document, change non-space token mark

author: Mirek Kratochvil <exa.exa@gmail.com> 2025-07-18 15:31:55 +0200
committer: Mirek Kratochvil <exa.exa@gmail.com> 2025-07-18 15:31:55 +0200
commit: 44518ce94659a98527c606c5a3ddc52306f4105a (patch)
tree: c4924aa2fdd00a6ca1f4e4e640441d5dc811ae62 /README.md
parent: 6a2b2e314870468329d3653093bda404feb0c121 (diff)
download: werge-44518ce94659a98527c606c5a3ddc52306f4105a.tar.gz
werge-44518ce94659a98527c606c5a3ddc52306f4105a.tar.bz2
1 files changed, 22 insertions, 15 deletions
diff --git a/README.md b/README.md
index 1ead68a..c6f789f 100644
--- a/README.md
+++ b/README.md
@@ -1,20 +1,26 @@
 
 # werge (merge weird stuff)
 
-This is a partial work-alike of `diff3` and `git merge` and other merge-y tools
-that is capable of
+This is a partial work-alike of `diff3`, `patch`, `git merge` and other merge-y
+tools that is capable of:
 
-- merging token-size changes instead of line-size ones
-- largely ignoring changes in blank characters
+- merging token-size changes (words, identifiers, sentences) instead of
+  line-size ones
+- merging changes in blank characters separately or ignoring them altogether
 
 These properties are great for several use-cases:
 
-- merging free-flowing text changes (such as in TeX) irrespective of line breaks
-  etc,
-- merging of change sets that use different code formatters
+- combining changes in free-flowing text (such as in TeX or Markdown),
+  irrespectively of changed line breaks, paragraph breaking and justification,
+  etc.
+- merging of code formatted with different code formatters
 - minimizing the conflict size of tiny changes to a few characters, making them
   easier to resolve
 
+Separate `diff`&`patch` functionality is provided too for sending
+token-granularity patches. (The patches are similar to what `git diff
+--word-diff` produces, but can be applied to files.)
+
 ## Demo
 
 Original (`old` file):
@@ -85,21 +91,22 @@ type. This choice trades off some merge quality for (a lot of) complexity.
 
 Tokenizers are simple, implementable as linear scanners that print separate
 tokens on individual lines that are prefixed with a space mark (`.` for space
-and `|` for non-space), and also escape newlines and backslashes. A default
+and `/` for non-space), and also escape newlines and backslashes. A default
 tokenization of string "hello \ world" with a new line at the end is listed
 below (note the invisible space on the lines with dots):
 
 ```
-|hello
+/hello
 . 
-|\\
+/\\
 . 
-|world
+/world
 .\n
 ```
 
-Users may supply any tokenizer via option `-F`, e.g. this script makes
-line-size tokens (reproducing the usual line merges):
+Users may supply any tokenizer via option `-F`. The script below produces
+line-size tokens for demonstration (in turn, `werge` will do the usual line
+merges), and can be used e.g. via `-F ./tokenize.py`:
 
 ```py
 #!/usr/bin/env python3
@@ -107,9 +114,9 @@ import sys
 for l in sys.stdin.readlines():
     if len(l)==0: continue
     if l[-1]=='\n':
-        print('|'+l[:-1].replace('\\','\\\\')+'\\n')
+        print('/'+l[:-1].replace('\\','\\\\')+'\\n')
     else:
-        print('|'+l.replace('\\','\\\\'))
+        print('/'+l.replace('\\','\\\\'))
 ```
 
 ## Installation
author	Mirek Kratochvil <exa.exa@gmail.com>	2025-07-18 15:31:55 +0200
committer	Mirek Kratochvil <exa.exa@gmail.com>	2025-07-18 15:31:55 +0200
commit	44518ce94659a98527c606c5a3ddc52306f4105a (patch)
tree	c4924aa2fdd00a6ca1f4e4e640441d5dc811ae62 /README.md
parent	6a2b2e314870468329d3653093bda404feb0c121 (diff)
download	werge-44518ce94659a98527c606c5a3ddc52306f4105a.tar.gz werge-44518ce94659a98527c606c5a3ddc52306f4105a.tar.bz2