document, change non-space token mark

This commit is contained in:
Mirek Kratochvil 2025-07-18 15:31:55 +02:00
parent 6a2b2e3148
commit 44518ce946

View file

@ -1,20 +1,26 @@
# werge (merge weird stuff) # werge (merge weird stuff)
This is a partial work-alike of `diff3` and `git merge` and other merge-y tools This is a partial work-alike of `diff3`, `patch`, `git merge` and other merge-y
that is capable of tools that is capable of:
- merging token-size changes instead of line-size ones - merging token-size changes (words, identifiers, sentences) instead of
- largely ignoring changes in blank characters line-size ones
- merging changes in blank characters separately or ignoring them altogether
These properties are great for several use-cases: These properties are great for several use-cases:
- merging free-flowing text changes (such as in TeX) irrespective of line breaks - combining changes in free-flowing text (such as in TeX or Markdown),
etc, irrespectively of changed line breaks, paragraph breaking and justification,
- merging of change sets that use different code formatters etc.
- merging of code formatted with different code formatters
- minimizing the conflict size of tiny changes to a few characters, making them - minimizing the conflict size of tiny changes to a few characters, making them
easier to resolve easier to resolve
Separate `diff`&`patch` functionality is provided too for sending
token-granularity patches. (The patches are similar to what `git diff
--word-diff` produces, but can be applied to files.)
## Demo ## Demo
Original (`old` file): Original (`old` file):
@ -85,21 +91,22 @@ type. This choice trades off some merge quality for (a lot of) complexity.
Tokenizers are simple, implementable as linear scanners that print separate Tokenizers are simple, implementable as linear scanners that print separate
tokens on individual lines that are prefixed with a space mark (`.` for space tokens on individual lines that are prefixed with a space mark (`.` for space
and `|` for non-space), and also escape newlines and backslashes. A default and `/` for non-space), and also escape newlines and backslashes. A default
tokenization of string "hello \ world" with a new line at the end is listed tokenization of string "hello \ world" with a new line at the end is listed
below (note the invisible space on the lines with dots): below (note the invisible space on the lines with dots):
``` ```
|hello /hello
. .
|\\ /\\
. .
|world /world
.\n .\n
``` ```
Users may supply any tokenizer via option `-F`, e.g. this script makes Users may supply any tokenizer via option `-F`. The script below produces
line-size tokens (reproducing the usual line merges): line-size tokens for demonstration (in turn, `werge` will do the usual line
merges), and can be used e.g. via `-F ./tokenize.py`:
```py ```py
#!/usr/bin/env python3 #!/usr/bin/env python3
@ -107,9 +114,9 @@ import sys
for l in sys.stdin.readlines(): for l in sys.stdin.readlines():
if len(l)==0: continue if len(l)==0: continue
if l[-1]=='\n': if l[-1]=='\n':
print('|'+l[:-1].replace('\\','\\\\')+'\\n') print('/'+l[:-1].replace('\\','\\\\')+'\\n')
else: else:
print('|'+l.replace('\\','\\\\')) print('/'+l.replace('\\','\\\\'))
``` ```
## Installation ## Installation