diff options
| author | Mirek Kratochvil <exa.exa@gmail.com> | 2025-07-14 10:55:43 +0200 |
|---|---|---|
| committer | Mirek Kratochvil <exa.exa@gmail.com> | 2025-07-14 10:55:43 +0200 |
| commit | a8b38d647581b7bb6ae515b12bda81b7aae44fee (patch) | |
| tree | 6e4b3d8d4a339071043a195ae863ba66f89cb117 | |
| parent | 79977cdf4b9a2fcac4c47b458cccca101686da63 (diff) | |
| download | werge-a8b38d647581b7bb6ae515b12bda81b7aae44fee.tar.gz werge-a8b38d647581b7bb6ae515b12bda81b7aae44fee.tar.bz2 | |
doc
| -rw-r--r-- | README.md | 38 |
1 files changed, 37 insertions, 1 deletions
@@ -14,7 +14,41 @@ These properties are great for several use-cases: - minimizing the conflict size of tiny changes to a few characters, making them easier to resolve -Better docs is WIP +## How does it work? + +- Instead of lines, the files are torn to small tokens (words, spaces, symbols, + ...) and these are diffed and merged individually. +- Some tokens are marked as spaces by the tokenizer, which allows the merge + algorithm to be (selectively) more zealous when resolving conflicts on these. + +Tokenizers are simple, implementable as linear scanners that print separate +tokens on individual lines that are prefixed with a space mark (`.` for space +and `|` for non-space), and also escape newlines and backslashes. A default +tokenization of string "hello \ world" with a new line at the end is listed +below (note the invisible space on the lines with dots): + +``` +|hello +. +|\\ +. +|world +.\n +``` + +Users may supply any tokenizer via option `-F`, e.g. this script makes +line-size tokens (reproducing the usual line merges): + +``` +#!/usr/bin/env python3 +import sys +for l in sys.stdin.readlines(): + if len(l)==0: continue + if l[-1]=='\n': + print('|'+l[:-1].replace('\\','\\\\')+'\\n') + else: + print('|'+l.replace('\\','\\\\')) +``` ## Installation @@ -74,3 +108,5 @@ Available commands: werge is a free software, use it accordingly. ``` + +## External tokenizer |
