document, change non-space token mark
This commit is contained in:
parent
6a2b2e3148
commit
44518ce946
37
README.md
37
README.md
|
|
@ -1,20 +1,26 @@
|
|||
|
||||
# werge (merge weird stuff)
|
||||
|
||||
This is a partial work-alike of `diff3` and `git merge` and other merge-y tools
|
||||
that is capable of
|
||||
This is a partial work-alike of `diff3`, `patch`, `git merge` and other merge-y
|
||||
tools that is capable of:
|
||||
|
||||
- merging token-size changes instead of line-size ones
|
||||
- largely ignoring changes in blank characters
|
||||
- merging token-size changes (words, identifiers, sentences) instead of
|
||||
line-size ones
|
||||
- merging changes in blank characters separately or ignoring them altogether
|
||||
|
||||
These properties are great for several use-cases:
|
||||
|
||||
- merging free-flowing text changes (such as in TeX) irrespective of line breaks
|
||||
etc,
|
||||
- merging of change sets that use different code formatters
|
||||
- combining changes in free-flowing text (such as in TeX or Markdown),
|
||||
irrespectively of changed line breaks, paragraph breaking and justification,
|
||||
etc.
|
||||
- merging of code formatted with different code formatters
|
||||
- minimizing the conflict size of tiny changes to a few characters, making them
|
||||
easier to resolve
|
||||
|
||||
Separate `diff`&`patch` functionality is provided too for sending
|
||||
token-granularity patches. (The patches are similar to what `git diff
|
||||
--word-diff` produces, but can be applied to files.)
|
||||
|
||||
## Demo
|
||||
|
||||
Original (`old` file):
|
||||
|
|
@ -85,21 +91,22 @@ type. This choice trades off some merge quality for (a lot of) complexity.
|
|||
|
||||
Tokenizers are simple, implementable as linear scanners that print separate
|
||||
tokens on individual lines that are prefixed with a space mark (`.` for space
|
||||
and `|` for non-space), and also escape newlines and backslashes. A default
|
||||
and `/` for non-space), and also escape newlines and backslashes. A default
|
||||
tokenization of string "hello \ world" with a new line at the end is listed
|
||||
below (note the invisible space on the lines with dots):
|
||||
|
||||
```
|
||||
|hello
|
||||
/hello
|
||||
.
|
||||
|\\
|
||||
/\\
|
||||
.
|
||||
|world
|
||||
/world
|
||||
.\n
|
||||
```
|
||||
|
||||
Users may supply any tokenizer via option `-F`, e.g. this script makes
|
||||
line-size tokens (reproducing the usual line merges):
|
||||
Users may supply any tokenizer via option `-F`. The script below produces
|
||||
line-size tokens for demonstration (in turn, `werge` will do the usual line
|
||||
merges), and can be used e.g. via `-F ./tokenize.py`:
|
||||
|
||||
```py
|
||||
#!/usr/bin/env python3
|
||||
|
|
@ -107,9 +114,9 @@ import sys
|
|||
for l in sys.stdin.readlines():
|
||||
if len(l)==0: continue
|
||||
if l[-1]=='\n':
|
||||
print('|'+l[:-1].replace('\\','\\\\')+'\\n')
|
||||
print('/'+l[:-1].replace('\\','\\\\')+'\\n')
|
||||
else:
|
||||
print('|'+l.replace('\\','\\\\'))
|
||||
print('/'+l.replace('\\','\\\\'))
|
||||
```
|
||||
|
||||
## Installation
|
||||
|
|
|
|||
Loading…
Reference in a new issue