315 lines
11 KiB
Markdown
315 lines
11 KiB
Markdown
|
|
# werge (merge weird stuff)
|
|
|
|
This is a partial work-alike of `diff3`, `patch`, `git merge` and other merge-y
|
|
tools that is capable of:
|
|
|
|
- merging token-size changes (words, identifiers, sentences) instead of
|
|
line-size ones
|
|
- merging changes in blank characters separately or ignoring them altogether
|
|
|
|
These properties are great for several use-cases:
|
|
|
|
- combining changes in free-flowing text (such as in TeX or Markdown),
|
|
irrespectively of changed line breaks, paragraph breaking and justification,
|
|
etc.
|
|
- merging of code formatted with different code formatters
|
|
- minimizing the conflict size of tiny changes to a few characters, making them
|
|
easier to resolve
|
|
|
|
Separate `diff`&`patch` functionality is provided too for sending
|
|
token-granularity patches. (The patches are similar to what `git diff
|
|
--word-diff` produces, but can be applied to files.)
|
|
|
|
## Demo
|
|
|
|
Original (`old` file):
|
|
```
|
|
Roses are red. Violets are blue.
|
|
Patch is quite hard. I cannot rhyme.
|
|
```
|
|
|
|
Local changes (`my` file):
|
|
```
|
|
Roses are red. Violets are blue.
|
|
Patching is hard. I still cannot rhyme.
|
|
```
|
|
|
|
Remote changes (`your` file):
|
|
```
|
|
Roses are red.
|
|
Violets are blue.
|
|
Patch is quite hard.
|
|
I cannot do verses.
|
|
```
|
|
|
|
Token-merged version with `werge merge my orig your` (conflicts on the space
|
|
change that is too close to the disappearing "still" token):
|
|
```
|
|
Roses are red.
|
|
Violets are blue.
|
|
Patching is hard.<<<<< I still||||| I=====
|
|
I>>>>> cannot do verses.
|
|
```
|
|
(NOTE: option `-G` gives nicely colored output that is much easier to read.)
|
|
|
|
Token-merged version with separate space resolution using `-s` (conflicts get
|
|
fixed separately):
|
|
```
|
|
Roses are red.
|
|
Violets are blue.
|
|
Patching is hard.
|
|
I still cannot do verses.
|
|
```
|
|
|
|
A harder-conflicting file (`theirs`):
|
|
```
|
|
Roses are red.
|
|
Violets are blue.
|
|
Merging is quite hard.
|
|
I cannot do verses.
|
|
```
|
|
|
|
`werge merge mine orig theirs -s` highlights the actual unmergeable change:
|
|
```
|
|
Roses are red.
|
|
Violets are blue.
|
|
<<<<<Patching|||||Patch=====Merging>>>>> is hard.
|
|
I still cannot do verses.
|
|
```
|
|
|
|
## How does it work?
|
|
|
|
- Instead of lines, the files are torn to small tokens (words, spaces, symbols,
|
|
...) and these are diffed and merged individually.
|
|
- Some tokens are marked as spaces by the tokenizer, which allows the merge
|
|
algorithm to be (selectively) more zealous when resolving conflicts on these.
|
|
|
|
Technically, the ideas are similar to
|
|
[`spiff`](http://hpux.connect.org.uk/hppd/hpux/Text/spiff-1.0/) or `git diff
|
|
--word-diff`. Other tools exist such as
|
|
[`difftastic`](https://difftastic.wilfred.me.uk/) and
|
|
[`mergiraf`](https://mergiraf.org/) that are aware of the file structure (i.e.,
|
|
the actual syntax _tree_) that can be used to improve output. Compared to
|
|
these, **`werge` is completely oblivious about the actual file structure**, and
|
|
thus works quite well on any file type. This choice trades off some diff&merge
|
|
quality for (a lot of) complexity.
|
|
|
|
Tokenizers in `werge` are simple, implementable as linear scanners that print
|
|
separate tokens on individual lines that are prefixed with a space mark (`.`
|
|
for space and `/` for non-space), and escape newlines and backslashes. A
|
|
default tokenization of string "hello \ world" with a new line at the end is
|
|
listed below (note the invisible space on the lines with dots):
|
|
|
|
```
|
|
/hello
|
|
.
|
|
/\\
|
|
.
|
|
/world
|
|
.\n
|
|
```
|
|
|
|
### Custom tokenizers
|
|
|
|
Users may supply any tokenizer via option `-F`. The script below produces
|
|
line-size tokens for demonstration (in turn, `werge` will do the usual line
|
|
merges), and can be used e.g. via `-F ./tokenize.py`:
|
|
|
|
```py
|
|
#!/usr/bin/env python3
|
|
import sys
|
|
for l in sys.stdin.readlines():
|
|
if len(l)==0: continue
|
|
if l[-1]=='\n':
|
|
print('/'+l[:-1].replace('\\','\\\\')+'\\n')
|
|
else:
|
|
print('/'+l.replace('\\','\\\\'))
|
|
```
|
|
|
|
### History
|
|
|
|
I previously made an attempt to solve this in `adiff` software, which failed
|
|
because the approach was too complex. Before that, the issue was tackled by
|
|
Arek Antoniewicz on MFF CUNI, who used regex-edged DFAs (REDFAs) to construct
|
|
user-specifiable tokenizers in a pretty cool way.
|
|
|
|
## Installation
|
|
|
|
```sh
|
|
cabal install
|
|
```
|
|
|
|
Running of `werge` requires a working installation of `diff` compatible
|
|
with the one from [GNU diffutils](https://www.gnu.org/software/diffutils/). You
|
|
may set up a path to such `diff` (or a wrapper script) via environment variable
|
|
`WERGE_DIFF`.
|
|
|
|
## Use with `git`
|
|
|
|
`werge` can automatically process files that are marked in `git` as merge
|
|
conflicts:
|
|
|
|
```sh
|
|
$ git merge somebranch
|
|
$ werge git -ua
|
|
```
|
|
|
|
Options `-ua` (`--unmerged --add`) find all files that are marked as unmerged,
|
|
tries to merge them token-by-token, and if the merge is successful with current
|
|
settings it runs `git add` on them. The current changes in the files are
|
|
replaced by the merged (or partially merged) state; backups are written
|
|
automatically to `filename.werge-backup`.
|
|
|
|
## Current `--help` and features
|
|
|
|
```
|
|
werge -- blanks-friendly mergetool for tiny interdwindled changes
|
|
|
|
Usage: werge [(-F|--tok-filter FILTER) | (-i|--simple-tokens) |
|
|
(-I|--full-tokens)] [--no-zeal | (-z|--zeal)]
|
|
[-S|--space (keep|my|old|your)]
|
|
[-s | --resolve-space (normal|keep|my|old|your)]
|
|
[--conflict-space-overlaps] [--conflict-space-separate]
|
|
[--conflict-space-all] [-C|--expand-context N]
|
|
[--resolve (keep|my|old|your)] [--conflict-overlaps]
|
|
[--conflict-separate] [--conflict-all] [-G|--color]
|
|
[--label-start "<<<<<"] [--label-mo "|||||"] [--label-diff "|||||"]
|
|
[--label-oy "====="] [--label-end ">>>>>"] COMMAND
|
|
|
|
Available options:
|
|
-F,--tok-filter FILTER External program to separate the text to tokens
|
|
-i,--simple-tokens Use wider character class to separate the tokens
|
|
(results in larger tokens and ignores case)
|
|
-I,--full-tokens Separate characters by all known character classes
|
|
(default)
|
|
--no-zeal avoid zealous mode (default)
|
|
-z,--zeal Try to zealously minify conflicts, potentially
|
|
resolving them
|
|
-S,--space (keep|my|old|your)
|
|
Retain spacing from a selected version, or keep all
|
|
space changes for merging (default: keep)
|
|
-s Shortcut for `--resolve-space keep' (this separates
|
|
space-only conflicts, enabling better automated
|
|
resolution)
|
|
--resolve-space (normal|keep|my|old|your)
|
|
Resolve conflicts in space-only tokens separately,
|
|
and either keep unresolved conflicts, or resolve in
|
|
favor of a given version; `normal' resolves the
|
|
spaces together with other tokens, ignoring choices
|
|
in --resolve-space-* (default: normal)
|
|
--conflict-space-overlaps
|
|
Never resolve overlapping changes in space-only
|
|
tokens
|
|
--conflict-space-separate
|
|
Never resolve separate (non-overlapping) changes in
|
|
space-only tokens
|
|
--conflict-space-all Never resolve any changes in space-only tokens
|
|
-C,--expand-context N Consider changes that are at less than N tokens apart
|
|
to be a single change; 0 turns off conflict
|
|
expansion, 1 may cause bad resolutions of near
|
|
conflicting edits (default: 2)
|
|
--resolve (keep|my|old|your)
|
|
Resolve general conflicts in favor of a given
|
|
version, or keep the conflicts (default: keep)
|
|
--conflict-overlaps Never resolve overlapping changes in general tokens
|
|
--conflict-separate Never resolve separate (non-overlapping) changes in
|
|
general tokens
|
|
--conflict-all Never resolve any changes in general tokens
|
|
-G,--color Use shorter, gaily colored output markers by default
|
|
(requires ANSI color support; good for terminals or
|
|
`less -R')
|
|
--label-start "<<<<<" Label for beginning of the conflict
|
|
--label-mo "|||||" Separator of local edits and original
|
|
--label-diff "|||||" Separator for old and new version
|
|
--label-oy "=====" Separator of original and other people's edits
|
|
--label-end ">>>>>" Label for end of the conflict
|
|
-h,--help Show this help text
|
|
--version Show version information
|
|
|
|
Available commands:
|
|
merge diff3-style merge of two changesets
|
|
git Automerge unmerged files in git conflict
|
|
diff Find differences between two files
|
|
patch Apply a patch from `diff' to file
|
|
break Break text to tokens
|
|
glue Glue tokens back to text
|
|
|
|
werge is a free software, use it accordingly.
|
|
```
|
|
|
|
#### Manual merging
|
|
```
|
|
Usage: werge merge MYFILE OLDFILE YOURFILE
|
|
|
|
diff3-style merge of two changesets
|
|
|
|
Available options:
|
|
MYFILE Version with local edits
|
|
OLDFILE Original file version
|
|
YOURFILE Version with other people's edits
|
|
-h,--help Show this help text
|
|
```
|
|
|
|
#### Git interoperability
|
|
```
|
|
Usage: werge git (UNMERGED | (-u|--unmerged)) [(-a|--add) | --no-add]
|
|
|
|
Automerge unmerged files in git conflict
|
|
|
|
Available options:
|
|
UNMERGED Unmerged file tracked by git (can be specified
|
|
repeatedly)
|
|
-u,--unmerged Process all files marked as unmerged by git
|
|
-a,--add Run `git add' for fully merged files
|
|
--no-add Prevent running `git add'
|
|
-h,--help Show this help text
|
|
```
|
|
|
|
#### Finding differences
|
|
```
|
|
Usage: werge diff OLDFILE YOURFILE
|
|
[(-u|--unified) | (-U|--unified-size ARG) | (-m|--merge)]
|
|
|
|
Find differences between two files
|
|
|
|
Available options:
|
|
OLDFILE Original file version
|
|
YOURFILE File version with changes
|
|
-u,--unified Produce unified-diff-like output for `patch' with
|
|
default context size (20)
|
|
-U,--unified-size ARG Produce unified diff with this context size
|
|
-m,--merge Highlight the differences as with `merge' (default)
|
|
-h,--help Show this help text
|
|
```
|
|
|
|
#### Patching files in place
|
|
```
|
|
Usage: werge patch (MYFILE | (-f|--format)) [-p|--patch PATCH]
|
|
|
|
Modify a file using a patch from `diff'
|
|
|
|
Available options:
|
|
MYFILE File to be patched
|
|
-f,--format Do not patch anything, only format the patch using
|
|
conflict marks on joined tokens
|
|
-p,--patch PATCH File with the patch (default: stdin)
|
|
-h,--help Show this help text
|
|
```
|
|
|
|
#### Converting between files and tokens
|
|
|
|
Both commands work as plain stdin-to-stdout filters:
|
|
|
|
```
|
|
Usage: werge break
|
|
|
|
Break text to tokens
|
|
```
|
|
|
|
```
|
|
Usage: werge glue
|
|
|
|
Glue tokens back to text
|
|
```
|