# werge (merge weird stuff) This is a partial work-alike of `diff3`, `patch`, `git merge` and other merge-y tools that is capable of: - merging token-size changes (words, identifiers, sentences) instead of line-size ones - merging changes in blank characters separately or ignoring them altogether These properties are great for several use-cases: - combining changes in free-flowing text (such as in TeX or Markdown), irrespectively of changed line breaks, paragraph breaking and justification, etc. - merging of code formatted with different code formatters - minimizing the conflict size of tiny changes to a few characters, making them easier to resolve Separate `diff`&`patch` functionality is provided too for sending token-granularity patches. (The patches are similar to what `git diff --word-diff` produces, but can be applied to files.) ## Demo Original (`old` file): ``` Roses are red. Violets are blue. Patch is quite hard. I cannot rhyme. ``` Local changes (`my` file): ``` Roses are red. Violets are blue. Patching is hard. I still cannot rhyme. ``` Remote changes (`your` file): ``` Roses are red. Violets are blue. Patch is quite hard. I cannot do verses. ``` Token-merged version with `werge merge my orig your` (conflicts on the space change that is too close to the disappearing "still" token): ``` Roses are red. Violets are blue. Patching is hard.<<<<< I still||||| I===== I>>>>> cannot do verses. ``` (NOTE: option `-G` gives nicely colored output that is much easier to read.) Token-merged version with separate space resolution using `-s` (conflicts get fixed separately): ``` Roses are red. Violets are blue. Patching is hard. I still cannot do verses. ``` A harder-conflicting file (`theirs`): ``` Roses are red. Violets are blue. Merging is quite hard. I cannot do verses. ``` `werge merge mine orig theirs -s` highlights the actual unmergeable change: ``` Roses are red. Violets are blue. <<<<>>>> is hard. I still cannot do verses. ``` ## How does it work? - Instead of lines, the files are torn to small tokens (words, spaces, symbols, ...) and these are diffed and merged individually. - Some tokens are marked as spaces by the tokenizer, which allows the merge algorithm to be (selectively) more zealous when resolving conflicts on these. Technically, the ideas are similar to [`spiff`](http://hpux.connect.org.uk/hppd/hpux/Text/spiff-1.0/) or `git diff --word-diff`. Other tools exist such as [`difftastic`](https://difftastic.wilfred.me.uk/) and [`mergiraf`](https://mergiraf.org/) that are aware of the file structure (i.e., the actual syntax _tree_) that can be used to improve output. Compared to these, **`werge` is completely oblivious about the actual file structure**, and thus works quite well on any file type. This choice trades off some diff&merge quality for (a lot of) complexity. Tokenizers in `werge` are simple, implementable as linear scanners that print separate tokens on individual lines that are prefixed with a space mark (`.` for space and `/` for non-space), and escape newlines and backslashes. A default tokenization of string "hello \ world" with a new line at the end is listed below (note the invisible space on the lines with dots): ``` /hello . /\\ . /world .\n ``` ### Custom tokenizers Users may supply any tokenizer via option `-F`. The script below produces line-size tokens for demonstration (in turn, `werge` will do the usual line merges), and can be used e.g. via `-F ./tokenize.py`: ```py #!/usr/bin/env python3 import sys for l in sys.stdin.readlines(): if len(l)==0: continue if l[-1]=='\n': print('/'+l[:-1].replace('\\','\\\\')+'\\n') else: print('/'+l.replace('\\','\\\\')) ``` ### History I previously made an attempt to solve this in `adiff` software, which failed because the approach was too complex. Before that, the issue was tackled by Arek Antoniewicz on MFF CUNI, who used regex-edged DFAs (REDFAs) to construct user-specifiable tokenizers in a pretty cool way. ## Installation ```sh cabal install ``` Running of `werge` requires a working installation of `diff` compatible with the one from [GNU diffutils](https://www.gnu.org/software/diffutils/). You may set up a path to such `diff` (or a wrapper script) via environment variable `WERGE_DIFF`. ## Use with `git` `werge` can automatically process files that are marked in `git` as merge conflicts: ```sh $ git merge somebranch $ werge git -ua ``` Options `-ua` (`--unmerged --add`) find all files that are marked as unmerged, tries to merge them token-by-token, and if the merge is successful with current settings it runs `git add` on them. The current changes in the files are replaced by the merged (or partially merged) state; backups are written automatically to `filename.werge-backup`. ## Current `--help` and features ``` werge -- blanks-friendly mergetool for tiny interdwindled changes Usage: werge [(-F|--tok-filter FILTER) | (-i|--simple-tokens) | (-I|--full-tokens)] [--no-zeal | (-z|--zeal)] [-S|--space (keep|my|old|your)] [-s | --resolve-space (normal|keep|my|old|your)] [--conflict-space-overlaps] [--conflict-space-separate] [--conflict-space-all] [-C|--expand-context N] [--resolve (keep|my|old|your)] [--conflict-overlaps] [--conflict-separate] [--conflict-all] [-G|--color] [--label-start "<<<<<"] [--label-mo "|||||"] [--label-diff "|||||"] [--label-oy "====="] [--label-end ">>>>>"] COMMAND Available options: -F,--tok-filter FILTER External program to separate the text to tokens -i,--simple-tokens Use wider character class to separate the tokens (results in larger tokens and ignores case) -I,--full-tokens Separate characters by all known character classes (default) --no-zeal avoid zealous mode (default) -z,--zeal Try to zealously minify conflicts, potentially resolving them -S,--space (keep|my|old|your) Retain spacing from a selected version, or keep all space changes for merging (default: keep) -s Shortcut for `--resolve-space keep' (this separates space-only conflicts, enabling better automated resolution) --resolve-space (normal|keep|my|old|your) Resolve conflicts in space-only tokens separately, and either keep unresolved conflicts, or resolve in favor of a given version; `normal' resolves the spaces together with other tokens, ignoring choices in --resolve-space-* (default: normal) --conflict-space-overlaps Never resolve overlapping changes in space-only tokens --conflict-space-separate Never resolve separate (non-overlapping) changes in space-only tokens --conflict-space-all Never resolve any changes in space-only tokens -C,--expand-context N Consider changes that are at less than N tokens apart to be a single change; 0 turns off conflict expansion, 1 may cause bad resolutions of near conflicting edits (default: 2) --resolve (keep|my|old|your) Resolve general conflicts in favor of a given version, or keep the conflicts (default: keep) --conflict-overlaps Never resolve overlapping changes in general tokens --conflict-separate Never resolve separate (non-overlapping) changes in general tokens --conflict-all Never resolve any changes in general tokens -G,--color Use shorter, gaily colored output markers by default (requires ANSI color support; good for terminals or `less -R') --label-start "<<<<<" Label for beginning of the conflict --label-mo "|||||" Separator of local edits and original --label-diff "|||||" Separator for old and new version --label-oy "=====" Separator of original and other people's edits --label-end ">>>>>" Label for end of the conflict -h,--help Show this help text --version Show version information Available commands: merge diff3-style merge of two changesets git Automerge unmerged files in git conflict diff Find differences between two files patch Apply a patch from `diff' to file break Break text to tokens glue Glue tokens back to text werge is a free software, use it accordingly. ``` #### Manual merging ``` Usage: werge merge MYFILE OLDFILE YOURFILE diff3-style merge of two changesets Available options: MYFILE Version with local edits OLDFILE Original file version YOURFILE Version with other people's edits -h,--help Show this help text ``` #### Git interoperability ``` Usage: werge git (UNMERGED | (-u|--unmerged)) [(-a|--add) | --no-add] Automerge unmerged files in git conflict Available options: UNMERGED Unmerged file tracked by git (can be specified repeatedly) -u,--unmerged Process all files marked as unmerged by git -a,--add Run `git add' for fully merged files --no-add Prevent running `git add' -h,--help Show this help text ``` #### Finding differences ``` Usage: werge diff OLDFILE YOURFILE [(-u|--unified) | (-U|--unified-size ARG) | (-m|--merge)] Find differences between two files Available options: OLDFILE Original file version YOURFILE File version with changes -u,--unified Produce unified-diff-like output for `patch' with default context size (20) -U,--unified-size ARG Produce unified diff with this context size -m,--merge Highlight the differences as with `merge' (default) -h,--help Show this help text ``` #### Patching files in place ``` Usage: werge patch (MYFILE | (-f|--format)) [-p|--patch PATCH] Modify a file using a patch from `diff' Available options: MYFILE File to be patched -f,--format Do not patch anything, only format the patch using conflict marks on joined tokens -p,--patch PATCH File with the patch (default: stdin) -h,--help Show this help text ``` #### Converting between files and tokens Both commands work as plain stdin-to-stdout filters: ``` Usage: werge break Break text to tokens ``` ``` Usage: werge glue Glue tokens back to text ```