# werge (merge weird stuff) This is a partial work-alike of `diff3`, `patch`, `git merge` and other merge-y tools that is capable of: - merging token-size changes (words, identifiers, sentences) instead of line-size ones - merging changes in blank characters separately or ignoring them altogether These properties are great for several use-cases: - combining changes in free-flowing text (such as in TeX or Markdown), irrespectively of changed line breaks, paragraph breaking and justification, etc. - merging of code formatted with different code formatters - minimizing the conflict size of tiny changes to a few characters, making them easier to resolve Separate `diff`&`patch` functionality is provided too for sending token-granularity patches. (The patches are similar to what `git diff --word-diff` produces, but can be applied to files.) ## Installation - To build from source, clone the repo and run `cabal install` in the directory (you need [a way to compile Haskell](https://www.haskell.org/downloads/)). - [Releases](https://github.com/exaexa/werge/releases) come with prebuilt binaries that you may download and run as-is on many Linuxes and Macs. Running of `werge` requires working installations of `diff` and `patch` compatible with the ones from [GNU diffutils](https://www.gnu.org/software/diffutils/): - Most Linux distributions contain the correct diffutils - On BSDs you should be able to install these from Ports ([FreeBSD](https://cgit.freebsd.org/ports/tree/textproc/diffutils), [OpenBSD](https://openports.eu/ports/textproc/gdiff)) - On Macs, install [diffutils from brew](https://formulae.brew.sh/formula/diffutils). In any other case, you may set up a path to any compatible `diff` and `patch` (or suitable wrapper scripts) via environment variables `WERGE_DIFF` and `WERGE_PATCH`. (If required, the same applies for `WERGE_GIT`.) ### Editor integration There's a `vim` syntax highlighting file in `vim/werge.vim`. To install, simply copy it to your local `vim` syntax configuration directory (usually to `~/.vim/syntax/werge.vim`). Then, you can activate the syntax in vim with: ``` :set syn=werge ``` ## Demo ##### Original (`old` file): ``` Roses are red. Violets are blue. Patch is quite hard. I cannot rhyme. ``` ##### Local changes (`my` file): ``` Roses are red. Violets are blue. Patching is hard. I still cannot rhyme. ``` ##### Remote changes (`your` file): ``` Roses are red. Violets are blue. Patch is quite hard. I cannot do verses. ``` ##### Token-merged version This is produced with `werge merge my old your` (conflicts on the space change that is too close to the disappearing "still" token): ``` Roses are red. Violets are blue. Patching is hard.<<<<< I still||||| I===== I>>>>> cannot do verses. ``` (NOTE: option `-G` gives nicely colored output that is much easier to read. Alternatively you can install the syntax highlighting for `vim`.) ##### Merge with separate space resultion Adding option `-s` to `werge merge` causes it to resolve space conflicts separately, usually helping many cases that would be easily resolvable by a human: ``` Roses are red. Violets are blue. Patching is hard. I still cannot do verses. ``` ##### Mixing in unresolvable conflict A harder-conflicting file (`their`): ``` Roses are red. Violets are blue. Merging is quite hard. I cannot do verses. ``` `werge merge my old their -s` highlights the actual unmergeable change: ``` Roses are red. Violets are blue. <<<<>>>> is hard. I still cannot do verses. ``` ## How does it work? - Instead of lines, the files are torn to small tokens (words, spaces, symbols, ...) and these are diffed and merged individually. - Some tokens are marked as spaces by the tokenizer, which allows the merge algorithm to be (selectively) more zealous when resolving conflicts on these. Technically, the ideas are similar to [`spiff`](http://hpux.connect.org.uk/hppd/hpux/Text/spiff-1.0/) or `git diff --word-diff`. Other tools exist such as [`difftastic`](https://difftastic.wilfred.me.uk/) and [`mergiraf`](https://mergiraf.org/) that are aware of the file structure (i.e., the actual syntax _tree_) that can be used to improve output. Compared to these, **`werge` is completely oblivious about the actual file structure**, and thus works quite well on any file type. This choice trades off some diff&merge quality for (a lot of) complexity. Tokenizers in `werge` are simple, implementable as linear scanners that print separate tokens on individual lines that are prefixed with a space mark (`.` for space and `/` for non-space), and escape newlines and backslashes. A default tokenization of string "hello \ world" with a new line at the end is listed below (note the invisible space on the lines with dots): ``` /hello . /\\ . /world .\n ``` ### Custom tokenizers Users may supply any tokenizer via option `-F`. The script below produces line-size tokens for demonstration (in turn, `werge` will do the usual line merges), and can be used e.g. via `-F ./tokenize.py`: ```py #!/usr/bin/env python3 import sys for l in sys.stdin.readlines(): if len(l)==0: continue if l[-1]=='\n': print('/'+l[:-1].replace('\\','\\\\')+'\\n') else: print('/'+l.replace('\\','\\\\')) ``` ### History I previously made an attempt to solve this in `adiff` software, which failed because the approach was too complex. Before that, the issue was tackled by Arek Antoniewicz on MFF CUNI, who used regex-edged DFAs (REDFAs) to construct user-specifiable tokenizers in a pretty cool way. ## Integration with `git` ### Automerging conflicts `werge` can automatically process files that are marked in `git` as merge conflicts: ```sh $ git merge somebranch $ werge git -ua ``` Options `-ua` (`--unmerged --add`) find all files that are marked as unmerged, tries to merge them token-by-token, and if the merge is successful with current settings it runs `git add` on them. The current changes in the files are replaced by the merged (or partially merged) state; backups are written automatically to `filename.werge-backup`. Optionally, you can specify exact files to be automerged. That is useful for cases when only some of the conflicting files should be processed by `werge`: ```sh $ werge git my/conflicting/file.txt ``` Support for merging complex types of changes (deletes, directory moves, symlinks, ...) via this interface is currently limited. `werge` can be used as a mergetool or a merge driver to ameliorate that. ### Use as `git difftool` and `git mergetool` The `git` config below allows direct use of `werge` as `git difftool -t werge` and `git mergetool -t werge`: ```ini [difftool "werge"] cmd = werge diff -G $LOCAL $REMOTE [mergetool "werge"] cmd = werge merge $LOCAL $BASE $REMOTE > $MERGED trustExitCode = true # variant for separate resolution of space (solves more conflicts): [mergetool "spacewerge"] cmd = werge merge -s $LOCAL $BASE $REMOTE > $MERGED trustExitCode = true ``` One issue with `git` mergetools is that they are supposed to be interactive, and thus `git` expects them to always produce a completely merged, conflictless result. In turn, if the auto-merging with `git mergetool -t werge` fails with conflicts, `git` assumes a complete failure and restores the original version from the backup. To enable a more useful behavior, use `werge` as a merge driver (see below). ### Use as a `git` merge driver Add this to your git config: ```ini [merge "werge"] name = werge driver = werge merge %A %O %B > %P recursive = binary ``` Then, specify that the "werge" driver should be used for certain files in your repository's `.gitattributes`: ``` *.md merge=werge *.tex merge=werge # ... etc ``` With this in place, `git merge` will automatically run `werge` to merge the marked files in the repository. On conflict, you will have the files marked with the usual (werge's usual) conflict markers, and you will be able to resolve them just as with the normal merging workflow. **Hint:** As with `spacewerge` mergetool above, it is beneficial to add a few conflict-resolving options such as `-s` to the `driver`, in order to help the automerges pass nicely. ### Use with `git rebase` The merge driver and mergetools as configured above will also automatically work with `git rebase` that runs in the "merge mode" (which is the default). As a possible source of confusion, the "my" and "your" versions are somewhat swapped (as implied by semantics): - With `git checkout mybranch; git merge otherbranch`, the conflicts will look roughly like this: ``` <<<<< mybranch version ||||| merge base ===== otherbranch version >>>>> ``` - With `git checkout mybranch; git rebase otherbranch`, the logic is reversed: ``` <<<<< otherbranch version ||||| common base ===== mybranch version >>>>> ``` ## Current `--help` and features ``` werge -- blanks-friendly mergetool for tiny interdwindled changes Usage: werge [(-F|--tok-filter FILTER) | (-i|--simple-tokens) | (-I|--full-tokens)] [--no-zeal | (-z|--zeal)] [-S|--space (keep|my|old|your)] [-s | --resolve-space (normal|keep|my|old|your)] [--conflict-space-overlaps] [--conflict-space-separate] [--conflict-space-all] [-C|--expand-context N] [--resolve (keep|my|old|your)] [--conflict-overlaps] [--conflict-separate] [--conflict-all] [-G|--color] [--label-start "<<<<<"] [--label-mo "|||||"] [--label-diff "|||||"] [--label-oy "====="] [--label-end ">>>>>"] COMMAND Available options: -F,--tok-filter FILTER External program to separate the text to tokens -i,--simple-tokens Use wider character class to separate the tokens (results in larger tokens and ignores case) -I,--full-tokens Separate characters by all known character classes (default) --no-zeal avoid zealous mode (default) -z,--zeal Try to zealously minify conflicts, potentially resolving them -S,--space (keep|my|old|your) Retain spacing from a selected version, or keep all space changes for merging (default: keep) -s Shortcut for `--resolve-space keep' (this separates space-only conflicts, enabling better automated resolution) --resolve-space (normal|keep|my|old|your) Resolve conflicts in space-only tokens separately, and either keep unresolved conflicts, or resolve in favor of a given version; `normal' resolves the spaces together with other tokens, ignoring choices in --resolve-space-* (default: normal) --conflict-space-overlaps Never resolve overlapping changes in space-only tokens --conflict-space-separate Never resolve separate (non-overlapping) changes in space-only tokens --conflict-space-all Never resolve any changes in space-only tokens -C,--expand-context N Consider changes that are at less than N tokens apart to be a single change; 0 turns off conflict expansion, 1 may cause bad resolutions of near conflicting edits (default: 2) --resolve (keep|my|old|your) Resolve general conflicts in favor of a given version, or keep the conflicts (default: keep) --conflict-overlaps Never resolve overlapping changes in general tokens --conflict-separate Never resolve separate (non-overlapping) changes in general tokens --conflict-all Never resolve any changes in general tokens -G,--color Use shorter, gaily colored output markers by default (requires ANSI color support; good for terminals or `less -R') --label-start "<<<<<" Label for beginning of the conflict --label-mo "|||||" Separator of local edits and original --label-diff "|||||" Separator for old and new version --label-oy "=====" Separator of original and other people's edits --label-end ">>>>>" Label for end of the conflict -h,--help Show this help text --version Show version information Available commands: merge diff3-style merge of two changesets git Automerge unmerged files in git conflict diff Find differences between two files patch Apply a patch from `diff' to file break Break text to tokens glue Glue tokens back to text werge is a free software, use it accordingly. ``` #### Manual merging ``` Usage: werge merge MYFILE OLDFILE YOURFILE diff3-style merge of two changesets Available options: MYFILE Version with local edits OLDFILE Original file version YOURFILE Version with other people's edits -h,--help Show this help text ``` #### Git interoperability ``` Usage: werge git (UNMERGED | (-u|--unmerged)) [(-a|--add) | --no-add] Automerge unmerged files in git conflict Available options: UNMERGED Unmerged file tracked by git (can be specified repeatedly) -u,--unmerged Process all files marked as unmerged by git -a,--add Run `git add' for fully merged files --no-add Prevent running `git add' -h,--help Show this help text ``` #### Finding differences ``` Usage: werge diff OLDFILE YOURFILE [(-u|--unified) | (-U|--unified-size ARG) | (-m|--merge)] Find differences between two files Available options: OLDFILE Original file version YOURFILE File version with changes -u,--unified Produce unified-diff-like output for `patch' with default context size (20) -U,--unified-size ARG Produce unified diff with this context size -m,--merge Highlight the differences as with `merge' (default) -h,--help Show this help text ``` #### Patching files in place ``` Usage: werge patch (MYFILE | (-f|--format)) [-p|--patch PATCH] Modify a file using a patch from `diff' Available options: MYFILE File to be patched -f,--format Do not patch anything, only format the patch using conflict marks on joined tokens -p,--patch PATCH File with the patch (default: stdin) -h,--help Show this help text ``` #### Converting between files and tokens Both commands work as plain stdin-to-stdout filters: ``` Usage: werge break Break text to tokens ``` ``` Usage: werge glue Glue tokens back to text ```