422 lines
		
	
	
		
			15 KiB
		
	
	
	
		
			Markdown
		
	
	
	
	
	
			
		
		
	
	
			422 lines
		
	
	
		
			15 KiB
		
	
	
	
		
			Markdown
		
	
	
	
	
	
| 
 | |
| # werge (merge weird stuff)
 | |
| 
 | |
| This is a partial work-alike of `diff3`, `patch`, `git merge` and other merge-y
 | |
| tools that is capable of:
 | |
| 
 | |
| - merging token-size changes (words, identifiers, sentences) instead of
 | |
|   line-size ones
 | |
| - merging changes in blank characters separately or ignoring them altogether
 | |
| 
 | |
| These properties are great for several use-cases:
 | |
| 
 | |
| - combining changes in free-flowing text (such as in TeX or Markdown),
 | |
|   irrespectively of changed line breaks, paragraph breaking and justification,
 | |
|   etc.
 | |
| - merging of code formatted with different code formatters
 | |
| - minimizing the conflict size of tiny changes to a few characters, making them
 | |
|   easier to resolve
 | |
| 
 | |
| Separate `diff`&`patch` functionality is provided too for sending
 | |
| token-granularity patches. (The patches are similar to what `git diff
 | |
| --word-diff` produces, but can be applied to files.)
 | |
| 
 | |
| ## Installation
 | |
| 
 | |
| - To build from source, clone the repo and run `cabal install` in the directory
 | |
|   (you need [a way to compile Haskell](https://www.haskell.org/downloads/)).
 | |
| - [Releases](https://github.com/exaexa/werge/releases) come with prebuilt
 | |
|   binaries that you may download and run as-is on many Linuxes and Macs.
 | |
| 
 | |
| Running of `werge` requires working installations of `diff` and `patch`
 | |
| compatible with the ones from [GNU
 | |
| diffutils](https://www.gnu.org/software/diffutils/):
 | |
| - Most Linux distributions contain the correct diffutils
 | |
| - On BSDs you should be able to install these from Ports
 | |
|   ([FreeBSD](https://cgit.freebsd.org/ports/tree/textproc/diffutils),
 | |
|   [OpenBSD](https://openports.eu/ports/textproc/gdiff))
 | |
| - On Macs, install [diffutils from
 | |
|   brew](https://formulae.brew.sh/formula/diffutils).
 | |
| 
 | |
| In any other case, you may set up a path to any compatible `diff` and `patch`
 | |
| (or suitable wrapper scripts) via environment variables `WERGE_DIFF` and
 | |
| `WERGE_PATCH`. (If required, the same applies for `WERGE_GIT`.)
 | |
| 
 | |
| ### Editor integration
 | |
| 
 | |
| There's a `vim` syntax highlighting file in `vim/werge.vim`. To install, simply
 | |
| copy it to your local `vim` syntax configuration directory (usually to
 | |
| `~/.vim/syntax/werge.vim`). Then, you can activate the syntax in vim with:
 | |
| 
 | |
| ```
 | |
| :set syn=werge
 | |
| ```
 | |
| 
 | |
| ## Demo
 | |
| 
 | |
| ##### Original (`old` file):
 | |
| ```
 | |
| Roses are red. Violets are blue.
 | |
| Patch is quite hard. I cannot rhyme.
 | |
| ```
 | |
| 
 | |
| ##### Local changes (`my` file):
 | |
| ```
 | |
| Roses are red. Violets are blue.
 | |
| Patching is hard. I still cannot rhyme.
 | |
| ```
 | |
| 
 | |
| ##### Remote changes (`your` file):
 | |
| ```
 | |
| Roses are red.
 | |
| Violets are blue.
 | |
| Patch is quite hard.
 | |
| I cannot do verses.
 | |
| ```
 | |
| 
 | |
| ##### Token-merged version
 | |
| 
 | |
| This is produced with `werge merge my old your` (conflicts on the space change
 | |
| that is too close to the disappearing "still" token):
 | |
| ```
 | |
| Roses are red.
 | |
| Violets are blue.
 | |
| Patching is hard.<<<<< I still||||| I=====
 | |
| I>>>>> cannot do verses.
 | |
| ```
 | |
| (NOTE: option `-G` gives nicely colored output that is much easier to read.
 | |
| Alternatively you can install the syntax highlighting for `vim`.)
 | |
| 
 | |
| ##### Merge with separate space resultion
 | |
| Adding option `-s` to `werge merge` causes it to resolve space conflicts
 | |
| separately, usually helping many cases that would be easily resolvable by a
 | |
| human:
 | |
| ```
 | |
| Roses are red.
 | |
| Violets are blue.
 | |
| Patching is hard.
 | |
| I still cannot do verses.
 | |
| ```
 | |
| 
 | |
| ##### Mixing in unresolvable conflict
 | |
| A harder-conflicting file (`their`):
 | |
| ```
 | |
| Roses are red.
 | |
| Violets are blue.
 | |
| Merging is quite hard.
 | |
| I cannot do verses.
 | |
| ```
 | |
| 
 | |
| `werge merge my old their -s` highlights the actual unmergeable change:
 | |
| ```
 | |
| Roses are red.
 | |
| Violets are blue.
 | |
| <<<<<Patching|||||Patch=====Merging>>>>> is hard.
 | |
| I still cannot do verses.
 | |
| ```
 | |
| 
 | |
| ## How does it work?
 | |
| 
 | |
| - Instead of lines, the files are torn to small tokens (words, spaces, symbols,
 | |
|   ...) and these are diffed and merged individually.
 | |
| - Some tokens are marked as spaces by the tokenizer, which allows the merge
 | |
|   algorithm to be (selectively) more zealous when resolving conflicts on these.
 | |
| 
 | |
| Technically, the ideas are similar to
 | |
| [`spiff`](http://hpux.connect.org.uk/hppd/hpux/Text/spiff-1.0/) or `git diff
 | |
| --word-diff`.  Other tools exist such as
 | |
| [`difftastic`](https://difftastic.wilfred.me.uk/) and
 | |
| [`mergiraf`](https://mergiraf.org/) that are aware of the file structure (i.e.,
 | |
| the actual syntax _tree_) that can be used to improve output.  Compared to
 | |
| these, **`werge` is completely oblivious about the actual file structure**, and
 | |
| thus works quite well on any file type.  This choice trades off some diff&merge
 | |
| quality for (a lot of) complexity.
 | |
| 
 | |
| Tokenizers in `werge` are simple, implementable as linear scanners that print
 | |
| separate tokens on individual lines that are prefixed with a space mark (`.`
 | |
| for space and `/` for non-space), and escape newlines and backslashes. A
 | |
| default tokenization of string "hello \ world" with a new line at the end is
 | |
| listed below (note the invisible space on the lines with dots):
 | |
| 
 | |
| ```
 | |
| /hello
 | |
| . 
 | |
| /\\
 | |
| . 
 | |
| /world
 | |
| .\n
 | |
| ```
 | |
| 
 | |
| ### Custom tokenizers
 | |
| 
 | |
| Users may supply any tokenizer via option `-F`. The script below produces
 | |
| line-size tokens for demonstration (in turn, `werge` will do the usual line
 | |
| merges), and can be used e.g. via `-F ./tokenize.py`:
 | |
| 
 | |
| ```py
 | |
| #!/usr/bin/env python3
 | |
| import sys
 | |
| for l in sys.stdin.readlines():
 | |
|     if len(l)==0: continue
 | |
|     if l[-1]=='\n':
 | |
|         print('/'+l[:-1].replace('\\','\\\\')+'\\n')
 | |
|     else:
 | |
|         print('/'+l.replace('\\','\\\\'))
 | |
| ```
 | |
| 
 | |
| ### History
 | |
| 
 | |
| I previously made an attempt to solve this in `adiff` software, which failed
 | |
| because the approach was too complex. Before that, the issue was tackled by
 | |
| Arek Antoniewicz on MFF CUNI, who used regex-edged DFAs (REDFAs) to construct
 | |
| user-specifiable tokenizers in a pretty cool way.
 | |
| 
 | |
| ## Integration with `git`
 | |
| 
 | |
| ### Automerging conflicts
 | |
| 
 | |
| `werge` can automatically process files that are marked in `git` as merge
 | |
| conflicts:
 | |
| 
 | |
| ```sh
 | |
| $ git merge somebranch
 | |
| $ werge git -ua
 | |
| ```
 | |
| 
 | |
| Options `-ua` (`--unmerged --add`) find all files that are marked as unmerged,
 | |
| tries to merge them token-by-token, and if the merge is successful with current
 | |
| settings it runs `git add` on them. The current changes in the files are
 | |
| replaced by the merged (or partially merged) state; backups are written
 | |
| automatically to `filename.werge-backup`.
 | |
| 
 | |
| Optionally, you can specify exact files to be automerged. That is useful for
 | |
| cases when only some of the conflicting files should be processed by `werge`:
 | |
| 
 | |
| ```sh
 | |
| $ werge git my/conflicting/file.txt
 | |
| ```
 | |
| 
 | |
| Support for merging complex types of changes (deletes, directory moves,
 | |
| symlinks, ...) via this interface is currently limited. `werge` can be used as
 | |
| a mergetool or a merge driver to ameliorate that.
 | |
| 
 | |
| ### Use as `git difftool` and `git mergetool`
 | |
| 
 | |
| The `git` config below allows direct use of `werge` as `git difftool -t werge`
 | |
| and `git mergetool -t werge`:
 | |
| ```ini
 | |
| [difftool "werge"]
 | |
| 	cmd = werge diff -G $LOCAL $REMOTE
 | |
| [mergetool "werge"]
 | |
| 	cmd = werge merge $LOCAL $BASE $REMOTE > $MERGED
 | |
| 	trustExitCode = true
 | |
| 
 | |
| # variant for separate resolution of space (solves more conflicts):
 | |
| [mergetool "spacewerge"]
 | |
| 	cmd = werge merge -s $LOCAL $BASE $REMOTE > $MERGED
 | |
| 	trustExitCode = true
 | |
| ```
 | |
| 
 | |
| One issue with `git` mergetools is that they are supposed to be interactive,
 | |
| and thus `git` expects them to always produce a completely merged, conflictless
 | |
| result. In turn, if the auto-merging with `git mergetool -t werge` fails with
 | |
| conflicts, `git` assumes a complete failure and restores the original version
 | |
| from the backup. To enable a more useful behavior, use `werge` as a merge
 | |
| driver (see below).
 | |
| 
 | |
| ### Use as a `git` merge driver
 | |
| 
 | |
| Add this to your git config:
 | |
| ```ini
 | |
| [merge "werge"]
 | |
| 	name = werge
 | |
| 	driver = werge merge %A %O %B > %P
 | |
| 	recursive = binary
 | |
| ```
 | |
| 
 | |
| Then, specify that the "werge" driver should be used for certain files in your
 | |
| repository's `.gitattributes`:
 | |
| ```
 | |
| *.md merge=werge
 | |
| *.tex merge=werge
 | |
| # ... etc
 | |
| ```
 | |
| 
 | |
| With this in place, `git merge` will automatically run `werge` to merge the
 | |
| marked files in the repository. On conflict, you will have the files marked
 | |
| with the usual (werge's usual) conflict markers, and you will be able to
 | |
| resolve them just as with the normal merging workflow.
 | |
| 
 | |
| **Hint:** As with `spacewerge` mergetool above, it is beneficial to add a few
 | |
| conflict-resolving options such as `-s` to the `driver`, in order to help the
 | |
| automerges pass nicely.
 | |
| 
 | |
| ### Use with `git rebase`
 | |
| 
 | |
| The merge driver and mergetools as configured above will also automatically
 | |
| work with `git rebase` that runs in the "merge mode" (which is the default).
 | |
| 
 | |
| As a possible source of confusion, the "my" and "your" versions are somewhat swapped (as implied by semantics):
 | |
| 
 | |
| - With `git checkout mybranch; git merge otherbranch`, the conflicts will look
 | |
|   roughly like this:
 | |
|   ```
 | |
|   <<<<< mybranch version ||||| merge base ===== otherbranch version >>>>>
 | |
|   ```
 | |
| - With `git checkout mybranch; git rebase otherbranch`, the logic is reversed:
 | |
|   ```
 | |
|   <<<<< otherbranch version ||||| common base ===== mybranch version >>>>>
 | |
|   ```
 | |
| 
 | |
| ## Current `--help` and features
 | |
| 
 | |
| ```
 | |
| werge -- blanks-friendly mergetool for tiny interdwindled changes
 | |
| 
 | |
| Usage: werge [(-F|--tok-filter FILTER) | (-i|--simple-tokens) | 
 | |
|                (-I|--full-tokens)] [--no-zeal | (-z|--zeal)] 
 | |
|              [-S|--space (keep|my|old|your)] 
 | |
|              [-s | --resolve-space (normal|keep|my|old|your)] 
 | |
|              [--conflict-space-overlaps] [--conflict-space-separate] 
 | |
|              [--conflict-space-all] [-C|--expand-context N] 
 | |
|              [--resolve (keep|my|old|your)] [--conflict-overlaps] 
 | |
|              [--conflict-separate] [--conflict-all] [-G|--color] 
 | |
|              [--label-start "<<<<<"] [--label-mo "|||||"] [--label-diff "|||||"]
 | |
|              [--label-oy "====="] [--label-end ">>>>>"] COMMAND
 | |
| 
 | |
| Available options:
 | |
|   -F,--tok-filter FILTER   External program to separate the text to tokens
 | |
|   -i,--simple-tokens       Use wider character class to separate the tokens
 | |
|                            (results in larger tokens and ignores case)
 | |
|   -I,--full-tokens         Separate characters by all known character classes
 | |
|                            (default)
 | |
|   --no-zeal                avoid zealous mode (default)
 | |
|   -z,--zeal                Try to zealously minify conflicts, potentially
 | |
|                            resolving them
 | |
|   -S,--space (keep|my|old|your)
 | |
|                            Retain spacing from a selected version, or keep all
 | |
|                            space changes for merging (default: keep)
 | |
|   -s                       Shortcut for `--resolve-space keep' (this separates
 | |
|                            space-only conflicts, enabling better automated
 | |
|                            resolution)
 | |
|   --resolve-space (normal|keep|my|old|your)
 | |
|                            Resolve conflicts in space-only tokens separately,
 | |
|                            and either keep unresolved conflicts, or resolve in
 | |
|                            favor of a given version; `normal' resolves the
 | |
|                            spaces together with other tokens, ignoring choices
 | |
|                            in --resolve-space-* (default: normal)
 | |
|   --conflict-space-overlaps
 | |
|                            Never resolve overlapping changes in space-only
 | |
|                            tokens
 | |
|   --conflict-space-separate
 | |
|                            Never resolve separate (non-overlapping) changes in
 | |
|                            space-only tokens
 | |
|   --conflict-space-all     Never resolve any changes in space-only tokens
 | |
|   -C,--expand-context N    Consider changes that are at less than N tokens apart
 | |
|                            to be a single change; 0 turns off conflict
 | |
|                            expansion, 1 may cause bad resolutions of near
 | |
|                            conflicting edits (default: 2)
 | |
|   --resolve (keep|my|old|your)
 | |
|                            Resolve general conflicts in favor of a given
 | |
|                            version, or keep the conflicts (default: keep)
 | |
|   --conflict-overlaps      Never resolve overlapping changes in general tokens
 | |
|   --conflict-separate      Never resolve separate (non-overlapping) changes in
 | |
|                            general tokens
 | |
|   --conflict-all           Never resolve any changes in general tokens
 | |
|   -G,--color               Use shorter, gaily colored output markers by default
 | |
|                            (requires ANSI color support; good for terminals or
 | |
|                            `less -R')
 | |
|   --label-start "<<<<<"    Label for beginning of the conflict
 | |
|   --label-mo "|||||"       Separator of local edits and original
 | |
|   --label-diff "|||||"     Separator for old and new version
 | |
|   --label-oy "====="       Separator of original and other people's edits
 | |
|   --label-end ">>>>>"      Label for end of the conflict
 | |
|   -h,--help                Show this help text
 | |
|   --version                Show version information
 | |
| 
 | |
| Available commands:
 | |
|   merge                    diff3-style merge of two changesets
 | |
|   git                      Automerge unmerged files in git conflict
 | |
|   diff                     Find differences between two files
 | |
|   patch                    Apply a patch from `diff' to file
 | |
|   break                    Break text to tokens
 | |
|   glue                     Glue tokens back to text
 | |
| 
 | |
| werge is a free software, use it accordingly.
 | |
| ```
 | |
| 
 | |
| #### Manual merging
 | |
| ```
 | |
| Usage: werge merge MYFILE OLDFILE YOURFILE
 | |
| 
 | |
|   diff3-style merge of two changesets
 | |
| 
 | |
| Available options:
 | |
|   MYFILE                   Version with local edits
 | |
|   OLDFILE                  Original file version
 | |
|   YOURFILE                 Version with other people's edits
 | |
|   -h,--help                Show this help text
 | |
| ```
 | |
| 
 | |
| #### Git interoperability
 | |
| ```
 | |
| Usage: werge git (UNMERGED | (-u|--unmerged)) [(-a|--add) | --no-add]
 | |
| 
 | |
|   Automerge unmerged files in git conflict
 | |
| 
 | |
| Available options:
 | |
|   UNMERGED                 Unmerged file tracked by git (can be specified
 | |
|                            repeatedly)
 | |
|   -u,--unmerged            Process all files marked as unmerged by git
 | |
|   -a,--add                 Run `git add' for fully merged files
 | |
|   --no-add                 Prevent running `git add'
 | |
|   -h,--help                Show this help text
 | |
| ```
 | |
| 
 | |
| #### Finding differences
 | |
| ```
 | |
| Usage: werge diff OLDFILE YOURFILE 
 | |
|                   [(-u|--unified) | (-U|--unified-size ARG) | (-m|--merge)]
 | |
| 
 | |
|   Find differences between two files
 | |
| 
 | |
| Available options:
 | |
|   OLDFILE                  Original file version
 | |
|   YOURFILE                 File version with changes
 | |
|   -u,--unified             Produce unified-diff-like output for `patch' with
 | |
|                            default context size (20)
 | |
|   -U,--unified-size ARG    Produce unified diff with this context size
 | |
|   -m,--merge               Highlight the differences as with `merge' (default)
 | |
|   -h,--help                Show this help text
 | |
| ```
 | |
| 
 | |
| #### Patching files in place
 | |
| ```
 | |
| Usage: werge patch (MYFILE | (-f|--format)) [-p|--patch PATCH]
 | |
| 
 | |
|   Modify a file using a patch from `diff'
 | |
| 
 | |
| Available options:
 | |
|   MYFILE                   File to be patched
 | |
|   -f,--format              Do not patch anything, only format the patch using
 | |
|                            conflict marks on joined tokens
 | |
|   -p,--patch PATCH         File with the patch (default: stdin)
 | |
|   -h,--help                Show this help text
 | |
| ```
 | |
| 
 | |
| #### Converting between files and tokens
 | |
| 
 | |
| Both commands work as plain stdin-to-stdout filters:
 | |
| 
 | |
| ```
 | |
| Usage: werge break 
 | |
| 
 | |
|   Break text to tokens
 | |
| ```
 | |
| 
 | |
| ```
 | |
| Usage: werge glue 
 | |
| 
 | |
|   Glue tokens back to text
 | |
| ```
 | 
