aboutsummaryrefslogtreecommitdiff
path: root/README.md
blob: 9683fa42f3a8ea9575cd5e1ca5fd524f159a393a (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421

# werge (merge weird stuff)

This is a partial work-alike of `diff3`, `patch`, `git merge` and other merge-y
tools that is capable of:

- merging token-size changes (words, identifiers, sentences) instead of
  line-size ones
- merging changes in blank characters separately or ignoring them altogether

These properties are great for several use-cases:

- combining changes in free-flowing text (such as in TeX or Markdown),
  irrespectively of changed line breaks, paragraph breaking and justification,
  etc.
- merging of code formatted with different code formatters
- minimizing the conflict size of tiny changes to a few characters, making them
  easier to resolve

Separate `diff`&`patch` functionality is provided too for sending
token-granularity patches. (The patches are similar to what `git diff
--word-diff` produces, but can be applied to files.)

## Installation

- To build from source, clone the repo and run `cabal install` in the directory
  (you need [a way to compile Haskell](https://www.haskell.org/downloads/)).
- [Releases](https://github.com/exaexa/werge/releases) come with prebuilt
  binaries that you may download and run as-is on many Linuxes and Macs.

Running of `werge` requires working installations of `diff` and `patch`
compatible with the ones from [GNU
diffutils](https://www.gnu.org/software/diffutils/):
- Most Linux distributions contain the correct diffutils
- On BSDs you should be able to install these from Ports
  ([FreeBSD](https://cgit.freebsd.org/ports/tree/textproc/diffutils),
  [OpenBSD](https://openports.eu/ports/textproc/gdiff))
- On Macs, install [diffutils from
  brew](https://formulae.brew.sh/formula/diffutils).

In any other case, you may set up a path to any compatible `diff` and `patch`
(or suitable wrapper scripts) via environment variables `WERGE_DIFF` and
`WERGE_PATCH`. (If required, the same applies for `WERGE_GIT`.)

### Editor integration

There's a `vim` syntax highlighting file in `vim/werge.vim`. To install, simply
copy it to your local `vim` syntax configuration directory (usually to
`~/.vim/syntax/werge.vim`). Then, you can activate the syntax in vim with:

```
:set syn=werge
```

## Demo

##### Original (`old` file):
```
Roses are red. Violets are blue.
Patch is quite hard. I cannot rhyme.
```

##### Local changes (`my` file):
```
Roses are red. Violets are blue.
Patching is hard. I still cannot rhyme.
```

##### Remote changes (`your` file):
```
Roses are red.
Violets are blue.
Patch is quite hard.
I cannot do verses.
```

##### Token-merged version

This is produced with `werge merge my old your` (conflicts on the space change
that is too close to the disappearing "still" token):
```
Roses are red.
Violets are blue.
Patching is hard.<<<<< I still||||| I=====
I>>>>> cannot do verses.
```
(NOTE: option `-G` gives nicely colored output that is much easier to read.
Alternatively you can install the syntax highlighting for `vim`.)

##### Merge with separate space resultion
Adding option `-s` to `werge merge` causes it to resolve space conflicts
separately, usually helping many cases that would be easily resolvable by a
human:
```
Roses are red.
Violets are blue.
Patching is hard.
I still cannot do verses.
```

##### Mixing in unresolvable conflict
A harder-conflicting file (`their`):
```
Roses are red.
Violets are blue.
Merging is quite hard.
I cannot do verses.
```

`werge merge my old their -s` highlights the actual unmergeable change:
```
Roses are red.
Violets are blue.
<<<<<Patching|||||Patch=====Merging>>>>> is hard.
I still cannot do verses.
```

## How does it work?

- Instead of lines, the files are torn to small tokens (words, spaces, symbols,
  ...) and these are diffed and merged individually.
- Some tokens are marked as spaces by the tokenizer, which allows the merge
  algorithm to be (selectively) more zealous when resolving conflicts on these.

Technically, the ideas are similar to
[`spiff`](http://hpux.connect.org.uk/hppd/hpux/Text/spiff-1.0/) or `git diff
--word-diff`.  Other tools exist such as
[`difftastic`](https://difftastic.wilfred.me.uk/) and
[`mergiraf`](https://mergiraf.org/) that are aware of the file structure (i.e.,
the actual syntax _tree_) that can be used to improve output.  Compared to
these, **`werge` is completely oblivious about the actual file structure**, and
thus works quite well on any file type.  This choice trades off some diff&merge
quality for (a lot of) complexity.

Tokenizers in `werge` are simple, implementable as linear scanners that print
separate tokens on individual lines that are prefixed with a space mark (`.`
for space and `/` for non-space), and escape newlines and backslashes. A
default tokenization of string "hello \ world" with a new line at the end is
listed below (note the invisible space on the lines with dots):

```
/hello
. 
/\\
. 
/world
.\n
```

### Custom tokenizers

Users may supply any tokenizer via option `-F`. The script below produces
line-size tokens for demonstration (in turn, `werge` will do the usual line
merges), and can be used e.g. via `-F ./tokenize.py`:

```py
#!/usr/bin/env python3
import sys
for l in sys.stdin.readlines():
    if len(l)==0: continue
    if l[-1]=='\n':
        print('/'+l[:-1].replace('\\','\\\\')+'\\n')
    else:
        print('/'+l.replace('\\','\\\\'))
```

### History

I previously made an attempt to solve this in `adiff` software, which failed
because the approach was too complex. Before that, the issue was tackled by
Arek Antoniewicz on MFF CUNI, who used regex-edged DFAs (REDFAs) to construct
user-specifiable tokenizers in a pretty cool way.

## Integration with `git`

### Automerging conflicts

`werge` can automatically process files that are marked in `git` as merge
conflicts:

```sh
$ git merge somebranch
$ werge git -ua
```

Options `-ua` (`--unmerged --add`) find all files that are marked as unmerged,
tries to merge them token-by-token, and if the merge is successful with current
settings it runs `git add` on them. The current changes in the files are
replaced by the merged (or partially merged) state; backups are written
automatically to `filename.werge-backup`.

Optionally, you can specify exact files to be automerged. That is useful for
cases when only some of the conflicting files should be processed by `werge`:

```sh
$ werge git my/conflicting/file.txt
```

Support for merging complex types of changes (deletes, directory moves,
symlinks, ...) via this interface is currently limited. `werge` can be used as
a mergetool or a merge driver to ameliorate that.

### Use as `git difftool` and `git mergetool`

The `git` config below allows direct use of `werge` as `git difftool -t werge`
and `git mergetool -t werge`:
```ini
[difftool "werge"]
	cmd = werge diff -G $LOCAL $REMOTE
[mergetool "werge"]
	cmd = werge merge $LOCAL $BASE $REMOTE > $MERGED
	trustExitCode = true

# variant for separate resolution of space (solves more conflicts):
[mergetool "spacewerge"]
	cmd = werge merge -s $LOCAL $BASE $REMOTE > $MERGED
	trustExitCode = true
```

One issue with `git` mergetools is that they are supposed to be interactive,
and thus `git` expects them to always produce a completely merged, conflictless
result. In turn, if the auto-merging with `git mergetool -t werge` fails with
conflicts, `git` assumes a complete failure and restores the original version
from the backup. To enable a more useful behavior, use `werge` as a merge
driver (see below).

### Use as a `git` merge driver

Add this to your git config:
```ini
[merge "werge"]
	name = werge
	driver = werge merge %A %O %B > %P
	recursive = binary
```

Then, specify that the "werge" driver should be used for certain files in your
repository's `.gitattributes`:
```
*.md merge=werge
*.tex merge=werge
# ... etc
```

With this in place, `git merge` will automatically run `werge` to merge the
marked files in the repository. On conflict, you will have the files marked
with the usual (werge's usual) conflict markers, and you will be able to
resolve them just as with the normal merging workflow.

**Hint:** As with `spacewerge` mergetool above, it is beneficial to add a few
conflict-resolving options such as `-s` to the `driver`, in order to help the
automerges pass nicely.

### Use with `git rebase`

The merge driver and mergetools as configured above will also automatically
work with `git rebase` that runs in the "merge mode" (which is the default).

As a possible source of confusion, the "my" and "your" versions are somewhat swapped (as implied by semantics):

- With `git checkout mybranch; git merge otherbranch`, the conflicts will look
  roughly like this:
  ```
  <<<<< mybranch version ||||| merge base ===== otherbranch version >>>>>
  ```
- With `git checkout mybranch; git rebase otherbranch`, the logic is reversed:
  ```
  <<<<< otherbranch version ||||| common base ===== mybranch version >>>>>
  ```

## Current `--help` and features

```
werge -- blanks-friendly mergetool for tiny interdwindled changes

Usage: werge [(-F|--tok-filter FILTER) | (-i|--simple-tokens) | 
               (-I|--full-tokens)] [--no-zeal | (-z|--zeal)] 
             [-S|--space (keep|my|old|your)] 
             [-s | --resolve-space (normal|keep|my|old|your)] 
             [--conflict-space-overlaps] [--conflict-space-separate] 
             [--conflict-space-all] [-C|--expand-context N] 
             [--resolve (keep|my|old|your)] [--conflict-overlaps] 
             [--conflict-separate] [--conflict-all] [-G|--color] 
             [--label-start "<<<<<"] [--label-mo "|||||"] [--label-diff "|||||"]
             [--label-oy "====="] [--label-end ">>>>>"] COMMAND

Available options:
  -F,--tok-filter FILTER   External program to separate the text to tokens
  -i,--simple-tokens       Use wider character class to separate the tokens
                           (results in larger tokens and ignores case)
  -I,--full-tokens         Separate characters by all known character classes
                           (default)
  --no-zeal                avoid zealous mode (default)
  -z,--zeal                Try to zealously minify conflicts, potentially
                           resolving them
  -S,--space (keep|my|old|your)
                           Retain spacing from a selected version, or keep all
                           space changes for merging (default: keep)
  -s                       Shortcut for `--resolve-space keep' (this separates
                           space-only conflicts, enabling better automated
                           resolution)
  --resolve-space (normal|keep|my|old|your)
                           Resolve conflicts in space-only tokens separately,
                           and either keep unresolved conflicts, or resolve in
                           favor of a given version; `normal' resolves the
                           spaces together with other tokens, ignoring choices
                           in --resolve-space-* (default: normal)
  --conflict-space-overlaps
                           Never resolve overlapping changes in space-only
                           tokens
  --conflict-space-separate
                           Never resolve separate (non-overlapping) changes in
                           space-only tokens
  --conflict-space-all     Never resolve any changes in space-only tokens
  -C,--expand-context N    Consider changes that are at less than N tokens apart
                           to be a single change; 0 turns off conflict
                           expansion, 1 may cause bad resolutions of near
                           conflicting edits (default: 2)
  --resolve (keep|my|old|your)
                           Resolve general conflicts in favor of a given
                           version, or keep the conflicts (default: keep)
  --conflict-overlaps      Never resolve overlapping changes in general tokens
  --conflict-separate      Never resolve separate (non-overlapping) changes in
                           general tokens
  --conflict-all           Never resolve any changes in general tokens
  -G,--color               Use shorter, gaily colored output markers by default
                           (requires ANSI color support; good for terminals or
                           `less -R')
  --label-start "<<<<<"    Label for beginning of the conflict
  --label-mo "|||||"       Separator of local edits and original
  --label-diff "|||||"     Separator for old and new version
  --label-oy "====="       Separator of original and other people's edits
  --label-end ">>>>>"      Label for end of the conflict
  -h,--help                Show this help text
  --version                Show version information

Available commands:
  merge                    diff3-style merge of two changesets
  git                      Automerge unmerged files in git conflict
  diff                     Find differences between two files
  patch                    Apply a patch from `diff' to file
  break                    Break text to tokens
  glue                     Glue tokens back to text

werge is a free software, use it accordingly.
```

#### Manual merging
```
Usage: werge merge MYFILE OLDFILE YOURFILE

  diff3-style merge of two changesets

Available options:
  MYFILE                   Version with local edits
  OLDFILE                  Original file version
  YOURFILE                 Version with other people's edits
  -h,--help                Show this help text
```

#### Git interoperability
```
Usage: werge git (UNMERGED | (-u|--unmerged)) [(-a|--add) | --no-add]

  Automerge unmerged files in git conflict

Available options:
  UNMERGED                 Unmerged file tracked by git (can be specified
                           repeatedly)
  -u,--unmerged            Process all files marked as unmerged by git
  -a,--add                 Run `git add' for fully merged files
  --no-add                 Prevent running `git add'
  -h,--help                Show this help text
```

#### Finding differences
```
Usage: werge diff OLDFILE YOURFILE 
                  [(-u|--unified) | (-U|--unified-size ARG) | (-m|--merge)]

  Find differences between two files

Available options:
  OLDFILE                  Original file version
  YOURFILE                 File version with changes
  -u,--unified             Produce unified-diff-like output for `patch' with
                           default context size (20)
  -U,--unified-size ARG    Produce unified diff with this context size
  -m,--merge               Highlight the differences as with `merge' (default)
  -h,--help                Show this help text
```

#### Patching files in place
```
Usage: werge patch (MYFILE | (-f|--format)) [-p|--patch PATCH]

  Modify a file using a patch from `diff'

Available options:
  MYFILE                   File to be patched
  -f,--format              Do not patch anything, only format the patch using
                           conflict marks on joined tokens
  -p,--patch PATCH         File with the patch (default: stdin)
  -h,--help                Show this help text
```

#### Converting between files and tokens

Both commands work as plain stdin-to-stdout filters:

```
Usage: werge break 

  Break text to tokens
```

```
Usage: werge glue 

  Glue tokens back to text
```