# adiff (arbitrary-tokens diff, patch and merge) This is a half-working pre-alpha version, use with care. ### Short summary The main aim of this toolbox is to help with finding differences in text formats that do not have a fixed "line-by-line" semantics, as assumed by standard unix `diff` and related tools. The problem was previously tackled by Arek Antoniewicz on MFF CUNI, who produced a working software package in C++, and designed the Regex-edged DFAs (REDFAs) that were used for user-specifiable tokenization of the input. The work on the corresponding thesis is currently ongoing. This started as a simple Haskell port of that work, and packed some relatively orthogonal improvements (mainly the histogram-style diffing). ### TODO list - Implement `patch` functionality, mainly patchfile parsing and fuzzy matching of hunk context. `diff` and `diff3` works. - Implement a splitting heuristic for diffs, so that diffing of large files doesn't take aeons - Check whether REDFA can even be implemented correctly with current Haskell libraries (most regex libraries target a completely different). Taking the lexer specification format from `alex` currently seems like a much better option. Deferring the task unix-ishly to another program could work too. # How-To Install using `cabal`. The `adiff` program has 3 sub-commands that work like `diff`, `patch` and `diff3`. It expects a lexing specification on the input; there are several very simple example lexers in `lexers/`. ## Example Let's have a file `orig`: ``` Roses are red. Violets are blue. Patch is quite hard. I cannot rhyme. ``` and a modified file `mine`: ``` Roses are red. Violets are blue. Patching is hard. I still cannot rhyme. ``` Let's use the `words` lexer, which marks everything whitespace-ish as whitespace, and groups of non-whitespace: ``` :[^ \t\n]* _:[ \t\n]* ``` Diffing the 2 files gets done as such: ``` $ cabal run adiff -- -l lexers/words diff orig mine ``` You should get something like this: ``` @@ -7 +7 @@ . |are . |blue. .\n -|Patch +|Patching . |is -. -|quite . |hard. . |I +. +|still . |cannot . |rhyme. .\n ``` Let's pretend someone has sent us a new version, with a better formated verse and some other improvements, in file `yours`: ``` Roses are red. Violets are blue. Patch is quite hard. I cannot do verses. ``` We can run `diff3` to get a patch with both changes, optionally with reduced context: ``` $ cabal run adiff -- -l lexers/words diff3 mine orig yours -C1 ``` ...which outputs: ``` @@ -4 +4 @@ |red. -. +.\n |Violets @@ -11 +11 @@ .\n -|Patch +|Patching . |is -. -|quite . |hard. -. +.\n |I +. +|still . @@ -23 +23 @@ . -|rhyme. +|do +. +|verses. .\n ``` ...or get a merged output right away, using the `-m`/`--merge` option: ``` Roses are red. Violets are blue. Patching is hard. I still cannot do verses. ``` ...or completely ignore whatever whitespace changes that the people decided to do for whatever reason, with `-i`/`--ignore-whitespace` (also works without `-m`): ``` Roses are red. Violets are blue. Patching is hard. I still cannot do verses. ``` If there's a conflict (substituing the `Patch` to `Merging` in file `yours`), it gets highlighted in the merged diff as such: ``` [...] . |blue. .\n <|Patching =|Patch >|Merging . |is -. -|quite [...] ``` and using the standard conflict marks in the merged output: ``` Roses are red. Violets are blue. <<<<<<>>>>>> is hard. I still cannot do verses. ```