diff --git a/README.md b/README.md new file mode 100644 index 0000000..1505734 --- /dev/null +++ b/README.md @@ -0,0 +1,173 @@ + +# adiff (arbitrary-tokens diff, patch and merge) + +This is a half-working pre-alpha version, use with care. + +### Short summary + +The main aim of this toolbox is to help with finding differences in text +formats that do not have a fixed "line-by-line" semantics, as assumed by +standard unix `diff` and related tools. + +The problem was previously tackled by Arek Antoniewicz on MFF CUNI, who +produced a working software package in C++, and designed the Regex-edged DFAs +(REDFAs) that were used for user-specifiable tokenization of the input. The +work on the corresponding thesis is currently ongoing. + +This started as a simple Haskell port of that work, and packed some relatively +orthogonal improvements (mainly the histogram-style diffing). + +### TODO list + +- Implement `patch` functionality (`diff` and `diff3` works) +- Implement the splitting heuristic for diffs, so that the diffing of larger + files doesn't take aeons +- Check whether REDFA can even be implemented correctly with current Haskell + libraries (most regex libraries target a completely different). Taking the + lexer specification format from `alex` currently seems like a much better + option. Deferring the task unix-ishly to another program could work too. + +# How-To + +Install using `cabal`. The `adiff` program has 3 sub-commands that work like +`diff`, `patch` and `diff3`. It expects a lexing specification on the input; +there are several very simple example lexers in `lexers/`. + +## Example + +Let's have a file `orig`: +``` +Roses are red. Violets are blue. +Patch is quite hard. I cannot rhyme. +``` + +and a modified file `mine`: +``` +Roses are red. Violets are blue. +Patching is hard. I still cannot rhyme. +``` + +Let's use the `words` lexer, which marks everything whitespace-ish as +whitespace, and groups of non-whitespace: +``` +:[^ \t\n]* +_:[ \t\n]* +``` + +Diffing the 2 files gets done as such: +``` + $ cabal run adiff -- -l lexers/words diff orig mine +``` + +You should get something like this: +``` +@@ -7 +7 @@ + . + |are + . + |blue. + .\n +-|Patch ++|Patching + . + |is +-. +-|quite + . + |hard. + . + |I ++. ++|still + . + |cannot + . + |rhyme. + .\n +``` + +Let's pretend someone has sent us a patch with a better formated verse with +some other improvements, in file `yours`: +``` +Roses are red. +Violets are blue. +Patch is quite hard. +I cannot do verses. +``` + +We can run `diff3` to get a patch with both changes, optionally with reduced +context: +``` + $ cabal run adiff -- -l lexers/words diff3 mine orig yours -C1 +``` +...which outputs: +``` +@@ -4 +4 @@ + |red. +-. ++.\n + |Violets +@@ -11 +11 @@ + .\n +-|Patch ++|Patching + . + |is +-. +-|quite + . + |hard. +-. ++.\n + |I ++. ++|still + . +@@ -23 +23 @@ + . +-|rhyme. ++|do ++. ++|verses. + .\n +``` + +...or get a merged output right away, using the `-m`/`--merge` option: +``` +Roses are red. +Violets are blue. +Patching is hard. +I still cannot do verses. +``` + +...or completely ignore whatever whitespace changes that the people decided to +do for whatever reason, with `-i`/`--ignore-whitespace` (also works without +`-m`): +``` +Roses are red. Violets are blue. +Patching is hard. I still cannot do verses. +``` + +If there's a conflict (substituing the `Patch` to `Merging` in file `yours`), it gets highlighted in the merged diff as such: +``` +[...] + . + |blue. + .\n +<|Patching +=|Patch +>|Merging + . + |is +-. +-|quite +[...] +``` + +and using the standard conflict marks in the merged output: +``` +Roses are red. +Violets are blue. +<<<<<<>>>>>> is hard. +I still cannot do verses. +``` diff --git a/lexers/letters b/lexers/letters new file mode 100644 index 0000000..c8ca875 --- /dev/null +++ b/lexers/letters @@ -0,0 +1,2 @@ +[a-z] +_:[ \n] diff --git a/lexers/lines b/lexers/lines new file mode 100644 index 0000000..5bf4719 --- /dev/null +++ b/lexers/lines @@ -0,0 +1,2 @@ +[^\n]*\n +[^\n]* diff --git a/lexers/nums b/lexers/nums new file mode 100644 index 0000000..3e57b7e --- /dev/null +++ b/lexers/nums @@ -0,0 +1,2 @@ +[0-9] +_:\n diff --git a/lexers/words b/lexers/words new file mode 100644 index 0000000..a772682 --- /dev/null +++ b/lexers/words @@ -0,0 +1,2 @@ +:[^ \t\n]* +_:[ \t\n]*