src | ||
adiff.cabal | ||
CHANGELOG.md | ||
LICENSE | ||
README.md | ||
Setup.hs |
adiff (arbitrary-tokens diff, patch and merge)
This is a half-working pre-alpha version, use with care.
Short summary
The main aim of this toolbox is to help with finding differences in text
formats that do not have a fixed "line-by-line" semantics, as assumed by
standard unix diff
and related tools.
The problem was previously tackled by Arek Antoniewicz on MFF CUNI, who produced a working software package in C++, and designed the Regex-edged DFAs (REDFAs) that were used for user-specifiable tokenization of the input. The work on the corresponding thesis is finished.
This started as a simple Haskell port of that work, and packed some relatively orthogonal improvements (mainly the histogram-style diffing). I later got rid of the REDFA concept -- while super-interesting and useful in theory, I didn't find a sufficiently universal way to build good lexers from user-specified strings. Having a proper Regex representation library (so that e.g. reconstructing Flex is easy) would help a lot.
TODO list
- Implement
patch
functionality, mainly patchfile parsing and fuzzy matching of hunk context.diff
anddiff3
works. - Implement a splitting heuristic for diffs, so that diffing of large files doesn't take aeons
- check if we can have external lexers, unix-style
How-To
Install using cabal
. The adiff
program has 3 sub-commands that work like
diff
, patch
and diff3
.
Example
Let's have a file orig
:
Roses are red. Violets are blue.
Patch is quite hard. I cannot rhyme.
and a modified file mine
:
Roses are red. Violets are blue.
Patching is hard. I still cannot rhyme.
Let's use the words
lexer, which marks everything whitespace-ish as
whitespace, and picks up groups of non-whitespace "content" characters.
Diffing the 2 files gets done as such:
$ cabal run adiff -- -l words diff orig mine
You should get something like this:
@@ -7 +7 @@
.
|are
.
|blue.
.\n
-|Patch
+|Patching
.
|is
-.
-|quite
.
|hard.
.
|I
+.
+|still
.
|cannot
.
|rhyme.
.\n
Let's pretend someone has sent us a new version, with a better formated verse
and some other improvements, in file yours
:
Roses are red.
Violets are blue.
Patch is quite hard.
I cannot do verses.
We can run diff3
to get a patch with both changes, optionally with reduced
context:
$ cabal run adiff -- -l words diff3 mine orig yours -C1
...which outputs:
@@ -4 +4 @@
|red.
-.
+.\n
|Violets
@@ -11 +11 @@
.\n
-|Patch
+|Patching
.
|is
-.
-|quite
.
|hard.
-.
+.\n
|I
+.
+|still
.
@@ -23 +23 @@
.
-|rhyme.
+|do
+.
+|verses.
.\n
...or get a merged output right away, using the -m
/--merge
option:
Roses are red.
Violets are blue.
Patching is hard.
I still cannot do verses.
...or completely ignore whatever whitespace changes that the people decided to
do for whatever reason, with -i
/--ignore-whitespace
(also works without
-m
):
Roses are red. Violets are blue.
Patching is hard. I still cannot do verses.
If there's a conflict (substituing the Patch
to Merging
in file yours
), it gets highlighted in the merged diff as such:
[...]
.
|blue.
.\n
<|Patching
=|Patch
>|Merging
.
|is
-.
-|quite
[...]
and using the standard conflict marks in the merged output:
Roses are red.
Violets are blue.
<<<<<<<Patching|||||||Patch=======Merging>>>>>>> is hard.
I still cannot do verses.