| lexers | ||
| src | ||
| adiff.cabal | ||
| CHANGELOG.md | ||
| LICENSE | ||
| README.md | ||
| Setup.hs | ||
adiff (arbitrary-tokens diff, patch and merge)
This is a half-working pre-alpha version, use with care.
Short summary
The main aim of this toolbox is to help with finding differences in text
formats that do not have a fixed "line-by-line" semantics, as assumed by
standard unix diff and related tools.
The problem was previously tackled by Arek Antoniewicz on MFF CUNI, who produced a working software package in C++, and designed the Regex-edged DFAs (REDFAs) that were used for user-specifiable tokenization of the input. The work on the corresponding thesis is currently ongoing.
This started as a simple Haskell port of that work, and packed some relatively orthogonal improvements (mainly the histogram-style diffing).
TODO list
- Implement patchfunctionality (diffanddiff3works)
- Implement the splitting heuristic for diffs, so that the diffing of larger files doesn't take aeons
- Check whether REDFA can even be implemented correctly with current Haskell
libraries (most regex libraries target a completely different). Taking the
lexer specification format from alexcurrently seems like a much better option. Deferring the task unix-ishly to another program could work too.
How-To
Install using cabal. The adiff program has 3 sub-commands that work like
diff, patch and diff3. It expects a lexing specification on the input;
there are several very simple example lexers in lexers/.
Example
Let's have a file orig:
Roses are red. Violets are blue.
Patch is quite hard. I cannot rhyme.
and a modified file mine:
Roses are red. Violets are blue.
Patching is hard. I still cannot rhyme.
Let's use the words lexer, which marks everything whitespace-ish as
whitespace, and groups of non-whitespace:
:[^ \t\n]*
_:[ \t\n]*
Diffing the 2 files gets done as such:
 $ cabal run adiff -- -l lexers/words diff orig mine
You should get something like this:
@@ -7 +7 @@
 . 
 |are
 . 
 |blue.
 .\n
-|Patch
+|Patching
 . 
 |is
-. 
-|quite
 . 
 |hard.
 . 
 |I
+. 
+|still
 . 
 |cannot
 . 
 |rhyme.
 .\n
Let's pretend someone has sent us a patch with a better formated verse with
some other improvements, in file yours:
Roses are red.
Violets are blue.
Patch is quite hard.
I cannot do verses.
We can run diff3 to get a patch with both changes, optionally with reduced
context:
 $ cabal run adiff -- -l lexers/words diff3 mine orig yours -C1
...which outputs:
@@ -4 +4 @@
 |red.
-. 
+.\n
 |Violets
@@ -11 +11 @@
 .\n
-|Patch
+|Patching
 . 
 |is
-. 
-|quite
 . 
 |hard.
-. 
+.\n
 |I
+. 
+|still
 . 
@@ -23 +23 @@
 . 
-|rhyme.
+|do
+. 
+|verses.
 .\n
...or get a merged output right away, using the -m/--merge option:
Roses are red.
Violets are blue.
Patching is hard.
I still cannot do verses.
...or completely ignore whatever whitespace changes that the people decided to
do for whatever reason, with -i/--ignore-whitespace (also works without
-m):
Roses are red. Violets are blue.
Patching is hard. I still cannot do verses.
If there's a conflict (substituing the Patch to Merging in file yours), it gets highlighted in the merged diff as such:
[...]
 . 
 |blue.
 .\n
<|Patching
=|Patch
>|Merging
 . 
 |is
-. 
-|quite
[...]
and using the standard conflict marks in the merged output:
Roses are red.
Violets are blue.
<<<<<<<Patching|||||||Patch=======Merging>>>>>>> is hard.
I still cannot do verses.