exa/adiff

Find a file

Mirek Kratochvil 7781cd6512 document this a bit, add redfa examples		2020-12-30 11:51:44 +01:00
lexers	document this a bit, add redfa examples	2020-12-30 11:51:44 +01:00
src	better commandline parsing	2020-12-29 19:45:36 +01:00
adiff.cabal	software F engineering	2020-09-27 19:26:53 +02:00
CHANGELOG.md	init	2020-08-13 12:54:20 +02:00
LICENSE	init	2020-08-13 12:54:20 +02:00
README.md	document this a bit, add redfa examples	2020-12-30 11:51:44 +01:00
Setup.hs	init	2020-08-13 12:54:20 +02:00

README.md

adiff (arbitrary-tokens diff, patch and merge)

This is a half-working pre-alpha version, use with care.

Short summary

The main aim of this toolbox is to help with finding differences in text formats that do not have a fixed "line-by-line" semantics, as assumed by standard unix diff and related tools.

The problem was previously tackled by Arek Antoniewicz on MFF CUNI, who produced a working software package in C++, and designed the Regex-edged DFAs (REDFAs) that were used for user-specifiable tokenization of the input. The work on the corresponding thesis is currently ongoing.

This started as a simple Haskell port of that work, and packed some relatively orthogonal improvements (mainly the histogram-style diffing).

TODO list

Implement patch functionality (diff and diff3 works)
Implement the splitting heuristic for diffs, so that the diffing of larger files doesn't take aeons
Check whether REDFA can even be implemented correctly with current Haskell libraries (most regex libraries target a completely different). Taking the lexer specification format from alex currently seems like a much better option. Deferring the task unix-ishly to another program could work too.

How-To

Install using cabal. The adiff program has 3 sub-commands that work like diff, patch and diff3. It expects a lexing specification on the input; there are several very simple example lexers in lexers/.

Example

Let's have a file orig:

Roses are red. Violets are blue.
Patch is quite hard. I cannot rhyme.

and a modified file mine:

Roses are red. Violets are blue.
Patching is hard. I still cannot rhyme.

Let's use the words lexer, which marks everything whitespace-ish as whitespace, and groups of non-whitespace:

:[^ \t\n]*
_:[ \t\n]*

Diffing the 2 files gets done as such:

 $ cabal run adiff -- -l lexers/words diff orig mine

You should get something like this:

@@ -7 +7 @@
 . 
 |are
 . 
 |blue.
 .\n
-|Patch
+|Patching
 . 
 |is
-. 
-|quite
 . 
 |hard.
 . 
 |I
+. 
+|still
 . 
 |cannot
 . 
 |rhyme.
 .\n

Let's pretend someone has sent us a patch with a better formated verse with some other improvements, in file yours:

Roses are red.
Violets are blue.
Patch is quite hard.
I cannot do verses.

We can run diff3 to get a patch with both changes, optionally with reduced context:

 $ cabal run adiff -- -l lexers/words diff3 mine orig yours -C1

...which outputs:

@@ -4 +4 @@
 |red.
-. 
+.\n
 |Violets
@@ -11 +11 @@
 .\n
-|Patch
+|Patching
 . 
 |is
-. 
-|quite
 . 
 |hard.
-. 
+.\n
 |I
+. 
+|still
 . 
@@ -23 +23 @@
 . 
-|rhyme.
+|do
+. 
+|verses.
 .\n

...or get a merged output right away, using the -m/--merge option:

Roses are red.
Violets are blue.
Patching is hard.
I still cannot do verses.

...or completely ignore whatever whitespace changes that the people decided to do for whatever reason, with -i/--ignore-whitespace (also works without -m):

Roses are red. Violets are blue.
Patching is hard. I still cannot do verses.

If there's a conflict (substituing the Patch to Merging in file yours), it gets highlighted in the merged diff as such:

[...]
 . 
 |blue.
 .\n
<|Patching
=|Patch
>|Merging
 . 
 |is
-. 
-|quite
[...]

and using the standard conflict marks in the merged output:

Roses are red.
Violets are blue.
<<<<<<<Patching|||||||Patch=======Merging>>>>>>> is hard.
I still cannot do verses.