document this a bit, add redfa examples

This commit is contained in:
Mirek Kratochvil 2020-12-30 11:51:44 +01:00
parent 14d7b454ff
commit 7781cd6512
5 changed files with 181 additions and 0 deletions

173
README.md Normal file
View file

@ -0,0 +1,173 @@
# adiff (arbitrary-tokens diff, patch and merge)
This is a half-working pre-alpha version, use with care.
### Short summary
The main aim of this toolbox is to help with finding differences in text
formats that do not have a fixed "line-by-line" semantics, as assumed by
standard unix `diff` and related tools.
The problem was previously tackled by Arek Antoniewicz on MFF CUNI, who
produced a working software package in C++, and designed the Regex-edged DFAs
(REDFAs) that were used for user-specifiable tokenization of the input. The
work on the corresponding thesis is currently ongoing.
This started as a simple Haskell port of that work, and packed some relatively
orthogonal improvements (mainly the histogram-style diffing).
### TODO list
- Implement `patch` functionality (`diff` and `diff3` works)
- Implement the splitting heuristic for diffs, so that the diffing of larger
files doesn't take aeons
- Check whether REDFA can even be implemented correctly with current Haskell
libraries (most regex libraries target a completely different). Taking the
lexer specification format from `alex` currently seems like a much better
option. Deferring the task unix-ishly to another program could work too.
# How-To
Install using `cabal`. The `adiff` program has 3 sub-commands that work like
`diff`, `patch` and `diff3`. It expects a lexing specification on the input;
there are several very simple example lexers in `lexers/`.
## Example
Let's have a file `orig`:
```
Roses are red. Violets are blue.
Patch is quite hard. I cannot rhyme.
```
and a modified file `mine`:
```
Roses are red. Violets are blue.
Patching is hard. I still cannot rhyme.
```
Let's use the `words` lexer, which marks everything whitespace-ish as
whitespace, and groups of non-whitespace:
```
:[^ \t\n]*
_:[ \t\n]*
```
Diffing the 2 files gets done as such:
```
$ cabal run adiff -- -l lexers/words diff orig mine
```
You should get something like this:
```
@@ -7 +7 @@
.
|are
.
|blue.
.\n
-|Patch
+|Patching
.
|is
-.
-|quite
.
|hard.
.
|I
+.
+|still
.
|cannot
.
|rhyme.
.\n
```
Let's pretend someone has sent us a patch with a better formated verse with
some other improvements, in file `yours`:
```
Roses are red.
Violets are blue.
Patch is quite hard.
I cannot do verses.
```
We can run `diff3` to get a patch with both changes, optionally with reduced
context:
```
$ cabal run adiff -- -l lexers/words diff3 mine orig yours -C1
```
...which outputs:
```
@@ -4 +4 @@
|red.
-.
+.\n
|Violets
@@ -11 +11 @@
.\n
-|Patch
+|Patching
.
|is
-.
-|quite
.
|hard.
-.
+.\n
|I
+.
+|still
.
@@ -23 +23 @@
.
-|rhyme.
+|do
+.
+|verses.
.\n
```
...or get a merged output right away, using the `-m`/`--merge` option:
```
Roses are red.
Violets are blue.
Patching is hard.
I still cannot do verses.
```
...or completely ignore whatever whitespace changes that the people decided to
do for whatever reason, with `-i`/`--ignore-whitespace` (also works without
`-m`):
```
Roses are red. Violets are blue.
Patching is hard. I still cannot do verses.
```
If there's a conflict (substituing the `Patch` to `Merging` in file `yours`), it gets highlighted in the merged diff as such:
```
[...]
.
|blue.
.\n
<|Patching
=|Patch
>|Merging
.
|is
-.
-|quite
[...]
```
and using the standard conflict marks in the merged output:
```
Roses are red.
Violets are blue.
<<<<<<<Patching|||||||Patch=======Merging>>>>>>> is hard.
I still cannot do verses.
```

2
lexers/letters Normal file
View file

@ -0,0 +1,2 @@
[a-z]
_:[ \n]

2
lexers/lines Normal file
View file

@ -0,0 +1,2 @@
[^\n]*\n
[^\n]*

2
lexers/nums Normal file
View file

@ -0,0 +1,2 @@
[0-9]
_:\n

2
lexers/words Normal file
View file

@ -0,0 +1,2 @@
:[^ \t\n]*
_:[ \t\n]*