document this a bit, add redfa examples
This commit is contained in:
parent
14d7b454ff
commit
7781cd6512
173
README.md
Normal file
173
README.md
Normal file
|
@ -0,0 +1,173 @@
|
||||||
|
|
||||||
|
# adiff (arbitrary-tokens diff, patch and merge)
|
||||||
|
|
||||||
|
This is a half-working pre-alpha version, use with care.
|
||||||
|
|
||||||
|
### Short summary
|
||||||
|
|
||||||
|
The main aim of this toolbox is to help with finding differences in text
|
||||||
|
formats that do not have a fixed "line-by-line" semantics, as assumed by
|
||||||
|
standard unix `diff` and related tools.
|
||||||
|
|
||||||
|
The problem was previously tackled by Arek Antoniewicz on MFF CUNI, who
|
||||||
|
produced a working software package in C++, and designed the Regex-edged DFAs
|
||||||
|
(REDFAs) that were used for user-specifiable tokenization of the input. The
|
||||||
|
work on the corresponding thesis is currently ongoing.
|
||||||
|
|
||||||
|
This started as a simple Haskell port of that work, and packed some relatively
|
||||||
|
orthogonal improvements (mainly the histogram-style diffing).
|
||||||
|
|
||||||
|
### TODO list
|
||||||
|
|
||||||
|
- Implement `patch` functionality (`diff` and `diff3` works)
|
||||||
|
- Implement the splitting heuristic for diffs, so that the diffing of larger
|
||||||
|
files doesn't take aeons
|
||||||
|
- Check whether REDFA can even be implemented correctly with current Haskell
|
||||||
|
libraries (most regex libraries target a completely different). Taking the
|
||||||
|
lexer specification format from `alex` currently seems like a much better
|
||||||
|
option. Deferring the task unix-ishly to another program could work too.
|
||||||
|
|
||||||
|
# How-To
|
||||||
|
|
||||||
|
Install using `cabal`. The `adiff` program has 3 sub-commands that work like
|
||||||
|
`diff`, `patch` and `diff3`. It expects a lexing specification on the input;
|
||||||
|
there are several very simple example lexers in `lexers/`.
|
||||||
|
|
||||||
|
## Example
|
||||||
|
|
||||||
|
Let's have a file `orig`:
|
||||||
|
```
|
||||||
|
Roses are red. Violets are blue.
|
||||||
|
Patch is quite hard. I cannot rhyme.
|
||||||
|
```
|
||||||
|
|
||||||
|
and a modified file `mine`:
|
||||||
|
```
|
||||||
|
Roses are red. Violets are blue.
|
||||||
|
Patching is hard. I still cannot rhyme.
|
||||||
|
```
|
||||||
|
|
||||||
|
Let's use the `words` lexer, which marks everything whitespace-ish as
|
||||||
|
whitespace, and groups of non-whitespace:
|
||||||
|
```
|
||||||
|
:[^ \t\n]*
|
||||||
|
_:[ \t\n]*
|
||||||
|
```
|
||||||
|
|
||||||
|
Diffing the 2 files gets done as such:
|
||||||
|
```
|
||||||
|
$ cabal run adiff -- -l lexers/words diff orig mine
|
||||||
|
```
|
||||||
|
|
||||||
|
You should get something like this:
|
||||||
|
```
|
||||||
|
@@ -7 +7 @@
|
||||||
|
.
|
||||||
|
|are
|
||||||
|
.
|
||||||
|
|blue.
|
||||||
|
.\n
|
||||||
|
-|Patch
|
||||||
|
+|Patching
|
||||||
|
.
|
||||||
|
|is
|
||||||
|
-.
|
||||||
|
-|quite
|
||||||
|
.
|
||||||
|
|hard.
|
||||||
|
.
|
||||||
|
|I
|
||||||
|
+.
|
||||||
|
+|still
|
||||||
|
.
|
||||||
|
|cannot
|
||||||
|
.
|
||||||
|
|rhyme.
|
||||||
|
.\n
|
||||||
|
```
|
||||||
|
|
||||||
|
Let's pretend someone has sent us a patch with a better formated verse with
|
||||||
|
some other improvements, in file `yours`:
|
||||||
|
```
|
||||||
|
Roses are red.
|
||||||
|
Violets are blue.
|
||||||
|
Patch is quite hard.
|
||||||
|
I cannot do verses.
|
||||||
|
```
|
||||||
|
|
||||||
|
We can run `diff3` to get a patch with both changes, optionally with reduced
|
||||||
|
context:
|
||||||
|
```
|
||||||
|
$ cabal run adiff -- -l lexers/words diff3 mine orig yours -C1
|
||||||
|
```
|
||||||
|
...which outputs:
|
||||||
|
```
|
||||||
|
@@ -4 +4 @@
|
||||||
|
|red.
|
||||||
|
-.
|
||||||
|
+.\n
|
||||||
|
|Violets
|
||||||
|
@@ -11 +11 @@
|
||||||
|
.\n
|
||||||
|
-|Patch
|
||||||
|
+|Patching
|
||||||
|
.
|
||||||
|
|is
|
||||||
|
-.
|
||||||
|
-|quite
|
||||||
|
.
|
||||||
|
|hard.
|
||||||
|
-.
|
||||||
|
+.\n
|
||||||
|
|I
|
||||||
|
+.
|
||||||
|
+|still
|
||||||
|
.
|
||||||
|
@@ -23 +23 @@
|
||||||
|
.
|
||||||
|
-|rhyme.
|
||||||
|
+|do
|
||||||
|
+.
|
||||||
|
+|verses.
|
||||||
|
.\n
|
||||||
|
```
|
||||||
|
|
||||||
|
...or get a merged output right away, using the `-m`/`--merge` option:
|
||||||
|
```
|
||||||
|
Roses are red.
|
||||||
|
Violets are blue.
|
||||||
|
Patching is hard.
|
||||||
|
I still cannot do verses.
|
||||||
|
```
|
||||||
|
|
||||||
|
...or completely ignore whatever whitespace changes that the people decided to
|
||||||
|
do for whatever reason, with `-i`/`--ignore-whitespace` (also works without
|
||||||
|
`-m`):
|
||||||
|
```
|
||||||
|
Roses are red. Violets are blue.
|
||||||
|
Patching is hard. I still cannot do verses.
|
||||||
|
```
|
||||||
|
|
||||||
|
If there's a conflict (substituing the `Patch` to `Merging` in file `yours`), it gets highlighted in the merged diff as such:
|
||||||
|
```
|
||||||
|
[...]
|
||||||
|
.
|
||||||
|
|blue.
|
||||||
|
.\n
|
||||||
|
<|Patching
|
||||||
|
=|Patch
|
||||||
|
>|Merging
|
||||||
|
.
|
||||||
|
|is
|
||||||
|
-.
|
||||||
|
-|quite
|
||||||
|
[...]
|
||||||
|
```
|
||||||
|
|
||||||
|
and using the standard conflict marks in the merged output:
|
||||||
|
```
|
||||||
|
Roses are red.
|
||||||
|
Violets are blue.
|
||||||
|
<<<<<<<Patching|||||||Patch=======Merging>>>>>>> is hard.
|
||||||
|
I still cannot do verses.
|
||||||
|
```
|
2
lexers/letters
Normal file
2
lexers/letters
Normal file
|
@ -0,0 +1,2 @@
|
||||||
|
[a-z]
|
||||||
|
_:[ \n]
|
2
lexers/lines
Normal file
2
lexers/lines
Normal file
|
@ -0,0 +1,2 @@
|
||||||
|
[^\n]*\n
|
||||||
|
[^\n]*
|
2
lexers/nums
Normal file
2
lexers/nums
Normal file
|
@ -0,0 +1,2 @@
|
||||||
|
[0-9]
|
||||||
|
_:\n
|
2
lexers/words
Normal file
2
lexers/words
Normal file
|
@ -0,0 +1,2 @@
|
||||||
|
:[^ \t\n]*
|
||||||
|
_:[ \t\n]*
|
Loading…
Reference in a new issue