document this a bit, add redfa examples
This commit is contained in:
		
							parent
							
								
									14d7b454ff
								
							
						
					
					
						commit
						7781cd6512
					
				
							
								
								
									
										173
									
								
								README.md
									
									
									
									
									
										Normal file
									
								
							
							
						
						
									
										173
									
								
								README.md
									
									
									
									
									
										Normal file
									
								
							|  | @ -0,0 +1,173 @@ | |||
| 
 | ||||
| # adiff (arbitrary-tokens diff, patch and merge) | ||||
| 
 | ||||
| This is a half-working pre-alpha version, use with care. | ||||
| 
 | ||||
| ### Short summary | ||||
| 
 | ||||
| The main aim of this toolbox is to help with finding differences in text | ||||
| formats that do not have a fixed "line-by-line" semantics, as assumed by | ||||
| standard unix `diff` and related tools. | ||||
| 
 | ||||
| The problem was previously tackled by Arek Antoniewicz on MFF CUNI, who | ||||
| produced a working software package in C++, and designed the Regex-edged DFAs | ||||
| (REDFAs) that were used for user-specifiable tokenization of the input. The | ||||
| work on the corresponding thesis is currently ongoing. | ||||
| 
 | ||||
| This started as a simple Haskell port of that work, and packed some relatively | ||||
| orthogonal improvements (mainly the histogram-style diffing). | ||||
| 
 | ||||
| ### TODO list | ||||
| 
 | ||||
| - Implement `patch` functionality (`diff` and `diff3` works) | ||||
| - Implement the splitting heuristic for diffs, so that the diffing of larger | ||||
|   files doesn't take aeons | ||||
| - Check whether REDFA can even be implemented correctly with current Haskell | ||||
|   libraries (most regex libraries target a completely different). Taking the | ||||
|   lexer specification format from `alex` currently seems like a much better | ||||
|   option. Deferring the task unix-ishly to another program could work too. | ||||
| 
 | ||||
| # How-To | ||||
| 
 | ||||
| Install using `cabal`. The `adiff` program has 3 sub-commands that work like | ||||
| `diff`, `patch` and `diff3`. It expects a lexing specification on the input; | ||||
| there are several very simple example lexers in `lexers/`. | ||||
| 
 | ||||
| ## Example | ||||
| 
 | ||||
| Let's have a file `orig`: | ||||
| ``` | ||||
| Roses are red. Violets are blue. | ||||
| Patch is quite hard. I cannot rhyme. | ||||
| ``` | ||||
| 
 | ||||
| and a modified file `mine`: | ||||
| ``` | ||||
| Roses are red. Violets are blue. | ||||
| Patching is hard. I still cannot rhyme. | ||||
| ``` | ||||
| 
 | ||||
| Let's use the `words` lexer, which marks everything whitespace-ish as | ||||
| whitespace, and groups of non-whitespace: | ||||
| ``` | ||||
| :[^ \t\n]* | ||||
| _:[ \t\n]* | ||||
| ``` | ||||
| 
 | ||||
| Diffing the 2 files gets done as such: | ||||
| ``` | ||||
|  $ cabal run adiff -- -l lexers/words diff orig mine | ||||
| ``` | ||||
| 
 | ||||
| You should get something like this: | ||||
| ``` | ||||
| @@ -7 +7 @@ | ||||
|  .  | ||||
|  |are | ||||
|  .  | ||||
|  |blue. | ||||
|  .\n | ||||
| -|Patch | ||||
| +|Patching | ||||
|  .  | ||||
|  |is | ||||
| -.  | ||||
| -|quite | ||||
|  .  | ||||
|  |hard. | ||||
|  .  | ||||
|  |I | ||||
| +.  | ||||
| +|still | ||||
|  .  | ||||
|  |cannot | ||||
|  .  | ||||
|  |rhyme. | ||||
|  .\n | ||||
| ``` | ||||
| 
 | ||||
| Let's pretend someone has sent us a patch with a better formated verse with | ||||
| some other improvements, in file `yours`: | ||||
| ``` | ||||
| Roses are red. | ||||
| Violets are blue. | ||||
| Patch is quite hard. | ||||
| I cannot do verses. | ||||
| ``` | ||||
| 
 | ||||
| We can run `diff3` to get a patch with both changes, optionally with reduced | ||||
| context: | ||||
| ``` | ||||
|  $ cabal run adiff -- -l lexers/words diff3 mine orig yours -C1 | ||||
| ``` | ||||
| ...which outputs: | ||||
| ``` | ||||
| @@ -4 +4 @@ | ||||
|  |red. | ||||
| -.  | ||||
| +.\n | ||||
|  |Violets | ||||
| @@ -11 +11 @@ | ||||
|  .\n | ||||
| -|Patch | ||||
| +|Patching | ||||
|  .  | ||||
|  |is | ||||
| -.  | ||||
| -|quite | ||||
|  .  | ||||
|  |hard. | ||||
| -.  | ||||
| +.\n | ||||
|  |I | ||||
| +.  | ||||
| +|still | ||||
|  .  | ||||
| @@ -23 +23 @@ | ||||
|  .  | ||||
| -|rhyme. | ||||
| +|do | ||||
| +.  | ||||
| +|verses. | ||||
|  .\n | ||||
| ``` | ||||
| 
 | ||||
| ...or get a merged output right away, using the `-m`/`--merge` option: | ||||
| ``` | ||||
| Roses are red. | ||||
| Violets are blue. | ||||
| Patching is hard. | ||||
| I still cannot do verses. | ||||
| ``` | ||||
| 
 | ||||
| ...or completely ignore whatever whitespace changes that the people decided to | ||||
| do for whatever reason, with `-i`/`--ignore-whitespace` (also works without | ||||
| `-m`): | ||||
| ``` | ||||
| Roses are red. Violets are blue. | ||||
| Patching is hard. I still cannot do verses. | ||||
| ``` | ||||
| 
 | ||||
| If there's a conflict (substituing the `Patch` to `Merging` in file `yours`), it gets highlighted in the merged diff as such: | ||||
| ``` | ||||
| [...] | ||||
|  .  | ||||
|  |blue. | ||||
|  .\n | ||||
| <|Patching | ||||
| =|Patch | ||||
| >|Merging | ||||
|  .  | ||||
|  |is | ||||
| -.  | ||||
| -|quite | ||||
| [...] | ||||
| ``` | ||||
| 
 | ||||
| and using the standard conflict marks in the merged output: | ||||
| ``` | ||||
| Roses are red. | ||||
| Violets are blue. | ||||
| <<<<<<<Patching|||||||Patch=======Merging>>>>>>> is hard. | ||||
| I still cannot do verses. | ||||
| ``` | ||||
							
								
								
									
										2
									
								
								lexers/letters
									
									
									
									
									
										Normal file
									
								
							
							
						
						
									
										2
									
								
								lexers/letters
									
									
									
									
									
										Normal file
									
								
							|  | @ -0,0 +1,2 @@ | |||
| [a-z] | ||||
| _:[ \n] | ||||
							
								
								
									
										2
									
								
								lexers/lines
									
									
									
									
									
										Normal file
									
								
							
							
						
						
									
										2
									
								
								lexers/lines
									
									
									
									
									
										Normal file
									
								
							|  | @ -0,0 +1,2 @@ | |||
| [^\n]*\n | ||||
| [^\n]* | ||||
							
								
								
									
										2
									
								
								lexers/nums
									
									
									
									
									
										Normal file
									
								
							
							
						
						
									
										2
									
								
								lexers/nums
									
									
									
									
									
										Normal file
									
								
							|  | @ -0,0 +1,2 @@ | |||
| [0-9] | ||||
| _:\n | ||||
							
								
								
									
										2
									
								
								lexers/words
									
									
									
									
									
										Normal file
									
								
							
							
						
						
									
										2
									
								
								lexers/words
									
									
									
									
									
										Normal file
									
								
							|  | @ -0,0 +1,2 @@ | |||
| :[^ \t\n]* | ||||
| _:[ \t\n]* | ||||
		Loading…
	
		Reference in a new issue