document this a bit, add redfa examples
This commit is contained in:
		
							parent
							
								
									14d7b454ff
								
							
						
					
					
						commit
						7781cd6512
					
				
							
								
								
									
										173
									
								
								README.md
									
									
									
									
									
										Normal file
									
								
							
							
						
						
									
										173
									
								
								README.md
									
									
									
									
									
										Normal file
									
								
							|  | @ -0,0 +1,173 @@ | ||||||
|  | 
 | ||||||
|  | # adiff (arbitrary-tokens diff, patch and merge) | ||||||
|  | 
 | ||||||
|  | This is a half-working pre-alpha version, use with care. | ||||||
|  | 
 | ||||||
|  | ### Short summary | ||||||
|  | 
 | ||||||
|  | The main aim of this toolbox is to help with finding differences in text | ||||||
|  | formats that do not have a fixed "line-by-line" semantics, as assumed by | ||||||
|  | standard unix `diff` and related tools. | ||||||
|  | 
 | ||||||
|  | The problem was previously tackled by Arek Antoniewicz on MFF CUNI, who | ||||||
|  | produced a working software package in C++, and designed the Regex-edged DFAs | ||||||
|  | (REDFAs) that were used for user-specifiable tokenization of the input. The | ||||||
|  | work on the corresponding thesis is currently ongoing. | ||||||
|  | 
 | ||||||
|  | This started as a simple Haskell port of that work, and packed some relatively | ||||||
|  | orthogonal improvements (mainly the histogram-style diffing). | ||||||
|  | 
 | ||||||
|  | ### TODO list | ||||||
|  | 
 | ||||||
|  | - Implement `patch` functionality (`diff` and `diff3` works) | ||||||
|  | - Implement the splitting heuristic for diffs, so that the diffing of larger | ||||||
|  |   files doesn't take aeons | ||||||
|  | - Check whether REDFA can even be implemented correctly with current Haskell | ||||||
|  |   libraries (most regex libraries target a completely different). Taking the | ||||||
|  |   lexer specification format from `alex` currently seems like a much better | ||||||
|  |   option. Deferring the task unix-ishly to another program could work too. | ||||||
|  | 
 | ||||||
|  | # How-To | ||||||
|  | 
 | ||||||
|  | Install using `cabal`. The `adiff` program has 3 sub-commands that work like | ||||||
|  | `diff`, `patch` and `diff3`. It expects a lexing specification on the input; | ||||||
|  | there are several very simple example lexers in `lexers/`. | ||||||
|  | 
 | ||||||
|  | ## Example | ||||||
|  | 
 | ||||||
|  | Let's have a file `orig`: | ||||||
|  | ``` | ||||||
|  | Roses are red. Violets are blue. | ||||||
|  | Patch is quite hard. I cannot rhyme. | ||||||
|  | ``` | ||||||
|  | 
 | ||||||
|  | and a modified file `mine`: | ||||||
|  | ``` | ||||||
|  | Roses are red. Violets are blue. | ||||||
|  | Patching is hard. I still cannot rhyme. | ||||||
|  | ``` | ||||||
|  | 
 | ||||||
|  | Let's use the `words` lexer, which marks everything whitespace-ish as | ||||||
|  | whitespace, and groups of non-whitespace: | ||||||
|  | ``` | ||||||
|  | :[^ \t\n]* | ||||||
|  | _:[ \t\n]* | ||||||
|  | ``` | ||||||
|  | 
 | ||||||
|  | Diffing the 2 files gets done as such: | ||||||
|  | ``` | ||||||
|  |  $ cabal run adiff -- -l lexers/words diff orig mine | ||||||
|  | ``` | ||||||
|  | 
 | ||||||
|  | You should get something like this: | ||||||
|  | ``` | ||||||
|  | @@ -7 +7 @@ | ||||||
|  |  .  | ||||||
|  |  |are | ||||||
|  |  .  | ||||||
|  |  |blue. | ||||||
|  |  .\n | ||||||
|  | -|Patch | ||||||
|  | +|Patching | ||||||
|  |  .  | ||||||
|  |  |is | ||||||
|  | -.  | ||||||
|  | -|quite | ||||||
|  |  .  | ||||||
|  |  |hard. | ||||||
|  |  .  | ||||||
|  |  |I | ||||||
|  | +.  | ||||||
|  | +|still | ||||||
|  |  .  | ||||||
|  |  |cannot | ||||||
|  |  .  | ||||||
|  |  |rhyme. | ||||||
|  |  .\n | ||||||
|  | ``` | ||||||
|  | 
 | ||||||
|  | Let's pretend someone has sent us a patch with a better formated verse with | ||||||
|  | some other improvements, in file `yours`: | ||||||
|  | ``` | ||||||
|  | Roses are red. | ||||||
|  | Violets are blue. | ||||||
|  | Patch is quite hard. | ||||||
|  | I cannot do verses. | ||||||
|  | ``` | ||||||
|  | 
 | ||||||
|  | We can run `diff3` to get a patch with both changes, optionally with reduced | ||||||
|  | context: | ||||||
|  | ``` | ||||||
|  |  $ cabal run adiff -- -l lexers/words diff3 mine orig yours -C1 | ||||||
|  | ``` | ||||||
|  | ...which outputs: | ||||||
|  | ``` | ||||||
|  | @@ -4 +4 @@ | ||||||
|  |  |red. | ||||||
|  | -.  | ||||||
|  | +.\n | ||||||
|  |  |Violets | ||||||
|  | @@ -11 +11 @@ | ||||||
|  |  .\n | ||||||
|  | -|Patch | ||||||
|  | +|Patching | ||||||
|  |  .  | ||||||
|  |  |is | ||||||
|  | -.  | ||||||
|  | -|quite | ||||||
|  |  .  | ||||||
|  |  |hard. | ||||||
|  | -.  | ||||||
|  | +.\n | ||||||
|  |  |I | ||||||
|  | +.  | ||||||
|  | +|still | ||||||
|  |  .  | ||||||
|  | @@ -23 +23 @@ | ||||||
|  |  .  | ||||||
|  | -|rhyme. | ||||||
|  | +|do | ||||||
|  | +.  | ||||||
|  | +|verses. | ||||||
|  |  .\n | ||||||
|  | ``` | ||||||
|  | 
 | ||||||
|  | ...or get a merged output right away, using the `-m`/`--merge` option: | ||||||
|  | ``` | ||||||
|  | Roses are red. | ||||||
|  | Violets are blue. | ||||||
|  | Patching is hard. | ||||||
|  | I still cannot do verses. | ||||||
|  | ``` | ||||||
|  | 
 | ||||||
|  | ...or completely ignore whatever whitespace changes that the people decided to | ||||||
|  | do for whatever reason, with `-i`/`--ignore-whitespace` (also works without | ||||||
|  | `-m`): | ||||||
|  | ``` | ||||||
|  | Roses are red. Violets are blue. | ||||||
|  | Patching is hard. I still cannot do verses. | ||||||
|  | ``` | ||||||
|  | 
 | ||||||
|  | If there's a conflict (substituing the `Patch` to `Merging` in file `yours`), it gets highlighted in the merged diff as such: | ||||||
|  | ``` | ||||||
|  | [...] | ||||||
|  |  .  | ||||||
|  |  |blue. | ||||||
|  |  .\n | ||||||
|  | <|Patching | ||||||
|  | =|Patch | ||||||
|  | >|Merging | ||||||
|  |  .  | ||||||
|  |  |is | ||||||
|  | -.  | ||||||
|  | -|quite | ||||||
|  | [...] | ||||||
|  | ``` | ||||||
|  | 
 | ||||||
|  | and using the standard conflict marks in the merged output: | ||||||
|  | ``` | ||||||
|  | Roses are red. | ||||||
|  | Violets are blue. | ||||||
|  | <<<<<<<Patching|||||||Patch=======Merging>>>>>>> is hard. | ||||||
|  | I still cannot do verses. | ||||||
|  | ``` | ||||||
							
								
								
									
										2
									
								
								lexers/letters
									
									
									
									
									
										Normal file
									
								
							
							
						
						
									
										2
									
								
								lexers/letters
									
									
									
									
									
										Normal file
									
								
							|  | @ -0,0 +1,2 @@ | ||||||
|  | [a-z] | ||||||
|  | _:[ \n] | ||||||
							
								
								
									
										2
									
								
								lexers/lines
									
									
									
									
									
										Normal file
									
								
							
							
						
						
									
										2
									
								
								lexers/lines
									
									
									
									
									
										Normal file
									
								
							|  | @ -0,0 +1,2 @@ | ||||||
|  | [^\n]*\n | ||||||
|  | [^\n]* | ||||||
							
								
								
									
										2
									
								
								lexers/nums
									
									
									
									
									
										Normal file
									
								
							
							
						
						
									
										2
									
								
								lexers/nums
									
									
									
									
									
										Normal file
									
								
							|  | @ -0,0 +1,2 @@ | ||||||
|  | [0-9] | ||||||
|  | _:\n | ||||||
							
								
								
									
										2
									
								
								lexers/words
									
									
									
									
									
										Normal file
									
								
							
							
						
						
									
										2
									
								
								lexers/words
									
									
									
									
									
										Normal file
									
								
							|  | @ -0,0 +1,2 @@ | ||||||
|  | :[^ \t\n]* | ||||||
|  | _:[ \t\n]* | ||||||
		Loading…
	
		Reference in a new issue