add a note about history
This commit is contained in:
		
							parent
							
								
									5a88a00a0d
								
							
						
					
					
						commit
						f5f206765c
					
				|  | @ -104,6 +104,8 @@ below (note the invisible space on the lines with dots): | ||||||
| .\n | .\n | ||||||
| ``` | ``` | ||||||
| 
 | 
 | ||||||
|  | ### Custom tokenizers | ||||||
|  | 
 | ||||||
| Users may supply any tokenizer via option `-F`. The script below produces | Users may supply any tokenizer via option `-F`. The script below produces | ||||||
| line-size tokens for demonstration (in turn, `werge` will do the usual line | line-size tokens for demonstration (in turn, `werge` will do the usual line | ||||||
| merges), and can be used e.g. via `-F ./tokenize.py`: | merges), and can be used e.g. via `-F ./tokenize.py`: | ||||||
|  | @ -119,6 +121,13 @@ for l in sys.stdin.readlines(): | ||||||
|         print('/'+l.replace('\\','\\\\')) |         print('/'+l.replace('\\','\\\\')) | ||||||
| ``` | ``` | ||||||
| 
 | 
 | ||||||
|  | ### History | ||||||
|  | 
 | ||||||
|  | I previously made an attempt to solve this in `adiff` software, which failed | ||||||
|  | because the approach was too complex. Before that, the issue was tackled by | ||||||
|  | Arek Antoniewicz on MFF CUNI, who used regex-edged DFAs (REDFAs) to construct | ||||||
|  | user-specifiable tokenizers in a pretty cool way. | ||||||
|  | 
 | ||||||
| ## Installation | ## Installation | ||||||
| 
 | 
 | ||||||
| ```sh | ```sh | ||||||
|  |  | ||||||
		Loading…
	
		Reference in a new issue