diff options
| author | Mirek Kratochvil <exa.exa@gmail.com> | 2025-07-18 15:58:24 +0200 |
|---|---|---|
| committer | Mirek Kratochvil <exa.exa@gmail.com> | 2025-07-18 15:58:24 +0200 |
| commit | f5f206765cf05f59b6482bca04a9913aef5330d5 (patch) | |
| tree | 357a71620e4d0e66f742813468dc6262f1f5cdb2 | |
| parent | 5a88a00a0db1400cff1641ba6aa800d7e8c6d8a7 (diff) | |
| download | werge-f5f206765cf05f59b6482bca04a9913aef5330d5.tar.gz werge-f5f206765cf05f59b6482bca04a9913aef5330d5.tar.bz2 | |
add a note about history
| -rw-r--r-- | README.md | 9 |
1 files changed, 9 insertions, 0 deletions
@@ -104,6 +104,8 @@ below (note the invisible space on the lines with dots): .\n ``` +### Custom tokenizers + Users may supply any tokenizer via option `-F`. The script below produces line-size tokens for demonstration (in turn, `werge` will do the usual line merges), and can be used e.g. via `-F ./tokenize.py`: @@ -119,6 +121,13 @@ for l in sys.stdin.readlines(): print('/'+l.replace('\\','\\\\')) ``` +### History + +I previously made an attempt to solve this in `adiff` software, which failed +because the approach was too complex. Before that, the issue was tackled by +Arek Antoniewicz on MFF CUNI, who used regex-edged DFAs (REDFAs) to construct +user-specifiable tokenizers in a pretty cool way. + ## Installation ```sh |
