aboutsummaryrefslogtreecommitdiff
path: root/README.md
diff options
context:
space:
mode:
authorMirek Kratochvil <exa.exa@gmail.com>2025-07-18 15:58:24 +0200
committerMirek Kratochvil <exa.exa@gmail.com>2025-07-18 15:58:24 +0200
commitf5f206765cf05f59b6482bca04a9913aef5330d5 (patch)
tree357a71620e4d0e66f742813468dc6262f1f5cdb2 /README.md
parent5a88a00a0db1400cff1641ba6aa800d7e8c6d8a7 (diff)
downloadwerge-f5f206765cf05f59b6482bca04a9913aef5330d5.tar.gz
werge-f5f206765cf05f59b6482bca04a9913aef5330d5.tar.bz2
add a note about history
Diffstat (limited to 'README.md')
-rw-r--r--README.md9
1 files changed, 9 insertions, 0 deletions
diff --git a/README.md b/README.md
index c6f789f..e4b9398 100644
--- a/README.md
+++ b/README.md
@@ -104,6 +104,8 @@ below (note the invisible space on the lines with dots):
.\n
```
+### Custom tokenizers
+
Users may supply any tokenizer via option `-F`. The script below produces
line-size tokens for demonstration (in turn, `werge` will do the usual line
merges), and can be used e.g. via `-F ./tokenize.py`:
@@ -119,6 +121,13 @@ for l in sys.stdin.readlines():
print('/'+l.replace('\\','\\\\'))
```
+### History
+
+I previously made an attempt to solve this in `adiff` software, which failed
+because the approach was too complex. Before that, the issue was tackled by
+Arek Antoniewicz on MFF CUNI, who used regex-edged DFAs (REDFAs) to construct
+user-specifiable tokenizers in a pretty cool way.
+
## Installation
```sh