Compare commits
No commits in common. "bddf3063f9ab21dd92d9b4962fe4026f20c2ea24" and "efae03223ed6bf0a57b4f4174545b337a95ccf98" have entirely different histories.
bddf3063f9
...
efae03223e
170
README.md
170
README.md
|
@ -1,170 +0,0 @@
|
||||||
|
|
||||||
# adiff (arbitrary-tokens diff, patch and merge)
|
|
||||||
|
|
||||||
This is a half-working pre-alpha version, use with care.
|
|
||||||
|
|
||||||
### Short summary
|
|
||||||
|
|
||||||
The main aim of this toolbox is to help with finding differences in text
|
|
||||||
formats that do not have a fixed "line-by-line" semantics, as assumed by
|
|
||||||
standard unix `diff` and related tools.
|
|
||||||
|
|
||||||
The problem was previously tackled by Arek Antoniewicz on MFF CUNI, who
|
|
||||||
produced a working software package in C++, and designed the Regex-edged DFAs
|
|
||||||
(REDFAs) that were used for user-specifiable tokenization of the input. The
|
|
||||||
work on the corresponding thesis is finished.
|
|
||||||
|
|
||||||
This started as a simple Haskell port of that work, and packed some relatively
|
|
||||||
orthogonal improvements (mainly the histogram-style diffing). I later got rid
|
|
||||||
of the REDFA concept -- while super-interesting and useful in theory, I didn't
|
|
||||||
find a sufficiently universal way to build good lexers from user-specified
|
|
||||||
strings. Having a proper Regex representation library (so that e.g.
|
|
||||||
reconstructing Flex is easy) would help a lot.
|
|
||||||
|
|
||||||
### TODO list
|
|
||||||
|
|
||||||
- Implement `patch` functionality, mainly patchfile parsing and fuzzy matching
|
|
||||||
of hunk context. `diff` and `diff3` works.
|
|
||||||
- Implement a splitting heuristic for diffs, so that diffing of large files
|
|
||||||
doesn't take aeons
|
|
||||||
- check if we can have external lexers, unix-style
|
|
||||||
|
|
||||||
# How-To
|
|
||||||
|
|
||||||
Install using `cabal`. The `adiff` program has 3 sub-commands that work like
|
|
||||||
`diff`, `patch` and `diff3`.
|
|
||||||
|
|
||||||
## Example
|
|
||||||
|
|
||||||
Let's have a file `orig`:
|
|
||||||
```
|
|
||||||
Roses are red. Violets are blue.
|
|
||||||
Patch is quite hard. I cannot rhyme.
|
|
||||||
```
|
|
||||||
|
|
||||||
and a modified file `mine`:
|
|
||||||
```
|
|
||||||
Roses are red. Violets are blue.
|
|
||||||
Patching is hard. I still cannot rhyme.
|
|
||||||
```
|
|
||||||
|
|
||||||
Let's use the `words` lexer, which marks everything whitespace-ish as
|
|
||||||
whitespace, and picks up groups of non-whitespace "content" characters.
|
|
||||||
|
|
||||||
Diffing the 2 files gets done as such:
|
|
||||||
```
|
|
||||||
$ cabal run adiff -- -l words diff orig mine
|
|
||||||
```
|
|
||||||
|
|
||||||
You should get something like this:
|
|
||||||
```
|
|
||||||
@@ -7 +7 @@
|
|
||||||
.
|
|
||||||
|are
|
|
||||||
.
|
|
||||||
|blue.
|
|
||||||
.\n
|
|
||||||
-|Patch
|
|
||||||
+|Patching
|
|
||||||
.
|
|
||||||
|is
|
|
||||||
-.
|
|
||||||
-|quite
|
|
||||||
.
|
|
||||||
|hard.
|
|
||||||
.
|
|
||||||
|I
|
|
||||||
+.
|
|
||||||
+|still
|
|
||||||
.
|
|
||||||
|cannot
|
|
||||||
.
|
|
||||||
|rhyme.
|
|
||||||
.\n
|
|
||||||
```
|
|
||||||
|
|
||||||
Let's pretend someone has sent us a new version, with a better formated verse
|
|
||||||
and some other improvements, in file `yours`:
|
|
||||||
```
|
|
||||||
Roses are red.
|
|
||||||
Violets are blue.
|
|
||||||
Patch is quite hard.
|
|
||||||
I cannot do verses.
|
|
||||||
```
|
|
||||||
|
|
||||||
We can run `diff3` to get a patch with both changes, optionally with reduced
|
|
||||||
context:
|
|
||||||
```
|
|
||||||
$ cabal run adiff -- -l words diff3 mine orig yours -C1
|
|
||||||
```
|
|
||||||
...which outputs:
|
|
||||||
```
|
|
||||||
@@ -4 +4 @@
|
|
||||||
|red.
|
|
||||||
-.
|
|
||||||
+.\n
|
|
||||||
|Violets
|
|
||||||
@@ -11 +11 @@
|
|
||||||
.\n
|
|
||||||
-|Patch
|
|
||||||
+|Patching
|
|
||||||
.
|
|
||||||
|is
|
|
||||||
-.
|
|
||||||
-|quite
|
|
||||||
.
|
|
||||||
|hard.
|
|
||||||
-.
|
|
||||||
+.\n
|
|
||||||
|I
|
|
||||||
+.
|
|
||||||
+|still
|
|
||||||
.
|
|
||||||
@@ -23 +23 @@
|
|
||||||
.
|
|
||||||
-|rhyme.
|
|
||||||
+|do
|
|
||||||
+.
|
|
||||||
+|verses.
|
|
||||||
.\n
|
|
||||||
```
|
|
||||||
|
|
||||||
...or get a merged output right away, using the `-m`/`--merge` option:
|
|
||||||
```
|
|
||||||
Roses are red.
|
|
||||||
Violets are blue.
|
|
||||||
Patching is hard.
|
|
||||||
I still cannot do verses.
|
|
||||||
```
|
|
||||||
|
|
||||||
...or completely ignore whatever whitespace changes that the people decided to
|
|
||||||
do for whatever reason, with `-i`/`--ignore-whitespace` (also works without
|
|
||||||
`-m`):
|
|
||||||
```
|
|
||||||
Roses are red. Violets are blue.
|
|
||||||
Patching is hard. I still cannot do verses.
|
|
||||||
```
|
|
||||||
|
|
||||||
If there's a conflict (substituing the `Patch` to `Merging` in file `yours`), it gets highlighted in the merged diff as such:
|
|
||||||
```
|
|
||||||
[...]
|
|
||||||
.
|
|
||||||
|blue.
|
|
||||||
.\n
|
|
||||||
<|Patching
|
|
||||||
=|Patch
|
|
||||||
>|Merging
|
|
||||||
.
|
|
||||||
|is
|
|
||||||
-.
|
|
||||||
-|quite
|
|
||||||
[...]
|
|
||||||
```
|
|
||||||
|
|
||||||
and using the standard conflict marks in the merged output:
|
|
||||||
```
|
|
||||||
Roses are red.
|
|
||||||
Violets are blue.
|
|
||||||
<<<<<<<Patching|||||||Patch=======Merging>>>>>>> is hard.
|
|
||||||
I still cannot do verses.
|
|
||||||
```
|
|
13
adiff.cabal
13
adiff.cabal
|
@ -42,8 +42,6 @@ executable adiff
|
||||||
-- .hs or .lhs file containing the Main module.
|
-- .hs or .lhs file containing the Main module.
|
||||||
main-is: Main.hs
|
main-is: Main.hs
|
||||||
|
|
||||||
ghc-options: -O2 -Wall
|
|
||||||
|
|
||||||
-- Modules included in this executable, other than Main.
|
-- Modules included in this executable, other than Main.
|
||||||
other-modules: Diff,
|
other-modules: Diff,
|
||||||
Diff3,
|
Diff3,
|
||||||
|
@ -51,7 +49,8 @@ executable adiff
|
||||||
Hunks,
|
Hunks,
|
||||||
Merge,
|
Merge,
|
||||||
Patch,
|
Patch,
|
||||||
Tokenizers,
|
Redfa,
|
||||||
|
Substr,
|
||||||
Types,
|
Types,
|
||||||
Version
|
Version
|
||||||
|
|
||||||
|
@ -59,15 +58,13 @@ executable adiff
|
||||||
other-extensions: CPP
|
other-extensions: CPP
|
||||||
|
|
||||||
-- Other library packages from which modules are imported.
|
-- Other library packages from which modules are imported.
|
||||||
build-depends: base ^>=4.15.0.0,
|
build-depends: base ^>=4.13.0.0,
|
||||||
attoparsec ^>=0.14,
|
|
||||||
extra ^>= 1.7,
|
extra ^>= 1.7,
|
||||||
mmap ^>=0.5,
|
mmap ^>=0.5,
|
||||||
|
regex-tdfa ^>= 1.3,
|
||||||
optparse-applicative ^>=0.16,
|
optparse-applicative ^>=0.16,
|
||||||
bytestring ^>= 0.11.2,
|
bytestring ^>= 0.10.12,
|
||||||
vector ^>=0.12,
|
vector ^>=0.12,
|
||||||
word8 ^>=0.1,
|
|
||||||
unicode-data ^>=0.3,
|
|
||||||
utf8-string ^>=1.0
|
utf8-string ^>=1.0
|
||||||
|
|
||||||
-- Directories containing source files.
|
-- Directories containing source files.
|
||||||
|
|
41
src/Diff.hs
41
src/Diff.hs
|
@ -4,8 +4,18 @@ module Diff
|
||||||
( diffToks
|
( diffToks
|
||||||
) where
|
) where
|
||||||
|
|
||||||
|
import Control.Monad
|
||||||
import qualified Data.ByteString as B
|
import qualified Data.ByteString as B
|
||||||
|
import qualified Data.ByteString.Builder as BB
|
||||||
|
import qualified Data.ByteString.Internal as BI
|
||||||
|
import qualified Data.ByteString.Lazy as BL
|
||||||
|
import Data.ByteString.UTF8 (fromString)
|
||||||
|
import Data.Function (on)
|
||||||
|
import Data.List (groupBy, mapAccumL)
|
||||||
|
import Data.List.Extra (split, takeEnd)
|
||||||
import qualified Data.Vector as V
|
import qualified Data.Vector as V
|
||||||
|
import qualified Data.Vector.Unboxed.Mutable as M
|
||||||
|
import Substr
|
||||||
import Types
|
import Types
|
||||||
|
|
||||||
data DiffEnv =
|
data DiffEnv =
|
||||||
|
@ -20,11 +30,10 @@ data DiffEnv =
|
||||||
, deB :: Int
|
, deB :: Int
|
||||||
, deVS :: V.Vector (Int, Int)
|
, deVS :: V.Vector (Int, Int)
|
||||||
, deVE :: V.Vector (Int, Int)
|
, deVE :: V.Vector (Int, Int)
|
||||||
, deTokPrio :: Tok -> Int
|
|
||||||
, deTrans :: Bool
|
, deTrans :: Bool
|
||||||
}
|
}
|
||||||
|
deriving (Show)
|
||||||
|
|
||||||
toksMatch :: Int -> Int -> DiffEnv -> Bool
|
|
||||||
toksMatch x y DiffEnv {deT1 = t1, deT2 = t2} = t1 V.! x == t2 V.! y
|
toksMatch x y DiffEnv {deT1 = t1, deT2 = t2} = t1 V.! x == t2 V.! y
|
||||||
|
|
||||||
stripEqToks :: TV -> TV -> (Diff, Diff, TV, TV)
|
stripEqToks :: TV -> TV -> (Diff, Diff, TV, TV)
|
||||||
|
@ -47,19 +56,10 @@ stripEqToks t1 t2 = (pre, post, t1', t2')
|
||||||
t1' = V.slice b (l1 - e - b) t1
|
t1' = V.slice b (l1 - e - b) t1
|
||||||
t2' = V.slice b (l2 - e - b) t2
|
t2' = V.slice b (l2 - e - b) t2
|
||||||
|
|
||||||
makePrios :: TV -> TV -> (Bool, BS) -> Int
|
|
||||||
makePrios _ _ = get
|
|
||||||
where
|
|
||||||
get (isToken, str) =
|
|
||||||
if isToken
|
|
||||||
then B.length str
|
|
||||||
else 0
|
|
||||||
|
|
||||||
diffToks :: TV -> TV -> Diff
|
diffToks :: TV -> TV -> Diff
|
||||||
diffToks t1' t2' = pre ++ res ++ post
|
diffToks t1' t2' = pre ++ res ++ post
|
||||||
where
|
where
|
||||||
(pre, post, t1, t2) = stripEqToks t1' t2'
|
(pre, post, t1, t2) = stripEqToks t1' t2'
|
||||||
stats = makePrios t1' t2'
|
|
||||||
res
|
res
|
||||||
| V.null t1 = map (Add, ) (V.toList t2)
|
| V.null t1 = map (Add, ) (V.toList t2)
|
||||||
| V.null t2 = map (Remove, ) (V.toList t1)
|
| V.null t2 = map (Remove, ) (V.toList t1)
|
||||||
|
@ -76,7 +76,6 @@ diffToks t1' t2' = pre ++ res ++ post
|
||||||
, deB = V.length t2
|
, deB = V.length t2
|
||||||
, deVS = V.fromList $ zip [0 .. V.length t2] (repeat 0)
|
, deVS = V.fromList $ zip [0 .. V.length t2] (repeat 0)
|
||||||
, deVE = V.fromList $ reverse $ zip [0 .. V.length t2] (repeat 0)
|
, deVE = V.fromList $ reverse $ zip [0 .. V.length t2] (repeat 0)
|
||||||
, deTokPrio = stats
|
|
||||||
, deTrans = False
|
, deTrans = False
|
||||||
}
|
}
|
||||||
| otherwise =
|
| otherwise =
|
||||||
|
@ -92,11 +91,9 @@ diffToks t1' t2' = pre ++ res ++ post
|
||||||
, deB = V.length t1
|
, deB = V.length t1
|
||||||
, deVS = V.fromList $ zip [0 .. V.length t1] (repeat 0)
|
, deVS = V.fromList $ zip [0 .. V.length t1] (repeat 0)
|
||||||
, deVE = V.fromList $ reverse $ zip [0 .. V.length t1] (repeat 0)
|
, deVE = V.fromList $ reverse $ zip [0 .. V.length t1] (repeat 0)
|
||||||
, deTokPrio = stats
|
|
||||||
, deTrans = True
|
, deTrans = True
|
||||||
}
|
}
|
||||||
|
|
||||||
minIndexFwd :: V.Vector (Int, Int) -> Int
|
|
||||||
minIndexFwd =
|
minIndexFwd =
|
||||||
V.minIndexBy
|
V.minIndexBy
|
||||||
(\x y ->
|
(\x y ->
|
||||||
|
@ -105,7 +102,6 @@ minIndexFwd =
|
||||||
else GT --basically normal V.minIndex
|
else GT --basically normal V.minIndex
|
||||||
)
|
)
|
||||||
|
|
||||||
minIndexRev :: V.Vector (Int, Int) -> Int
|
|
||||||
minIndexRev =
|
minIndexRev =
|
||||||
V.minIndexBy
|
V.minIndexBy
|
||||||
(\x y ->
|
(\x y ->
|
||||||
|
@ -121,7 +117,12 @@ diffToks' de@DiffEnv {deS = s, deE = e, deA = a, deB = b, deTrans = trans} =
|
||||||
mid = quot (s + e) 2
|
mid = quot (s + e) 2
|
||||||
vecSmid = vecS mid
|
vecSmid = vecS mid
|
||||||
vecEmid = vecE mid
|
vecEmid = vecE mid
|
||||||
prio i = negate . deTokPrio de $ deT1 de V.! i
|
extraScore i =
|
||||||
|
if isToken
|
||||||
|
then -(B.length s)
|
||||||
|
else 0
|
||||||
|
where
|
||||||
|
(isToken, s) = deT1 de V.! i
|
||||||
vecS = vec -- "forward" operation
|
vecS = vec -- "forward" operation
|
||||||
where
|
where
|
||||||
vec i
|
vec i
|
||||||
|
@ -137,7 +138,7 @@ diffToks' de@DiffEnv {deS = s, deE = e, deA = a, deB = b, deTrans = trans} =
|
||||||
(iupleft, supleft) = v V.! pred j
|
(iupleft, supleft) = v V.! pred j
|
||||||
keep
|
keep
|
||||||
| toksMatch (pred i) (pred j) de =
|
| toksMatch (pred i) (pred j) de =
|
||||||
min (iupleft, supleft + prio (pred i))
|
min (iupleft, supleft + extraScore (pred i))
|
||||||
| otherwise = id
|
| otherwise = id
|
||||||
res = keep $ min (succ iup, sup) (succ ileft, sleft)
|
res = keep $ min (succ iup, sup) (succ ileft, sleft)
|
||||||
in res : go (succ j) res
|
in res : go (succ j) res
|
||||||
|
@ -156,7 +157,7 @@ diffToks' de@DiffEnv {deS = s, deE = e, deA = a, deB = b, deTrans = trans} =
|
||||||
(idownright, sdownright) = v V.! succ j
|
(idownright, sdownright) = v V.! succ j
|
||||||
keep
|
keep
|
||||||
| toksMatch i j de =
|
| toksMatch i j de =
|
||||||
min (idownright, sdownright + prio i)
|
min (idownright, sdownright + extraScore i)
|
||||||
| otherwise = id
|
| otherwise = id
|
||||||
res = keep $ min (succ idown, sdown) (succ iright, sright)
|
res = keep $ min (succ idown, sdown) (succ iright, sright)
|
||||||
in res : go (pred j) res
|
in res : go (pred j) res
|
||||||
|
@ -219,7 +220,7 @@ diffToks' de@DiffEnv {deS = s, deE = e, deA = a, deB = b, deTrans = trans} =
|
||||||
, fst (vecLS V.! i) == fst sCost - a + i
|
, fst (vecLS V.! i) == fst sCost - a + i
|
||||||
, sumL V.! i == totalCost
|
, sumL V.! i == totalCost
|
||||||
, if doKeep
|
, if doKeep
|
||||||
then scoreAdd (vecLS V.! i) (0, prio s) ==
|
then scoreAdd (vecLS V.! i) (0, extraScore s) ==
|
||||||
vecRS V.! succ i
|
vecRS V.! succ i
|
||||||
else scoreAdd (vecLS V.! i) (1, 0) == vecRS V.! i
|
else scoreAdd (vecLS V.! i) (1, 0) == vecRS V.! i
|
||||||
, if doKeep
|
, if doKeep
|
||||||
|
@ -229,7 +230,7 @@ diffToks' de@DiffEnv {deS = s, deE = e, deA = a, deB = b, deTrans = trans} =
|
||||||
]
|
]
|
||||||
jumpEnd =
|
jumpEnd =
|
||||||
if doKeep
|
if doKeep
|
||||||
then succ jumpPos
|
then jumpPos + 1
|
||||||
else jumpPos
|
else jumpPos
|
||||||
in map
|
in map
|
||||||
(\i ->
|
(\i ->
|
||||||
|
|
87
src/Diff3.hs
87
src/Diff3.hs
|
@ -1,47 +1,56 @@
|
||||||
{-# LANGUAGE TupleSections #-}
|
{-# LANGUAGE TupleSections #-}
|
||||||
|
module Diff3 where
|
||||||
module Diff3
|
|
||||||
( diff3Toks
|
|
||||||
) where
|
|
||||||
|
|
||||||
import Diff
|
import Diff
|
||||||
import Merge
|
|
||||||
import Types
|
import Types
|
||||||
|
import Merge
|
||||||
|
|
||||||
stable :: (Origin, a) -> Bool
|
data Origin
|
||||||
stable (Stable, _) = True
|
= Stable
|
||||||
stable _ = False
|
| Mine
|
||||||
|
| Your
|
||||||
applySplit :: (a -> Bool) -> ([a] -> [b]) -> ([a] -> [b]) -> [a] -> [b]
|
deriving (Show, Eq)
|
||||||
applySplit cond f g xs = f (takeWhile cond xs) ++ g (dropWhile cond xs)
|
|
||||||
|
|
||||||
conflict :: MergeOpts -> [(Origin, (Op, Tok))] -> [(Op, Tok)]
|
|
||||||
conflict mo = go
|
|
||||||
where
|
|
||||||
go [] = []
|
|
||||||
go as@(a:_)
|
|
||||||
| stable a = applySplit stable (map snd) go as
|
|
||||||
| otherwise = applySplit (not . stable) (merge mo) go as
|
|
||||||
|
|
||||||
align :: Diff -> Diff -> [(Origin, (Op, Tok))]
|
|
||||||
align ((Keep, a):as) ((Keep, _):bs) = (Stable, (Keep, a)) : align as bs
|
|
||||||
align ((Add, a):as) ((Add, b):bs) =
|
|
||||||
(Mine, (Add, a)) : (Your, (Add, b)) : align as bs
|
|
||||||
align ((Remove, a):as) ((Remove, b):bs) =
|
|
||||||
(Mine, (Remove, a)) : (Your, (Remove, b)) : align as bs
|
|
||||||
align ((Add, a):as) bs@((Keep, _):_) = (Mine, (Add, a)) : align as bs
|
|
||||||
align as@((Keep, _):_) ((Add, b):bs) = (Your, (Add, b)) : align as bs
|
|
||||||
align ((Remove, a):as) ((Keep, b):bs) =
|
|
||||||
(Mine, (Remove, a)) : (Your, (Keep, b)) : align as bs
|
|
||||||
align ((Keep, a):as) ((Remove, b):bs) =
|
|
||||||
(Mine, (Keep, a)) : (Your, (Remove, b)) : align as bs
|
|
||||||
align ((Add, a):as) bs@((Remove, _):_) = (Mine, (Add, a)) : align as bs
|
|
||||||
align as@((Remove, _):_) ((Add, b):bs) = (Your, (Add, b)) : align as bs
|
|
||||||
align [] [] = []
|
|
||||||
align as@((Add, _):_) [] = map (Mine, ) as
|
|
||||||
align [] bs@((Add, _):_) = map (Your, ) bs
|
|
||||||
align _ _ = error "Internal failure: diffstreams seem broken, cannot align"
|
|
||||||
|
|
||||||
diff3Toks :: MergeOpts -> TV -> TV -> TV -> Diff
|
diff3Toks :: MergeOpts -> TV -> TV -> TV -> Diff
|
||||||
diff3Toks mo tMine tOrig tYour =
|
diff3Toks mo tMine tOrig tYour =
|
||||||
conflict mo $ align (diffToks tOrig tMine) (diffToks tOrig tYour)
|
conflict $ align (diffToks tOrig tMine) (diffToks tOrig tYour)
|
||||||
|
where
|
||||||
|
align :: Diff -> Diff -> [(Origin, (Op, Tok))]
|
||||||
|
align ((Keep, a):as) ((Keep, _):bs) = (Stable, (Keep, a)) : align as bs
|
||||||
|
align ((Add, a):as) ((Add, b):bs) =
|
||||||
|
(Mine, (Add, a)) : (Your, (Add, b)) : align as bs
|
||||||
|
align ((Remove, a):as) ((Remove, b):bs) =
|
||||||
|
(Mine, (Remove, a)) : (Your, (Remove, b)) : align as bs
|
||||||
|
align ((Add, a):as) bs@((Keep, _):_) = (Mine, (Add, a)) : align as bs
|
||||||
|
align as@((Keep, _):_) ((Add, b):bs) = (Your, (Add, b)) : align as bs
|
||||||
|
align ((Remove, a):as) ((Keep, b):bs) =
|
||||||
|
(Mine, (Remove, a)) : (Your, (Keep, b)) : align as bs
|
||||||
|
align ((Keep, a):as) ((Remove, b):bs) =
|
||||||
|
(Mine, (Keep, a)) : (Your, (Remove, b)) : align as bs
|
||||||
|
align ((Add, a):as) bs@((Remove, _):_) = (Mine, (Add, a)) : align as bs
|
||||||
|
align as@((Remove, _):_) ((Add, b):bs) = (Your, (Add, b)) : align as bs
|
||||||
|
align [] [] = []
|
||||||
|
align as@((Add, _):_) [] = map (Mine,) as
|
||||||
|
align [] bs@((Add, _):_) = map (Your,) bs
|
||||||
|
align _ _ = error "Internal failure: diffstreams seem broken, cannot align"
|
||||||
|
conflict :: [(Origin, (Op, Tok))] -> Diff
|
||||||
|
conflict [] = []
|
||||||
|
conflict as@(a:_)
|
||||||
|
| stable a = applySplit stable (map snd) conflict as
|
||||||
|
| unstable a = applySplit unstable merge conflict as
|
||||||
|
applySplit :: (a -> Bool) -> ([a] -> [b]) -> ([a] -> [b]) -> [a] -> [b]
|
||||||
|
applySplit cond f g xs = f (takeWhile cond xs) ++ g (dropWhile cond xs)
|
||||||
|
merge :: [(Origin, (Op,Tok))] -> Diff
|
||||||
|
merge cs =
|
||||||
|
let mys = map (\a -> map snd $ filter ((== a) . fst) cs) [Mine, Your]
|
||||||
|
[tokOrigsMine, tokOrigsYour] =
|
||||||
|
map (map snd.filter ((/= Add) . fst)) mys
|
||||||
|
[tokMine, tokYour] = map (map snd.filter ((/= Remove) . fst)) mys
|
||||||
|
in if tokOrigsMine /= tokOrigsYour
|
||||||
|
then error "Internal failure: merge origins differ"
|
||||||
|
else map (MineChanged,) tokMine ++
|
||||||
|
map (Original,) tokOrigsMine ++
|
||||||
|
map (YourChanged,) tokYour
|
||||||
|
stable (Stable, _) = True
|
||||||
|
stable _ = False
|
||||||
|
unstable = not . stable
|
||||||
|
|
113
src/Format.hs
113
src/Format.hs
|
@ -1,128 +1,43 @@
|
||||||
module Format
|
{-# LANGUAGE OverloadedStrings #-}
|
||||||
( pprHunks
|
|
||||||
, pprHunk
|
module Format where
|
||||||
, parsePatch
|
|
||||||
) where
|
|
||||||
|
|
||||||
import Types
|
import Types
|
||||||
|
|
||||||
import Control.Applicative
|
|
||||||
import Control.Monad
|
|
||||||
import Data.Attoparsec.ByteString as A
|
|
||||||
import qualified Data.ByteString as B
|
import qualified Data.ByteString as B
|
||||||
import qualified Data.ByteString.Builder as BB
|
import qualified Data.ByteString.Builder as BB
|
||||||
import qualified Data.ByteString.Internal as BI
|
import qualified Data.ByteString.Internal as BI
|
||||||
import qualified Data.ByteString.Lazy as BL
|
import qualified Data.ByteString.Lazy as BL
|
||||||
import Data.String
|
import Data.String
|
||||||
import Data.Word8 as W8
|
import Substr
|
||||||
|
|
||||||
backslash :: Word8
|
pprHunkHdr i j = fromString $ "@@ -" ++ show i ++ " +" ++ show j ++ " @@"
|
||||||
backslash = BI.c2w '\\'
|
|
||||||
|
|
||||||
newline :: Word8
|
lineSep = fromString "\n"
|
||||||
newline = BI.c2w '\n'
|
|
||||||
|
|
||||||
pprHunks :: [Hunk] -> BB.Builder
|
pprHunks :: [Hunk] -> BB.Builder
|
||||||
pprHunks = mconcat . map pprHunk
|
pprHunks = mconcat . map pprHunk
|
||||||
|
|
||||||
lineSep :: BB.Builder
|
|
||||||
lineSep = BB.word8 newline
|
|
||||||
|
|
||||||
pprHunkHdr :: Int -> Int -> BB.Builder
|
|
||||||
pprHunkHdr i j =
|
|
||||||
(fromString $ "@@ -" ++ show i ++ " +" ++ show j ++ " @@") <> lineSep
|
|
||||||
|
|
||||||
pprHunk :: Hunk -> BB.Builder
|
pprHunk :: Hunk -> BB.Builder
|
||||||
pprHunk ((i, j), toks) = mconcat (pprHunkHdr i j : map pprDiff1 toks)
|
pprHunk ((i, j), toks) = mconcat ((pprHunkHdr i j <> lineSep) : map pprDiff1 toks)
|
||||||
|
|
||||||
pprDiff1 :: (Op, Tok) -> BB.Builder
|
pprDiff1 :: (Op, Tok) -> BB.Builder
|
||||||
pprDiff1 (op, (tok, s)) = fromString pfx <> escNewlines s <> lineSep
|
pprDiff1 (op, (tok, s)) =
|
||||||
|
fromString pfx <> escNewlines s <> lineSep
|
||||||
where
|
where
|
||||||
pfx = [opc, tc]
|
pfx = [opc,tc]
|
||||||
opc =
|
opc = case op of
|
||||||
case op of
|
|
||||||
Add -> '+'
|
Add -> '+'
|
||||||
Keep -> ' '
|
Keep -> ' '
|
||||||
Remove -> '-'
|
Remove -> '-'
|
||||||
MineChanged -> '<'
|
MineChanged -> '<'
|
||||||
Original -> '='
|
Original -> '='
|
||||||
YourChanged -> '>'
|
YourChanged -> '>'
|
||||||
tc =
|
tc = if tok then '|' else '.'
|
||||||
if tok
|
|
||||||
then '|'
|
|
||||||
else '.'
|
|
||||||
|
|
||||||
escNewlines :: BS -> BB.Builder
|
escNewlines :: BS -> BB.Builder
|
||||||
escNewlines s
|
escNewlines s
|
||||||
| B.null s = mempty
|
| B.null s = mempty
|
||||||
| B.head s == newline =
|
| B.head s == BI.c2w '\n' = BB.stringUtf8 "\\n" <> escNewlines (B.tail s)
|
||||||
BB.word8 backslash <> BB.word8 (BI.c2w 'n') <> escNewlines (B.tail s)
|
| B.head s == BI.c2w '\\' = BB.stringUtf8 "\\\\" <> escNewlines (B.tail s)
|
||||||
| B.head s == backslash =
|
|
||||||
BB.word8 backslash <> BB.word8 backslash <> escNewlines (B.tail s)
|
|
||||||
| otherwise = BB.word8 (B.head s) <> escNewlines (B.tail s)
|
| otherwise = BB.word8 (B.head s) <> escNewlines (B.tail s)
|
||||||
|
|
||||||
--parsePatch :: BS -> Either String [Hunk]
|
|
||||||
parsePatch = parseOnly parseHunks
|
|
||||||
|
|
||||||
parseHunks :: Parser [Hunk]
|
|
||||||
parseHunks = many parseHunk <* endOfInput
|
|
||||||
|
|
||||||
parseHunk :: Parser Hunk
|
|
||||||
parseHunk = liftA2 (,) parseHunkHdr (many parseDiff1)
|
|
||||||
|
|
||||||
parseInt :: Parser Int
|
|
||||||
parseInt = read . map BI.w2c <$> some (satisfy W8.isDigit)
|
|
||||||
|
|
||||||
eol :: Parser ()
|
|
||||||
eol = void $ word8 newline
|
|
||||||
|
|
||||||
parseHunkHdr :: Parser (Int, Int)
|
|
||||||
parseHunkHdr = do
|
|
||||||
void . string $ fromString "@@ -"
|
|
||||||
i <- parseInt
|
|
||||||
void . string $ fromString " +"
|
|
||||||
j <- parseInt
|
|
||||||
void . string $ fromString " @@"
|
|
||||||
eol
|
|
||||||
return (i, j)
|
|
||||||
|
|
||||||
pairs2parsers :: [(a, Char)] -> [Parser a]
|
|
||||||
pairs2parsers = map (\(x, ch) -> x <$ word8 (BI.c2w ch))
|
|
||||||
|
|
||||||
parseOpList :: [Parser Op]
|
|
||||||
parseOpList =
|
|
||||||
pairs2parsers
|
|
||||||
[ (Add, '+')
|
|
||||||
, (Keep, ' ')
|
|
||||||
, (Remove, '-')
|
|
||||||
, (MineChanged, '<')
|
|
||||||
, (Original, '=')
|
|
||||||
, (YourChanged, '>')
|
|
||||||
]
|
|
||||||
|
|
||||||
parseOp :: Parser Op
|
|
||||||
parseOp = choice parseOpList
|
|
||||||
|
|
||||||
parseTokMarkList :: [Parser Bool]
|
|
||||||
parseTokMarkList = pairs2parsers [(True, '|'), (False, '.')]
|
|
||||||
|
|
||||||
parseTokMark :: Parser Bool
|
|
||||||
parseTokMark = choice parseTokMarkList
|
|
||||||
|
|
||||||
parseTokBS :: Parser BS
|
|
||||||
parseTokBS =
|
|
||||||
(BL.toStrict . BB.toLazyByteString . mconcat <$> many parseTokChar) <* eol
|
|
||||||
|
|
||||||
parseTokChar :: Parser BB.Builder
|
|
||||||
parseTokChar =
|
|
||||||
choice
|
|
||||||
[ BB.word8 newline <$ string (fromString "\\n")
|
|
||||||
, BB.word8 backslash <$ string (fromString "\\\\")
|
|
||||||
, BB.word8 <$> satisfy (\w -> w /= backslash && w /= newline)
|
|
||||||
]
|
|
||||||
|
|
||||||
parseTok :: Parser Tok
|
|
||||||
parseTok = liftA2 (,) parseTokMark parseTokBS
|
|
||||||
|
|
||||||
parseDiff1 :: Parser (Op, Tok)
|
|
||||||
parseDiff1 = liftA2 (,) parseOp parseTok
|
|
||||||
|
|
|
@ -21,7 +21,7 @@ hunks ctxt d =
|
||||||
zipWith (<=) (padEnd ctxt edits) (replicate (1 + 2 * ctxt) 0 ++ edits)
|
zipWith (<=) (padEnd ctxt edits) (replicate (1 + 2 * ctxt) 0 ++ edits)
|
||||||
addNums = snd . mapAccumL countTok (0, 0)
|
addNums = snd . mapAccumL countTok (0, 0)
|
||||||
stripNums = (,) <$> fst . head <*> map snd
|
stripNums = (,) <$> fst . head <*> map snd
|
||||||
countTok x@(i, j) d'@(op, _) =
|
countTok x@(i, j) d@(op, _) =
|
||||||
(,)
|
(,)
|
||||||
(case op of
|
(case op of
|
||||||
Remove -> (i + 1, j)
|
Remove -> (i + 1, j)
|
||||||
|
@ -30,4 +30,4 @@ hunks ctxt d =
|
||||||
MineChanged -> (i, j)
|
MineChanged -> (i, j)
|
||||||
Original -> (i + 1, j + 1)
|
Original -> (i + 1, j + 1)
|
||||||
YourChanged -> (i, j))
|
YourChanged -> (i, j))
|
||||||
(x, d')
|
(x, d)
|
||||||
|
|
249
src/Main.hs
249
src/Main.hs
|
@ -1,10 +1,9 @@
|
||||||
module Main where
|
module Main where
|
||||||
|
|
||||||
import Control.Exception
|
|
||||||
import Control.Monad
|
|
||||||
import qualified Data.ByteString as B
|
import qualified Data.ByteString as B
|
||||||
import qualified Data.ByteString.Builder as BB
|
import qualified Data.ByteString.Builder as BB
|
||||||
import Data.Foldable (traverse_)
|
import qualified Data.ByteString.Char8 as B8
|
||||||
|
import Data.ByteString.UTF8 (fromString)
|
||||||
import qualified Data.Vector as V
|
import qualified Data.Vector as V
|
||||||
import Diff
|
import Diff
|
||||||
import Diff3
|
import Diff3
|
||||||
|
@ -12,17 +11,14 @@ import Format
|
||||||
import Hunks
|
import Hunks
|
||||||
import Merge
|
import Merge
|
||||||
import Options.Applicative
|
import Options.Applicative
|
||||||
import Patch
|
import Redfa
|
||||||
import System.Exit
|
import System.IO (stdout)
|
||||||
import System.IO (hPutStrLn, stderr, stdout)
|
|
||||||
import System.IO.MMap
|
import System.IO.MMap
|
||||||
import Tokenizers
|
|
||||||
import Types
|
|
||||||
import Version
|
import Version
|
||||||
|
|
||||||
data ADiffOptions =
|
data ADiffOptions =
|
||||||
ADiffOptions
|
ADiffOptions
|
||||||
{ adiffTokOpts :: TokOpts
|
{ adiffRedfaOpt :: RedfaOption
|
||||||
, adiffCmdOpts :: ADiffCommandOpts
|
, adiffCmdOpts :: ADiffCommandOpts
|
||||||
}
|
}
|
||||||
deriving (Show)
|
deriving (Show)
|
||||||
|
@ -35,17 +31,11 @@ data ADiffCommandOpts
|
||||||
}
|
}
|
||||||
| CmdPatch
|
| CmdPatch
|
||||||
{ patchDryRun :: Bool
|
{ patchDryRun :: Bool
|
||||||
--, patchInDir :: Maybe String
|
, patchInDir :: Maybe String
|
||||||
--, patchPathStrip :: Int
|
|
||||||
, patchInputPatch :: String
|
|
||||||
, patchOutput :: String
|
|
||||||
, patchReject :: String --todo convert to Maybes with optional
|
|
||||||
, patchBackup :: String
|
|
||||||
, patchReverse :: Bool
|
|
||||||
, patchScanRange :: Int
|
|
||||||
, context :: Int
|
|
||||||
, patchMergeOpts :: MergeOpts
|
|
||||||
, patchInput :: String
|
, patchInput :: String
|
||||||
|
, patchReverse :: Bool
|
||||||
|
, patchPathStrip :: Int
|
||||||
|
, patchMergeOpts :: MergeOpts
|
||||||
}
|
}
|
||||||
| CmdDiff3
|
| CmdDiff3
|
||||||
{ context :: Int
|
{ context :: Int
|
||||||
|
@ -56,90 +46,50 @@ data ADiffCommandOpts
|
||||||
}
|
}
|
||||||
deriving (Show)
|
deriving (Show)
|
||||||
|
|
||||||
contextOpt :: Bool -> Parser Int
|
contextOpt =
|
||||||
contextOpt forPatch =
|
|
||||||
check <$>
|
|
||||||
option
|
option
|
||||||
auto
|
auto
|
||||||
(metavar "CONTEXT" <>
|
(metavar "CONTEXT" <>
|
||||||
short 'C' <>
|
short 'C' <>
|
||||||
long "context" <>
|
long "context" <>
|
||||||
value
|
value 5 <> help "How many tokens around the change to include in the patch")
|
||||||
(if forPatch
|
|
||||||
then 4
|
|
||||||
else 8) <>
|
|
||||||
help
|
|
||||||
(if forPatch
|
|
||||||
then "Maximum number of context tokens that may be discarded from the beginning and end of the hunk when attempting to find a match"
|
|
||||||
else "How many tokens around the change to include in the patch"))
|
|
||||||
where
|
|
||||||
check c
|
|
||||||
| c < 0 = error "Negative context"
|
|
||||||
| otherwise = c
|
|
||||||
|
|
||||||
diffCmdOptions :: Parser ADiffCommandOpts
|
|
||||||
diffCmdOptions =
|
diffCmdOptions =
|
||||||
CmdDiff <$> contextOpt False <*> strArgument (metavar "FROMFILE") <*>
|
CmdDiff <$> contextOpt <*> strArgument (metavar "FROMFILE") <*>
|
||||||
strArgument (metavar "TOFILE")
|
strArgument (metavar "TOFILE")
|
||||||
|
|
||||||
patchCmdOptions :: Parser ADiffCommandOpts
|
|
||||||
patchCmdOptions =
|
patchCmdOptions =
|
||||||
CmdPatch <$>
|
CmdPatch <$>
|
||||||
switch
|
switch
|
||||||
(short 'n' <>
|
(short 'n' <>
|
||||||
long "dry-run" <>
|
long "dry-run" <>
|
||||||
help "Do not patch anything, just print what would happen") <*>
|
help "Do not patch anything, just print what would happen") <*>
|
||||||
-- optional (strOption $ short 'd' <> long "directory" <> metavar "DIR" <> help "Work in DIR") <*>
|
optional
|
||||||
-- option auto (short 'p' <> long "strip" <> metavar "NUM" <> help "Strip NUM leading components from the paths" <> value 0) <*>
|
(strOption $
|
||||||
|
short 'd' <> long "directory" <> metavar "DIR" <> help "Work in DIR") <*>
|
||||||
strOption
|
strOption
|
||||||
(short 'i' <>
|
(short 'i' <>
|
||||||
long "input" <>
|
long "input" <>
|
||||||
metavar "PATCHFILE" <>
|
metavar "INPUT" <>
|
||||||
help "Read the patchfile from PATCHFILE, defaults to `-' for STDIN" <>
|
help "Read the patchfile from INPUT, defaults to `-' for STDIN" <>
|
||||||
value "-") <*>
|
value "-") <*>
|
||||||
strOption
|
|
||||||
(short 'o' <>
|
|
||||||
long "output" <>
|
|
||||||
metavar "OUTPUT" <>
|
|
||||||
help
|
|
||||||
"Write the patched file to OUTPUT, use `-' for STDOUT. By default, INPUT is rewritten." <>
|
|
||||||
value "") <*>
|
|
||||||
strOption
|
|
||||||
(short 'r' <>
|
|
||||||
long "reject" <>
|
|
||||||
metavar "REJECTS" <>
|
|
||||||
help
|
|
||||||
"Write the rejected hunks file to file REJECTS, instead of default `OUTPUT.rej'. Use `-' to discard rejects." <>
|
|
||||||
value "") <*>
|
|
||||||
strOption
|
|
||||||
(short 'b' <>
|
|
||||||
long "backup" <>
|
|
||||||
metavar "BACKUP" <>
|
|
||||||
help
|
|
||||||
"When rewriting INPUT after a partially applied or otherwise suspicious patch, back up the original file in BACKUP instead of default `INPUT.orig'. Use `-' to discard backups." <>
|
|
||||||
value "") <*>
|
|
||||||
switch (short 'R' <> long "reverse" <> help "Unapply applied patches") <*>
|
switch (short 'R' <> long "reverse" <> help "Unapply applied patches") <*>
|
||||||
option
|
option
|
||||||
auto
|
auto
|
||||||
(short 'S' <>
|
(short 'p' <>
|
||||||
long "scan-range" <>
|
long "strip" <>
|
||||||
metavar "RANGE" <>
|
metavar "NUM" <>
|
||||||
help
|
help "Strip NUM leading components from the paths" <> value 0) <*>
|
||||||
"Maximum distance from the indended patch position (in tokens) for fuzzy matching of hunks" <>
|
mergeOption True
|
||||||
value 42) <*>
|
|
||||||
contextOpt True <*>
|
|
||||||
mergeOption True <*>
|
|
||||||
strArgument (metavar "INPUT")
|
|
||||||
|
|
||||||
diff3CmdOptions :: Parser ADiffCommandOpts
|
|
||||||
diff3CmdOptions =
|
diff3CmdOptions =
|
||||||
CmdDiff3 <$> contextOpt False <*> strArgument (metavar "MYFILE") <*>
|
CmdDiff3 <$> contextOpt <*> strArgument (metavar "MYFILE") <*>
|
||||||
strArgument (metavar "OLDFILE") <*>
|
strArgument (metavar "OLDFILE") <*>
|
||||||
strArgument (metavar "YOURFILE") <*>
|
strArgument (metavar "YOURFILE") <*>
|
||||||
mergeOption False
|
mergeOption False
|
||||||
|
|
||||||
actionOptions :: Parser ADiffCommandOpts
|
actionOption :: Parser ADiffCommandOpts
|
||||||
actionOptions =
|
actionOption =
|
||||||
hsubparser $
|
hsubparser $
|
||||||
mconcat
|
mconcat
|
||||||
[ command "diff" $ info diffCmdOptions $ progDesc "Compare two files"
|
[ command "diff" $ info diffCmdOptions $ progDesc "Compare two files"
|
||||||
|
@ -148,138 +98,35 @@ actionOptions =
|
||||||
info diff3CmdOptions $ progDesc "Compare and merge three files"
|
info diff3CmdOptions $ progDesc "Compare and merge three files"
|
||||||
]
|
]
|
||||||
|
|
||||||
adiffOptions :: Parser ADiffOptions
|
adiffOptions = ADiffOptions <$> redfaOption <*> actionOption
|
||||||
adiffOptions = ADiffOptions <$> tokOptions <*> actionOptions
|
|
||||||
|
|
||||||
-- TODO: load in case it's not a regular file
|
loadToks redfa f =
|
||||||
loadToksMM :: TokOpts -> FilePath -> IO TV
|
V.fromList <$> (mmapFileByteString f Nothing >>= redfaTokenize redfa)
|
||||||
loadToksMM topt fn = loadToksWith topt fn (mmapFileByteString fn Nothing)
|
|
||||||
|
|
||||||
loadToksR :: TokOpts -> FilePath -> IO TV
|
main :: IO ()
|
||||||
loadToksR topt fn = loadToksWith topt fn (B.readFile fn)
|
main =
|
||||||
|
|
||||||
loadToksWith :: TokOpts -> FilePath -> IO BS -> IO TV
|
|
||||||
loadToksWith topt fn bs = V.fromList <$> (bs >>= tokenize topt fn)
|
|
||||||
|
|
||||||
doDiff :: TokOpts -> ADiffCommandOpts -> IO ()
|
|
||||||
doDiff topt (CmdDiff {diffFile1 = f1, diffFile2 = f2, context = ctxt}) = do
|
|
||||||
[toks1, toks2] <- traverse (loadToksMM topt) [f1, f2]
|
|
||||||
let output = hunks (max 0 ctxt) $ diffToks toks1 toks2
|
|
||||||
BB.hPutBuilder stdout $ pprHunks output
|
|
||||||
unless (null output) $ exitWith (ExitFailure 1)
|
|
||||||
doDiff _ _ = error "dispatch failure"
|
|
||||||
|
|
||||||
doDiff3 :: TokOpts -> ADiffCommandOpts -> IO ()
|
|
||||||
doDiff3 topt (CmdDiff3 ctxt f1 f2 f3 mo) = do
|
|
||||||
[toksMine, toksOld, toksYour] <- traverse (loadToksMM topt) [f1, f2, f3]
|
|
||||||
let d3 = diff3Toks mo toksMine toksOld toksYour
|
|
||||||
isConflict (MineChanged, _) = True
|
|
||||||
isConflict (YourChanged, _) = True
|
|
||||||
isConflict _ = False
|
|
||||||
hasConflict = any isConflict d3
|
|
||||||
BB.hPutBuilder stdout $
|
|
||||||
if mergeDoMerge mo
|
|
||||||
then do
|
|
||||||
fmtMerged mo d3
|
|
||||||
else pprHunks $ hunks (max 0 ctxt) d3
|
|
||||||
when hasConflict $ exitWith (ExitFailure 1)
|
|
||||||
doDiff3 _ _ = error "dispatch failure"
|
|
||||||
|
|
||||||
note :: String -> IO ()
|
|
||||||
note = hPutStrLn stderr
|
|
||||||
|
|
||||||
doPatch :: TokOpts -> ADiffCommandOpts -> IO ()
|
|
||||||
doPatch topt o = do
|
|
||||||
toksIn <- loadToksR topt (patchInput o)
|
|
||||||
hs' <-
|
|
||||||
parsePatch <$>
|
|
||||||
case (patchInputPatch o) of
|
|
||||||
"-" -> B.getContents
|
|
||||||
fn -> B.readFile fn
|
|
||||||
hs <-
|
|
||||||
case hs' of
|
|
||||||
Left _ -> ioError $ userError "PATCHFILE parsing failed"
|
|
||||||
Right x -> pure x
|
|
||||||
let (toks, rej, warns) =
|
|
||||||
patchToks
|
|
||||||
toksIn
|
|
||||||
hs
|
|
||||||
(patchReverse o)
|
|
||||||
(patchScanRange o)
|
|
||||||
(context o)
|
|
||||||
(patchMergeOpts o)
|
|
||||||
sus = not (null warns)
|
|
||||||
dry = patchDryRun o
|
|
||||||
rewritingInput = null (patchOutput o)
|
|
||||||
outputStdout = patchOutput o == "-"
|
|
||||||
rejFile
|
|
||||||
| rewritingInput || outputStdout = patchInput o ++ ".rej"
|
|
||||||
| otherwise = patchOutput o ++ ".rej"
|
|
||||||
backupFile
|
|
||||||
| patchBackup o == "-" = ""
|
|
||||||
| patchBackup o == "" && rewritingInput = patchInput o ++ ".orig"
|
|
||||||
| otherwise = patchBackup o
|
|
||||||
outFile
|
|
||||||
| rewritingInput = patchInput o
|
|
||||||
| otherwise = patchOutput o
|
|
||||||
traverse_ (note . pprPatchWarn) warns
|
|
||||||
when dry $
|
|
||||||
note $
|
|
||||||
(if not sus
|
|
||||||
then "OK"
|
|
||||||
else "Possibly problematic") ++
|
|
||||||
" patch with " ++ show (length rej :: Int) ++ " rejected hunks"
|
|
||||||
when (not (null rej)) $
|
|
||||||
if dry
|
|
||||||
then note $ "Would write rejected hunks to " ++ rejFile
|
|
||||||
else do
|
|
||||||
note $ "Writing rejected hunks to " ++ rejFile
|
|
||||||
BB.writeFile rejFile (pprHunks rej)
|
|
||||||
when (sus && not (null backupFile)) $
|
|
||||||
if dry
|
|
||||||
then note $ "Would write backup to " ++ backupFile
|
|
||||||
else do
|
|
||||||
note $ "Writing backup to " ++ backupFile
|
|
||||||
B.readFile (patchInput o) >>= B.writeFile backupFile
|
|
||||||
let doWrite output =
|
|
||||||
if outputStdout
|
|
||||||
then if dry
|
|
||||||
then note "Would write output to stdout"
|
|
||||||
else BB.hPutBuilder stdout output
|
|
||||||
else if dry
|
|
||||||
then note $ "Would write output to " ++ outFile
|
|
||||||
else do
|
|
||||||
note $ "Writing output to " ++ outFile
|
|
||||||
BB.writeFile outFile output
|
|
||||||
doWrite (mconcat . map (BB.byteString . snd) . V.toList $ toks)
|
|
||||||
when (dry && not (null rej)) $ do
|
|
||||||
note "Rejected hunks:"
|
|
||||||
BB.hPutBuilder stdout (pprHunks rej)
|
|
||||||
when sus $ exitWith (ExitFailure 1)
|
|
||||||
|
|
||||||
main' :: IO ()
|
|
||||||
main' =
|
|
||||||
let opts :: ParserInfo ADiffOptions
|
let opts :: ParserInfo ADiffOptions
|
||||||
opts =
|
opts =
|
||||||
info
|
info
|
||||||
(adiffOptions <**> versionOption "adiff" <**> helperOption)
|
(adiffOptions <**> versionOption "adiff" <**> helperOption)
|
||||||
(fullDesc <>
|
(fullDesc <>
|
||||||
progDesc
|
progDesc
|
||||||
"Compare, patch and merge files on arbitrarily tokenized sequences." <>
|
"Compare, patch and merge files on arbitrarily-tokenized sequences." <>
|
||||||
header "adiff: arbitrary-token diff utilities")
|
header "adiff: arbitrary-token diff utilities")
|
||||||
in do ADiffOptions {adiffTokOpts = topt, adiffCmdOpts = copt} <-
|
in do ADiffOptions {adiffRedfaOpt = ropt, adiffCmdOpts = copt} <-
|
||||||
customExecParser (prefs $ helpShowGlobals <> subparserInline) opts
|
execParser opts
|
||||||
(case copt of
|
redfa <- redfaPrepareRules ropt
|
||||||
CmdDiff {} -> doDiff
|
case copt of
|
||||||
CmdDiff3 {} -> doDiff3
|
CmdDiff {diffFile1 = f1, diffFile2 = f2, context = ctxt} -> do
|
||||||
CmdPatch {} -> doPatch)
|
[toks1, toks2] <- traverse (loadToks redfa) [f1, f2]
|
||||||
topt
|
BB.hPutBuilder stdout $
|
||||||
copt
|
pprHunks $ hunks (max 0 ctxt) $ diffToks toks1 toks2
|
||||||
|
CmdPatch {} -> putStrLn "not supported yet"
|
||||||
main :: IO ()
|
CmdDiff3 ctxt f1 f2 f3 mo -> do
|
||||||
main =
|
[toksMine, toksOld, toksYour] <-
|
||||||
main' `catch`
|
traverse (loadToks redfa) [f1, f2, f3]
|
||||||
(\e -> do
|
let d3 = diff3Toks mo toksMine toksOld toksYour
|
||||||
let err = show (e :: IOException)
|
BB.hPutBuilder stdout $
|
||||||
note err
|
if mergeDoMerge mo
|
||||||
exitWith $ ExitFailure 2)
|
then fmtMerged mo d3
|
||||||
|
else pprHunks $ hunks (max 0 ctxt) d3
|
||||||
|
|
76
src/Merge.hs
76
src/Merge.hs
|
@ -1,13 +1,12 @@
|
||||||
{-# LANGUAGE OverloadedStrings #-}
|
{-# LANGUAGE OverloadedStrings #-}
|
||||||
{-# LANGUAGE TupleSections #-}
|
|
||||||
|
|
||||||
module Merge
|
module Merge
|
||||||
( MergeOpts(..)
|
( MergeOpts(..)
|
||||||
, mergeOption
|
, mergeOption
|
||||||
, fmtMerged
|
, fmtMerged
|
||||||
, merge
|
|
||||||
) where
|
) where
|
||||||
|
|
||||||
|
import qualified Data.ByteString as B
|
||||||
import qualified Data.ByteString.Builder as BB
|
import qualified Data.ByteString.Builder as BB
|
||||||
import Data.String
|
import Data.String
|
||||||
import Options.Applicative
|
import Options.Applicative
|
||||||
|
@ -16,7 +15,6 @@ import Types
|
||||||
data MergeOpts =
|
data MergeOpts =
|
||||||
MergeOpts
|
MergeOpts
|
||||||
{ mergeDoMerge :: Bool
|
{ mergeDoMerge :: Bool
|
||||||
, mergeIgnoreWhitespace :: Bool
|
|
||||||
, mergeForceWhitespace :: Bool
|
, mergeForceWhitespace :: Bool
|
||||||
, mergeKeepWhitespace :: Bool
|
, mergeKeepWhitespace :: Bool
|
||||||
, mergeCStartStr :: BS
|
, mergeCStartStr :: BS
|
||||||
|
@ -26,17 +24,17 @@ data MergeOpts =
|
||||||
}
|
}
|
||||||
deriving (Show)
|
deriving (Show)
|
||||||
|
|
||||||
mergeOption :: Bool -> Parser MergeOpts
|
marker = fromString . replicate 7
|
||||||
|
|
||||||
mergeOption forPatch =
|
mergeOption forPatch =
|
||||||
addLBR <$>
|
addLBR <$>
|
||||||
((,) <$>
|
((,) <$>
|
||||||
switch
|
switch
|
||||||
(short 'a' <>
|
(short 'a' <>
|
||||||
long "add-linebreak" <>
|
long "add-linebreak" <>
|
||||||
help "Automatically add a line break after conflict markers. Useful with `lines' lexer.") <*>
|
help "Automatically add a line break after conflict markers") <*>
|
||||||
mo)
|
mo)
|
||||||
where
|
where
|
||||||
marker = fromString . replicate 7
|
|
||||||
mo =
|
mo =
|
||||||
MergeOpts <$>
|
MergeOpts <$>
|
||||||
switch
|
switch
|
||||||
|
@ -44,36 +42,27 @@ mergeOption forPatch =
|
||||||
long "merge" <>
|
long "merge" <>
|
||||||
help
|
help
|
||||||
(if forPatch
|
(if forPatch
|
||||||
then "Instead of printing the rejected thunks, merge using conflict markers as if the INPUT was `MYFILE' and the patch would produced `YOURFILE' from the original."
|
then "Merge using conflict markers instead of printing the rejected hunks"
|
||||||
else "Output the merged file instead of the patch")) <*>
|
else "Output the merged file instead of the patch")) <*>
|
||||||
switch
|
switch
|
||||||
(short 'I' <>
|
(short 'w' <>
|
||||||
long "ignore-whitespace" <>
|
long "whitespace" <>
|
||||||
help
|
|
||||||
("Ignore " ++
|
|
||||||
(if forPatch
|
|
||||||
then "hunks"
|
|
||||||
else "chunks") ++
|
|
||||||
" that change only whitespace")) <*>
|
|
||||||
switch
|
|
||||||
(short 'F' <>
|
|
||||||
long "force-whitespace" <>
|
|
||||||
help
|
help
|
||||||
((if forPatch
|
((if forPatch
|
||||||
then "Force rejecting a hunk"
|
then "Force rejecting a thunk"
|
||||||
else "Force a merge conflict") ++
|
else "Force a merge conflict") ++
|
||||||
" on whitespace mismatch (overrides `ignore-whitespace')")) <*>
|
" on whitespace mismatch")) <*>
|
||||||
switch
|
switch
|
||||||
(short 'K' <>
|
(short 'k' <>
|
||||||
long "keep-whitespace" <>
|
long "keep-whitespace" <>
|
||||||
help
|
help
|
||||||
("On whitespace mismatch, output the version from " ++
|
("On whitespace mismatch, default to the version from " ++
|
||||||
(if forPatch
|
(if forPatch
|
||||||
then "the original file"
|
then "original file"
|
||||||
else "MYFILE") ++
|
else "MYFILE") ++
|
||||||
" instead of the one from " ++
|
" instead of the one from " ++
|
||||||
(if forPatch
|
(if forPatch
|
||||||
then "the context in patch"
|
then "patch"
|
||||||
else "YOURFILE"))) <*>
|
else "YOURFILE"))) <*>
|
||||||
strOption
|
strOption
|
||||||
(long "merge-start" <>
|
(long "merge-start" <>
|
||||||
|
@ -100,27 +89,24 @@ mergeOption forPatch =
|
||||||
, mergeCEndStr = mergeCEndStr x <> "\n"
|
, mergeCEndStr = mergeCEndStr x <> "\n"
|
||||||
}
|
}
|
||||||
|
|
||||||
{- This kinda relies on reasonable ordering
|
{- This kinda relies on reasonable ordering within the conflicts in the Diff -}
|
||||||
- within the conflicts in the Diff -}
|
|
||||||
fmtMerged :: MergeOpts -> Diff -> BB.Builder
|
fmtMerged :: MergeOpts -> Diff -> BB.Builder
|
||||||
fmtMerged mo = go Keep
|
fmtMerged mo = go Keep
|
||||||
where
|
where
|
||||||
go op []
|
go op []
|
||||||
| conflictOp op = bb $ mergeCEndStr mo
|
| conflictOp op = bb $ mergeCEndStr mo
|
||||||
| otherwise = mempty
|
| otherwise = mempty
|
||||||
go prev l@((op, (_, tok)):xs)
|
go last l@((op, (_, tok)):xs)
|
||||||
| conflictOp prev && not (conflictOp op) =
|
| conflictOp last && not (conflictOp op) =
|
||||||
bb (mergeCEndStr mo) <> go Keep l
|
bb (mergeCEndStr mo) <> go Keep l
|
||||||
| not (conflictOp prev) && conflictOp op =
|
| not (conflictOp last) && conflictOp op =
|
||||||
bb (mergeCStartStr mo) <> go MineChanged l
|
bb (mergeCStartStr mo) <> go MineChanged l
|
||||||
| prev /= op && conflictOp op =
|
| last /= op && conflictOp op =
|
||||||
(case op of
|
(case op of
|
||||||
MineChanged -> bb $ mergeCStartStr mo
|
MineChanged -> bb $ mergeCStartStr mo
|
||||||
Original -> bb $ mergeMineSepStr mo
|
Original -> bb $ mergeMineSepStr mo
|
||||||
YourChanged -> bb $ mergeYourSepStr mo
|
YourChanged -> bb $ mergeYourSepStr mo) <>
|
||||||
_ -> error "Internal conflict handling failure") <>
|
|
||||||
go op l
|
go op l
|
||||||
| op == Remove = go op xs
|
|
||||||
| otherwise = bb tok <> go op xs
|
| otherwise = bb tok <> go op xs
|
||||||
conflictOp o =
|
conflictOp o =
|
||||||
case o of
|
case o of
|
||||||
|
@ -129,27 +115,3 @@ fmtMerged mo = go Keep
|
||||||
Remove -> False
|
Remove -> False
|
||||||
_ -> True
|
_ -> True
|
||||||
bb = BB.byteString
|
bb = BB.byteString
|
||||||
|
|
||||||
merge :: MergeOpts -> [(Origin, (Op, Tok))] -> Diff
|
|
||||||
merge mo cs = go
|
|
||||||
where
|
|
||||||
mys@[diffMine, diffYour] =
|
|
||||||
map (\a -> map snd $ filter ((a ==) . fst) cs) [Mine, Your]
|
|
||||||
[tokOrigsMine, tokOrigsYour] = map (map snd . filter ((Add /=) . fst)) mys
|
|
||||||
[tokMine, tokYour] = map (map snd . filter ((Remove /=) . fst)) mys
|
|
||||||
conflict =
|
|
||||||
map (MineChanged, ) tokMine ++
|
|
||||||
map (Original, ) tokOrigsMine ++ map (YourChanged, ) tokYour
|
|
||||||
noTokens = all (not . fst . snd) (diffMine ++ diffYour)
|
|
||||||
go
|
|
||||||
| tokOrigsMine /= tokOrigsYour =
|
|
||||||
error "Internal failure: merge origins differ"
|
|
||||||
| mergeIgnoreWhitespace mo && noTokens = map (Keep, ) tokOrigsMine
|
|
||||||
| all ((Keep ==) . fst) diffYour = diffMine -- only mine changed
|
|
||||||
| all ((Keep ==) . fst) diffMine = diffYour -- only your changed
|
|
||||||
| diffMine == diffYour = diffMine -- false conflict
|
|
||||||
| not (mergeForceWhitespace mo) && noTokens =
|
|
||||||
if mergeKeepWhitespace mo
|
|
||||||
then diffMine
|
|
||||||
else diffYour -- conflict happened, but not on significant tokens
|
|
||||||
| otherwise = conflict -- true conflict
|
|
||||||
|
|
212
src/Patch.hs
212
src/Patch.hs
|
@ -1,210 +1,4 @@
|
||||||
module Patch
|
module Patch where
|
||||||
( patchToks
|
|
||||||
, pprPatchWarn
|
|
||||||
) where
|
|
||||||
|
|
||||||
import Data.Maybe (catMaybes)
|
patchToks :: a
|
||||||
import qualified Data.Vector as V
|
patchToks = undefined
|
||||||
import Merge
|
|
||||||
import Types
|
|
||||||
|
|
||||||
data PatchWarn
|
|
||||||
= HunkMatched Hunk (Int, Int)
|
|
||||||
| HunkFailed Hunk (Int, Int)
|
|
||||||
deriving (Show)
|
|
||||||
|
|
||||||
data PatchState =
|
|
||||||
PatchState
|
|
||||||
{ input :: TV
|
|
||||||
, output :: [TV]
|
|
||||||
, warns :: [PatchWarn]
|
|
||||||
, inOff :: Int -- where we are in `input`
|
|
||||||
, patchInOff :: Int -- to what position does that correspond in patch input (gets adjusted on fuzzy matches)
|
|
||||||
, outOff :: Int -- where we are in output tokens (informative)
|
|
||||||
, patchOutOff :: Int -- to what position does that correspond in patch output (gets adjusted from processed hunks)
|
|
||||||
}
|
|
||||||
|
|
||||||
pprPatchWarn :: PatchWarn -> String
|
|
||||||
pprPatchWarn (HunkMatched (offs, _) poffs) =
|
|
||||||
"hunk (" ++ pprOffs offs ++ ") succeeded at " ++ pprOffs poffs
|
|
||||||
pprPatchWarn (HunkFailed (offs, _) poffs) =
|
|
||||||
"hunk (" ++ pprOffs offs ++ ") FAILED, expected at " ++ pprOffs poffs
|
|
||||||
|
|
||||||
pprOffs :: (Int, Int) -> String
|
|
||||||
pprOffs (o, n) = "-" ++ show o ++ " +" ++ show n
|
|
||||||
|
|
||||||
patchToks ::
|
|
||||||
TV
|
|
||||||
-> [Hunk]
|
|
||||||
-> Bool
|
|
||||||
-> Int
|
|
||||||
-> Int
|
|
||||||
-> MergeOpts
|
|
||||||
-> (TV, [Hunk], [PatchWarn])
|
|
||||||
patchToks toks hunks' revPatch scan ctxt mopt =
|
|
||||||
go hunks $ PatchState toks [] [] 0 0 0 0
|
|
||||||
where
|
|
||||||
hunks
|
|
||||||
| revPatch = map revHunk hunks'
|
|
||||||
| otherwise = hunks'
|
|
||||||
revHunk ((o, n), diff) = ((n, o), map revDiff diff)
|
|
||||||
revDiff (Add, t) = (Remove, t)
|
|
||||||
revDiff (Remove, t) = (Add, t)
|
|
||||||
revDiff (Keep, t) = (Keep, t)
|
|
||||||
revDiff _ = error "cannot reverse conflict diff"
|
|
||||||
go [] ps =
|
|
||||||
( V.concat (output ps ++ [V.drop (inOff ps) (input ps)])
|
|
||||||
, [rej | HunkFailed rej _ <- warns ps]
|
|
||||||
, warns ps)
|
|
||||||
go (h:hs) ps = go hs ps'
|
|
||||||
where
|
|
||||||
((fromPos, toPos), diff) = h
|
|
||||||
advance = fromPos - patchInOff ps
|
|
||||||
noMatch =
|
|
||||||
ps
|
|
||||||
{ warns =
|
|
||||||
warns ps ++
|
|
||||||
[HunkFailed h (advance + inOff ps, advance + outOff ps)]
|
|
||||||
, patchOutOff = patchOutOff ps - diffOffChange diff
|
|
||||||
}
|
|
||||||
cleanMatch :: Maybe PatchState
|
|
||||||
cleanMatch = patchHunkClean ps h mopt
|
|
||||||
isContext :: (Op, Tok) -> Bool
|
|
||||||
isContext (op, _) = op == Keep
|
|
||||||
discardedContextDiffs :: [(Int, Diff)]
|
|
||||||
discardedContextDiffs =
|
|
||||||
let (fwdCtxt, d') = span isContext diff
|
|
||||||
(revCtxt, revMid) = span isContext (reverse d')
|
|
||||||
mid = reverse revMid
|
|
||||||
discards n c@(_:r) = (n, c) : discards (n + 1) r
|
|
||||||
discards n [] = (n, []) : discards n []
|
|
||||||
in zipWith
|
|
||||||
(\(dfwd, fwd) (drev, rev) ->
|
|
||||||
(max dfwd drev, fwd ++ mid ++ reverse rev))
|
|
||||||
(discards 0 fwdCtxt)
|
|
||||||
(discards 0 revCtxt)
|
|
||||||
fuzzyHunks :: [Hunk]
|
|
||||||
fuzzyHunks = do
|
|
||||||
(discarded, ddiff) <- take (ctxt + 1) discardedContextDiffs
|
|
||||||
off <- 0 : concatMap (\x -> [-x, x]) [1 .. scan]
|
|
||||||
pure ((fromPos + discarded + off, toPos + discarded), ddiff)
|
|
||||||
fuzzyMatches =
|
|
||||||
[ (\x -> x {warns = warns x ++ [HunkMatched h fPos]}) <$>
|
|
||||||
patchHunkClean ps fh mopt
|
|
||||||
| fh@(fPos, _) <- tail fuzzyHunks -- tail omits the "clean" 0,0 one
|
|
||||||
]
|
|
||||||
ps' = head $ catMaybes (cleanMatch : fuzzyMatches) ++ [noMatch]
|
|
||||||
|
|
||||||
patchHunkClean :: PatchState -> Hunk -> MergeOpts -> Maybe PatchState
|
|
||||||
patchHunkClean ps ((fromPos, toPos), diff) mopts
|
|
||||||
| expInOff < 0 || expOutOff < 0 = Nothing
|
|
||||||
| mergeIgnoreWhitespace mopts && whitespaceOnly diff = Just ps
|
|
||||||
| Just repl <- matchDiff mopts (V.toList origPart) diff =
|
|
||||||
Just
|
|
||||||
ps
|
|
||||||
{ output = output ps ++ [skipped, V.fromList repl]
|
|
||||||
, inOff = expInOff + matchLen
|
|
||||||
, patchInOff = fromPos + matchLen
|
|
||||||
, outOff = expOutOff + replLen
|
|
||||||
, patchOutOff = toPos + replLen
|
|
||||||
}
|
|
||||||
| otherwise = Nothing
|
|
||||||
where
|
|
||||||
matchLen = diffMatchLen diff
|
|
||||||
replLen = diffReplLen diff
|
|
||||||
advance = fromPos - patchInOff ps
|
|
||||||
expInOff = advance + inOff ps
|
|
||||||
expOutOff = advance + outOff ps
|
|
||||||
skipped = V.take expInOff $ input ps
|
|
||||||
origPart = V.take matchLen $ V.drop expInOff $ input ps
|
|
||||||
|
|
||||||
whitespaceOnly :: Diff -> Bool
|
|
||||||
whitespaceOnly = all wsOnly
|
|
||||||
where
|
|
||||||
wsOnly (Keep, _) = True
|
|
||||||
wsOnly (Original, _) = True
|
|
||||||
wsOnly (_, (False, _)) = True
|
|
||||||
wsOnly _ = False
|
|
||||||
|
|
||||||
diffMatchLen :: Diff -> Int
|
|
||||||
diffMatchLen = sum . map (off . fst)
|
|
||||||
where
|
|
||||||
off Keep = 1
|
|
||||||
off Remove = 1
|
|
||||||
off Original = 1
|
|
||||||
off _ = 0
|
|
||||||
|
|
||||||
diffReplLen :: Diff -> Int
|
|
||||||
diffReplLen = sum . map (off . fst)
|
|
||||||
where
|
|
||||||
off Keep = 1
|
|
||||||
off Add = 1
|
|
||||||
off Original = 1
|
|
||||||
off _ = 0 -- tricky: the conflicts do not actually add to the diff counters
|
|
||||||
|
|
||||||
diffOffChange :: Diff -> Int
|
|
||||||
diffOffChange = sum . map (off . fst)
|
|
||||||
where
|
|
||||||
off Add = 1
|
|
||||||
off Remove = -1
|
|
||||||
off _ = 0
|
|
||||||
|
|
||||||
markToks :: MergeOpts -> Op -> Op -> [Tok]
|
|
||||||
markToks mopts x' y' = map (\s -> (True, s)) $ go x' y'
|
|
||||||
where
|
|
||||||
unmarked :: Op -> Bool
|
|
||||||
unmarked Keep = True
|
|
||||||
unmarked Add = True
|
|
||||||
unmarked Remove = True
|
|
||||||
unmarked _ = False
|
|
||||||
go :: Op -> Op -> [BS]
|
|
||||||
go x y
|
|
||||||
| x == y = []
|
|
||||||
| unmarked x = goUnmarked y
|
|
||||||
| x == MineChanged
|
|
||||||
, y /= Original = mergeYourSepStr mopts : go YourChanged y
|
|
||||||
| x == MineChanged = mergeMineSepStr mopts : go Original y
|
|
||||||
| x == Original = mergeYourSepStr mopts : go YourChanged y
|
|
||||||
| x == YourChanged = mergeCEndStr mopts : goUnmarked y
|
|
||||||
| otherwise = error "internal error in markToks"
|
|
||||||
goUnmarked :: Op -> [BS]
|
|
||||||
goUnmarked y
|
|
||||||
| unmarked y = []
|
|
||||||
| otherwise = mergeCStartStr mopts : go MineChanged y
|
|
||||||
|
|
||||||
matchDiff :: MergeOpts -> [Tok] -> Diff -> Maybe [Tok]
|
|
||||||
matchDiff mopt = go Keep
|
|
||||||
where
|
|
||||||
withMark :: Op -> Op -> [Tok] -> ([Tok] -> [Tok])
|
|
||||||
withMark prev op toks = (++) (markToks mopt prev op ++ toks)
|
|
||||||
go :: Op -> [Tok] -> Diff -> Maybe [Tok]
|
|
||||||
go prev ts ds
|
|
||||||
| null ts
|
|
||||||
, null ds = return $ markToks mopt prev Keep
|
|
||||||
| ((op, tok):ds') <- ds
|
|
||||||
, op == Add || op == MineChanged || op == YourChanged =
|
|
||||||
withMark prev op [tok] <$> go op ts ds'
|
|
||||||
| (intok:ts') <- ts
|
|
||||||
, ((op, tok):ds') <- ds
|
|
||||||
, op == Keep || op == Original
|
|
||||||
, tokCmp' mopt intok tok =
|
|
||||||
withMark
|
|
||||||
prev
|
|
||||||
op
|
|
||||||
[ if mergeKeepWhitespace mopt && not (fst intok)
|
|
||||||
then intok
|
|
||||||
else tok
|
|
||||||
] <$>
|
|
||||||
go op ts' ds'
|
|
||||||
| (intok:ts') <- ts
|
|
||||||
, ((op, tok):ds') <- ds
|
|
||||||
, op == Remove
|
|
||||||
, tokCmp' mopt intok tok = withMark prev op [] <$> go op ts' ds'
|
|
||||||
| otherwise = Nothing
|
|
||||||
|
|
||||||
tokCmp' :: MergeOpts -> Tok -> Tok -> Bool
|
|
||||||
tokCmp' MergeOpts {mergeForceWhitespace = x} = tokCmp x
|
|
||||||
|
|
||||||
tokCmp :: Bool -> Tok -> Tok -> Bool
|
|
||||||
tokCmp False (False, _) (False, _) = True -- do not force rejecting on whitespace change
|
|
||||||
tokCmp _ a b = a == b -- otherwise just compare
|
|
||||||
|
|
185
src/Redfa.hs
Normal file
185
src/Redfa.hs
Normal file
|
@ -0,0 +1,185 @@
|
||||||
|
{-# LANGUAGE FlexibleContexts #-}
|
||||||
|
|
||||||
|
module Redfa where
|
||||||
|
|
||||||
|
import Control.Monad
|
||||||
|
import qualified Data.ByteString as B
|
||||||
|
import qualified Data.ByteString.Builder as BB
|
||||||
|
import qualified Data.ByteString.Char8 as B8
|
||||||
|
import qualified Data.ByteString.Internal as BI
|
||||||
|
import qualified Data.ByteString.Lazy as BL
|
||||||
|
import Data.ByteString.UTF8 (fromString, toString)
|
||||||
|
import Data.List
|
||||||
|
import Data.Maybe
|
||||||
|
import Data.Monoid
|
||||||
|
import qualified Data.Vector as V
|
||||||
|
import Options.Applicative
|
||||||
|
import Text.Regex.TDFA
|
||||||
|
import Types
|
||||||
|
import Substr
|
||||||
|
|
||||||
|
data RedfaOption
|
||||||
|
= RedfaOptionRules [BS]
|
||||||
|
| RedfaOptionFile String
|
||||||
|
deriving (Show)
|
||||||
|
|
||||||
|
data RedfaSpec =
|
||||||
|
RedfaSpec
|
||||||
|
{ redfaStart :: Int
|
||||||
|
, redfaNames :: V.Vector BS
|
||||||
|
, redfaRules :: V.Vector [RedfaRule] --start, mid, jump, token
|
||||||
|
}
|
||||||
|
|
||||||
|
data RedfaRule =
|
||||||
|
RedfaRule
|
||||||
|
{ rrRegexStart :: Regex
|
||||||
|
, rrRegexMid :: Regex
|
||||||
|
, rrJump :: Int
|
||||||
|
, rrIsToken :: Bool
|
||||||
|
}
|
||||||
|
|
||||||
|
redfaOption :: Parser RedfaOption
|
||||||
|
redfaOption =
|
||||||
|
RedfaOptionRules <$>
|
||||||
|
some
|
||||||
|
(strOption $
|
||||||
|
metavar "RULE" <>
|
||||||
|
short 'L' <>
|
||||||
|
long "lex" <> help "Lexing rule (specify repeatedly for more rules)") <|>
|
||||||
|
RedfaOptionFile <$>
|
||||||
|
strOption
|
||||||
|
(metavar "FILE" <>
|
||||||
|
short 'l' <>
|
||||||
|
long "lexer" <> help "File from where to load the lexing rules")
|
||||||
|
|
||||||
|
redfaOptionToRuleStrings :: RedfaOption -> IO [BS]
|
||||||
|
redfaOptionToRuleStrings (RedfaOptionRules x) = return x
|
||||||
|
redfaOptionToRuleStrings (RedfaOptionFile fn) =
|
||||||
|
B8.lines <$> B.readFile fn -- TODO improve
|
||||||
|
|
||||||
|
splitFirst :: Char -> BS -> (BS, BS)
|
||||||
|
splitFirst c s = B.splitAt (fromMaybe (B.length s) $ B8.elemIndex c s) s
|
||||||
|
|
||||||
|
redfaRuleStringToRuleStr :: BS -> Maybe (BS, BS, BS, Bool)
|
||||||
|
redfaRuleStringToRuleStr s =
|
||||||
|
let (spec, regex) = splitFirst ':' s
|
||||||
|
(from, to) = splitFirst '>' spec
|
||||||
|
sf = B8.strip from
|
||||||
|
(cleanFrom, isToken)
|
||||||
|
| B.null sf = (sf, True)
|
||||||
|
| B.head sf == fromIntegral (fromEnum '_') = (B.tail sf, False)
|
||||||
|
| otherwise = (sf, True)
|
||||||
|
go
|
||||||
|
| B.null s = Nothing
|
||||||
|
| B.head s == 35 = Nothing -- # comment
|
||||||
|
| B.null regex = Just (B.empty, B.empty, spec, isToken)
|
||||||
|
| B.null to = Just (cleanFrom, cleanFrom, B.tail regex, isToken)
|
||||||
|
| otherwise =
|
||||||
|
Just (cleanFrom, B8.strip $ B.tail to, B.tail regex, isToken)
|
||||||
|
in go
|
||||||
|
|
||||||
|
unescapeRegex :: MonadFail m => BS -> m BS
|
||||||
|
unescapeRegex s = BL.toStrict . BB.toLazyByteString <$> unescape' s
|
||||||
|
where
|
||||||
|
unescape' :: MonadFail m => BS -> m BB.Builder
|
||||||
|
unescape' s
|
||||||
|
| B.null s = return mempty
|
||||||
|
| B.head s == BI.c2w '\\' && B.null (B.tail s) =
|
||||||
|
fail "incomplete escape sequence"
|
||||||
|
| B.head s == BI.c2w '\\' =
|
||||||
|
let rest = B.tail s
|
||||||
|
cc = B.head rest
|
||||||
|
thechar =
|
||||||
|
BB.stringUtf8 $
|
||||||
|
case BI.w2c cc of
|
||||||
|
'`' -> "\\`"
|
||||||
|
'\'' -> "\\'"
|
||||||
|
'b' -> "\\b"
|
||||||
|
'B' -> "\\B"
|
||||||
|
'<' -> "\\<"
|
||||||
|
'>' -> "\\>"
|
||||||
|
'a' -> "\a"
|
||||||
|
'e' -> "\x1b"
|
||||||
|
'f' -> "\f"
|
||||||
|
'n' -> "\n"
|
||||||
|
'r' -> "\r"
|
||||||
|
't' -> "\t"
|
||||||
|
'\\' -> "\\\\"
|
||||||
|
a -> [a] --TODO add support for \x and \u
|
||||||
|
in (thechar <>) <$> unescape' (B.tail rest)
|
||||||
|
| otherwise = mappend (BB.word8 $ B.head s) <$> unescape' (B.tail s)
|
||||||
|
|
||||||
|
redfaPrepareRules :: RedfaOption -> IO RedfaSpec
|
||||||
|
redfaPrepareRules opt = do
|
||||||
|
(states, jumps, regexes, isToken) <-
|
||||||
|
unzip4 . mapMaybe redfaRuleStringToRuleStr <$>
|
||||||
|
redfaOptionToRuleStrings opt
|
||||||
|
uRegexes <- traverse unescapeRegex regexes
|
||||||
|
startREs <- traverse (makeRegexM . (fromString "\\`" <>)) uRegexes
|
||||||
|
midREs <- traverse (makeRegexM . (fromString "\\`(.|\n)" <>)) uRegexes
|
||||||
|
let ids = nub . sort $ states ++ jumps
|
||||||
|
index' = fromJust . flip elemIndex ids
|
||||||
|
statesIds = map index' states
|
||||||
|
jumpsIds = map index' jumps
|
||||||
|
start = index' $ head states
|
||||||
|
rules :: [[RedfaRule]]
|
||||||
|
rules = do
|
||||||
|
stateId <- [0 .. length ids - 1]
|
||||||
|
return $ do
|
||||||
|
(a, b, (rs, rm), t) <-
|
||||||
|
zip4 statesIds jumpsIds (zip startREs midREs) isToken
|
||||||
|
guard $ a == stateId
|
||||||
|
return $ RedfaRule rs rm b t
|
||||||
|
return $ RedfaSpec start (V.fromList ids) (V.fromList rules)
|
||||||
|
|
||||||
|
redfaTokenize :: MonadFail m => RedfaSpec -> BS -> m [Tok]
|
||||||
|
redfaTokenize spec s = redfaTokenize' spec s (redfaStart spec) 0 []
|
||||||
|
|
||||||
|
redfaTokenize' ::
|
||||||
|
MonadFail m
|
||||||
|
=> RedfaSpec
|
||||||
|
-> BS
|
||||||
|
-> Int
|
||||||
|
-> Int
|
||||||
|
-> [Int]
|
||||||
|
-> m [Tok]
|
||||||
|
redfaTokenize' spec s state off visited
|
||||||
|
| off >= B.length s = pure []
|
||||||
|
| otherwise =
|
||||||
|
let (ooff, reg) =
|
||||||
|
if off == 0
|
||||||
|
then (0, rrRegexStart)
|
||||||
|
else (1, rrRegexMid)
|
||||||
|
matchString = B.drop (off - ooff) s
|
||||||
|
matches :: [(RedfaRule, (MatchOffset, MatchLength))]
|
||||||
|
matches =
|
||||||
|
filter contOK $
|
||||||
|
zip rules $ map (\x -> match (reg x) matchString) rules
|
||||||
|
contOK (RedfaRule {rrJump = j}, (off', len))
|
||||||
|
| off' /= 0 = False
|
||||||
|
| len > ooff = True
|
||||||
|
| otherwise = j `notElem` visited
|
||||||
|
in case matches of
|
||||||
|
[] ->
|
||||||
|
fail $
|
||||||
|
"Tokenization could not continue from " <>
|
||||||
|
stateStrPretty <> " at offset " <> show (off + ooff)
|
||||||
|
((rule, (_, len)):_) ->
|
||||||
|
let matchLen = len - ooff
|
||||||
|
in (if matchLen > 0
|
||||||
|
then (:) (rrIsToken rule, substr off matchLen s)
|
||||||
|
else id) <$>
|
||||||
|
redfaTokenize'
|
||||||
|
spec
|
||||||
|
s
|
||||||
|
(rrJump rule)
|
||||||
|
(off + matchLen)
|
||||||
|
(if matchLen > 0
|
||||||
|
then []
|
||||||
|
else state : visited)
|
||||||
|
where
|
||||||
|
rules = redfaRules spec V.! state
|
||||||
|
stateStr = redfaNames spec V.! state
|
||||||
|
stateStrPretty
|
||||||
|
| B.null stateStr = "anonymous state"
|
||||||
|
| otherwise = "state `" <> toString stateStr <> "'"
|
7
src/Substr.hs
Normal file
7
src/Substr.hs
Normal file
|
@ -0,0 +1,7 @@
|
||||||
|
module Substr where
|
||||||
|
|
||||||
|
import Types
|
||||||
|
import qualified Data.ByteString as B
|
||||||
|
|
||||||
|
substr :: Int -> Int -> BS -> BS
|
||||||
|
substr b l = B.take l . B.drop b
|
|
@ -1,114 +0,0 @@
|
||||||
{-# LANGUAGE Rank2Types #-}
|
|
||||||
{-# LANGUAGE TupleSections #-}
|
|
||||||
|
|
||||||
module Tokenizers (TokOpts, tokOptions, tokenize) where
|
|
||||||
|
|
||||||
import qualified Data.ByteString as B
|
|
||||||
import qualified Data.ByteString.UTF8 as BU8
|
|
||||||
import qualified Data.Word8 as W8
|
|
||||||
import Options.Applicative
|
|
||||||
import Types
|
|
||||||
import qualified Unicode.Char.General as UC
|
|
||||||
|
|
||||||
data TokOpts =
|
|
||||||
TokOpts
|
|
||||||
{ optTokenizerName :: String
|
|
||||||
}
|
|
||||||
deriving (Show)
|
|
||||||
|
|
||||||
tokOptions :: Parser TokOpts
|
|
||||||
tokOptions =
|
|
||||||
TokOpts <$>
|
|
||||||
strOption
|
|
||||||
(metavar "LEXER" <>
|
|
||||||
short 'l' <>
|
|
||||||
long "lexer" <>
|
|
||||||
value "text" <>
|
|
||||||
help "Lexer name (use --help-lexers to list available ones)")
|
|
||||||
|
|
||||||
--TODO this should later allow choosing the lexer by suffix
|
|
||||||
tokenize :: MonadFail m => TokOpts -> FilePath -> BS -> m [Tok]
|
|
||||||
tokenize topt _ =
|
|
||||||
case filter ((== optTokenizerName topt) . tkName) tokenizers of
|
|
||||||
[t] -> runTokenizer t
|
|
||||||
_ -> const $ fail "Lexer not found"
|
|
||||||
|
|
||||||
data Tokenizer =
|
|
||||||
Tokenizer
|
|
||||||
{ tkName :: String
|
|
||||||
, tkDescription :: String
|
|
||||||
, runTokenizer :: forall m. MonadFail m =>
|
|
||||||
BS -> m [Tok]
|
|
||||||
}
|
|
||||||
|
|
||||||
tokenizers :: [Tokenizer]
|
|
||||||
tokenizers =
|
|
||||||
[ Tokenizer "lines" "works like the traditional diff" tokenizeLines
|
|
||||||
, Tokenizer "asciiwords" "separate by ASCII whitespace" tokenizeWords8
|
|
||||||
, Tokenizer "words" "separate on any UTF8 spacing" tokenizeWords
|
|
||||||
, Tokenizer
|
|
||||||
"text"
|
|
||||||
"separate groups of similar-class UTF8 characters"
|
|
||||||
tokenizeUnicode
|
|
||||||
]
|
|
||||||
|
|
||||||
tokenizeLines :: MonadFail m => BS -> m [Tok]
|
|
||||||
tokenizeLines = pure . map (True, ) . BU8.lines'
|
|
||||||
|
|
||||||
tokenizeWords8 :: MonadFail m => BS -> m [Tok]
|
|
||||||
tokenizeWords8 =
|
|
||||||
pure . map (\x -> (not $ W8.isSpace $ B.head x, x)) . bsGroupOn W8.isSpace
|
|
||||||
where
|
|
||||||
bsGroupOn f = B.groupBy (\l r -> f l == f r)
|
|
||||||
|
|
||||||
tokenizeWords :: MonadFail m => BS -> m [Tok]
|
|
||||||
tokenizeWords = pure . map makeTokenBU8 . groupOnBU8 UC.isWhiteSpace
|
|
||||||
|
|
||||||
makeTokenBU8 :: BS -> Tok
|
|
||||||
makeTokenBU8 s = if
|
|
||||||
maybe False (not . UC.isWhiteSpace . fst) (BU8.decode s)
|
|
||||||
then (True, s)
|
|
||||||
else (False, s)
|
|
||||||
|
|
||||||
{- needed?
|
|
||||||
foldri :: (Char -> Int -> a -> a) -> a -> BS -> a
|
|
||||||
foldri cons nil cs =
|
|
||||||
case decode cs of
|
|
||||||
Just (a, l) -> cons a l (foldrS cons nil $ drop l cs)
|
|
||||||
Nothing -> nil
|
|
||||||
-}
|
|
||||||
|
|
||||||
spanBU8 :: (Char -> Bool) -> BS -> (BS, BS)
|
|
||||||
spanBU8 f orig = B.splitAt (go 0 orig) orig
|
|
||||||
where
|
|
||||||
go :: Int -> BS -> Int
|
|
||||||
go n bs =
|
|
||||||
case BU8.decode bs of
|
|
||||||
Just (a, l) ->
|
|
||||||
if f a
|
|
||||||
then go (n + l) (B.drop l bs)
|
|
||||||
else n
|
|
||||||
Nothing -> n
|
|
||||||
|
|
||||||
groupByBU8 :: (Char -> Char -> Bool) -> BS -> [BS]
|
|
||||||
groupByBU8 f s =
|
|
||||||
case BU8.decode s of
|
|
||||||
Just (a, _) ->
|
|
||||||
let (g, s') = spanBU8 (f a) s
|
|
||||||
in g : groupByBU8 f s'
|
|
||||||
Nothing -> []
|
|
||||||
|
|
||||||
groupOnBU8 :: Eq a => (Char -> a) -> BS -> [BS]
|
|
||||||
groupOnBU8 f = groupByBU8 (\l r -> f l == f r)
|
|
||||||
|
|
||||||
tokenizeUnicode :: MonadFail m => BS -> m [Tok]
|
|
||||||
tokenizeUnicode = pure . map makeTokenBU8 . groupOnBU8 simpleUnicodeCategory
|
|
||||||
|
|
||||||
simpleUnicodeCategory :: Char -> Int
|
|
||||||
simpleUnicodeCategory c
|
|
||||||
| UC.isWhiteSpace c = 1
|
|
||||||
| UC.isAlphaNum c = 2
|
|
||||||
| UC.isSymbol c = 3
|
|
||||||
| UC.isPunctuation c = 4
|
|
||||||
| UC.isSeparator c = 5
|
|
||||||
| otherwise = 0
|
|
|
@ -1,5 +1,3 @@
|
||||||
--{-# LANGUAGE BangPatterns #-}
|
|
||||||
|
|
||||||
module Types where
|
module Types where
|
||||||
|
|
||||||
import Data.ByteString
|
import Data.ByteString
|
||||||
|
@ -7,7 +5,6 @@ import Data.Vector
|
||||||
|
|
||||||
type BS = ByteString
|
type BS = ByteString
|
||||||
|
|
||||||
{- TODO: all this needs to get unboxed -}
|
|
||||||
type Tok = (Bool, BS)
|
type Tok = (Bool, BS)
|
||||||
|
|
||||||
type TV = Vector Tok
|
type TV = Vector Tok
|
||||||
|
@ -24,9 +21,3 @@ data Op
|
||||||
| Original
|
| Original
|
||||||
| YourChanged
|
| YourChanged
|
||||||
deriving (Show, Eq)
|
deriving (Show, Eq)
|
||||||
|
|
||||||
data Origin
|
|
||||||
= Stable
|
|
||||||
| Mine
|
|
||||||
| Your
|
|
||||||
deriving (Show, Eq)
|
|
||||||
|
|
|
@ -7,7 +7,6 @@ import Options.Applicative
|
||||||
adiffVersion :: String
|
adiffVersion :: String
|
||||||
adiffVersion = VERSION_adiff
|
adiffVersion = VERSION_adiff
|
||||||
|
|
||||||
versionOption :: String -> Parser (a -> a)
|
|
||||||
versionOption prog =
|
versionOption prog =
|
||||||
infoOption
|
infoOption
|
||||||
(prog <>
|
(prog <>
|
||||||
|
|
Loading…
Reference in a new issue