I'd not to be brainfucked and I write first brute force version. No surprisingly, Time Limit Exceeded.
foreach i in haystack
foreach j in needle
if haystack[i + j] != needle[j]
i = i + 1
retry
found
simple and easy but slow.
I only heard the KMP method, but I have no idea about how it goes. now, I will show how do I understand KMP step by step.
lets say
0 1 2 3 4 5 6
A B C D E F G
A B C E
when
i = 0
j = 3
if haystack[i + j] != needle[j]
i = i + 1
retry
i
jump from 0
to 1
. but wait, why not jump to 3 (i + j)
? comparing 1
and 2
are obviously needless.
e.g.
0 1 2 3 4 5 6
A B A B A C G
A B A C
A B A C
A B A C
i = 0
j = 3
retry from i = 3
? Wrong. it seems i
should jump to 2
instead of 3
0 1 2 3
A B A C
A B A C
i = 0
j = 3
mismatch at 3
indicates that haystack[3 - 1] == needle[3 - 1]
.
In additional, needle[3 - 1] == needle[0]
, so i
jumps to 3
and then goes back 1
char to 2
, in the formula: 0 + 3 - 1 (i + j - 1)
more complex example
0 1 2 3 4 5
A B C A B C
A B C A B C
i = 0
j = 5
mismatch at 5
indicates that haystack[5 - 1] == needle[5 - 1]
.
In additional, needle[5 - 2] == needle[0]
and needle[5 - 1] == needle[1]
, so i
jumps to 5 (i + j)
and then goes back 2
chars to 3
, in the formula: 0 + 5 - 2 (i + j - 2)
From above, needle[i]
should go back P
chars
such that
needle[i - P] == needle[0] and needle[i - (P - 1)] == needle[1] and ...
for n = 0 .. P - 1
needle[i - P + n] == needle[0 + n]
the P
here DO have name. In wikipedia or some other places, it was known as needle's longest proper suffix which is a proper prefix. I do not understand why they like brainfucking ppl :(.
now we can write some easy code to find all P
of needle[i]
to P[i]
, P[i]
here means how many chars should needle[i] go back.
The first brute force P
finding version. Because I do not know how to calculate the P
, I search from max possible value to zero. I hope codes below would help you understand what the P
is
here S.substring(i, j) == S[i] + S[i + 1] + ... S[j - 1]
for i in needle
Possible_P = i - 1
while Possible_P > 0 and needle.substring(0, Possible_P) != needle.substring(i - Possible_P, i)
Possible_P = Possible_P - 1
P[i] = Possible_P
as you can see
needle.substring(0, Possible_P)
is the prefix and needle.substring(i - Possible_P, i)
is the suffix
for i in needle
Possible_P = i - 1
while Possible_P > 0 and needle.substring(0, Possible_P) != needle.substring(i - Possible_P, i)
Possible_P = Possible_P - 1
P[i] = Possible_P
foreach i in haystack
foreach j in needle
if haystack[i + j] != needle[j]
i = i + j - P[i]
i = MAX(i, 1) // at least jump 1 char, think j = 0, infinite looping
retry
found
Time limited again, last P finding method is silly and takes a long time to run.
Each loop in needle only add 1
char, so the maximum value of P[i]
is P[i - 1] + 1
for i in needle
Possible_P = P[i - 1] + 1
while Possible_P > 0 and needle.substring(0, Possible_P) != needle.substring(i - Possible_P, i)
Possible_P = Possible_P - 1
P[i] = Possible_P
and think can the case like P[i - 1] = 10 and P[i] = 6
happens?
No, becasue P[i]
depends on P[i - 1]
.
P[i - 1] = 10
means that in the string of needle[0 .. i - 1 - 1]
, that the fist 10 chars(prefix) of needle[0 .. i - 1 - 1] = last 10 chars(suffix) of needle[0 .. i - 1 - 1]
.
P[i] = 6
means that in the string of needle[0 .. i - 1]
, that the fist 6 chars(prefix) of needle[0 .. i - 1] = last 6 chars(suffix) of needle[0 .. i - 1]
.
So, P[i]
should be P[i - 1] + 1
or 0
for i in needle
Possible_P = P[i - 1] + 1
if needle.substring(0, Possible_P) != needle.substring(i - Possible_P, i)
Possible_P = 0
P[i] = Possible_P
Now, substring is also needless. P[i - 1]
indicates that needle.substring(0, P[i - 1 - 1]) == needle.substring(i - P[i - 1 - 1], i)
.
if any new char(needle[i - 1]
) comes, checking needle[0 + P[i - 1]] == needle[i - 1]
is enough.
e.g.
0 1 2 3 4 5
A B C A B C
A B C A B C
^ ^
m n
when i = 5
P[5 - 1] = 1 (needle[0] == needle[3])
if needle[0 + 1] == needle[5 - 1]
P[5] = P[4] + 1
here we can see m = 0 + P[i - 1]
and n = i - 1
for i in needle
Possible_P = P[i - 1] + 1
if needle[0 + P[i - 1]] != needle[i - 1]
Possible_P = 0
P[i] = Possible_P