May 2016

Manacher algorithm for finding the longest palindrome substring

0. Longest palindrome substring

Palindrome substring refers to a substring in which sequential reading and reverse reading are the same. Given that an original string has and only has one longest palindrome substring, such problems require us to find it. The time complexity is relatively high if the string is directly compared with the inverted string, let alone the relatively long string. If it can be solved in linear time, the time saved will be considerable. Manacher algorithm is a method.

1. Manacher algorithm

This algorithm makes use of the symmetry of palindrome string and the known information to reduce the number of comparisons. This information is an array P (P [i] represents the radius of the palindrome string with i as the center) and max recording the midpoint of the longest palindrome string (then max+P [max] is the midpoint+radius is the subscript of the rightmost character of the longest palindrome string known).

index zero one two three four five six seven eight nine ten eleven twelve thirteen
str b a b c b a b c b a c c b a
P[index] zero one zero three zero four zero two zero zero zero zero zero zero

How to use this array

For example, we need to calculate P [5] now, and we already know that a point max is the midpoint of the longest known text string. This substring is the interval of [str [max-P [max]], str [max+P [max]]. In the example above, max=3, str [3-P [3]] to str [3+P [3]]="babcbab". At this time:

  1. The character index=5 is within the substring, that is, the right subscript of the substring is max+P [max]=3+3=6, 5<6. Then, since index=5 is in the substring, it can be symmetric about max, and the symmetric point is index '=1. Since it is symmetric, the known information about index' can be used: the palindrome string length centered on index 'is P [index']=1 is "bab"; Since the palindrome string with index=5 as the center is symmetrical, the length of the palindrome string is at least 1, that is, the part from str [index-1] to str [index+1]. When calculating the palindrome string with index=5 as the center, you can skip index=5 to min (index=5+1, max+P [max]) This part (the smaller one is because if the length of palindrome string with index=5 as the center exceeds the longest known right boundary, it can only be determined that the segment from 5 to the right boundary max+P [max] is symmetric about 5, but it cannot be determined whether the rightmost substring with the right boundary max+P [max] to 5 as the center is symmetric), and the matching continues from min (index=5+1, max+P [max])+1.
  2. If index>max+P [max], the current position exceeds the position of the known information, it can only be matched with ordinary methods.
  3. Another important problem is that if the string length is even, then a midpoint character will not be found in the above process. The solution is to add a special symbol on both sides of each character in the original string, which is required to be unused in the original string. In this way, even length strings can be changed to odd length strings.