|
| 1 | +--- |
| 2 | +id: Z Algorithm |
| 3 | +title: Z Algorithm |
| 4 | +sidebar_label: Z-Algorithm |
| 5 | +tags: |
| 6 | + - Intermediate |
| 7 | + - String Matching Algorithms |
| 8 | + - CPP |
| 9 | + - Python |
| 10 | + - Java |
| 11 | + - DSA |
| 12 | +description: "This is a solution for the string matching in linear time using Z algorithm." |
| 13 | +--- |
| 14 | + |
| 15 | +## 1. What is the Z Algorithm? |
| 16 | + |
| 17 | +This algorithm finds all occurrences of a pattern in a text in linear time. Let length of text be n and of pattern be m, then total time taken is `O(m + n)` with linear space complexity. Now we can see that both time and space complexity is same as KMP algorithm but this algorithm is Simpler to understand. |
| 18 | +In this algorithm, we construct a Z array. |
| 19 | + |
| 20 | +## 2. What is Z Array? |
| 21 | + |
| 22 | +For a string `str[0..n-1]`, Z array is of same length as string. An element Z[i] of Z array stores length of the longest substring starting from `str[i]` which is also a prefix of `str[0..n-1]`. The first entry of Z array is meaning less as complete string is always prefix of itself. |
| 23 | + |
| 24 | +#### Examples |
| 25 | +``` |
| 26 | +Index 0 1 2 3 4 5 6 7 8 9 10 11 |
| 27 | +Text a a b c a a b x a a a z |
| 28 | +Z values X 1 0 0 3 1 0 0 2 2 1 0 |
| 29 | +``` |
| 30 | +## 3. How to construct Z array? |
| 31 | + |
| 32 | +A Simple Solution is to run two nested loops, the outer loop goes to every index and the inner loop finds length of the longest prefix that matches the substring starting at the current index. The time complexity of this solution is $O(n2)$. |
| 33 | + We can construct Z array in linear time. |
| 34 | + |
| 35 | +``` |
| 36 | +The idea is to maintain an interval [L, R] which is the interval with max R |
| 37 | +such that [L,R] is prefix substring (substring which is also prefix). |
| 38 | +
|
| 39 | +Steps for maintaining this interval are as follows – |
| 40 | +
|
| 41 | +1) If i > R then there is no prefix substring that starts before i and |
| 42 | + ends after i, so we reset L and R and compute new [L,R] by comparing |
| 43 | + str[0..] to str[i..] and get Z[i] (= R-L+1). |
| 44 | +
|
| 45 | +2) If i <= R then let K = i-L, now Z[i] >= min(Z[K], R-i+1) because |
| 46 | + str[i..] matches with str[K..] for atleast R-i+1 characters (they are in |
| 47 | + [L,R] interval which we know is a prefix substring). |
| 48 | + Now two sub cases arise – |
| 49 | + a) If Z[K] < R-i+1 then there is no prefix substring starting at |
| 50 | + str[i] (otherwise Z[K] would be larger) so Z[i] = Z[K] and |
| 51 | + interval [L,R] remains same. |
| 52 | + b) If Z[K] >= R-i+1 then it is possible to extend the [L,R] interval |
| 53 | + thus we will set L as i and start matching from str[R] onwards and |
| 54 | + get new R then we will update interval [L,R] and calculate Z[i] (=R-L+1). |
| 55 | +
|
| 56 | +``` |
| 57 | + |
| 58 | + |
| 59 | + |
| 60 | + |
| 61 | +## 4. Problem Description |
| 62 | + |
| 63 | +Given a text string and a pattern string, implement the Z algorithm to find all occurrences of the pattern in the text. |
| 64 | + |
| 65 | +## 5. Examples |
| 66 | + |
| 67 | +**Example 1:** |
| 68 | +``` |
| 69 | +Input: text = "GEEKS FOR GEEKS", pattern = "GEEK" |
| 70 | +Output: Pattern found at index 0, Pattern found at index 10 |
| 71 | +``` |
| 72 | + |
| 73 | +**Example 2:** |
| 74 | +``` |
| 75 | +Input: text = "ABABDABACDABABCABAB", pattern = "ABAB" |
| 76 | +Output: Pattern found at index 0, Pattern found at index 10, Pattern found at index 12 |
| 77 | +``` |
| 78 | + |
| 79 | +**Explanation of Example 1:** |
| 80 | +- The pattern "GEEK" is found in the text "GEEKS FOR GEEKS" starting from index 0 and index 10. |
| 81 | + |
| 82 | +## 6. Constraints |
| 83 | + |
| 84 | +- $The text and pattern can contain any number of characters.$ |
| 85 | +- $All characters are ASCII characters.$ |
| 86 | + |
| 87 | +## 7. Implementation |
| 88 | + |
| 89 | +<Tabs> |
| 90 | + <TabItem value="Pyhon" label="Python" default> |
| 91 | + ```python |
| 92 | + def getZarr(string, z): |
| 93 | + n = len(string) |
| 94 | + |
| 95 | + # [L,R] make a window which matches |
| 96 | + # with prefix of s |
| 97 | + l, r, k = 0, 0, 0 |
| 98 | + for i in range(1, n): |
| 99 | + |
| 100 | + # if i>R nothing matches so we will calculate. |
| 101 | + # Z[i] using naive way. |
| 102 | + if i > r: |
| 103 | + l, r = i, i |
| 104 | + |
| 105 | + # R-L = 0 in starting, so it will start |
| 106 | + # checking from 0'th index. For example, |
| 107 | + # for "ababab" and i = 1, the value of R |
| 108 | + # remains 0 and Z[i] becomes 0. For string |
| 109 | + # "aaaaaa" and i = 1, Z[i] and R become 5 |
| 110 | + while r < n and string[r - l] == string[r]: |
| 111 | + r += 1 |
| 112 | + z[i] = r - l |
| 113 | + r -= 1 |
| 114 | + else: |
| 115 | + |
| 116 | + # k = i-L so k corresponds to number which |
| 117 | + # matches in [L,R] interval. |
| 118 | + k = i - l |
| 119 | + |
| 120 | + # if Z[k] is less than remaining interval |
| 121 | + # then Z[i] will be equal to Z[k]. |
| 122 | + # For example, str = "ababab", i = 3, R = 5 |
| 123 | + # and L = 2 |
| 124 | + if z[k] < r - i + 1: |
| 125 | + z[i] = z[k] |
| 126 | + |
| 127 | + # For example str = "aaaaaa" and i = 2, |
| 128 | + # R is 5, L is 0 |
| 129 | + else: |
| 130 | + |
| 131 | + # else start from R and check manually |
| 132 | + l = i |
| 133 | + while r < n and string[r - l] == string[r]: |
| 134 | + r += 1 |
| 135 | + z[i] = r - l |
| 136 | + r -= 1 |
| 137 | + |
| 138 | +# prints all occurrences of pattern |
| 139 | +# in text using Z algo |
| 140 | +def search(text, pattern): |
| 141 | + |
| 142 | + # Create concatenated string "P$T" |
| 143 | + concat = pattern + "$" + text |
| 144 | + l = len(concat) |
| 145 | + |
| 146 | + # Construct Z array |
| 147 | + z = [0] * l |
| 148 | + getZarr(concat, z) |
| 149 | + |
| 150 | + # now looping through Z array for matching condition |
| 151 | + for i in range(l): |
| 152 | + |
| 153 | + # if Z[i] (matched region) is equal to pattern |
| 154 | + # length we got the pattern |
| 155 | + if z[i] == len(pattern): |
| 156 | + print("Pattern found at index", |
| 157 | + i - len(pattern) - 1) |
| 158 | + |
| 159 | + |
| 160 | + if __name__ == "__main__": |
| 161 | + text = "GEEKS FOR GEEKS" |
| 162 | + pattern = "GEEK" |
| 163 | + search(text, pattern) |
| 164 | + |
| 165 | + |
| 166 | +``` |
| 167 | +</TabItem> |
| 168 | + |
| 169 | + <TabItem value="C++" label="C++"> |
| 170 | + ```cpp |
| 171 | +#include<iostream> |
| 172 | +using namespace std; |
| 173 | + |
| 174 | +void getZarr(string str, int Z[]); |
| 175 | + |
| 176 | +// prints all occurrences of pattern in text using Z algo |
| 177 | +void search(string text, string pattern) |
| 178 | +{ |
| 179 | + // Create concatenated string "P$T" |
| 180 | + string concat = pattern + "$" + text; |
| 181 | + int l = concat.length(); |
| 182 | + |
| 183 | + // Construct Z array |
| 184 | + int Z[l]; |
| 185 | + getZarr(concat, Z); |
| 186 | + |
| 187 | + // now looping through Z array for matching condition |
| 188 | + for (int i = 0; i < l; ++i) |
| 189 | + { |
| 190 | + // if Z[i] (matched region) is equal to pattern |
| 191 | + // length we got the pattern |
| 192 | + if (Z[i] == pattern.length()) |
| 193 | + cout << "Pattern found at index " |
| 194 | + << i - pattern.length() -1 << endl; |
| 195 | + } |
| 196 | +} |
| 197 | + |
| 198 | +// Fills Z array for given string str[] |
| 199 | +void getZarr(string str, int Z[]) |
| 200 | +{ |
| 201 | + int n = str.length(); |
| 202 | + int L, R, k; |
| 203 | + |
| 204 | + // [L,R] make a window which matches with prefix of s |
| 205 | + L = R = 0; |
| 206 | + for (int i = 1; i < n; ++i) |
| 207 | + { |
| 208 | + // if i>R nothing matches so we will calculate. |
| 209 | + // Z[i] using naive way. |
| 210 | + if (i > R) |
| 211 | + { |
| 212 | + L = R = i; |
| 213 | + |
| 214 | + // R-L = 0 in starting, so it will start |
| 215 | + // checking from 0'th index. For example, |
| 216 | + // for "ababab" and i = 1, the value of R |
| 217 | + // remains 0 and Z[i] becomes 0. For string |
| 218 | + // "aaaaaa" and i = 1, Z[i] and R become 5 |
| 219 | + while (R<n && str[R-L] == str[R]) |
| 220 | + R++; |
| 221 | + Z[i] = R-L; |
| 222 | + R--; |
| 223 | + } |
| 224 | + else |
| 225 | + { |
| 226 | + // k = i-L so k corresponds to number which |
| 227 | + // matches in [L,R] interval. |
| 228 | + k = i-L; |
| 229 | + |
| 230 | + // if Z[k] is less than remaining interval |
| 231 | + // then Z[i] will be equal to Z[k]. |
| 232 | + // For example, str = "ababab", i = 3, R = 5 |
| 233 | + // and L = 2 |
| 234 | + if (Z[k] < R-i+1) |
| 235 | + Z[i] = Z[k]; |
| 236 | + |
| 237 | + // For example str = "aaaaaa" and i = 2, R is 5, |
| 238 | + // L is 0 |
| 239 | + else |
| 240 | + { |
| 241 | + // else start from R and check manually |
| 242 | + L = i; |
| 243 | + while (R<n && str[R-L] == str[R]) |
| 244 | + R++; |
| 245 | + Z[i] = R-L; |
| 246 | + R--; |
| 247 | + } |
| 248 | + } |
| 249 | + } |
| 250 | +} |
| 251 | + |
| 252 | +// Driver program |
| 253 | +int main() |
| 254 | +{ |
| 255 | + string text = "GEEKS FOR GEEKS"; |
| 256 | + string pattern = "GEEK"; |
| 257 | + search(text, pattern); |
| 258 | + return 0; |
| 259 | +} |
| 260 | + |
| 261 | + ``` |
| 262 | + </TabItem> |
| 263 | +
|
| 264 | + <TabItem value="Java" label="Java"> |
| 265 | + ```java |
| 266 | +
|
| 267 | +class GFG { |
| 268 | +
|
| 269 | + // prints all occurrences of pattern in text using |
| 270 | + // Z algo |
| 271 | + public static void search(String text, String pattern) |
| 272 | + { |
| 273 | +
|
| 274 | + // Create concatenated string "P$T" |
| 275 | + String concat = pattern + "$" + text; |
| 276 | +
|
| 277 | + int l = concat.length(); |
| 278 | +
|
| 279 | + int Z[] = new int[l]; |
| 280 | +
|
| 281 | + // Construct Z array |
| 282 | + getZarr(concat, Z); |
| 283 | +
|
| 284 | + // now looping through Z array for matching condition |
| 285 | + for(int i = 0; i < l; ++i){ |
| 286 | +
|
| 287 | + // if Z[i] (matched region) is equal to pattern |
| 288 | + // length we got the pattern |
| 289 | +
|
| 290 | + if(Z[i] == pattern.length()){ |
| 291 | + System.out.println("Pattern found at index " |
| 292 | + + (i - pattern.length() - 1)); |
| 293 | + } |
| 294 | + } |
| 295 | + } |
| 296 | +
|
| 297 | + // Fills Z array for given string str[] |
| 298 | + private static void getZarr(String str, int[] Z) { |
| 299 | +
|
| 300 | + int n = str.length(); |
| 301 | + |
| 302 | + // [L,R] make a window which matches with |
| 303 | + // prefix of s |
| 304 | + int L = 0, R = 0; |
| 305 | +
|
| 306 | + for(int i = 1; i < n; ++i) { |
| 307 | +
|
| 308 | + // if i>R nothing matches so we will calculate. |
| 309 | + // Z[i] using naive way. |
| 310 | + if(i > R){ |
| 311 | +
|
| 312 | + L = R = i; |
| 313 | +
|
| 314 | + // R-L = 0 in starting, so it will start |
| 315 | + // checking from 0'th index. For example, |
| 316 | + // for "ababab" and i = 1, the value of R |
| 317 | + // remains 0 and Z[i] becomes 0. For string |
| 318 | + // "aaaaaa" and i = 1, Z[i] and R become 5 |
| 319 | +
|
| 320 | + while(R < n && str.charAt(R - L) == str.charAt(R)) |
| 321 | + R++; |
| 322 | + |
| 323 | + Z[i] = R - L; |
| 324 | + R--; |
| 325 | +
|
| 326 | + } |
| 327 | + else{ |
| 328 | +
|
| 329 | + // k = i-L so k corresponds to number which |
| 330 | + // matches in [L,R] interval. |
| 331 | + int k = i - L; |
| 332 | +
|
| 333 | + // if Z[k] is less than remaining interval |
| 334 | + // then Z[i] will be equal to Z[k]. |
| 335 | + // For example, str = "ababab", i = 3, R = 5 |
| 336 | + // and L = 2 |
| 337 | + if(Z[k] < R - i + 1) |
| 338 | + Z[i] = Z[k]; |
| 339 | +
|
| 340 | + // For example str = "aaaaaa" and i = 2, R is 5, |
| 341 | + // L is 0 |
| 342 | + else{ |
| 343 | +
|
| 344 | +
|
| 345 | + // else start from R and check manually |
| 346 | + L = i; |
| 347 | + while(R < n && str.charAt(R - L) == str.charAt(R)) |
| 348 | + R++; |
| 349 | + |
| 350 | + Z[i] = R - L; |
| 351 | + R--; |
| 352 | + } |
| 353 | + } |
| 354 | + } |
| 355 | + } |
| 356 | + |
| 357 | + // Driver program |
| 358 | + public static void main(String[] args) |
| 359 | + { |
| 360 | + String text = "GEEKS FOR GEEKS"; |
| 361 | + String pattern = "GEEK"; |
| 362 | +
|
| 363 | + search(text, pattern); |
| 364 | + } |
| 365 | +} |
| 366 | +
|
| 367 | + ``` |
| 368 | + </TabItem> |
| 369 | +</Tabs> |
| 370 | + |
| 371 | +## 8. Complexity Analysis |
| 372 | + |
| 373 | +- **Time Complexity**: |
| 374 | + - Average and Best Case: $O(N + M)$ |
| 375 | + |
| 376 | +- **Space Complexity**: $O(N)$ . |
| 377 | + |
| 378 | +## 9. Advantages and Disadvantages |
| 379 | + |
| 380 | +**Advantages:** |
| 381 | +- Efficient on average with good hash functions. |
| 382 | +- Suitable for multiple pattern searches in a single text. |
| 383 | + |
| 384 | +**Disadvantages |
| 385 | + |
| 386 | +:** |
| 387 | +- Hash collisions can degrade performance to $O(N * M)$. |
| 388 | +- Requires a good hash function to minimize collisions. |
| 389 | + |
| 390 | +## 10. References |
| 391 | + |
| 392 | +- **GFG Problem:** [GFG Problem](https://www.geeksforgeeks.org/z-algorithm-linear-time-pattern-searching-algorithm/) |
0 commit comments