Skip to content

Commit edfd150

Browse files
authored
Merge pull request #1130 from agarwalhimanshugaya/main
add-z-algorithm
2 parents 1273db0 + fe15941 commit edfd150

File tree

1 file changed

+392
-0
lines changed

1 file changed

+392
-0
lines changed
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,392 @@
1+
---
2+
id: Z Algorithm
3+
title: Z Algorithm
4+
sidebar_label: Z-Algorithm
5+
tags:
6+
- Intermediate
7+
- String Matching Algorithms
8+
- CPP
9+
- Python
10+
- Java
11+
- DSA
12+
description: "This is a solution for the string matching in linear time using Z algorithm."
13+
---
14+
15+
## 1. What is the Z Algorithm?
16+
17+
This algorithm finds all occurrences of a pattern in a text in linear time. Let length of text be n and of pattern be m, then total time taken is `O(m + n)` with linear space complexity. Now we can see that both time and space complexity is same as KMP algorithm but this algorithm is Simpler to understand.
18+
In this algorithm, we construct a Z array.
19+
20+
## 2. What is Z Array?
21+
22+
For a string `str[0..n-1]`, Z array is of same length as string. An element Z[i] of Z array stores length of the longest substring starting from `str[i]` which is also a prefix of `str[0..n-1]`. The first entry of Z array is meaning less as complete string is always prefix of itself.
23+
24+
#### Examples
25+
```
26+
Index 0 1 2 3 4 5 6 7 8 9 10 11
27+
Text a a b c a a b x a a a z
28+
Z values X 1 0 0 3 1 0 0 2 2 1 0
29+
```
30+
## 3. How to construct Z array?
31+
32+
A Simple Solution is to run two nested loops, the outer loop goes to every index and the inner loop finds length of the longest prefix that matches the substring starting at the current index. The time complexity of this solution is $O(n2)$.
33+
We can construct Z array in linear time.
34+
35+
```
36+
The idea is to maintain an interval [L, R] which is the interval with max R
37+
such that [L,R] is prefix substring (substring which is also prefix).
38+
39+
Steps for maintaining this interval are as follows –
40+
41+
1) If i > R then there is no prefix substring that starts before i and
42+
ends after i, so we reset L and R and compute new [L,R] by comparing
43+
str[0..] to str[i..] and get Z[i] (= R-L+1).
44+
45+
2) If i <= R then let K = i-L, now Z[i] >= min(Z[K], R-i+1) because
46+
str[i..] matches with str[K..] for atleast R-i+1 characters (they are in
47+
[L,R] interval which we know is a prefix substring).
48+
Now two sub cases arise –
49+
a) If Z[K] < R-i+1 then there is no prefix substring starting at
50+
str[i] (otherwise Z[K] would be larger) so Z[i] = Z[K] and
51+
interval [L,R] remains same.
52+
b) If Z[K] >= R-i+1 then it is possible to extend the [L,R] interval
53+
thus we will set L as i and start matching from str[R] onwards and
54+
get new R then we will update interval [L,R] and calculate Z[i] (=R-L+1).
55+
56+
```
57+
58+
59+
60+
61+
## 4. Problem Description
62+
63+
Given a text string and a pattern string, implement the Z algorithm to find all occurrences of the pattern in the text.
64+
65+
## 5. Examples
66+
67+
**Example 1:**
68+
```
69+
Input: text = "GEEKS FOR GEEKS", pattern = "GEEK"
70+
Output: Pattern found at index 0, Pattern found at index 10
71+
```
72+
73+
**Example 2:**
74+
```
75+
Input: text = "ABABDABACDABABCABAB", pattern = "ABAB"
76+
Output: Pattern found at index 0, Pattern found at index 10, Pattern found at index 12
77+
```
78+
79+
**Explanation of Example 1:**
80+
- The pattern "GEEK" is found in the text "GEEKS FOR GEEKS" starting from index 0 and index 10.
81+
82+
## 6. Constraints
83+
84+
- $The text and pattern can contain any number of characters.$
85+
- $All characters are ASCII characters.$
86+
87+
## 7. Implementation
88+
89+
<Tabs>
90+
<TabItem value="Pyhon" label="Python" default>
91+
```python
92+
def getZarr(string, z):
93+
n = len(string)
94+
95+
# [L,R] make a window which matches
96+
# with prefix of s
97+
l, r, k = 0, 0, 0
98+
for i in range(1, n):
99+
100+
# if i>R nothing matches so we will calculate.
101+
# Z[i] using naive way.
102+
if i > r:
103+
l, r = i, i
104+
105+
# R-L = 0 in starting, so it will start
106+
# checking from 0'th index. For example,
107+
# for "ababab" and i = 1, the value of R
108+
# remains 0 and Z[i] becomes 0. For string
109+
# "aaaaaa" and i = 1, Z[i] and R become 5
110+
while r < n and string[r - l] == string[r]:
111+
r += 1
112+
z[i] = r - l
113+
r -= 1
114+
else:
115+
116+
# k = i-L so k corresponds to number which
117+
# matches in [L,R] interval.
118+
k = i - l
119+
120+
# if Z[k] is less than remaining interval
121+
# then Z[i] will be equal to Z[k].
122+
# For example, str = "ababab", i = 3, R = 5
123+
# and L = 2
124+
if z[k] < r - i + 1:
125+
z[i] = z[k]
126+
127+
# For example str = "aaaaaa" and i = 2,
128+
# R is 5, L is 0
129+
else:
130+
131+
# else start from R and check manually
132+
l = i
133+
while r < n and string[r - l] == string[r]:
134+
r += 1
135+
z[i] = r - l
136+
r -= 1
137+
138+
# prints all occurrences of pattern
139+
# in text using Z algo
140+
def search(text, pattern):
141+
142+
# Create concatenated string "P$T"
143+
concat = pattern + "$" + text
144+
l = len(concat)
145+
146+
# Construct Z array
147+
z = [0] * l
148+
getZarr(concat, z)
149+
150+
# now looping through Z array for matching condition
151+
for i in range(l):
152+
153+
# if Z[i] (matched region) is equal to pattern
154+
# length we got the pattern
155+
if z[i] == len(pattern):
156+
print("Pattern found at index",
157+
i - len(pattern) - 1)
158+
159+
160+
if __name__ == "__main__":
161+
text = "GEEKS FOR GEEKS"
162+
pattern = "GEEK"
163+
search(text, pattern)
164+
165+
166+
```
167+
</TabItem>
168+
169+
<TabItem value="C++" label="C++">
170+
```cpp
171+
#include<iostream>
172+
using namespace std;
173+
174+
void getZarr(string str, int Z[]);
175+
176+
// prints all occurrences of pattern in text using Z algo
177+
void search(string text, string pattern)
178+
{
179+
// Create concatenated string "P$T"
180+
string concat = pattern + "$" + text;
181+
int l = concat.length();
182+
183+
// Construct Z array
184+
int Z[l];
185+
getZarr(concat, Z);
186+
187+
// now looping through Z array for matching condition
188+
for (int i = 0; i < l; ++i)
189+
{
190+
// if Z[i] (matched region) is equal to pattern
191+
// length we got the pattern
192+
if (Z[i] == pattern.length())
193+
cout << "Pattern found at index "
194+
<< i - pattern.length() -1 << endl;
195+
}
196+
}
197+
198+
// Fills Z array for given string str[]
199+
void getZarr(string str, int Z[])
200+
{
201+
int n = str.length();
202+
int L, R, k;
203+
204+
// [L,R] make a window which matches with prefix of s
205+
L = R = 0;
206+
for (int i = 1; i < n; ++i)
207+
{
208+
// if i>R nothing matches so we will calculate.
209+
// Z[i] using naive way.
210+
if (i > R)
211+
{
212+
L = R = i;
213+
214+
// R-L = 0 in starting, so it will start
215+
// checking from 0'th index. For example,
216+
// for "ababab" and i = 1, the value of R
217+
// remains 0 and Z[i] becomes 0. For string
218+
// "aaaaaa" and i = 1, Z[i] and R become 5
219+
while (R<n && str[R-L] == str[R])
220+
R++;
221+
Z[i] = R-L;
222+
R--;
223+
}
224+
else
225+
{
226+
// k = i-L so k corresponds to number which
227+
// matches in [L,R] interval.
228+
k = i-L;
229+
230+
// if Z[k] is less than remaining interval
231+
// then Z[i] will be equal to Z[k].
232+
// For example, str = "ababab", i = 3, R = 5
233+
// and L = 2
234+
if (Z[k] < R-i+1)
235+
Z[i] = Z[k];
236+
237+
// For example str = "aaaaaa" and i = 2, R is 5,
238+
// L is 0
239+
else
240+
{
241+
// else start from R and check manually
242+
L = i;
243+
while (R<n && str[R-L] == str[R])
244+
R++;
245+
Z[i] = R-L;
246+
R--;
247+
}
248+
}
249+
}
250+
}
251+
252+
// Driver program
253+
int main()
254+
{
255+
string text = "GEEKS FOR GEEKS";
256+
string pattern = "GEEK";
257+
search(text, pattern);
258+
return 0;
259+
}
260+
261+
```
262+
</TabItem>
263+
264+
<TabItem value="Java" label="Java">
265+
```java
266+
267+
class GFG {
268+
269+
// prints all occurrences of pattern in text using
270+
// Z algo
271+
public static void search(String text, String pattern)
272+
{
273+
274+
// Create concatenated string "P$T"
275+
String concat = pattern + "$" + text;
276+
277+
int l = concat.length();
278+
279+
int Z[] = new int[l];
280+
281+
// Construct Z array
282+
getZarr(concat, Z);
283+
284+
// now looping through Z array for matching condition
285+
for(int i = 0; i < l; ++i){
286+
287+
// if Z[i] (matched region) is equal to pattern
288+
// length we got the pattern
289+
290+
if(Z[i] == pattern.length()){
291+
System.out.println("Pattern found at index "
292+
+ (i - pattern.length() - 1));
293+
}
294+
}
295+
}
296+
297+
// Fills Z array for given string str[]
298+
private static void getZarr(String str, int[] Z) {
299+
300+
int n = str.length();
301+
302+
// [L,R] make a window which matches with
303+
// prefix of s
304+
int L = 0, R = 0;
305+
306+
for(int i = 1; i < n; ++i) {
307+
308+
// if i>R nothing matches so we will calculate.
309+
// Z[i] using naive way.
310+
if(i > R){
311+
312+
L = R = i;
313+
314+
// R-L = 0 in starting, so it will start
315+
// checking from 0'th index. For example,
316+
// for "ababab" and i = 1, the value of R
317+
// remains 0 and Z[i] becomes 0. For string
318+
// "aaaaaa" and i = 1, Z[i] and R become 5
319+
320+
while(R < n && str.charAt(R - L) == str.charAt(R))
321+
R++;
322+
323+
Z[i] = R - L;
324+
R--;
325+
326+
}
327+
else{
328+
329+
// k = i-L so k corresponds to number which
330+
// matches in [L,R] interval.
331+
int k = i - L;
332+
333+
// if Z[k] is less than remaining interval
334+
// then Z[i] will be equal to Z[k].
335+
// For example, str = "ababab", i = 3, R = 5
336+
// and L = 2
337+
if(Z[k] < R - i + 1)
338+
Z[i] = Z[k];
339+
340+
// For example str = "aaaaaa" and i = 2, R is 5,
341+
// L is 0
342+
else{
343+
344+
345+
// else start from R and check manually
346+
L = i;
347+
while(R < n && str.charAt(R - L) == str.charAt(R))
348+
R++;
349+
350+
Z[i] = R - L;
351+
R--;
352+
}
353+
}
354+
}
355+
}
356+
357+
// Driver program
358+
public static void main(String[] args)
359+
{
360+
String text = "GEEKS FOR GEEKS";
361+
String pattern = "GEEK";
362+
363+
search(text, pattern);
364+
}
365+
}
366+
367+
```
368+
</TabItem>
369+
</Tabs>
370+
371+
## 8. Complexity Analysis
372+
373+
- **Time Complexity**:
374+
- Average and Best Case: $O(N + M)$
375+
376+
- **Space Complexity**: $O(N)$ .
377+
378+
## 9. Advantages and Disadvantages
379+
380+
**Advantages:**
381+
- Efficient on average with good hash functions.
382+
- Suitable for multiple pattern searches in a single text.
383+
384+
**Disadvantages
385+
386+
:**
387+
- Hash collisions can degrade performance to $O(N * M)$.
388+
- Requires a good hash function to minimize collisions.
389+
390+
## 10. References
391+
392+
- **GFG Problem:** [GFG Problem](https://www.geeksforgeeks.org/z-algorithm-linear-time-pattern-searching-algorithm/)

0 commit comments

Comments
 (0)