Skip to content

Commit 0c7d3e5

Browse files
Merge branch 'CodeHarborHub:main' into URLShortner
2 parents 862dee5 + c7ebcb9 commit 0c7d3e5

34 files changed

+4423
-22
lines changed
Lines changed: 173 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,173 @@
1+
---
2+
id: Hierarchical Clustering
3+
title: Hierarchical Clustering
4+
sidebar_label: Introduction of Hierarchical Clustering
5+
sidebar_position: 1
6+
tags: [hierarchical clustering, clustering algorithm, machine learning, data analysis, data science, dendrogram, agglomerative clustering, divisive clustering, unsupervised learning, data visualization, career opportunities, personal growth, clustering techniques, data segmentation, exploratory data analysis, machine learning algorithms]
7+
description: In this tutorial, you will learn about Hierarchical Clustering, its importance, what Hierarchical Clustering is, why learn Hierarchical Clustering, how to use Hierarchical Clustering, steps to start using Hierarchical Clustering, and more.
8+
---
9+
10+
### Introduction to Hierarchical Clustering
11+
Hierarchical clustering is a powerful unsupervised learning algorithm used for clustering tasks. Unlike partitioning methods such as K-Means, hierarchical clustering builds a tree-like structure (dendrogram) that captures the nested grouping relationships among data points. This algorithm is intuitive, effective, and widely used for understanding the hierarchical relationships within datasets.
12+
13+
### What is Hierarchical Clustering?
14+
Hierarchical clustering can be divided into two main types:
15+
16+
- **Agglomerative (Bottom-Up) Clustering**: Starts with each data point as an individual cluster and iteratively merges the closest pairs of clusters until a single cluster remains.
17+
- **Divisive (Top-Down) Clustering**: Starts with all data points in a single cluster and recursively splits them into smaller clusters.
18+
19+
:::info
20+
**Leaves**: Represent individual data points.
21+
22+
**Nodes**: Represent clusters formed at different stages of the algorithm.
23+
24+
**Height**: Represents the distance or dissimilarity at which clusters are merged or split.
25+
:::
26+
27+
### Example:
28+
Consider hierarchical clustering for customer segmentation in a retail company. Initially, each customer is a separate cluster. The algorithm merges customers based on purchase behavior and demographics, forming larger clusters. The dendrogram provides a visual representation of how clusters are nested, helping the company understand customer segments at different levels of granularity.
29+
30+
### Advantages of Hierarchical Clustering
31+
Hierarchical clustering offers several advantages:
32+
33+
- **Interpretability**: The dendrogram provides a clear and interpretable visual representation of the nested clustering structure.
34+
- **No Need to Specify Number of Clusters**: Unlike K-Means, hierarchical clustering does not require a predefined number of clusters, allowing for flexible exploration of the data.
35+
- **Deterministic**: The algorithm is deterministic, meaning it produces the same result with each run, given the same data and parameters.
36+
37+
### Example:
38+
In a healthcare setting, hierarchical clustering can group patients based on a mix of symptoms, medical history, and demographics, providing interpretable insights into patient subgroups and their relationships.
39+
40+
### Disadvantages of Hierarchical Clustering
41+
Despite its advantages, hierarchical clustering has limitations:
42+
43+
- **Computational Complexity**: The algorithm can be computationally expensive, especially with large datasets, as it requires computing and updating a distance matrix.
44+
- **Sensitivity to Noise and Outliers**: Hierarchical clustering can be sensitive to noise and outliers, which may lead to the formation of less meaningful clusters.
45+
- **Difficulty in Scaling**: The time complexity of hierarchical clustering makes it challenging to scale to very large datasets.
46+
47+
### Example:
48+
In financial markets, hierarchical clustering of assets based on historical price movements may be impacted by noise and outliers, leading to less stable clustering results.
49+
50+
### Practical Tips for Using Hierarchical Clustering
51+
To maximize the effectiveness of hierarchical clustering:
52+
53+
- **Distance Metrics**: Choose an appropriate distance metric (e.g., Euclidean, Manhattan, or cosine) based on the nature of your data.
54+
- **Linkage Criteria**: Select a suitable linkage criterion (e.g., single, complete, or average linkage) to define how the distance between clusters is computed.
55+
- **Data Preprocessing**: Standardize or normalize your data to ensure that all features contribute equally to the distance calculations.
56+
57+
### Example:
58+
In e-commerce, hierarchical clustering can be used to segment products based on attributes like price, category, and customer ratings. Preprocessing the data to standardize these attributes ensures that the clustering results are meaningful and interpretable.
59+
60+
### Real-World Examples
61+
62+
#### Customer Segmentation
63+
Hierarchical clustering is extensively used in retail for customer segmentation. By analyzing customer demographics, purchase history, and behavior, retailers can understand the hierarchical relationships among customer groups and tailor their marketing strategies accordingly.
64+
65+
#### Gene Expression Analysis
66+
In bioinformatics, hierarchical clustering helps analyze gene expression data by grouping genes with similar expression patterns. This aids in identifying gene functions and understanding the underlying biological processes.
67+
68+
### Difference Between Agglomerative and Divisive Clustering
69+
70+
| Feature | Agglomerative Clustering (Bottom-Up) | Divisive Clustering (Top-Down) |
71+
|---------------------------------|-----------------------------------------|---------------------------------|
72+
| Starting Point | Each data point starts as its own cluster. | All data points start in a single cluster. |
73+
| Process | Iteratively merges the closest pairs of clusters. | Recursively splits the largest clusters. |
74+
| Dendrogram Construction | Built from the leaves (individual points) up to the root (single cluster). | Built from the root (single cluster) down to the leaves (individual points). |
75+
| Complexity | Generally more computationally efficient and widely used. | Typically more computationally intensive and less commonly used. |
76+
| Use Cases | More suitable for large datasets where fine-grained merging is needed. | Can be useful when the top-down approach aligns better with the problem domain. |
77+
78+
### Implementation
79+
To implement and train a hierarchical clustering model, you can use a machine learning library such as scikit-learn. Below are the steps to install the necessary library and train a hierarchical clustering model.
80+
81+
#### Libraries to Download
82+
- `scikit-learn`: This is the primary library for machine learning in Python, including hierarchical clustering implementation.
83+
- `pandas`: Useful for data manipulation and analysis.
84+
- `numpy`: Useful for numerical operations.
85+
86+
You can install these libraries using pip:
87+
88+
```bash
89+
pip install scikit-learn pandas numpy
90+
```
91+
92+
#### Training a Hierarchical Clustering Model
93+
Here’s a step-by-step guide to training a hierarchical clustering model:
94+
95+
**Import Libraries:**
96+
97+
```python
98+
import pandas as pd
99+
import numpy as np
100+
from sklearn.preprocessing import StandardScaler
101+
from sklearn.cluster import AgglomerativeClustering
102+
import matplotlib.pyplot as plt
103+
import scipy.cluster.hierarchy as sch
104+
```
105+
106+
**Load and Prepare Data:**
107+
Assuming you have a dataset in a CSV file:
108+
109+
```python
110+
# Load the dataset
111+
data = pd.read_csv('your_dataset.csv')
112+
113+
# Prepare features (X)
114+
X = data.drop('target_column', axis=1) # replace 'target_column' with the name of your target column if applicable
115+
```
116+
117+
**Feature Scaling:**
118+
119+
```python
120+
scaler = StandardScaler()
121+
X_scaled = scaler.fit_transform(X)
122+
```
123+
124+
**Determine Optimal Number of Clusters:**
125+
Using the dendrogram to visualize the cluster formation:
126+
127+
```python
128+
# Plot Dendrogram
129+
plt.figure(figsize=(10, 7))
130+
dendrogram = sch.dendrogram(sch.linkage(X_scaled, method='ward'))
131+
plt.title('Dendrogram')
132+
plt.xlabel('Samples')
133+
plt.ylabel('Euclidean distances')
134+
plt.show()
135+
```
136+
137+
**Initialize and Train the Hierarchical Clustering Model:**
138+
139+
```python
140+
# Initialize the Hierarchical Clustering model
141+
hc = AgglomerativeClustering(n_clusters=3, affinity='euclidean', linkage='ward') # Choose the appropriate number of clusters
142+
143+
# Train the model
144+
hc.fit(X_scaled)
145+
```
146+
147+
**Evaluate the Model:**
148+
149+
```python
150+
# Predict cluster labels
151+
cluster_labels = hc.labels_
152+
153+
# Optionally, visualize the clusters
154+
plt.scatter(X_scaled[:, 0], X_scaled[:, 1], c=cluster_labels, cmap='rainbow')
155+
plt.title('Clusters')
156+
plt.xlabel('Feature 1')
157+
plt.ylabel('Feature 2')
158+
plt.show()
159+
```
160+
161+
This example demonstrates how to load data, prepare features, scale the features, determine the optimal number of clusters, train a hierarchical clustering model, and visualize the clustering results. You can adjust parameters and the dataset as needed for your specific use case.
162+
163+
### Performance Considerations
164+
165+
#### Scalability and Computational Efficiency
166+
- **Large Datasets**: Hierarchical clustering can be slow with large datasets due to the need to compute and update the distance matrix.
167+
- **Algorithmic Complexity**: Using techniques like approximate hierarchical clustering or limiting the dendrogram depth can improve scalability and efficiency.
168+
169+
### Example:
170+
In geospatial analysis, hierarchical clustering is used to identify patterns in geographical data. Optimizing the algorithm for large-scale geospatial data ensures efficient and accurate clustering, aiding in urban planning and resource allocation.
171+
172+
### Conclusion
173+
Hierarchical clustering is a versatile and powerful unsupervised learning algorithm suitable for a variety of applications. Understanding its strengths, limitations, and proper usage is crucial for effectively applying it to different datasets. By carefully selecting parameters, scaling features, and considering computational efficiency, hierarchical clustering can provide valuable insights and groupings for numerous real-world problems.

dsa-problems/leetcode-problems/0100-0199.md

Lines changed: 6 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -518,31 +518,32 @@ export const problems = [
518518
"problemName": "184. Department Highest Salary",
519519
"difficulty": "Medium",
520520
"leetCodeLink": "https://leetcode.com/problems/department-highest-salary",
521-
"solutionLink": "#"
521+
"solutionLink": "/dsa-solutions/lc-solutions/0100-0199/department-highest-salary"
522522
},
523523
{
524524
"problemName": "185. Department Top Three Salaries",
525525
"difficulty": "Hard",
526526
"leetCodeLink": "https://leetcode.com/problems/department-top-three-salaries",
527-
"solutionLink": "#"
527+
"solutionLink": "/dsa-solutions/lc-solutions/0100-0199/department-top-three-salaries"
528528
},
529529
{
530530
"problemName": "186. Reverse Words in a String II",
531531
"difficulty": "Medium",
532532
"leetCodeLink": "https://leetcode.com/problems/reverse-words-in-a-string-ii",
533-
"solutionLink": "#"
533+
"solutionLink": "/dsa-solutions/lc-solutions/0100-0199/reverse-words-in-string-II"
534534
},
535535
{
536536
"problemName": "187. Repeated DNA Sequences",
537537
"difficulty": "Medium",
538538
"leetCodeLink": "https://leetcode.com/problems/repeated-dna-sequences",
539-
"solutionLink": "#"
539+
"solutionLink": "/dsa-solutions/lc-solutions/0100-0199/repeated-dna-sequence"
540+
540541
},
541542
{
542543
"problemName": "188. Best Time to Buy and Sell Stock IV",
543544
"difficulty": "Hard",
544545
"leetCodeLink": "https://leetcode.com/problems/best-time-to-buy-and-sell-stock-iv",
545-
"solutionLink": "#"
546+
"solutionLink": "/dsa-solutions/lc-solutions/0100-0199/best-time-to-buy-sell-stock-IV"
546547
},
547548
{
548549
"problemName": "189. Rotate Array",
Lines changed: 146 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,146 @@
1+
---
2+
id: Backward-Search-Algorithm
3+
title: Backward Search Algorithm
4+
sidebar_label: Backward Search Algorithm
5+
tags:
6+
- Advanced
7+
- Search Algorithms
8+
- Text Processing
9+
- CPP
10+
- Python
11+
- Java
12+
- JavaScript
13+
- DSA
14+
description: "This is a solution to the Backward Search Algorithm problem."
15+
---
16+
17+
## What is the Backward Search Algorithm?
18+
19+
The Backward Search Algorithm is a search method typically used in text processing and pattern matching. It works by searching for a pattern in a text from right to left, which can be more efficient in certain contexts compared to traditional left-to-right search algorithms.
20+
21+
## Algorithm Steps
22+
23+
1. **Preprocessing**:
24+
- Construct a rightmost occurrence function that maps each character in the pattern to its rightmost position.
25+
26+
2. **Searching**:
27+
- Align the pattern with the beginning of the text.
28+
- Compare characters from right to left.
29+
- If a mismatch is found, use the rightmost occurrence function to determine the next alignment.
30+
- If the entire pattern matches, record the position of the match.
31+
32+
## Complexity Analysis
33+
34+
- **Time Complexity**: The average-case time complexity is \(O(n/m)\) where \(n\) is the length of the text and \(m\) is the length of the pattern.
35+
- **Space Complexity**: The space complexity is \(O(k)\) where \(k\) is the size of the character set.
36+
37+
## Example
38+
39+
Given a pattern and a text:
40+
41+
```
42+
pattern = "ABC"
43+
text = "AABCABCDABC"
44+
```
45+
46+
47+
Using the Backward Search Algorithm:
48+
49+
- The algorithm will search the pattern "ABC" in the text "AABCABCDABC" from right to left.
50+
51+
## Implementation
52+
53+
<Tabs>
54+
<TabItem value="Python" label="Python" default>
55+
56+
```python
57+
def rightmost_occurrence_function(pattern):
58+
rightmost = {}
59+
for i in range(len(pattern)):
60+
rightmost[pattern[i]] = i
61+
return rightmost
62+
63+
def backward_search(text, pattern):
64+
rightmost = rightmost_occurrence_function(pattern)
65+
m, n = len(pattern), len(text)
66+
i = m - 1
67+
j = m - 1
68+
69+
while i < n:
70+
if text[i] == pattern[j]:
71+
if j == 0:
72+
return i
73+
else:
74+
i -= 1
75+
j -= 1
76+
else:
77+
lo = rightmost.get(text[i], -1)
78+
i += m - min(j, 1 + lo)
79+
j = m - 1
80+
81+
return -1
82+
83+
# Example usage:
84+
pattern = "ABC"
85+
text = "AABCABCDABC"
86+
result = backward_search(text, pattern)
87+
print(f"Pattern found at index: {result}")
88+
```
89+
90+
</TabItem>
91+
<TabItem value="C++" label="C++">
92+
93+
```cpp
94+
95+
#include <iostream>
96+
#include <unordered_map>
97+
#include <string>
98+
99+
std::unordered_map<char, int> rightmost_occurrence_function(const std::string &pattern) {
100+
std::unordered_map<char, int> rightmost;
101+
for (int i = 0; i < pattern.size(); ++i) {
102+
rightmost[pattern[i]] = i;
103+
}
104+
return rightmost;
105+
}
106+
107+
int backward_search(const std::string &text, const std::string &pattern) {
108+
auto rightmost = rightmost_occurrence_function(pattern);
109+
int m = pattern.size();
110+
int n = text.size();
111+
int i = m - 1;
112+
int j = m - 1;
113+
114+
while (i < n) {
115+
if (text[i] == pattern[j]) {
116+
if (j == 0) {
117+
return i;
118+
} else {
119+
--i;
120+
--j;
121+
}
122+
} else {
123+
int lo = rightmost.find(text[i]) != rightmost.end() ? rightmost[text[i]] : -1;
124+
i += m - std::min(j, 1 + lo);
125+
j = m - 1;
126+
}
127+
}
128+
129+
return -1;
130+
}
131+
132+
int main() {
133+
std::string pattern = "ABC";
134+
std::string text = "AABCABCDABC";
135+
int result = backward_search(text, pattern);
136+
std::cout << "Pattern found at index: " << result << std::endl;
137+
return 0;
138+
}
139+
140+
```
141+
142+
</TabItem>
143+
</Tabs>
144+
145+
# Conclusion
146+
The Backward Search Algorithm is a powerful method for pattern matching in text processing. By searching from right to left and leveraging the rightmost occurrence function, it provides efficient searching capabilities, especially useful in specific contexts where this directionality offers advantages over traditional approaches.

0 commit comments

Comments
 (0)