Skip to content

Commit f32d182

Browse files
authored
Merge pull request #31 from ScrapeGraphAI/pre/beta
SearchScraper
2 parents d1e4e7d + 49271b0 commit f32d182

27 files changed

+925
-315
lines changed

README.md

+12-11
Original file line numberDiff line numberDiff line change
@@ -9,15 +9,15 @@
99
<img src="https://raw.githubusercontent.com/VinciGit00/Scrapegraph-ai/main/docs/assets/api-banner.png" alt="ScrapeGraph API Banner" style="width: 70%;">
1010
</p>
1111

12-
Official SDKs for the ScrapeGraph AI API - Intelligent web scraping powered by AI. Extract structured data from any webpage with natural language prompts.
12+
Official SDKs for the ScrapeGraph AI API - Intelligent web scraping and search powered by AI. Extract structured data from any webpage or perform AI-powered web searches with natural language prompts.
1313

1414
Get your [API key](https://scrapegraphai.com)!
1515

1616
## 🚀 Quick Links
1717

1818
- [Python SDK Documentation](scrapegraph-py/README.md)
1919
- [JavaScript SDK Documentation](scrapegraph-js/README.md)
20-
- [API Documentation](https://docs.scrapegraphai.com)
20+
- [API Documentation](https://docs.scrapegraphai.com)
2121
- [Website](https://scrapegraphai.com)
2222

2323
## 📦 Installation
@@ -34,31 +34,31 @@ npm install scrapegraph-js
3434

3535
## 🎯 Core Features
3636

37-
- 🤖 **AI-Powered Extraction**: Use natural language to describe what data you want
37+
- 🤖 **AI-Powered Extraction & Search**: Use natural language to extract data or search the web
3838
- 📊 **Structured Output**: Get clean, structured data with optional schema validation
3939
- 🔄 **Multiple Formats**: Extract data as JSON, Markdown, or custom schemas
4040
-**High Performance**: Concurrent processing and automatic retries
4141
- 🔒 **Enterprise Ready**: Production-grade security and rate limiting
4242

4343
## 🛠️ Available Endpoints
4444

45-
### 🔍 SmartScraper
46-
Extract structured data from any webpage using natural language prompts.
45+
### 🤖 SmartScraper
46+
Using AI to extract structured data from any webpage or HTML content with natural language prompts.
47+
48+
### 🔍 SearchScraper
49+
Perform AI-powered web searches with structured results and reference URLs.
4750

4851
### 📝 Markdownify
4952
Convert any webpage into clean, formatted markdown.
5053

51-
### 💻 LocalScraper
52-
Extract information from a local HTML file using AI.
53-
54-
5554
## 🌟 Key Benefits
5655

5756
- 📝 **Natural Language Queries**: No complex selectors or XPath needed
5857
- 🎯 **Precise Extraction**: AI understands context and structure
59-
- 🔄 **Adaptive Scraping**: Works with dynamic and static content
58+
- 🔄 **Adaptive Processing**: Works with both web content and direct HTML
6059
- 📊 **Schema Validation**: Ensure data consistency with Pydantic/TypeScript
6160
-**Async Support**: Handle multiple requests efficiently
61+
- 🔍 **Source Attribution**: Get reference URLs for search results
6262

6363
## 💡 Use Cases
6464

@@ -67,13 +67,14 @@ Extract information from a local HTML file using AI.
6767
- 📰 **Content Aggregation**: Convert articles to structured formats
6868
- 🔍 **Data Mining**: Extract specific information from multiple sources
6969
- 📱 **App Integration**: Feed clean data into your applications
70+
- 🌐 **Web Research**: Perform AI-powered searches with structured results
7071

7172
## 📖 Documentation
7273

7374
For detailed documentation and examples, visit:
7475
- [Python SDK Guide](scrapegraph-py/README.md)
7576
- [JavaScript SDK Guide](scrapegraph-js/README.md)
76-
- [API Documentation](https://docs.scrapegraphai.com)
77+
- [API Documentation](https://docs.scrapegraphai.com)
7778

7879
## 💬 Support & Feedback
7980

scrapegraph-py/CHANGELOG.md

+14
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,4 @@
1+
## [1.9.0-beta.7](https://github.com/ScrapeGraphAI/scrapegraph-sdk/compare/v1.9.0-beta.6...v1.9.0-beta.7) (2025-02-03)
12
## [1.10.2](https://github.com/ScrapeGraphAI/scrapegraph-sdk/compare/v1.10.1...v1.10.2) (2025-01-22)
23

34

@@ -18,6 +19,19 @@
1819

1920
### Features
2021

22+
* add optional headers to request ([bb851d7](https://github.com/ScrapeGraphAI/scrapegraph-sdk/commit/bb851d785d121b039d5e968327fb930955a3fd92))
23+
* merged localscraper into smartscraper ([503dbd1](https://github.com/ScrapeGraphAI/scrapegraph-sdk/commit/503dbd19b8cec4d2ff4575786b0eec25db2e80e6))
24+
* modified icons ([bcb9b0b](https://github.com/ScrapeGraphAI/scrapegraph-sdk/commit/bcb9b0b731b057d242fdf80b43d96879ff7a2764))
25+
* searchscraper ([2e04e5a](https://github.com/ScrapeGraphAI/scrapegraph-sdk/commit/2e04e5a1bbd207a7ceeea594878bdea542a7a856))
26+
* updated readmes ([bfdbea0](https://github.com/ScrapeGraphAI/scrapegraph-sdk/commit/bfdbea038918d79df2e3e9442e25d5f08bbccbbc))
27+
28+
29+
### chore
30+
31+
* refactor examples ([8e00846](https://github.com/ScrapeGraphAI/scrapegraph-sdk/commit/8e008465f7280c53e2faab7a92f02871ffc5b867))
32+
* **tests:** updated tests ([9149ce8](https://github.com/ScrapeGraphAI/scrapegraph-sdk/commit/9149ce85a78b503098f80910c20de69831030378))
33+
34+
## [1.9.0-beta.6](https://github.com/ScrapeGraphAI/scrapegraph-sdk/compare/v1.9.0-beta.5...v1.9.0-beta.6) (2025-01-08)
2135
* add integration for sql ([2543b5a](https://github.com/ScrapeGraphAI/scrapegraph-sdk/commit/2543b5a9b84826de5c583d38fe89cf21aad077e6))
2236

2337

scrapegraph-py/README.md

+53-28
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@
44
[![Python Support](https://img.shields.io/pypi/pyversions/scrapegraph-py.svg)](https://pypi.org/project/scrapegraph-py/)
55
[![License](https://img.shields.io/badge/License-MIT-blue.svg)](https://opensource.org/licenses/MIT)
66
[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)
7-
[![Documentation Status](https://readthedocs.org/projects/scrapegraph-py/badge/?version=latest)](https://docs.scrapegraphai.com)
7+
[![Documentation Status](https://readthedocs.org/projects/scrapegraph-py/badge/?version=latest)](https://docs.scrapegraphai.com)
88

99
<p align="left">
1010
<img src="https://raw.githubusercontent.com/VinciGit00/Scrapegraph-ai/main/docs/assets/api-banner.png" alt="ScrapeGraph API Banner" style="width: 70%;">
@@ -20,7 +20,7 @@ pip install scrapegraph-py
2020

2121
## 🚀 Features
2222

23-
- 🤖 AI-powered web scraping
23+
- 🤖 AI-powered web scraping and search
2424
- 🔄 Both sync and async clients
2525
- 📊 Structured output with Pydantic schemas
2626
- 🔍 Detailed logging
@@ -40,21 +40,36 @@ client = Client(api_key="your-api-key-here")
4040
4141
## 📚 Available Endpoints
4242

43-
### 🔍 SmartScraper
43+
### 🤖 SmartScraper
4444

45-
Scrapes any webpage using AI to extract specific information.
45+
Extract structured data from any webpage or HTML content using AI.
4646

4747
```python
4848
from scrapegraph_py import Client
4949

5050
client = Client(api_key="your-api-key-here")
5151

52-
# Basic usage
52+
# Using a URL
5353
response = client.smartscraper(
5454
website_url="https://example.com",
5555
user_prompt="Extract the main heading and description"
5656
)
5757

58+
# Or using HTML content
59+
html_content = """
60+
<html>
61+
<body>
62+
<h1>Company Name</h1>
63+
<p>We are a technology company focused on AI solutions.</p>
64+
</body>
65+
</html>
66+
"""
67+
68+
response = client.smartscraper(
69+
website_html=html_content,
70+
user_prompt="Extract the company description"
71+
)
72+
5873
print(response)
5974
```
6075

@@ -80,46 +95,56 @@ response = client.smartscraper(
8095

8196
</details>
8297

83-
### 📝 Markdownify
98+
### 🔍 SearchScraper
8499

85-
Converts any webpage into clean, formatted markdown.
100+
Perform AI-powered web searches with structured results and reference URLs.
86101

87102
```python
88103
from scrapegraph_py import Client
89104

90105
client = Client(api_key="your-api-key-here")
91106

92-
response = client.markdownify(
93-
website_url="https://example.com"
107+
response = client.searchscraper(
108+
user_prompt="What is the latest version of Python and its main features?"
94109
)
95110

96-
print(response)
111+
print(f"Answer: {response['result']}")
112+
print(f"Sources: {response['reference_urls']}")
97113
```
98114

99-
### 💻 LocalScraper
100-
101-
Extracts information from HTML content using AI.
115+
<details>
116+
<summary>Output Schema (Optional)</summary>
102117

103118
```python
119+
from pydantic import BaseModel, Field
104120
from scrapegraph_py import Client
105121

106122
client = Client(api_key="your-api-key-here")
107123

108-
html_content = """
109-
<html>
110-
<body>
111-
<h1>Company Name</h1>
112-
<p>We are a technology company focused on AI solutions.</p>
113-
<div class="contact">
114-
<p>Email: [email protected]</p>
115-
</div>
116-
</body>
117-
</html>
118-
"""
124+
class PythonVersionInfo(BaseModel):
125+
version: str = Field(description="The latest Python version number")
126+
release_date: str = Field(description="When this version was released")
127+
major_features: list[str] = Field(description="List of main features")
128+
129+
response = client.searchscraper(
130+
user_prompt="What is the latest version of Python and its main features?",
131+
output_schema=PythonVersionInfo
132+
)
133+
```
134+
135+
</details>
119136

120-
response = client.localscraper(
121-
user_prompt="Extract the company description",
122-
website_html=html_content
137+
### 📝 Markdownify
138+
139+
Converts any webpage into clean, formatted markdown.
140+
141+
```python
142+
from scrapegraph_py import Client
143+
144+
client = Client(api_key="your-api-key-here")
145+
146+
response = client.markdownify(
147+
website_url="https://example.com"
123148
)
124149

125150
print(response)
@@ -177,7 +202,7 @@ This project is licensed under the MIT License - see the [LICENSE](LICENSE) file
177202
## 🔗 Links
178203

179204
- [Website](https://scrapegraphai.com)
180-
- [Documentation](https://docs.scrapegraphai.com)
205+
- [Documentation](https://docs.scrapegraphai.com)
181206
- [GitHub](https://github.com/ScrapeGraphAI/scrapegraph-sdk)
182207

183208
---
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,46 @@
1+
"""
2+
Example of using the async searchscraper functionality to search for information concurrently.
3+
"""
4+
5+
import asyncio
6+
7+
from scrapegraph_py import AsyncClient
8+
from scrapegraph_py.logger import sgai_logger
9+
10+
sgai_logger.set_logging(level="INFO")
11+
12+
13+
async def main():
14+
# Initialize async client
15+
sgai_client = AsyncClient(api_key="your-api-key-here")
16+
17+
# List of search queries
18+
queries = [
19+
"What is the latest version of Python and what are its main features?",
20+
"What are the key differences between Python 2 and Python 3?",
21+
"What is Python's GIL and how does it work?",
22+
]
23+
24+
# Create tasks for concurrent execution
25+
tasks = [sgai_client.searchscraper(user_prompt=query) for query in queries]
26+
27+
# Execute requests concurrently
28+
responses = await asyncio.gather(*tasks, return_exceptions=True)
29+
30+
# Process results
31+
for i, response in enumerate(responses):
32+
if isinstance(response, Exception):
33+
print(f"\nError for query {i+1}: {response}")
34+
else:
35+
print(f"\nSearch {i+1}:")
36+
print(f"Query: {queries[i]}")
37+
print(f"Result: {response['result']}")
38+
print("Reference URLs:")
39+
for url in response["reference_urls"]:
40+
print(f"- {url}")
41+
42+
await sgai_client.close()
43+
44+
45+
if __name__ == "__main__":
46+
asyncio.run(main())

0 commit comments

Comments
 (0)