4
4
[ ![ Python Support] ( https://img.shields.io/pypi/pyversions/scrapegraph-py.svg )] ( https://pypi.org/project/scrapegraph-py/ )
5
5
[ ![ License] ( https://img.shields.io/badge/License-MIT-blue.svg )] ( https://opensource.org/licenses/MIT )
6
6
[ ![ Code style: black] ( https://img.shields.io/badge/code%20style-black-000000.svg )] ( https://github.com/psf/black )
7
- [ ![ Documentation Status] ( https://readthedocs.org/projects/scrapegraph-py/badge/?version=latest )] ( https://docs.scrapegraphai.com )
7
+ [ ![ Documentation Status] ( https://readthedocs.org/projects/scrapegraph-py/badge/?version=latest )] ( https://docs.scrapegraphai.com )
8
8
9
9
<p align =" left " >
10
10
<img src =" https://raw.githubusercontent.com/VinciGit00/Scrapegraph-ai/main/docs/assets/api-banner.png " alt =" ScrapeGraph API Banner " style =" width : 70% ;" >
@@ -20,7 +20,7 @@ pip install scrapegraph-py
20
20
21
21
## 🚀 Features
22
22
23
- - 🤖 AI-powered web scraping
23
+ - 🤖 AI-powered web scraping and search
24
24
- 🔄 Both sync and async clients
25
25
- 📊 Structured output with Pydantic schemas
26
26
- 🔍 Detailed logging
@@ -40,21 +40,36 @@ client = Client(api_key="your-api-key-here")
40
40
41
41
## 📚 Available Endpoints
42
42
43
- ### 🔍 SmartScraper
43
+ ### 🤖 SmartScraper
44
44
45
- Scrapes any webpage using AI to extract specific information .
45
+ Extract structured data from any webpage or HTML content using AI .
46
46
47
47
``` python
48
48
from scrapegraph_py import Client
49
49
50
50
client = Client(api_key = " your-api-key-here" )
51
51
52
- # Basic usage
52
+ # Using a URL
53
53
response = client.smartscraper(
54
54
website_url = " https://example.com" ,
55
55
user_prompt = " Extract the main heading and description"
56
56
)
57
57
58
+ # Or using HTML content
59
+ html_content = """
60
+ <html>
61
+ <body>
62
+ <h1>Company Name</h1>
63
+ <p>We are a technology company focused on AI solutions.</p>
64
+ </body>
65
+ </html>
66
+ """
67
+
68
+ response = client.smartscraper(
69
+ website_html = html_content,
70
+ user_prompt = " Extract the company description"
71
+ )
72
+
58
73
print (response)
59
74
```
60
75
@@ -80,46 +95,56 @@ response = client.smartscraper(
80
95
81
96
</details >
82
97
83
- ### 📝 Markdownify
98
+ ### 🔍 SearchScraper
84
99
85
- Converts any webpage into clean, formatted markdown .
100
+ Perform AI-powered web searches with structured results and reference URLs .
86
101
87
102
``` python
88
103
from scrapegraph_py import Client
89
104
90
105
client = Client(api_key = " your-api-key-here" )
91
106
92
- response = client.markdownify (
93
- website_url = " https://example.com "
107
+ response = client.searchscraper (
108
+ user_prompt = " What is the latest version of Python and its main features? "
94
109
)
95
110
96
- print (response)
111
+ print (f " Answer: { response[' result' ]} " )
112
+ print (f " Sources: { response[' reference_urls' ]} " )
97
113
```
98
114
99
- ### 💻 LocalScraper
100
-
101
- Extracts information from HTML content using AI.
115
+ <details >
116
+ <summary >Output Schema (Optional)</summary >
102
117
103
118
``` python
119
+ from pydantic import BaseModel, Field
104
120
from scrapegraph_py import Client
105
121
106
122
client = Client(api_key = " your-api-key-here" )
107
123
108
- html_content = """
109
- <html>
110
- <body>
111
- <h1>Company Name</h1>
112
- <p>We are a technology company focused on AI solutions.</p>
113
- <div class="contact">
114
-
115
- </div>
116
- </body>
117
- </html>
118
- """
124
+ class PythonVersionInfo (BaseModel ):
125
+ version: str = Field(description = " The latest Python version number" )
126
+ release_date: str = Field(description = " When this version was released" )
127
+ major_features: list[str ] = Field(description = " List of main features" )
128
+
129
+ response = client.searchscraper(
130
+ user_prompt = " What is the latest version of Python and its main features?" ,
131
+ output_schema = PythonVersionInfo
132
+ )
133
+ ```
134
+
135
+ </details >
119
136
120
- response = client.localscraper(
121
- user_prompt = " Extract the company description" ,
122
- website_html = html_content
137
+ ### 📝 Markdownify
138
+
139
+ Converts any webpage into clean, formatted markdown.
140
+
141
+ ``` python
142
+ from scrapegraph_py import Client
143
+
144
+ client = Client(api_key = " your-api-key-here" )
145
+
146
+ response = client.markdownify(
147
+ website_url = " https://example.com"
123
148
)
124
149
125
150
print (response)
@@ -177,7 +202,7 @@ This project is licensed under the MIT License - see the [LICENSE](LICENSE) file
177
202
## 🔗 Links
178
203
179
204
- [ Website] ( https://scrapegraphai.com )
180
- - [ Documentation] ( https://docs.scrapegraphai.com )
205
+ - [ Documentation] ( https://docs.scrapegraphai.com )
181
206
- [ GitHub] ( https://github.com/ScrapeGraphAI/scrapegraph-sdk )
182
207
183
208
---
0 commit comments