Skip to content

Commit d789f59

Browse files
authored
Merge pull request #14 from ScrapeGraphAI/pre/beta
Added Markdownify and Localscraper
2 parents e9c852c + 5e65800 commit d789f59

15 files changed

+679
-111
lines changed

scrapegraph-py/CHANGELOG.md

+7
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,10 @@
1+
## [1.7.0-beta.1](https://github.com/ScrapeGraphAI/scrapegraph-sdk/compare/v1.6.0...v1.7.0-beta.1) (2024-12-05)
2+
3+
4+
### Features
5+
6+
* add markdownify and localscraper ([6296510](https://github.com/ScrapeGraphAI/scrapegraph-sdk/commit/6296510b22ce511adde4265532ac6329a05967e0))
7+
18
## [1.6.0](https://github.com/ScrapeGraphAI/scrapegraph-sdk/compare/v1.5.0...v1.6.0) (2024-12-05)
29

310

scrapegraph-py/CONTRIBUTING.md

+28-3
Original file line numberDiff line numberDiff line change
@@ -13,11 +13,36 @@ Thank you for your interest in contributing to **ScrapeGraphAI**! We welcome con
1313

1414
## Getting Started
1515

16-
To get started with contributing, follow these steps:
16+
### Development Setup
1717

1818
1. Fork the repository on GitHub **(FROM pre/beta branch)**.
19-
2. Clone your forked repository to your local machine.
20-
3. Install the necessary dependencies from requirements.txt or via pyproject.toml as you prefere :).
19+
2. Clone your forked repository:
20+
```bash
21+
git clone https://github.com/ScrapeGraphAI/scrapegraph-sdk.git
22+
cd scrapegraph-sdk/scrapegraph-py
23+
```
24+
25+
3. Install dependencies using uv (recommended):
26+
```bash
27+
# Install uv if you haven't already
28+
pip install uv
29+
30+
# Install dependencies
31+
uv sync
32+
33+
# Install pre-commit hooks
34+
uv run pre-commit install
35+
```
36+
37+
4. Run tests:
38+
```bash
39+
# Run all tests
40+
uv run pytest
41+
42+
# Run specific test file
43+
uv run pytest tests/test_client.py
44+
```
45+
2146
4. Make your changes or additions.
2247
5. Test your changes thoroughly.
2348
6. Commit your changes with descriptive commit messages.

scrapegraph-py/README.md

+107-96
Original file line numberDiff line numberDiff line change
@@ -6,164 +6,175 @@
66
[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)
77
[![Documentation Status](https://readthedocs.org/projects/scrapegraph-py/badge/?version=latest)](https://scrapegraph-py.readthedocs.io/en/latest/?badge=latest)
88

9-
Official Python SDK for the ScrapeGraph AI API - Smart web scraping powered by AI.
10-
11-
## 🚀 Features
12-
13-
- ✨ Smart web scraping with AI
14-
- 🔄 Both sync and async clients
15-
- 📊 Structured output with Pydantic schemas
16-
- 🔍 Detailed logging with emojis
17-
- ⚡ Automatic retries and error handling
18-
- 🔐 Secure API authentication
9+
Official Python SDK for the ScrapeGraph API - Smart web scraping powered by AI.
1910

2011
## 📦 Installation
2112

22-
### Using pip
23-
24-
```
13+
```bash
2514
pip install scrapegraph-py
2615
```
2716

28-
### Using uv
17+
## 🚀 Features
2918

30-
We recommend using [uv](https://docs.astral.sh/uv/) to install the dependencies and pre-commit hooks.
19+
- 🤖 AI-powered web scraping
20+
- 🔄 Both sync and async clients
21+
- 📊 Structured output with Pydantic schemas
22+
- 🔍 Detailed logging
23+
- ⚡ Automatic retries
24+
- 🔐 Secure authentication
3125

32-
```
33-
# Install uv if you haven't already
34-
pip install uv
26+
## 🎯 Quick Start
3527

36-
# Install dependencies
37-
uv sync
28+
```python
29+
from scrapegraph_py import Client
3830

39-
# Install pre-commit hooks
40-
uv run pre-commit install
31+
client = Client(api_key="your-api-key-here")
4132
```
4233

43-
## 🔧 Quick Start
44-
4534
> [!NOTE]
46-
> If you prefer, you can use the environment variables to configure the API key and load them using `load_dotenv()`
35+
> You can set the `SGAI_API_KEY` environment variable and initialize the client without parameters: `client = Client()`
4736
48-
```python
49-
from scrapegraph_py import SyncClient
50-
from scrapegraph_py.logger import get_logger
37+
## 📚 Available Endpoints
38+
39+
### 🔍 SmartScraper
5140

52-
# Enable debug logging
53-
logger = get_logger(level="DEBUG")
41+
Scrapes any webpage using AI to extract specific information.
42+
43+
```python
44+
from scrapegraph_py import Client
5445

55-
# Initialize client
56-
sgai_client = SyncClient(api_key="your-api-key-here")
46+
client = Client(api_key="your-api-key-here")
5747

58-
# Make a request
59-
response = sgai_client.smartscraper(
48+
# Basic usage
49+
response = client.smartscraper(
6050
website_url="https://example.com",
6151
user_prompt="Extract the main heading and description"
6252
)
6353

64-
print(response["result"])
65-
```
66-
67-
## 🎯 Examples
68-
69-
### Async Usage
70-
71-
```python
72-
import asyncio
73-
from scrapegraph_py import AsyncClient
74-
75-
async def main():
76-
async with AsyncClient(api_key="your-api-key-here") as sgai_client:
77-
response = await sgai_client.smartscraper(
78-
website_url="https://example.com",
79-
user_prompt="Summarize the main content"
80-
)
81-
print(response["result"])
82-
83-
asyncio.run(main())
54+
print(response)
8455
```
8556

8657
<details>
87-
<summary><b>With Output Schema</b></summary>
58+
<summary>Output Schema (Optional)</summary>
8859

8960
```python
9061
from pydantic import BaseModel, Field
91-
from scrapegraph_py import SyncClient
62+
from scrapegraph_py import Client
63+
64+
client = Client(api_key="your-api-key-here")
9265

9366
class WebsiteData(BaseModel):
9467
title: str = Field(description="The page title")
9568
description: str = Field(description="The meta description")
9669

97-
sgai_client = SyncClient(api_key="your-api-key-here")
98-
response = sgai_client.smartscraper(
70+
response = client.smartscraper(
9971
website_url="https://example.com",
10072
user_prompt="Extract the title and description",
10173
output_schema=WebsiteData
10274
)
103-
104-
print(response["result"])
10575
```
76+
10677
</details>
10778

108-
## 📚 Documentation
79+
### 📝 Markdownify
10980

110-
For detailed documentation, visit [docs.scrapegraphai.com](https://docs.scrapegraphai.com)
81+
Converts any webpage into clean, formatted markdown.
11182

112-
## 🛠️ Development
83+
```python
84+
from scrapegraph_py import Client
11385

114-
### Setup
86+
client = Client(api_key="your-api-key-here")
11587

116-
1. Clone the repository:
117-
```
118-
git clone https://github.com/ScrapeGraphAI/scrapegraph-sdk.git
119-
cd scrapegraph-sdk/scrapegraph-py
120-
```
88+
response = client.markdownify(
89+
website_url="https://example.com"
90+
)
12191

122-
2. Install dependencies:
123-
```
124-
uv sync
92+
print(response)
12593
```
12694

127-
3. Install pre-commit hooks:
128-
```
129-
uv run pre-commit install
130-
```
95+
### 💻 LocalScraper
13196

132-
### Running Tests
97+
Extracts information from HTML content using AI.
13398

134-
```
135-
# Run all tests
136-
uv run pytest
99+
```python
100+
from scrapegraph_py import Client
101+
102+
client = Client(api_key="your-api-key-here")
103+
104+
html_content = """
105+
<html>
106+
<body>
107+
<h1>Company Name</h1>
108+
<p>We are a technology company focused on AI solutions.</p>
109+
<div class="contact">
110+
<p>Email: [email protected]</p>
111+
</div>
112+
</body>
113+
</html>
114+
"""
115+
116+
response = client.localscraper(
117+
user_prompt="Extract the company description",
118+
website_html=html_content
119+
)
137120

138-
# Run specific test file
139-
poetry run pytest tests/test_client.py
121+
print(response)
140122
```
141123

142-
## 📝 License
124+
## ⚡ Async Support
143125

144-
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
126+
All endpoints support async operations:
127+
128+
```python
129+
import asyncio
130+
from scrapegraph_py import AsyncClient
145131

146-
## 🤝 Contributing
132+
async def main():
133+
async with AsyncClient() as client:
134+
response = await client.smartscraper(
135+
website_url="https://example.com",
136+
user_prompt="Extract the main content"
137+
)
138+
print(response)
139+
140+
asyncio.run(main())
141+
```
147142

148-
Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.
143+
## 📖 Documentation
149144

150-
1. Fork the repository
151-
2. Create your feature branch (`git checkout -b feature/AmazingFeature`)
152-
3. Commit your changes (`git commit -m 'Add some AmazingFeature'`)
153-
4. Push to the branch (`git push origin feature/AmazingFeature`)
154-
5. Open a Pull Request
145+
For detailed documentation, visit [scrapegraphai.com/docs](https://scrapegraphai.com/docs)
155146

156-
## 🔗 Links
147+
## 🛠️ Development
157148

158-
- [Website](https://scrapegraphai.com)
159-
- [Documentation](https://scrapegraphai.com/documentation)
160-
- [GitHub](https://github.com/ScrapeGraphAI/scrapegraph-sdk)
149+
For information about setting up the development environment and contributing to the project, see our [Contributing Guide](CONTRIBUTING.md).
161150

162-
## 💬 Support
151+
## 💬 Support & Feedback
163152

164153
- 📧 Email: [email protected]
165154
- 💻 GitHub Issues: [Create an issue](https://github.com/ScrapeGraphAI/scrapegraph-sdk/issues)
166155
- 🌟 Feature Requests: [Request a feature](https://github.com/ScrapeGraphAI/scrapegraph-sdk/issues/new)
156+
- ⭐ API Feedback: You can also submit feedback programmatically using the feedback endpoint:
157+
```python
158+
from scrapegraph_py import Client
159+
160+
client = Client(api_key="your-api-key-here")
161+
162+
client.submit_feedback(
163+
request_id="your-request-id",
164+
rating=5,
165+
feedback_text="Great results!"
166+
)
167+
```
168+
169+
## 📄 License
170+
171+
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
172+
173+
## 🔗 Links
174+
175+
- [Website](https://scrapegraphai.com)
176+
- [Documentation](https://scrapegraphai.com/docs)
177+
- [GitHub](https://github.com/ScrapeGraphAI/scrapegraph-sdk)
167178

168179
---
169180

Original file line numberDiff line numberDiff line change
@@ -0,0 +1,37 @@
1+
import asyncio
2+
3+
from scrapegraph_py import AsyncClient
4+
from scrapegraph_py.logger import sgai_logger
5+
6+
sgai_logger.set_logging(level="INFO")
7+
8+
9+
async def main():
10+
# Initialize async client
11+
sgai_client = AsyncClient(api_key="your-api-key-here")
12+
13+
# Concurrent markdownify requests
14+
urls = [
15+
"https://scrapegraphai.com/",
16+
"https://github.com/ScrapeGraphAI/Scrapegraph-ai",
17+
]
18+
19+
tasks = [sgai_client.markdownify(website_url=url) for url in urls]
20+
21+
# Execute requests concurrently
22+
responses = await asyncio.gather(*tasks, return_exceptions=True)
23+
24+
# Process results
25+
for i, response in enumerate(responses):
26+
if isinstance(response, Exception):
27+
print(f"\nError for {urls[i]}: {response}")
28+
else:
29+
print(f"\nPage {i+1} Markdown:")
30+
print(f"URL: {urls[i]}")
31+
print(f"Result: {response['result']}")
32+
33+
await sgai_client.close()
34+
35+
36+
if __name__ == "__main__":
37+
asyncio.run(main())
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,31 @@
1+
from scrapegraph_py import Client
2+
from scrapegraph_py.logger import sgai_logger
3+
4+
sgai_logger.set_logging(level="INFO")
5+
6+
# Initialize the client
7+
sgai_client = Client(api_key="your-api-key-here")
8+
9+
# Example HTML content
10+
html_content = """
11+
<html>
12+
<body>
13+
<h1>Company Name</h1>
14+
<p>We are a technology company focused on AI solutions.</p>
15+
<div class="contact">
16+
<p>Email: [email protected]</p>
17+
<p>Phone: (555) 123-4567</p>
18+
</div>
19+
</body>
20+
</html>
21+
"""
22+
23+
# LocalScraper request
24+
response = sgai_client.localscraper(
25+
user_prompt="Extract the company description and contact information",
26+
website_html=html_content,
27+
)
28+
29+
# Print the response
30+
print(f"Request ID: {response['request_id']}")
31+
print(f"Result: {response['result']}")

0 commit comments

Comments
 (0)