Skip to content

Commit 76e8656

Browse files
committed
feat: add new doc
1 parent d22ca14 commit 76e8656

File tree

3 files changed

+577
-0
lines changed

3 files changed

+577
-0
lines changed

services/markdownify.mdx

+145
Original file line numberDiff line numberDiff line change
@@ -130,6 +130,151 @@ Want to learn more about our AI-powered scraping technology? Visit our [main web
130130

131131
## Advanced Usage
132132

133+
### Request Helper Function
134+
135+
The `getMarkdownifyRequest` function helps create properly formatted request objects for the Markdownify service:
136+
137+
```javascript
138+
import { getMarkdownifyRequest } from 'scrapegraph-js';
139+
140+
const request = getMarkdownifyRequest({
141+
websiteUrl: "https://example.com/article",
142+
headers: {
143+
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36"
144+
}
145+
});
146+
```
147+
148+
#### Parameters
149+
150+
| Parameter | Type | Required | Description |
151+
|-----------|------|----------|-------------|
152+
| websiteUrl | string | Yes | The URL of the webpage to convert to markdown. |
153+
| headers | object | No | Custom headers for the request (e.g., User-Agent, cookies). |
154+
155+
#### Return Value
156+
157+
Returns an object with the following structure:
158+
159+
```typescript
160+
{
161+
request_id: string;
162+
status: "queued" | "processing" | "completed" | "failed";
163+
website_url: string;
164+
result?: string | null;
165+
error: string;
166+
}
167+
```
168+
169+
#### Error Handling
170+
171+
The function includes built-in error handling for common scenarios:
172+
173+
```javascript
174+
try {
175+
const request = getMarkdownifyRequest({
176+
websiteUrl: "https://example.com/article"
177+
});
178+
} catch (error) {
179+
if (error.code === 'INVALID_URL') {
180+
console.error('The provided URL is not valid');
181+
} else if (error.code === 'MISSING_REQUIRED') {
182+
console.error('Required parameters are missing');
183+
} else {
184+
console.error('An unexpected error occurred:', error);
185+
}
186+
}
187+
```
188+
189+
#### Advanced Examples
190+
191+
##### Using Custom Headers
192+
193+
```javascript
194+
const request = getMarkdownifyRequest({
195+
websiteUrl: "https://example.com/article",
196+
headers: {
197+
"User-Agent": "Custom User Agent",
198+
"Accept-Language": "en-US,en;q=0.9",
199+
"Cookie": "session=abc123; user=john",
200+
"Authorization": "Bearer your-auth-token"
201+
}
202+
});
203+
```
204+
205+
##### Handling Dynamic Content
206+
207+
For websites with dynamic content, you might need to adjust the request:
208+
209+
```javascript
210+
const request = getMarkdownifyRequest({
211+
websiteUrl: "https://example.com/dynamic-content",
212+
headers: {
213+
// Headers to handle dynamic content
214+
"X-Requested-With": "XMLHttpRequest",
215+
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9",
216+
// Add any required session cookies
217+
"Cookie": "dynamicContent=enabled; sessionId=xyz789"
218+
}
219+
});
220+
```
221+
222+
#### Best Practices
223+
224+
1. **URL Validation**
225+
- Always validate URLs before making requests
226+
- Ensure URLs use HTTPS when possible
227+
- Handle URL encoding properly
228+
229+
```javascript
230+
import { isValidUrl } from 'scrapegraph-js/utils';
231+
232+
const url = "https://example.com/article with spaces";
233+
const encodedUrl = encodeURI(url);
234+
235+
if (isValidUrl(encodedUrl)) {
236+
const request = getMarkdownifyRequest({ websiteUrl: encodedUrl });
237+
}
238+
```
239+
240+
2. **Header Management**
241+
- Use appropriate User-Agent strings
242+
- Include necessary cookies for authenticated content
243+
- Set proper Accept headers
244+
245+
3. **Error Recovery**
246+
- Implement retry logic for transient failures
247+
- Cache successful responses when appropriate
248+
- Log errors for debugging
249+
250+
```javascript
251+
import { getMarkdownifyRequest, retry } from 'scrapegraph-js';
252+
253+
const makeRequest = retry(async () => {
254+
const request = await getMarkdownifyRequest({
255+
websiteUrl: "https://example.com/article"
256+
});
257+
return request;
258+
}, {
259+
retries: 3,
260+
backoff: true
261+
});
262+
```
263+
264+
4. **Performance Optimization**
265+
- Batch requests when possible
266+
- Use caching strategies
267+
- Monitor API usage
268+
269+
```javascript
270+
import { cache } from 'scrapegraph-js/utils';
271+
272+
const cachedRequest = cache(getMarkdownifyRequest, {
273+
ttl: 3600, // Cache for 1 hour
274+
maxSize: 100 // Cache up to 100 requests
275+
});
276+
```
277+
133278
### Async Support
134279

135280
For applications requiring asynchronous execution, Markdownify provides async support through the `AsyncClient`:

services/searchscraper.mdx

+204
Original file line numberDiff line numberDiff line change
@@ -195,6 +195,210 @@ Want to learn more about our AI-powered search technology? Visit our [main websi
195195

196196
## Advanced Usage
197197

198+
### Request Helper Function
199+
200+
The `getSearchScraperRequest` function helps create properly formatted request objects for the SearchScraper service:
201+
202+
```javascript
203+
import { getSearchScraperRequest } from 'scrapegraph-js';
204+
205+
const request = getSearchScraperRequest({
206+
userPrompt: "What is the latest version of Python and its main features?",
207+
outputSchema: {
208+
version: { type: "string" },
209+
release_date: { type: "string" },
210+
major_features: { type: "array", items: { type: "string" } }
211+
}
212+
});
213+
```
214+
215+
#### Parameters
216+
217+
| Parameter | Type | Required | Description |
218+
|-----------|------|----------|-------------|
219+
| userPrompt | string | Yes | The search query or question to answer. |
220+
| headers | object | No | Custom headers for the request. |
221+
| outputSchema | object | No | Schema defining the structure of the search results. |
222+
223+
#### Return Value
224+
225+
Returns an object with the following structure:
226+
227+
```typescript
228+
{
229+
request_id: string;
230+
status: "queued" | "processing" | "completed" | "failed";
231+
user_prompt: string;
232+
result?: object | null;
233+
reference_urls: string[];
234+
error: string;
235+
}
236+
```
237+
238+
#### Error Handling
239+
240+
The function includes comprehensive error handling:
241+
242+
```javascript
243+
try {
244+
const request = getSearchScraperRequest({
245+
userPrompt: "What are the latest AI chip developments?",
246+
outputSchema: {
247+
manufacturers: { type: "array" },
248+
technologies: { type: "object" }
249+
}
250+
});
251+
} catch (error) {
252+
if (error.code === 'INVALID_PROMPT') {
253+
console.error('The search prompt is invalid or empty');
254+
} else if (error.code === 'SCHEMA_VALIDATION') {
255+
console.error('The output schema is invalid:', error.details);
256+
} else if (error.code === 'MISSING_REQUIRED') {
257+
console.error('Required parameters are missing');
258+
} else {
259+
console.error('An unexpected error occurred:', error);
260+
}
261+
}
262+
```
263+
264+
#### Advanced Examples
265+
266+
##### Complex Search Queries
267+
268+
```javascript
269+
const request = getSearchScraperRequest({
270+
userPrompt: "Compare the top 3 cloud providers (AWS, Azure, GCP) focusing on ML services pricing and features",
271+
outputSchema: {
272+
providers: {
273+
type: "array",
274+
items: {
275+
type: "object",
276+
properties: {
277+
name: { type: "string" },
278+
ml_services: {
279+
type: "array",
280+
items: {
281+
type: "object",
282+
properties: {
283+
name: { type: "string" },
284+
pricing: { type: "string" },
285+
features: { type: "array", items: { type: "string" } }
286+
}
287+
}
288+
}
289+
}
290+
}
291+
},
292+
comparison_matrix: { type: "object" },
293+
recommendation: { type: "string" }
294+
}
295+
});
296+
```
297+
298+
##### Time-Sensitive Searches
299+
300+
```javascript
301+
const request = getSearchScraperRequest({
302+
userPrompt: "Latest cryptocurrency market trends in the past 24 hours",
303+
headers: {
304+
// Headers for real-time data sources
305+
"Cache-Control": "no-cache",
306+
"Pragma": "no-cache"
307+
},
308+
outputSchema: {
309+
timestamp: { type: "string" },
310+
trends: { type: "array" },
311+
market_summary: { type: "object" }
312+
}
313+
});
314+
```
315+
316+
#### Best Practices
317+
318+
1. **Query Optimization**
319+
- Be specific and clear in your prompts
320+
- Include relevant context
321+
- Use appropriate keywords
322+
323+
```javascript
324+
// Good prompt example
325+
const request = getSearchScraperRequest({
326+
userPrompt: "Compare iPhone 15 Pro Max and Samsung S24 Ultra specifications, focusing on camera capabilities, battery life, and performance benchmarks"
327+
});
328+
329+
// Less effective prompt
330+
const badRequest = getSearchScraperRequest({
331+
userPrompt: "Compare phones" // Too vague
332+
});
333+
```
334+
335+
2. **Schema Design**
336+
- Start with essential fields
337+
- Use appropriate data types
338+
- Include field descriptions
339+
- Handle nested data properly
340+
341+
```javascript
342+
const schema = {
343+
comparison: {
344+
type: "object",
345+
properties: {
346+
date: { type: "string", description: "Comparison date" },
347+
devices: {
348+
type: "array",
349+
items: {
350+
type: "object",
351+
properties: {
352+
name: { type: "string" },
353+
specs: { type: "object" },
354+
pros: { type: "array" },
355+
cons: { type: "array" }
356+
}
357+
}
358+
}
359+
}
360+
}
361+
};
362+
```
363+
364+
3. **Error Recovery**
365+
- Implement retry logic
366+
- Handle rate limits
367+
- Cache results when appropriate
368+
369+
```javascript
370+
import { getSearchScraperRequest, retry } from 'scrapegraph-js';
371+
372+
const searchWithRetry = retry(async (prompt) => {
373+
const request = await getSearchScraperRequest({
374+
userPrompt: prompt
375+
});
376+
return request;
377+
}, {
378+
retries: 3,
379+
backoff: {
380+
initial: 1000,
381+
multiplier: 2,
382+
maxDelay: 10000
383+
}
384+
});
385+
```
386+
387+
4. **Performance Optimization**
388+
- Use caching for repeated searches
389+
- Batch related queries
390+
- Monitor API usage
391+
392+
```javascript
393+
import { cache } from 'scrapegraph-js/utils';
394+
395+
const cachedSearch = cache(getSearchScraperRequest, {
396+
ttl: 1800, // Cache for 30 minutes
397+
maxSize: 50, // Cache up to 50 requests
398+
keyGenerator: (params) => params.userPrompt // Cache key based on prompt
399+
});
400+
```
401+
198402
### Custom Schema Example
199403

200404
Define exactly what data you want to extract using Pydantic or Zod:

0 commit comments

Comments
 (0)