|
| 1 | +--- |
| 2 | +title: 'Headers & Cookies' |
| 3 | +description: 'Customize request headers and cookies for web scraping' |
| 4 | +icon: 'gear' |
| 5 | +--- |
| 6 | + |
| 7 | +<Frame> |
| 8 | + <img src="/services/images/headers-banner.png" alt="Headers Configuration" /> |
| 9 | +</Frame> |
| 10 | + |
| 11 | +## Overview |
| 12 | + |
| 13 | +All our services (SmartScraper, SearchScraper, and Markdownify) support custom headers and cookies to help you: |
| 14 | +- Bypass basic anti-bot protections |
| 15 | +- Access authenticated content |
| 16 | +- Maintain sessions |
| 17 | +- Customize request behavior |
| 18 | + |
| 19 | +## Headers |
| 20 | + |
| 21 | +### Common Headers |
| 22 | + |
| 23 | +You can set any of the following headers in your requests: |
| 24 | + |
| 25 | +```json |
| 26 | +{ |
| 27 | + "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36", // Browser identification |
| 28 | + "Accept": "*/*", // Accepted content types |
| 29 | + "Accept-Encoding": "gzip, deflate, br", // Supported encodings |
| 30 | + "Accept-Language": "en-US,en;q=0.9", // Preferred languages |
| 31 | + "Cache-Control": "no-cache,no-cache", // Caching behavior |
| 32 | + "Sec-Ch-Ua": "\"Google Chrome\";v=\"107\", \"Chromium\";v=\"107\"", // Browser details |
| 33 | + "Sec-Ch-Ua-Mobile": "?0", // Mobile browser flag |
| 34 | + "Sec-Ch-Ua-Platform": "\"macOS\"", // Operating system |
| 35 | + "Sec-Fetch-Dest": "document", // Request destination |
| 36 | + "Sec-Fetch-Mode": "navigate", // Request mode |
| 37 | + "Sec-Fetch-Site": "none", // Request origin |
| 38 | + "Sec-Fetch-User": "?1", // User-initiated flag |
| 39 | + "Upgrade-Insecure-Requests": "1" // HTTPS upgrade |
| 40 | +} |
| 41 | +``` |
| 42 | + |
| 43 | +### Usage Examples |
| 44 | + |
| 45 | +<CodeGroup> |
| 46 | + |
| 47 | +```python Python |
| 48 | +from scrapegraph_py import Client |
| 49 | + |
| 50 | +client = Client(api_key="your-api-key") |
| 51 | + |
| 52 | +# Define custom headers |
| 53 | +headers = { |
| 54 | + "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36", |
| 55 | + "Accept-Language": "en-US,en;q=0.9", |
| 56 | + "Sec-Ch-Ua-Platform": "\"Windows\"" |
| 57 | +} |
| 58 | + |
| 59 | +# Use with SmartScraper |
| 60 | +response = client.smartscraper( |
| 61 | + website_url="https://example.com", |
| 62 | + user_prompt="Extract the main content", |
| 63 | + headers=headers |
| 64 | +) |
| 65 | + |
| 66 | +# Use with SearchScraper |
| 67 | +response = client.searchscraper( |
| 68 | + user_prompt="Find information about...", |
| 69 | + headers=headers |
| 70 | +) |
| 71 | + |
| 72 | +# Use with Markdownify |
| 73 | +response = client.markdownify( |
| 74 | + website_url="https://example.com", |
| 75 | + headers=headers |
| 76 | +) |
| 77 | +``` |
| 78 | + |
| 79 | +```typescript TypeScript |
| 80 | +import { Client } from '@scrapegraph/sdk'; |
| 81 | + |
| 82 | +const client = new Client('your-api-key'); |
| 83 | + |
| 84 | +// Define custom headers |
| 85 | +const headers = { |
| 86 | + 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36', |
| 87 | + 'Accept-Language': 'en-US,en;q=0.9', |
| 88 | + 'Sec-Ch-Ua-Platform': '"Windows"' |
| 89 | +}; |
| 90 | + |
| 91 | +// Use with SmartScraper |
| 92 | +const response = await client.smartscraper({ |
| 93 | + websiteUrl: 'https://example.com', |
| 94 | + userPrompt: 'Extract the main content', |
| 95 | + headers: headers |
| 96 | +}); |
| 97 | +``` |
| 98 | + |
| 99 | +</CodeGroup> |
| 100 | + |
| 101 | +## Cookies |
| 102 | + |
| 103 | +### Overview |
| 104 | + |
| 105 | +Cookies are essential for: |
| 106 | +- Accessing authenticated content |
| 107 | +- Maintaining user sessions |
| 108 | +- Handling website preferences |
| 109 | +- Bypassing certain security measures |
| 110 | + |
| 111 | +### Setting Cookies |
| 112 | + |
| 113 | +Cookies are set using the `Cookie` header as a semicolon-separated string of key-value pairs: |
| 114 | + |
| 115 | +```python |
| 116 | +headers = { |
| 117 | + "Cookie": "session_id=abc123; user_id=12345; theme=dark" |
| 118 | +} |
| 119 | +``` |
| 120 | + |
| 121 | +### Examples |
| 122 | + |
| 123 | +<CodeGroup> |
| 124 | + |
| 125 | +```python Python |
| 126 | +# Example with session cookies |
| 127 | +headers = { |
| 128 | + "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36", |
| 129 | + "Cookie": "session_id=abc123; user_id=12345; theme=dark" |
| 130 | +} |
| 131 | + |
| 132 | +response = client.smartscraper( |
| 133 | + website_url="https://example.com/dashboard", |
| 134 | + user_prompt="Extract user information", |
| 135 | + headers=headers |
| 136 | +) |
| 137 | +``` |
| 138 | + |
| 139 | +```typescript TypeScript |
| 140 | +// Example with session cookies |
| 141 | +const headers = { |
| 142 | + 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36', |
| 143 | + 'Cookie': 'session_id=abc123; user_id=12345; theme=dark' |
| 144 | +}; |
| 145 | + |
| 146 | +const response = await client.smartscraper({ |
| 147 | + websiteUrl: 'https://example.com/dashboard', |
| 148 | + userPrompt: 'Extract user information', |
| 149 | + headers: headers |
| 150 | +}); |
| 151 | +``` |
| 152 | + |
| 153 | +</CodeGroup> |
| 154 | + |
| 155 | +### Common Use Cases |
| 156 | + |
| 157 | +1. **Authentication** |
| 158 | +```python |
| 159 | +headers = { |
| 160 | + "Cookie": "auth_token=xyz789; session_id=abc123" |
| 161 | +} |
| 162 | +``` |
| 163 | + |
| 164 | +2. **Regional Settings** |
| 165 | +```python |
| 166 | +headers = { |
| 167 | + "Cookie": "country=US; language=en; currency=USD" |
| 168 | +} |
| 169 | +``` |
| 170 | + |
| 171 | +3. **User Preferences** |
| 172 | +```python |
| 173 | +headers = { |
| 174 | + "Cookie": "theme=dark; notifications=enabled" |
| 175 | +} |
| 176 | +``` |
| 177 | + |
| 178 | +## Best Practices |
| 179 | + |
| 180 | +1. **User Agent Best Practices** |
| 181 | + - Use recent browser versions |
| 182 | + - Match User-Agent with Sec-Ch-Ua headers |
| 183 | + - Consider region-specific variations |
| 184 | + |
| 185 | +2. **Cookie Management** |
| 186 | + - Keep cookies up to date |
| 187 | + - Include all required session cookies |
| 188 | + - Remove unnecessary cookies |
| 189 | + - Handle cookie expiration |
| 190 | + |
| 191 | +3. **Security Considerations** |
| 192 | + - Don't share sensitive cookies |
| 193 | + - Rotate User-Agents when appropriate |
| 194 | + - Use HTTPS when sending sensitive data |
| 195 | + |
| 196 | +## Common Issues |
| 197 | + |
| 198 | +<Accordion title="Cookie Expiration" icon="clock"> |
| 199 | +Cookies may expire during scraping. Solutions: |
| 200 | +- Implement cookie refresh logic |
| 201 | +- Monitor session status |
| 202 | +- Handle re-authentication |
| 203 | +</Accordion> |
| 204 | + |
| 205 | +<Accordion title="Header Conflicts" icon="exclamation-triangle"> |
| 206 | +Some headers may conflict. Common fixes: |
| 207 | +- Remove conflicting headers |
| 208 | +- Ensure header values match |
| 209 | +- Check case sensitivity |
| 210 | +</Accordion> |
| 211 | + |
| 212 | +## Support |
| 213 | + |
| 214 | +<CardGroup cols={2}> |
| 215 | + <Card title="Documentation" icon="book" href="/introduction"> |
| 216 | + Comprehensive guides and tutorials |
| 217 | + </Card> |
| 218 | + <Card title="API Reference" icon="code" href="/api-reference/introduction"> |
| 219 | + Detailed API documentation |
| 220 | + </Card> |
| 221 | + <Card title="Community" icon="discord" href="https://discord.gg/uJN7TYcpNa"> |
| 222 | + Join our Discord community |
| 223 | + </Card> |
| 224 | + <Card title="GitHub" icon="github" href="https://github.com/ScrapeGraphAI"> |
| 225 | + Check out our open-source projects |
| 226 | + </Card> |
| 227 | +</CardGroup> |
| 228 | + |
| 229 | +< Card title="Need Help?" icon="question" href="mailto:[email protected]"> |
| 230 | + Contact our support team for assistance with headers, cookies, or any other questions! |
| 231 | +</Card> |
0 commit comments