A FastAPI wrapper around crawl4ai for extracting markdown content from web pages.
- Install dependencies:
pip install -r requirements.txtpython api.pyThe server will start on http://localhost:8000
Once the server is running, visit http://localhost:8000/docs for interactive API documentation.
Crawl a URL and return fitted markdown content.
Request Body:
{
"url": "https://iq.linkedin.com/in/mazyarf",
"profile_name": "profile_1759825962",
"headless": false,
"delay_before_return_html": 5.0,
"threshold": 0.4
}Response:
{
"success": true,
"url": "https://iq.linkedin.com/in/mazyarf",
"raw_markdown_length": 15234,
"fit_markdown_length": 8432,
"fit_markdown": "# LinkedIn Profile\n\nMazyar Farhad...",
"raw_markdown": null,
"error_message": null
}Simple GET endpoint for testing.
Example:
curl "http://localhost:8000/crawl?url=https://iq.linkedin.com/in/mazyarf"curl -X POST 'http://localhost:8000/crawl' \
-H 'Content-Type: application/json' \
-d '{"url": "https://iq.linkedin.com/in/mazyarf"}'curl 'http://localhost:8000/crawl?url=https://iq.linkedin.com/in/mazyarf'import requests
# Simple function to get fitted markdown
def get_fitted_markdown(url):
response = requests.post(
'http://localhost:8000/crawl',
json={'url': url}
)
if response.status_code == 200:
result = response.json()
if result['success']:
return result.get('fit_markdown') or result.get('raw_markdown', '')
return None
# Usage
markdown = get_fitted_markdown("https://iq.linkedin.com/in/mazyarf")
print(markdown)Run the test script:
python test_api.py- url: The URL to crawl (required)
- profile_name: Browser profile to use (default: "profile_1759825962")
- headless: Run browser in headless mode (default: false)
- delay_before_return_html: Delay before returning HTML in seconds (default: 5.0)
- threshold: Content filtering threshold (default: 0.4)
install the crwl cli using
Important: You need to create a browser profile first using the crwl CLI before using the API:
crwl profilesThis will create a new browser profile that can be used for crawling. The default profile name is "profile_1759825962". You can change this in the API request.
To list existing profiles:
crwl profilesThe API returns a JSON object with the following fields:
- success: Boolean indicating if the crawl was successful
- url: The crawled URL
- raw_markdown_length: Length of the raw markdown content
- fit_markdown_length: Length of the fitted markdown content (if available)
- fit_markdown: The fitted markdown content (preferred)
- raw_markdown: The raw markdown content (fallback)
- error_message: Error message if the crawl failed
The API will return appropriate HTTP status codes:
- 200: Success
- 400: Bad request (invalid URL, profile not found, crawl failed)
- 422: Validation error (invalid request format)
- 500: Internal server error
api.py: Main FastAPI applicationcrawler.py: Original crawler scripttest_api.py: Test script for the APIrequirements.txt: Python dependenciesREADME.md: This file