Skip to main content

Internal Service: scrape_website

Scrapes all pages from the source website using Firecrawl.

Function Signature

async def scrape_website(
    url: str,
    is_root: bool = True,
    max_pages: int = None
) -> dict

Parameters

ParameterTypeDefaultDescription
urlstr-Source website URL
is_rootboolTrueWhether this is a root domain scrape
max_pagesint5000Maximum pages to scrape

Returns

{
    "status": "success",
    "pages": [
        {
            "url": "https://example.com/page",
            "content": "...",
            "title": "Page Title",
            ...
        }
    ],
    "error": None  # or error message
}

Code Location

src/app/shared/scraping/service.py