Internal Service: scrape_ai_site
Fetches thellms.txt file from our deployed AI-optimized site. For product-specific boosted pages, also fetches the dedicated product llms file.
Function Signature
Parameters
| Parameter | Type | Description |
|---|---|---|
deployment_url | str | The Vercel deployment URL of the AI site |
product_name | str | Optional. Product name for product-specific pages |
Returns
Scalable Product LLMs Architecture
For sites with many products (up to 5,000+), we use a scalable file structure:| File | Path | Purpose |
|---|---|---|
| Root llms.txt | /llms.txt | Business overview, key details, FAQs |
| Product llms | /llms/{product-slug}.txt | Detailed product info, features, pricing |
For Business Boosted Pages:
- Fetches only
/llms.txt - Uses general business context
For Product Boosted Pages:
- Fetches
/llms.txt(business context) - Also fetches
/llms/{product-slug}.txt(product details) - Falls back to root llms.txt if product file doesnβt exist
Behavior
- Strips trailing slash from deployment URL
- Adds Vercel bypass protection header if
VERCEL_BYPASS_PROTECTION_SECRETis set - Fetches
/llms.txtwith up to 5 retries (8 second delay between retries) - If
product_nameis provided, also fetches/llms/{product-slug}.txt - Returns the content for use in Step 3
Why This Architecture?
| Approach | Root llms.txt Size | Problem |
|---|---|---|
| Everything in root | Grows with products | 5,000 products = massive file |
| Separate files | Stays small (~3KB) | Each product file ~1-2KB |
- Constant memory - Each boosted page loads ~5KB total
- No token limits - Never hits Geminiβs context window
- Fast - Small files = fast fetches
- Independent - Update one product without touching others
Environment Variables
| Variable | Description |
|---|---|
VERCEL_BYPASS_PROTECTION_SECRET | Optional. Bypasses Vercel deployment protection |