Internal Service — This is not an HTTP endpoint. It’s called directly by the generate-all orchestrator.
Purpose
Creates an AI-optimized website with llms.txt, robots.txt, sitemap.xml, structured data, and markdown replica pages. Deploys to Vercel and assigns a *.searchcompany.dev subdomain.
Runs in GROUP 2a (parallel with 2b and 2c after GROUP 1a + 1b + 1d complete).
Function Signature (Onboarding)
async def create_ai_website_from_business_info(
url: str,
business_id: str,
business_info: dict,
pages: List[dict]
) -> dict
Parameters
| Parameter | Type | Default | Description |
|---|
url | str | required | The business website URL |
business_id | str | required | The Clerk organization slug |
business_info | dict | required | Business info from Firecrawl agent (GROUP 1a) |
pages | List[dict] | required | Scraped pages from GROUP 1b (for replicas ONLY) |
Key Change: During onboarding, business_info is used for LLM content generation (llms.txt, Q&A, data.json). Scraped pages are ONLY used for markdown replica generation.
Returns
{
"status": "success",
"ai_site_url": "https://my-business-abc123.searchcompany.dev",
"entity_id": "uuid-...",
"pages_hashed": 42,
"qa_slugs": ["what-is-business-name", "how-does-business-work"],
"replica_paths": ["/about", "/pricing", "/contact"]
}
File Generation: Two Distinct Sources
The AI website content comes from two different sources:
From Business Info (Firecrawl Agent)
LLM-generated files use business_info from GROUP 1a:
| File | Gemini Call | Input |
|---|
llms.txt | Call 1 | business_info.description, products_services, etc. |
pages/index.js | Call 2 | business_info for Q&A generation |
pages/[slug].js | Call 2 | Individual Q&A pages (8-15 pages) |
data.json | Call 3 | business_info for Schema.org |
From Scraped Pages (GROUP 1b)
Deterministic files use scraped pages:
| File | Source | Description |
|---|
pages/*.js (replicas) | generate_files.py | 1:1 markdown copies of scraped pages |
robots.txt | static_templates.py | Standard robots.txt |
sitemap.xml | static_templates.py | Generated from Q&A slugs + replica paths |
Pipeline
The Three Gemini Calls
All three calls run in parallel using asyncio.gather() with business_info:
Call 1: llms.txt Generation
- Input:
business_info (description, products_services, target_market, key_features, value_proposition)
- Output: Comprehensive AI-readable summary (500-1500 words)
- Prompt:
build_llms_txt_prompt_from_business_info()
Call 2: Homepage + Q&A Pages
- Input:
business_info + AI site URL
- Output: JSON with homepage structure + 8-15 Q&A pages
- Prompt:
build_index_html_prompt_from_business_info()
Call 3: Schema.org data.json
- Input:
business_info + source URL
- Output: JSON-LD structured data
- Prompt:
build_data_json_prompt_from_business_info()
LLMs.txt Structure
# Business Name
> One-line description of the business
## Overview
Detailed description of what the business does...
## Products & Services
- Product A: Description
- Product B: Description
## Key Details
- Target Market: ...
- Key Features: ...
## Frequently Asked Questions - [Business Name] - About
- What is [Business Name]?
- How does [Business Name] work?
...
---
*Website: https://example.com | Last updated: 2026-01-04*
Markdown Replica Pages
For each scraped page, creates a markdown replica at /{slug}:
Source: https://example.com/about
Replica: https://my-business.searchcompany.dev/about
These replicas:
- Preserve the original content in markdown format
- Are optimized for AI crawlers
- Include structured metadata
- Have collision detection (adds 4-char suffix if slug conflicts with Q&A page)
Product LLMs Architecture
Product-specific llms files are generated by GROUP 2c (Generate Product LLMs) which runs in parallel with GROUP 2a and 2b.
File Structure
| File | When Created | Purpose |
|---|
/llms.txt | GROUP 2a (Create AI Website) | Business overview from business_info |
/llms/{product-slug}.txt | GROUP 2c (Generate Product LLMs) | Detailed product info |
Flow
Code Location
src/app/shared/ai_website/
├── __init__.py
├── service.py # create_ai_website_from_business_info (onboarding)
├── check_url.py # URL validation
├── llm_organize.py # organize_with_llm_from_business_info
├── generate_files.py # File generation
├── deploy.py # Vercel deployment
├── assign_domain.py # Subdomain assignment
├── html_generators.py # HTML/JS page generation
├── static_templates.py # robots.txt, sitemap.xml templates
└── product_llms.py # Product llms.txt generation
src/app/shared/prompts/templates/ai_website/
├── llms_txt_from_business_info.py # Prompt for Call 1 (NEW)
├── index_js_from_business_info.py # Prompt for Call 2 (NEW)
├── data_json_from_business_info.py # Prompt for Call 3 (NEW)
├── llms_txt_generation.py # Original (for cron updates)
├── index_js_homepage_generation.py # Original (for cron updates)
└── data_json_generation.py # Original (for cron updates)
Database Updates
Updates the ai_sites table:
INSERT INTO ai_sites (
entity_id,
ai_site_url,
vercel_deployment_url,
page_hashes,
site_map,
deployed_at
) VALUES (...)
Error Handling
{
"status": "error",
"error": "Vercel deployment failed: rate limit exceeded"
}
If deployment fails, the error is logged but onboarding continues. The site can be regenerated later via the manual trigger endpoint.