Create AI Website

Internal Service — This is not an HTTP endpoint. It’s called directly by the generate-all orchestrator.

Purpose

Creates an AI-optimized website with llms.txt, robots.txt, sitemap.xml, structured data, and markdown replica pages. Deploys to Vercel and assigns a *.searchcompany.dev subdomain. Runs in GROUP 2a (parallel with 2b and 2c after GROUP 1a + 1b + 1d complete).

Function Signature (Onboarding)

async def create_ai_website_from_business_info(
    url: str,
    business_id: str,
    business_info: dict,
    pages: List[dict]
) -> dict

Parameters

Parameter	Type	Default	Description
`url`	`str`	required	The business website URL
`business_id`	`str`	required	The Clerk organization slug
`business_info`	`dict`	required	Business info from Firecrawl agent (GROUP 1a)
`pages`	`List[dict]`	required	Scraped pages from GROUP 1b (for replicas ONLY)

Key Change: During onboarding, business_info is used for LLM content generation (llms.txt, Q&A, data.json). Scraped pages are ONLY used for markdown replica generation.

Returns

{
  "status": "success",
  "ai_site_url": "https://my-business-abc123.searchcompany.dev",
  "entity_id": "uuid-...",
  "pages_hashed": 42,
  "qa_slugs": ["what-is-business-name", "how-does-business-work"],
  "replica_paths": ["/about", "/pricing", "/contact"]
}

File Generation: Two Distinct Sources

The AI website content comes from two different sources:

From Business Info (Firecrawl Agent)

LLM-generated files use business_info from GROUP 1a:

File	Gemini Call	Input
`llms.txt`	Call 1	`business_info.description`, `products_services`, etc.
`pages/index.js`	Call 2	`business_info` for Q&A generation
`pages/[slug].js`	Call 2	Individual Q&A pages (8-15 pages)
`data.json`	Call 3	`business_info` for Schema.org

From Scraped Pages (GROUP 1b)

Deterministic files use scraped pages:

File	Source	Description
`pages/*.js` (replicas)	`generate_files.py`	1:1 markdown copies of scraped pages
`robots.txt`	`static_templates.py`	Standard robots.txt
`sitemap.xml`	`static_templates.py`	Generated from Q&A slugs + replica paths

Pipeline

The Three Gemini Calls

All three calls run in parallel using asyncio.gather() with business_info:

Call 1: llms.txt Generation

Input: business_info (description, products_services, target_market, key_features, value_proposition)
Output: Comprehensive AI-readable summary (500-1500 words)
Prompt: build_llms_txt_prompt_from_business_info()

Call 2: Homepage + Q&A Pages

Input: business_info + AI site URL
Output: JSON with homepage structure + 8-15 Q&A pages
Prompt: build_index_html_prompt_from_business_info()

Call 3: Schema.org data.json

Input: business_info + source URL
Output: JSON-LD structured data
Prompt: build_data_json_prompt_from_business_info()

LLMs.txt Structure

# Business Name

> One-line description of the business

## Overview
Detailed description of what the business does...

## Products & Services
- Product A: Description
- Product B: Description

## Key Details
- Target Market: ...
- Key Features: ...

## Frequently Asked Questions - [Business Name] - About
- What is [Business Name]?
- How does [Business Name] work?
...

---
*Website: https://example.com | Last updated: 2026-01-04*

Markdown Replica Pages

For each scraped page, creates a markdown replica at /{slug}:

Source: https://example.com/about
Replica: https://my-business.searchcompany.dev/about

These replicas:

Preserve the original content in markdown format
Are optimized for AI crawlers
Include structured metadata
Have collision detection (adds 4-char suffix if slug conflicts with Q&A page)

Product LLMs Architecture

Product-specific llms files are generated by GROUP 2c (Generate Product LLMs) which runs in parallel with GROUP 2a and 2b.

File Structure

File	When Created	Purpose
`/llms.txt`	GROUP 2a (Create AI Website)	Business overview from business_info
`/llms/{product-slug}.txt`	GROUP 2c (Generate Product LLMs)	Detailed product info

Flow

Code Location

src/app/shared/ai_website/
├── __init__.py
├── service.py           # create_ai_website_from_business_info (onboarding)
├── check_url.py         # URL validation
├── llm_organize.py      # organize_with_llm_from_business_info
├── generate_files.py    # File generation
├── deploy.py            # Vercel deployment
├── assign_domain.py     # Subdomain assignment
├── html_generators.py   # HTML/JS page generation
├── static_templates.py  # robots.txt, sitemap.xml templates
└── product_llms.py      # Product llms.txt generation

src/app/shared/prompts/templates/ai_website/
├── llms_txt_from_business_info.py       # Prompt for Call 1 (NEW)
├── index_js_from_business_info.py       # Prompt for Call 2 (NEW)
├── data_json_from_business_info.py      # Prompt for Call 3 (NEW)
├── llms_txt_generation.py               # Original (for cron updates)
├── index_js_homepage_generation.py      # Original (for cron updates)
└── data_json_generation.py              # Original (for cron updates)

Database Updates

Updates the ai_sites table:

INSERT INTO ai_sites (
  entity_id,
  ai_site_url,
  vercel_deployment_url,
  page_hashes,
  site_map,
  deployed_at
) VALUES (...)

Error Handling

{
  "status": "error",
  "error": "Vercel deployment failed: rate limit exceeded"
}

If deployment fails, the error is logged but onboarding continues. The site can be regenerated later via the manual trigger endpoint.

Getting Started

Website

Onboarding

Cron

Your Current Setup

Explore

Settings - Toggle

Settings - Business

Settings - Team

Settings - Billing

Settings - Domain

Webhooks

Health

Manual Trigger

Purpose

Function Signature (Onboarding)

Parameters

Returns

File Generation: Two Distinct Sources

From Business Info (Firecrawl Agent)

From Scraped Pages (GROUP 1b)

Pipeline

The Three Gemini Calls

Call 1: llms.txt Generation

Call 2: Homepage + Q&A Pages

Call 3: Schema.org data.json

LLMs.txt Structure

Markdown Replica Pages

Product LLMs Architecture

File Structure

Flow

Code Location

Database Updates

Error Handling

Getting Started

Website

Onboarding

Cron

Your Current Setup

Explore

Settings - Toggle

Settings - Business

Settings - Team

Settings - Billing

Settings - Domain

Webhooks

Health

Manual Trigger

​Purpose

​Function Signature (Onboarding)

​Parameters

​Returns

​File Generation: Two Distinct Sources

​From Business Info (Firecrawl Agent)

​From Scraped Pages (GROUP 1b)

​Pipeline

​The Three Gemini Calls

​Call 1: llms.txt Generation

​Call 2: Homepage + Q&A Pages

​Call 3: Schema.org data.json

​LLMs.txt Structure

​Markdown Replica Pages

​Product LLMs Architecture

​File Structure

​Flow

​Code Location

​Database Updates

​Error Handling

Purpose

Function Signature (Onboarding)

Parameters

Returns

File Generation: Two Distinct Sources

From Business Info (Firecrawl Agent)

From Scraped Pages (GROUP 1b)

Pipeline

The Three Gemini Calls

Call 1: llms.txt Generation

Call 2: Homepage + Q&A Pages

Call 3: Schema.org data.json

LLMs.txt Structure

Markdown Replica Pages

Product LLMs Architecture

File Structure

Flow

Code Location

Database Updates

Error Handling