Skip to main content
Internal Service — This is not an HTTP endpoint. It’s called directly by the generate-all orchestrator.

Purpose

Creates an AI-optimized website with llms.txt, robots.txt, sitemap.xml, structured data, and markdown replica pages. Deploys to Vercel and assigns a *.searchcompany.dev subdomain. Runs in GROUP 2a (parallel with 2b and 2c after GROUP 1a + 1b + 1d complete).

Function Signature (Onboarding)

async def create_ai_website_from_business_info(
    url: str,
    business_id: str,
    business_info: dict,
    pages: List[dict]
) -> dict

Parameters

ParameterTypeDefaultDescription
urlstrrequiredThe business website URL
business_idstrrequiredThe Clerk organization slug
business_infodictrequiredBusiness info from Firecrawl agent (GROUP 1a)
pagesList[dict]requiredScraped pages from GROUP 1b (for replicas ONLY)
Key Change: During onboarding, business_info is used for LLM content generation (llms.txt, Q&A, data.json). Scraped pages are ONLY used for markdown replica generation.

Returns

{
  "status": "success",
  "ai_site_url": "https://my-business-abc123.searchcompany.dev",
  "entity_id": "uuid-...",
  "pages_hashed": 42,
  "qa_slugs": ["what-is-business-name", "how-does-business-work"],
  "replica_paths": ["/about", "/pricing", "/contact"]
}

File Generation: Two Distinct Sources

The AI website content comes from two different sources:

From Business Info (Firecrawl Agent)

LLM-generated files use business_info from GROUP 1a:
FileGemini CallInput
llms.txtCall 1business_info.description, products_services, etc.
pages/index.jsCall 2business_info for Q&A generation
pages/[slug].jsCall 2Individual Q&A pages (8-15 pages)
data.jsonCall 3business_info for Schema.org

From Scraped Pages (GROUP 1b)

Deterministic files use scraped pages:
FileSourceDescription
pages/*.js (replicas)generate_files.py1:1 markdown copies of scraped pages
robots.txtstatic_templates.pyStandard robots.txt
sitemap.xmlstatic_templates.pyGenerated from Q&A slugs + replica paths

Pipeline

The Three Gemini Calls

All three calls run in parallel using asyncio.gather() with business_info:

Call 1: llms.txt Generation

  • Input: business_info (description, products_services, target_market, key_features, value_proposition)
  • Output: Comprehensive AI-readable summary (500-1500 words)
  • Prompt: build_llms_txt_prompt_from_business_info()

Call 2: Homepage + Q&A Pages

  • Input: business_info + AI site URL
  • Output: JSON with homepage structure + 8-15 Q&A pages
  • Prompt: build_index_html_prompt_from_business_info()

Call 3: Schema.org data.json

  • Input: business_info + source URL
  • Output: JSON-LD structured data
  • Prompt: build_data_json_prompt_from_business_info()

LLMs.txt Structure

# Business Name

> One-line description of the business

## Overview
Detailed description of what the business does...

## Products & Services
- Product A: Description
- Product B: Description

## Key Details
- Target Market: ...
- Key Features: ...

## Frequently Asked Questions - [Business Name] - About
- What is [Business Name]?
- How does [Business Name] work?
...

---
*Website: https://example.com | Last updated: 2026-01-04*

Markdown Replica Pages

For each scraped page, creates a markdown replica at /{slug}:
Source: https://example.com/about
Replica: https://my-business.searchcompany.dev/about
These replicas:
  • Preserve the original content in markdown format
  • Are optimized for AI crawlers
  • Include structured metadata
  • Have collision detection (adds 4-char suffix if slug conflicts with Q&A page)

Product LLMs Architecture

Product-specific llms files are generated by GROUP 2c (Generate Product LLMs) which runs in parallel with GROUP 2a and 2b.

File Structure

FileWhen CreatedPurpose
/llms.txtGROUP 2a (Create AI Website)Business overview from business_info
/llms/{product-slug}.txtGROUP 2c (Generate Product LLMs)Detailed product info

Flow

Code Location

src/app/shared/ai_website/
├── __init__.py
├── service.py           # create_ai_website_from_business_info (onboarding)
├── check_url.py         # URL validation
├── llm_organize.py      # organize_with_llm_from_business_info
├── generate_files.py    # File generation
├── deploy.py            # Vercel deployment
├── assign_domain.py     # Subdomain assignment
├── html_generators.py   # HTML/JS page generation
├── static_templates.py  # robots.txt, sitemap.xml templates
└── product_llms.py      # Product llms.txt generation

src/app/shared/prompts/templates/ai_website/
├── llms_txt_from_business_info.py       # Prompt for Call 1 (NEW)
├── index_js_from_business_info.py       # Prompt for Call 2 (NEW)
├── data_json_from_business_info.py      # Prompt for Call 3 (NEW)
├── llms_txt_generation.py               # Original (for cron updates)
├── index_js_homepage_generation.py      # Original (for cron updates)
└── data_json_generation.py              # Original (for cron updates)

Database Updates

Updates the ai_sites table:
INSERT INTO ai_sites (
  entity_id,
  ai_site_url,
  vercel_deployment_url,
  page_hashes,
  site_map,
  deployed_at
) VALUES (...)

Error Handling

{
  "status": "error",
  "error": "Vercel deployment failed: rate limit exceeded"
}
If deployment fails, the error is logged but onboarding continues. The site can be regenerated later via the manual trigger endpoint.