> ## Documentation Index
> Fetch the complete documentation index at: https://docs.searchcompany.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Create AI Website

<Note>
  **Internal Service** — This is not an HTTP endpoint. It's called directly by the `generate-all` orchestrator.
</Note>

## Purpose

Creates an AI-optimized website with `llms.txt`, `robots.txt`, `sitemap.xml`, structured data, and markdown replica pages. Deploys to Vercel and assigns a `*.searchcompany.dev` subdomain.

Runs in **GROUP 2a** (parallel with 2b and 2c after GROUP 1a + 1b + 1d complete).

## Function Signature (Onboarding)

```python theme={null}
async def create_ai_website_from_business_info(
    url: str,
    business_id: str,
    business_info: dict,
    pages: List[dict]
) -> dict
```

## Parameters

| Parameter       | Type         | Default  | Description                                     |
| --------------- | ------------ | -------- | ----------------------------------------------- |
| `url`           | `str`        | required | The business website URL                        |
| `business_id`   | `str`        | required | The Clerk organization slug                     |
| `business_info` | `dict`       | required | Business info from Firecrawl agent (GROUP 1a)   |
| `pages`         | `List[dict]` | required | Scraped pages from GROUP 1b (for replicas ONLY) |

<Info>
  **Key Change**: During onboarding, `business_info` is used for LLM content generation (llms.txt, Q\&A, data.json). Scraped `pages` are ONLY used for markdown replica generation.
</Info>

## Returns

```json theme={null}
{
  "status": "success",
  "ai_site_url": "https://my-business-abc123.searchcompany.dev",
  "entity_id": "uuid-...",
  "pages_hashed": 42,
  "qa_slugs": ["what-is-business-name", "how-does-business-work"],
  "replica_paths": ["/about", "/pricing", "/contact"]
}
```

## File Generation: Two Distinct Sources

The AI website content comes from two different sources:

### From Business Info (Firecrawl Agent)

LLM-generated files use `business_info` from GROUP 1a:

| File              | Gemini Call | Input                                                  |
| ----------------- | ----------- | ------------------------------------------------------ |
| `llms.txt`        | Call 1      | `business_info.description`, `products_services`, etc. |
| `pages/index.js`  | Call 2      | `business_info` for Q\&A generation                    |
| `pages/[slug].js` | Call 2      | Individual Q\&A pages (8-15 pages)                     |
| `data.json`       | Call 3      | `business_info` for Schema.org                         |

### From Scraped Pages (GROUP 1b)

Deterministic files use scraped `pages`:

| File                    | Source                | Description                               |
| ----------------------- | --------------------- | ----------------------------------------- |
| `pages/*.js` (replicas) | `generate_files.py`   | 1:1 markdown copies of scraped pages      |
| `robots.txt`            | `static_templates.py` | Standard robots.txt                       |
| `sitemap.xml`           | `static_templates.py` | Generated from Q\&A slugs + replica paths |

```mermaid theme={null}
flowchart TD
    subgraph inputs [Inputs]
        BI["business_info (from 1a)"]
        Pages["pages[] (from 1b)"]
    end

    subgraph gemini [3 Parallel Gemini Calls]
        G1["Call 1: llms.txt"]
        G2["Call 2: Homepage + Q&A Pages"]
        G3["Call 3: data.json Schema.org"]
    end
    
    subgraph deterministic [Deterministic Generation]
        D1["robots.txt"]
        D2["sitemap.xml"]
        D3["Replica pages"]
        D4["Next.js boilerplate"]
    end
    
    BI --> gemini
    Pages --> deterministic
    
    gemini --> FinalSite["AI Website"]
    deterministic --> FinalSite
```

## Pipeline

```mermaid theme={null}
flowchart TD
    A[Check if site exists] -->|No| B[Hash pages for change detection]
    A -->|Yes| Z[Return existing site]
    B --> C[Run 3 parallel Gemini calls with business_info]
    C --> D[Generate deterministic files]
    D --> E[Generate markdown replicas from pages]
    E --> F[Deploy to Vercel]
    F --> G[Assign subdomain]
    G --> H[Store page hashes]
    H --> I[Return AI site URL]
```

## The Three Gemini Calls

All three calls run in parallel using `asyncio.gather()` with `business_info`:

### Call 1: llms.txt Generation

* **Input**: `business_info` (description, products\_services, target\_market, key\_features, value\_proposition)
* **Output**: Comprehensive AI-readable summary (500-1500 words)
* **Prompt**: `build_llms_txt_prompt_from_business_info()`

### Call 2: Homepage + Q\&A Pages

* **Input**: `business_info` + AI site URL
* **Output**: JSON with homepage structure + 8-15 Q\&A pages
* **Prompt**: `build_index_html_prompt_from_business_info()`

### Call 3: Schema.org data.json

* **Input**: `business_info` + source URL
* **Output**: JSON-LD structured data
* **Prompt**: `build_data_json_prompt_from_business_info()`

## LLMs.txt Structure

```markdown theme={null}
# Business Name

> One-line description of the business

## Overview
Detailed description of what the business does...

## Products & Services
- Product A: Description
- Product B: Description

## Key Details
- Target Market: ...
- Key Features: ...

## Frequently Asked Questions - [Business Name] - About
- What is [Business Name]?
- How does [Business Name] work?
...

---
*Website: https://example.com | Last updated: 2026-01-04*
```

## Markdown Replica Pages

For each scraped page, creates a markdown replica at `/{slug}`:

```
Source: https://example.com/about
Replica: https://my-business.searchcompany.dev/about
```

These replicas:

* Preserve the original content in markdown format
* Are optimized for AI crawlers
* Include structured metadata
* Have collision detection (adds 4-char suffix if slug conflicts with Q\&A page)

## Product LLMs Architecture

Product-specific llms files are generated by **GROUP 2c** (Generate Product LLMs) which runs in parallel with GROUP 2a and 2b.

### File Structure

| File                       | When Created                     | Purpose                               |
| -------------------------- | -------------------------------- | ------------------------------------- |
| `/llms.txt`                | GROUP 2a (Create AI Website)     | Business overview from business\_info |
| `/llms/{product-slug}.txt` | GROUP 2c (Generate Product LLMs) | Detailed product info                 |

### Flow

```mermaid theme={null}
flowchart LR
    A[GROUP 2a: Create AI Website] -->|Creates| B["/llms.txt (business)"]
    C[GROUP 1d: Discover Products] -->|products[]| D[GROUP 2c: Generate Product LLMs]
    D -->|Creates| E["/llms/product-a.txt"]
    D -->|Creates| F["/llms/product-b.txt"]
```

## Code Location

```
src/app/shared/ai_website/
├── __init__.py
├── service.py           # create_ai_website_from_business_info (onboarding)
├── check_url.py         # URL validation
├── llm_organize.py      # organize_with_llm_from_business_info
├── generate_files.py    # File generation
├── deploy.py            # Vercel deployment
├── assign_domain.py     # Subdomain assignment
├── html_generators.py   # HTML/JS page generation
├── static_templates.py  # robots.txt, sitemap.xml templates
└── product_llms.py      # Product llms.txt generation

src/app/shared/prompts/templates/ai_website/
├── llms_txt_from_business_info.py       # Prompt for Call 1 (NEW)
├── index_js_from_business_info.py       # Prompt for Call 2 (NEW)
├── data_json_from_business_info.py      # Prompt for Call 3 (NEW)
├── llms_txt_generation.py               # Original (for cron updates)
├── index_js_homepage_generation.py      # Original (for cron updates)
└── data_json_generation.py              # Original (for cron updates)
```

## Database Updates

Updates the `ai_sites` table:

```sql theme={null}
INSERT INTO ai_sites (
  entity_id,
  ai_site_url,
  vercel_deployment_url,
  page_hashes,
  site_map,
  deployed_at
) VALUES (...)
```

## Error Handling

```json theme={null}
{
  "status": "error",
  "error": "Vercel deployment failed: rate limit exceeded"
}
```

If deployment fails, the error is logged but onboarding continues. The site can be regenerated later via the manual trigger endpoint.
