> ## Documentation Index
> Fetch the complete documentation index at: https://docs.searchcompany.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# AI Website Overview

> Understanding the files and pages that make up an AI-optimized website

When we create an AI website for a business, we generate a complete site optimized for AI search engines like ChatGPT, Perplexity, Claude, and Gemini. This page explains what each file and section does.

## Site Structure

```
ai-{business-slug}.searchcompany.dev/
├── llms.txt                    # Primary AI-readable content
├── data.json                   # Schema.org structured data
├── robots.txt                  # Crawler permissions
├── sitemap.xml                 # Site structure for crawlers
├── {indexnow-key}.txt          # IndexNow verification
│
├── /                           # Homepage with FAQ links
├── /what-is-{business}/        # Q&A page (example)
├── /how-to-contact-{business}/ # Q&A page (example)
│
├── /about/                     # Markdown replica of real /about page
├── /products/teddy-bear/       # Markdown replica of real product page
│
├── /llms/teddy-bear.txt        # Product-specific AI content for AI Article Context
├── /llms/premium-widget.txt    # Product-specific AI content for AI Article Context
│
└── /expert-review-of-{biz}/    # AI Article (added by cron)
```

***

## Core Files

### `/llms.txt` - Primary AI Content

<Info>
  **Purpose**: The main file AI search engines read to understand your business.
  Think of it as a comprehensive "about us" document written specifically for
  AI.
</Info>

**What it contains:**

* Business name and one-sentence description
* Comprehensive overview (2-3 paragraphs)
* Products/services with brief descriptions
* Key details (location, contact, hours, etc.)
* 5-10 FAQs about the business with detailed answers
* Links to product-specific llms files

**Format**: Plain text markdown, optimized for LLM parsing.

**Example structure:**

```markdown theme={null}
# Acme Corp

> Acme Corp is a leading provider of innovative widgets for enterprise customers.

## Overview

[2-3 paragraphs about the business...]

## Products & Services

### Enterprise Solutions

- **Widget Pro:** Enterprise-grade widget with advanced features
- **Widget Lite:** Lightweight solution for small teams

## Key Details

- Location: San Francisco, CA
- Founded: 2015
- Contact: hello@acme.com

## Frequently Asked Questions - Acme Corp - About

**Q: What is Acme Corp?**
A: [Detailed 2-3 paragraph answer...]
```

***

### `/data.json` - Schema.org Structured Data

<Info>
  **Purpose**: Machine-readable structured data that helps AI and search engines
  understand the business type, offerings, and key information.
</Info>

**What it contains:**

* `@type` - The most specific Schema.org type (Organization, LocalBusiness, SoftwareApplication, etc.)
* Business name, description, URL
* Contact information
* Location/address (if applicable)
* Products/services catalog
* Social media links

**Example:**

```json theme={null}
{
  "@context": "https://schema.org",
  "@type": "SoftwareApplication",
  "name": "Acme Corp",
  "description": "Enterprise widget solutions",
  "url": "https://acme.com",
  "applicationCategory": "BusinessApplication",
  "offers": {
    "@type": "Offer",
    "price": "99.00",
    "priceCurrency": "USD"
  }
}
```

***

### `/robots.txt` - Crawler Permissions

<Info>
  **Purpose**: Tells web crawlers (including AI bots) they're allowed to access
  all content, and points them to the sitemap.
</Info>

**Contents:**

```
User-agent: *
Allow: /

Sitemap: https://customer-domain.com/sitemap.xml
```

***

### `/sitemap.xml` - Site Structure

<Info>
  **Purpose**: Lists all pages on the AI site so crawlers can discover and index
  everything. Updated whenever new pages are added.
</Info>

**Includes:**

* Homepage
* All Q\&A pages
* All markdown replica pages
* All product llms files (`/llms/{slug}.txt`)
* All AI articles (added by cron)

***

### `/{indexnow-key}.txt` - IndexNow Verification

<Info>
  **Purpose**: Verification file for IndexNow protocol, which enables instant
  notification to Bing, Yandex, and other search engines when content changes.
</Info>

***

## Page Types

### Homepage (`/`)

The homepage serves as a navigation hub with:

1. **Business name** as H1
2. **Resources section** - Links to llms.txt, data.json, robots.txt, sitemap.xml
3. **FAQ sections** - Organized by category, each question links to its dedicated page

**Example:**

```html theme={null}
<h1>Acme Corp</h1>

<h2>Resources</h2>
<ul>
  <li><a href="/llms.txt">llms.txt</a> - Primary AI-readable content</li>
  <li><a href="/data.json">data.json</a> - Schema.org structured data</li>
</ul>

<h2>Frequently Asked Questions</h2>
<h3>Acme Corp - About</h3>
<ul>
  <li><a href="/what-is-acme-corp">What is Acme Corp?</a></li>
  <li><a href="/how-to-contact-acme">How do I contact Acme?</a></li>
</ul>
```

***

### Q\&A Pages (`/{slug}/`)

<Info>
  **Purpose**: Each FAQ gets its own dedicated page at the root level for
  maximum SEO authority. AI search engines can link directly to specific
  answers.
</Info>

**Structure:**

* Meta title: "What is Acme Corp? | Acme Corp"
* Meta description: Direct answer summary
* Full detailed answer (2-3 paragraphs)
* Proper Schema.org FAQPage markup

**Why separate pages?** AI search engines often cite specific URLs. Having dedicated pages for each question means they can link directly to the authoritative answer.

***

### Markdown Replica Pages (`/{original-path}/`)

<Info>
  **Purpose**: Exact copies of the real website's pages, converted to clean
  markdown HTML. This gives AI crawlers easy access to all your content.
</Info>

**Example:**

* Real site: `https://acme.com/about` → AI site: `/about/`
* Real site: `https://acme.com/products/widget-pro` → AI site: `/products/widget-pro/`

**What they contain:**

* The scraped markdown content from the original page
* Proper meta tags and canonical URLs pointing to the real site
* Clean, AI-readable formatting

**Why replicas?** AI search engines can struggle with complex JavaScript sites. The markdown replicas provide clean, easily-parsed versions of all your content.

***

### Product LLMs Files (`/llms/{product-slug}.txt`)

<Info>
  **Purpose**: Dedicated AI content files for each product. Keeps the main
  llms.txt small while providing detailed product information for AI queries.
</Info>

**What they contain:**

* Product name and one-sentence description
* Detailed overview (what it does, who it's for)
* Key features list
* Pricing information
* Best use cases
* Product-specific FAQs

**Example structure:**

```markdown theme={null}
# Widget Pro

> Enterprise-grade widget solution with advanced analytics and integrations.

_A product by Acme Corp_

## Overview

Widget Pro is designed for enterprise teams who need...

## Key Features

- **Real-time Analytics:** Track widget performance...
- **API Integrations:** Connect with 50+ tools...

## Pricing

Starting at $99/month. Enterprise plans available.

## Frequently Asked Questions - Widget Pro

**Q: What is Widget Pro?**
A: Widget Pro is Acme Corp's flagship product...
```

***

### AI Articles (`/{slug}/`)

<Info>
  **Purpose**: SEO-optimized content pages generated by the daily cron job to
  improve AI discoverability. These are NOT replicas - they're new content.
</Info>

**Added by**: Daily cron job (Batch 2a)

**Weekly target**: 100 pages per week

* 50 pages about the business
* 50 pages distributed across products

**Example titles:**

* "Expert Review of Acme Corp's Widget Solutions"
* "Deep Dive into Widget Pro Features"
* "How Acme Corp Compares to Competitors"

**Why AI articles?** They provide additional entry points for AI search engines to discover and recommend your business for relevant queries.

***

## How Files Are Generated

| File              | Generated By    | When                 |
| ----------------- | --------------- | -------------------- |
| `llms.txt`        | Gemini 3 Flash  | Onboarding           |
| `data.json`       | Gemini 3 Flash  | Onboarding           |
| `robots.txt`      | Static template | Onboarding           |
| `sitemap.xml`     | Static template | Onboarding + Updates |
| Homepage          | Gemini 3 Flash  | Onboarding           |
| Q\&A pages        | Gemini 3 Flash  | Onboarding           |
| Markdown replicas | Scraped content | Onboarding           |
| Product llms      | Gemini 3 Flash  | Onboarding + Cron    |
| AI articles       | Gemini 3 Flash  | Daily cron           |

***

## Update Flow

1. **Onboarding**: All core files are generated and deployed
2. **Daily Cron (Batch 1a)**: Detects changes on real site, updates llms.txt, Q\&A pages, and replicas
3. **Daily Cron (Batch 1b)**: Discovers new products, generates product llms files
4. **Daily Cron (Batch 2a)**: Creates new AI articles, updates timestamps on all pages
5. **Daily Cron (Batch 3)**: Notifies search engines of all changes via IndexNow

<Note>
  All files use the customer's real domain as the canonical URL, so search
  engines attribute the content to the original site, not the AI subdomain.
</Note>
