AI Website Overview - The Search Company API

When we create an AI website for a business, we generate a complete site optimized for AI search engines like ChatGPT, Perplexity, Claude, and Gemini. This page explains what each file and section does.

Site Structure

ai-{business-slug}.searchcompany.dev/
├── llms.txt                    # Primary AI-readable content
├── data.json                   # Schema.org structured data
├── robots.txt                  # Crawler permissions
├── sitemap.xml                 # Site structure for crawlers
├── {indexnow-key}.txt          # IndexNow verification
│
├── /                           # Homepage with FAQ links
├── /what-is-{business}/        # Q&A page (example)
├── /how-to-contact-{business}/ # Q&A page (example)
│
├── /about/                     # Markdown replica of real /about page
├── /products/teddy-bear/       # Markdown replica of real product page
│
├── /llms/teddy-bear.txt        # Product-specific AI content for AI Article Context
├── /llms/premium-widget.txt    # Product-specific AI content for AI Article Context
│
└── /expert-review-of-{biz}/    # AI Article (added by cron)

Core Files

`/llms.txt` - Primary AI Content

Purpose: The main file AI search engines read to understand your business. Think of it as a comprehensive “about us” document written specifically for AI.

What it contains:

Business name and one-sentence description
Comprehensive overview (2-3 paragraphs)
Products/services with brief descriptions
Key details (location, contact, hours, etc.)
5-10 FAQs about the business with detailed answers
Links to product-specific llms files

Format: Plain text markdown, optimized for LLM parsing. Example structure:

# Acme Corp

> Acme Corp is a leading provider of innovative widgets for enterprise customers.

## Overview

[2-3 paragraphs about the business...]

## Products & Services

### Enterprise Solutions

- **Widget Pro:** Enterprise-grade widget with advanced features
- **Widget Lite:** Lightweight solution for small teams

## Key Details

- Location: San Francisco, CA
- Founded: 2015
- Contact: hello@acme.com

## Frequently Asked Questions - Acme Corp - About

**Q: What is Acme Corp?**
A: [Detailed 2-3 paragraph answer...]

`/data.json` - Schema.org Structured Data

Purpose: Machine-readable structured data that helps AI and search engines understand the business type, offerings, and key information.

What it contains:

@type - The most specific Schema.org type (Organization, LocalBusiness, SoftwareApplication, etc.)
Business name, description, URL
Contact information
Location/address (if applicable)
Products/services catalog
Social media links

Example:

{
  "@context": "https://schema.org",
  "@type": "SoftwareApplication",
  "name": "Acme Corp",
  "description": "Enterprise widget solutions",
  "url": "https://acme.com",
  "applicationCategory": "BusinessApplication",
  "offers": {
    "@type": "Offer",
    "price": "99.00",
    "priceCurrency": "USD"
  }
}

`/robots.txt` - Crawler Permissions

Purpose: Tells web crawlers (including AI bots) they’re allowed to access all content, and points them to the sitemap.

Contents:

User-agent: *
Allow: /

Sitemap: https://customer-domain.com/sitemap.xml

`/sitemap.xml` - Site Structure

Purpose: Lists all pages on the AI site so crawlers can discover and index everything. Updated whenever new pages are added.

Includes:

Homepage
All Q&A pages
All markdown replica pages
All product llms files (/llms/{slug}.txt)
All AI articles (added by cron)

`/{indexnow-key}.txt` - IndexNow Verification

Purpose: Verification file for IndexNow protocol, which enables instant notification to Bing, Yandex, and other search engines when content changes.

Page Types

Homepage (`/`)

The homepage serves as a navigation hub with:

Business name as H1
Resources section - Links to llms.txt, data.json, robots.txt, sitemap.xml
FAQ sections - Organized by category, each question links to its dedicated page

Example:

<h1>Acme Corp</h1>

<h2>Resources</h2>
<ul>
  <li><a href="/llms.txt">llms.txt</a> - Primary AI-readable content</li>
  <li><a href="/data.json">data.json</a> - Schema.org structured data</li>
</ul>

<h2>Frequently Asked Questions</h2>
<h3>Acme Corp - About</h3>
<ul>
  <li><a href="/what-is-acme-corp">What is Acme Corp?</a></li>
  <li><a href="/how-to-contact-acme">How do I contact Acme?</a></li>
</ul>

Q&A Pages (`/{slug}/`)

Purpose: Each FAQ gets its own dedicated page at the root level for maximum SEO authority. AI search engines can link directly to specific answers.

Structure:

Meta title: “What is Acme Corp? | Acme Corp”
Meta description: Direct answer summary
Full detailed answer (2-3 paragraphs)
Proper Schema.org FAQPage markup

Why separate pages? AI search engines often cite specific URLs. Having dedicated pages for each question means they can link directly to the authoritative answer.

Markdown Replica Pages (`/{original-path}/`)

Purpose: Exact copies of the real website’s pages, converted to clean markdown HTML. This gives AI crawlers easy access to all your content.

Example:

Real site: https://acme.com/about → AI site: /about/
Real site: https://acme.com/products/widget-pro → AI site: /products/widget-pro/

What they contain:

The scraped markdown content from the original page
Proper meta tags and canonical URLs pointing to the real site
Clean, AI-readable formatting

Why replicas? AI search engines can struggle with complex JavaScript sites. The markdown replicas provide clean, easily-parsed versions of all your content.

Product LLMs Files (`/llms/{product-slug}.txt`)

Purpose: Dedicated AI content files for each product. Keeps the main llms.txt small while providing detailed product information for AI queries.

What they contain:

Product name and one-sentence description
Detailed overview (what it does, who it’s for)
Key features list
Pricing information
Best use cases
Product-specific FAQs

Example structure:

# Widget Pro

> Enterprise-grade widget solution with advanced analytics and integrations.

_A product by Acme Corp_

## Overview

Widget Pro is designed for enterprise teams who need...

## Key Features

- **Real-time Analytics:** Track widget performance...
- **API Integrations:** Connect with 50+ tools...

## Pricing

Starting at $99/month. Enterprise plans available.

## Frequently Asked Questions - Widget Pro

**Q: What is Widget Pro?**
A: Widget Pro is Acme Corp's flagship product...

AI Articles (`/{slug}/`)

Purpose: SEO-optimized content pages generated by the daily cron job to improve AI discoverability. These are NOT replicas - they’re new content.

Added by: Daily cron job (Batch 2a) Weekly target: 100 pages per week

50 pages about the business
50 pages distributed across products

Example titles:

“Expert Review of Acme Corp’s Widget Solutions”
“Deep Dive into Widget Pro Features”
“How Acme Corp Compares to Competitors”

Why AI articles? They provide additional entry points for AI search engines to discover and recommend your business for relevant queries.

How Files Are Generated

File	Generated By	When
`llms.txt`	Gemini 3 Flash	Onboarding
`data.json`	Gemini 3 Flash	Onboarding
`robots.txt`	Static template	Onboarding
`sitemap.xml`	Static template	Onboarding + Updates
Homepage	Gemini 3 Flash	Onboarding
Q&A pages	Gemini 3 Flash	Onboarding
Markdown replicas	Scraped content	Onboarding
Product llms	Gemini 3 Flash	Onboarding + Cron
AI articles	Gemini 3 Flash	Daily cron

Update Flow

Onboarding: All core files are generated and deployed
Daily Cron (Batch 1a): Detects changes on real site, updates llms.txt, Q&A pages, and replicas
Daily Cron (Batch 1b): Discovers new products, generates product llms files
Daily Cron (Batch 2a): Creates new AI articles, updates timestamps on all pages
Daily Cron (Batch 3): Notifies search engines of all changes via IndexNow

All files use the customer’s real domain as the canonical URL, so search engines attribute the content to the original site, not the AI subdomain.

​Site Structure

​Core Files

​/llms.txt - Primary AI Content

​/data.json - Schema.org Structured Data

​/robots.txt - Crawler Permissions

​/sitemap.xml - Site Structure

​/{indexnow-key}.txt - IndexNow Verification

​Page Types

​Homepage (/)

​Q&A Pages (/{slug}/)

​Markdown Replica Pages (/{original-path}/)

​Product LLMs Files (/llms/{product-slug}.txt)

​AI Articles (/{slug}/)

​How Files Are Generated

​Update Flow

Site Structure

Core Files

`/llms.txt` - Primary AI Content

`/data.json` - Schema.org Structured Data

`/robots.txt` - Crawler Permissions

`/sitemap.xml` - Site Structure

`/{indexnow-key}.txt` - IndexNow Verification

Page Types

Homepage (`/`)

Q&A Pages (`/{slug}/`)

Markdown Replica Pages (`/{original-path}/`)

Product LLMs Files (`/llms/{product-slug}.txt`)

AI Articles (`/{slug}/`)

How Files Are Generated

Update Flow