Site Structure
Core Files
/llms.txt - Primary AI Content
Purpose: The main file AI search engines read to understand your business.
Think of it as a comprehensive βabout usβ document written specifically for
AI.
- Business name and one-sentence description
- Comprehensive overview (2-3 paragraphs)
- Products/services with brief descriptions
- Key details (location, contact, hours, etc.)
- 5-10 FAQs about the business with detailed answers
- Links to product-specific llms files
/data.json - Schema.org Structured Data
Purpose: Machine-readable structured data that helps AI and search engines
understand the business type, offerings, and key information.
@type- The most specific Schema.org type (Organization, LocalBusiness, SoftwareApplication, etc.)- Business name, description, URL
- Contact information
- Location/address (if applicable)
- Products/services catalog
- Social media links
/robots.txt - Crawler Permissions
Purpose: Tells web crawlers (including AI bots) theyβre allowed to access
all content, and points them to the sitemap.
/sitemap.xml - Site Structure
Purpose: Lists all pages on the AI site so crawlers can discover and index
everything. Updated whenever new pages are added.
- Homepage
- All Q&A pages
- All markdown replica pages
- All product llms files (
/llms/{slug}.txt) - All AI articles (added by cron)
/{indexnow-key}.txt - IndexNow Verification
Purpose: Verification file for IndexNow protocol, which enables instant
notification to Bing, Yandex, and other search engines when content changes.
Page Types
Homepage (/)
The homepage serves as a navigation hub with:
- Business name as H1
- Resources section - Links to llms.txt, data.json, robots.txt, sitemap.xml
- FAQ sections - Organized by category, each question links to its dedicated page
Q&A Pages (/{slug}/)
Purpose: Each FAQ gets its own dedicated page at the root level for
maximum SEO authority. AI search engines can link directly to specific
answers.
- Meta title: βWhat is Acme Corp? | Acme Corpβ
- Meta description: Direct answer summary
- Full detailed answer (2-3 paragraphs)
- Proper Schema.org FAQPage markup
Markdown Replica Pages (/{original-path}/)
Purpose: Exact copies of the real websiteβs pages, converted to clean
markdown HTML. This gives AI crawlers easy access to all your content.
- Real site:
https://acme.com/aboutβ AI site:/about/ - Real site:
https://acme.com/products/widget-proβ AI site:/products/widget-pro/
- The scraped markdown content from the original page
- Proper meta tags and canonical URLs pointing to the real site
- Clean, AI-readable formatting
Product LLMs Files (/llms/{product-slug}.txt)
Purpose: Dedicated AI content files for each product. Keeps the main
llms.txt small while providing detailed product information for AI queries.
- Product name and one-sentence description
- Detailed overview (what it does, who itβs for)
- Key features list
- Pricing information
- Best use cases
- Product-specific FAQs
AI Articles (/{slug}/)
Purpose: SEO-optimized content pages generated by the daily cron job to
improve AI discoverability. These are NOT replicas - theyβre new content.
- 50 pages about the business
- 50 pages distributed across products
- βExpert Review of Acme Corpβs Widget Solutionsβ
- βDeep Dive into Widget Pro Featuresβ
- βHow Acme Corp Compares to Competitorsβ
How Files Are Generated
| File | Generated By | When |
|---|---|---|
llms.txt | Gemini 3 Flash | Onboarding |
data.json | Gemini 3 Flash | Onboarding |
robots.txt | Static template | Onboarding |
sitemap.xml | Static template | Onboarding + Updates |
| Homepage | Gemini 3 Flash | Onboarding |
| Q&A pages | Gemini 3 Flash | Onboarding |
| Markdown replicas | Scraped content | Onboarding |
| Product llms | Gemini 3 Flash | Onboarding + Cron |
| AI articles | Gemini 3 Flash | Daily cron |
Update Flow
- Onboarding: All core files are generated and deployed
- Daily Cron (Batch 1a): Detects changes on real site, updates llms.txt, Q&A pages, and replicas
- Daily Cron (Batch 1b): Discovers new products, generates product llms files
- Daily Cron (Batch 2a): Creates new AI articles, updates timestamps on all pages
- Daily Cron (Batch 3): Notifies search engines of all changes via IndexNow
All files use the customerβs real domain as the canonical URL, so search
engines attribute the content to the original site, not the AI subdomain.