Site Structure
Core Files
/llms.txt - Primary AI Content
Purpose: The main file AI search engines read to understand your business.
Think of it as a comprehensive โabout usโ document written specifically for
AI.
- Business name and one-sentence description
- Comprehensive overview (2-3 paragraphs)
- Products/services with brief descriptions
- Key details (location, contact, hours, etc.)
- 5-10 FAQs about the business with detailed answers
- Links to product-specific llms files
/data.json - Schema.org Structured Data
Purpose: Machine-readable structured data that helps AI and search engines
understand the business type, offerings, and key information.
@type- The most specific Schema.org type (Organization, LocalBusiness, SoftwareApplication, etc.)- Business name, description, URL
- Contact information
- Location/address (if applicable)
- Products/services catalog
- Social media links
/robots.txt - Crawler Permissions
Purpose: Tells web crawlers (including AI bots) theyโre allowed to access
all content, and points them to the sitemap.
/sitemap.xml - Site Structure
Purpose: Lists all pages on the AI site so crawlers can discover and index
everything. Updated whenever new pages are added.
- Homepage
- All Q&A pages
- All markdown replica pages
- All product llms files (
/llms/{slug}.txt) - All AI articles (added by cron)
/{indexnow-key}.txt - IndexNow Verification
Purpose: Verification file for IndexNow protocol, which enables instant
notification to Bing, Yandex, and other search engines when content changes.
Page Types
Homepage (/)
The homepage serves as a navigation hub with:
- Business name as H1
- Resources section - Links to llms.txt, data.json, robots.txt, sitemap.xml
- FAQ sections - Organized by category, each question links to its dedicated page
Q&A Pages (/{slug}/)
Purpose: Each FAQ gets its own dedicated page at the root level for
maximum SEO authority. AI search engines can link directly to specific
answers.
- Meta title: โWhat is Acme Corp? | Acme Corpโ
- Meta description: Direct answer summary
- Full detailed answer (2-3 paragraphs)
- Proper Schema.org FAQPage markup
Markdown Replica Pages (/{original-path}/)
Purpose: Exact copies of the real websiteโs pages, converted to clean
markdown HTML. This gives AI crawlers easy access to all your content.
- Real site:
https://acme.com/aboutโ AI site:/about/ - Real site:
https://acme.com/products/widget-proโ AI site:/products/widget-pro/
- The scraped markdown content from the original page
- Proper meta tags and canonical URLs pointing to the real site
- Clean, AI-readable formatting
Product LLMs Files (/llms/{product-slug}.txt)
Purpose: Dedicated AI content files for each product. Keeps the main
llms.txt small while providing detailed product information for AI queries.
- Product name and one-sentence description
- Detailed overview (what it does, who itโs for)
- Key features list
- Pricing information
- Best use cases
- Product-specific FAQs
AI Articles (/{slug}/)
Purpose: SEO-optimized content pages generated by the daily cron job to
improve AI discoverability. These are NOT replicas - theyโre new content.
- 50 pages about the business
- 50 pages distributed across products
- โExpert Review of Acme Corpโs Widget Solutionsโ
- โDeep Dive into Widget Pro Featuresโ
- โHow Acme Corp Compares to Competitorsโ
How Files Are Generated
| File | Generated By | When |
|---|---|---|
llms.txt | Gemini 3 Flash | Onboarding |
data.json | Gemini 3 Flash | Onboarding |
robots.txt | Static template | Onboarding |
sitemap.xml | Static template | Onboarding + Updates |
| Homepage | Gemini 3 Flash | Onboarding |
| Q&A pages | Gemini 3 Flash | Onboarding |
| Markdown replicas | Scraped content | Onboarding |
| Product llms | Gemini 3 Flash | Onboarding + Cron |
| AI articles | Gemini 3 Flash | Daily cron |
Update Flow
- Onboarding: All core files are generated and deployed
- Daily Cron (Batch 1a): Detects changes on real site, updates llms.txt, Q&A pages, and replicas
- Daily Cron (Batch 1b): Discovers new products, generates product llms files
- Daily Cron (Batch 2a): Creates new AI articles, updates timestamps on all pages
- Daily Cron (Batch 3): Notifies search engines of all changes via IndexNow
All files use the customerโs real domain as the canonical URL, so search
engines attribute the content to the original site, not the AI subdomain.