> ## Documentation Index
> Fetch the complete documentation index at: https://docs.searchcompany.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Domain Overview

> Connect your domain to enable AI-optimized content delivery

The Domain API handles connecting customer domains to our CloudFront proxy. This enables path-based routing where AI content lives at `/ai/*` while the rest of the site remains unchanged.

## Services Used

| Service                    | Purpose                                                              |
| -------------------------- | -------------------------------------------------------------------- |
| **AWS CloudFront**         | CDN and edge proxy - routes traffic based on URL path                |
| **AWS Lambda\@Edge**       | Runs at CloudFront edge to route paths to correct origin             |
| **AWS ACM**                | SSL certificate management (stores Let's Encrypt + native certs)     |
| **AWS Global Accelerator** | Static IPs for apex domain support (A records)                       |
| **Let's Encrypt**          | Initial SSL certificates (bypasses CAA restrictions)                 |
| **Entri**                  | One-click DNS configuration for customers (no manual record editing) |

## Architecture

```
                    ┌─────────────────────────────────────────────┐
                    │              CloudFront Edge                │
                    │         (Lambda@Edge at origin-request)     │
                    └─────────────────────────────────────────────┘
                                        │
                    ┌───────────────────┴───────────────────┐
                    │                                       │
              AI Paths                              Everything Else
         /ai/*, /llms.txt,                         /, /products/*,
      /robots.txt, /sitemap.xml                   /collections/*, etc.
                    │                                       │
                    ▼                                       ▼
        ┌───────────────────┐                   ┌───────────────────┐
        │   AI Site (Vercel)│                   │   Shopify Store   │
        │ org-slug.search   │                   │ (customer's host) │
        │ company.dev       │                   │                   │
        └───────────────────┘                   └───────────────────┘
```

## Path-Based Routing (No UA Cloaking)

We use **path-based routing**, NOT User-Agent based cloaking. This is safer and more transparent:

| Path                  | Destination      | Content                              |
| --------------------- | ---------------- | ------------------------------------ |
| `/ai/*`               | AI Site (Vercel) | Markdown mirrors + FAQ pages         |
| `/llms.txt`           | AI Site (Vercel) | AI discovery file                    |
| `/robots.txt`         | AI Site (Vercel) | Our robots.txt (overrides Shopify)   |
| `/sitemap.xml`        | AI Site (Vercel) | Unified sitemap (Shopify + AI pages) |
| `/search-company.txt` | AI Site (Vercel) | IndexNow verification key            |
| Everything else       | Shopify          | Original store content               |

### Why Path-Based?

* **No cloaking risk**: Same content for bots and humans on same URLs
* **Transparent**: Humans can access `/ai/*` pages too (they just won't find them)
* **SEO safe**: Clear separation between Shopify pages and AI pages
* **Canonical tags**: `/ai/*` pages point canonical to Shopify URLs

## Control File Takeover

We **override** these Shopify files at the edge:

| File           | What We Serve                            | Why                                       |
| -------------- | ---------------------------------------- | ----------------------------------------- |
| `/robots.txt`  | Our robots.txt with `Sitemap:` directive | Single source of crawler directives       |
| `/sitemap.xml` | Unified sitemap with ALL URLs            | One sitemap containing Shopify + AI pages |
| `/llms.txt`    | AI discovery file                        | Entry point for AI crawlers               |

### Unified Sitemap Strategy

Our `/sitemap.xml` contains **all URLs** in a single file:

1. **Shopify URLs** - Fetched during onboarding and stored in `ai_sites.shopify_sitemap_urls`
2. **AI pages** - `/ai/products/*`, `/ai/collections/*`, `/ai/<faq-slug>`
3. **AI files** - `/llms.txt`, `/llms/*.txt`

This ensures:

* No confusion from multiple sitemaps
* Shopify indexing is preserved
* AI pages are discoverable

## Connection Flow

The domain connection is a **two-part process** designed for zero downtime:

### Part 1: SSL Certificate + Google TXT

1. User clicks "Start Secure Connection"
2. We request a Let's Encrypt certificate via DNS-01 challenge
3. We also request a Google verification token
4. User adds BOTH TXT records via Entri (one-click DNS):
   * SSL validation TXT record
   * Google verification TXT record
5. Let's Encrypt validates and issues certificate
6. Certificate is imported to ACM and attached to CloudFront

### Part 2: Connect Domain + Verify Google

1. SSL is ready on CloudFront
2. User clicks "Point Domain"
3. User configures DNS via Entri (one-click):
   * **CNAME `www`** → CloudFront distribution
   * **A `@`** → Our gateway IPs (for naked domain)
4. Traffic now flows through our proxy (both `www` and naked domain)
5. Background tasks trigger:
   * ACM-native certificate upgrade (auto-renewal)
   * **Google verification polling** (10s × 12 = 2 min)
   * Once Google verified → IndexNow + GSC sitemap submission
   * If Google fails → IndexNow only (GSC skipped)

<Note>
  **Both www and naked domain are configured together.** Whether the user enters `www.example.com` or `example.com` as their business URL, Part 2 sets up DNS records for both to ensure the full domain works.
</Note>

## Status Flow

```
PENDING_VALIDATION  →  SSL_VALIDATING  →  SSL_VALIDATED  →  DEPLOYED
      │                      │                  │              │
      │                      │                  │              │
  CloudFront             TXT record          Certificate    www CNAME
   created              added, waiting       attached to    points to
                        for Let's Encrypt    CloudFront     CloudFront
                                                               │
                                                               ▼
                                                         DISCONNECTED
                                                               │
                                                         (can reconnect
                                                          instantly)
```

## Endpoints

| Endpoint                                                                                    | Purpose                                                                 |
| ------------------------------------------------------------------------------------------- | ----------------------------------------------------------------------- |
| [Get Proxy](/api-reference/endpoint/domain/get-proxy)                                       | Get current proxy status and DNS records                                |
| [Start Certificate](/api-reference/endpoint/domain/start-certificate)                       | Begin Let's Encrypt certificate request                                 |
| [Complete Certificate](/api-reference/endpoint/domain/complete-certificate)                 | Finish certificate after TXT record added                               |
| [Start Google Verification](/api-reference/endpoint/domain/start-google-verification)       | Get Google TXT verification token (Step 1)                              |
| [Complete Google Verification](/api-reference/endpoint/domain/complete-google-verification) | Verify domain + add to Search Console (now backend-driven in Step 2)    |
| [Resubmit Sitemap](/api-reference/endpoint/domain/resubmit-sitemap)                         | Resubmit sitemap after new AI pages (cron)                              |
| [Mark Step Complete](/api-reference/endpoint/domain/mark-step-complete)                     | Update status after Entri success + trigger Google verification polling |
| [Verify DNS](/api-reference/endpoint/domain/verify-dns)                                     | Live DNS lookup verification                                            |
| [Disconnect Proxy](/api-reference/endpoint/domain/disconnect-proxy)                         | Get DNS records to restore original config                              |
| [Mark Disconnect Complete](/api-reference/endpoint/domain/mark-disconnect-complete)         | Update status after disconnect completes                                |
| [Update Lambda@Edge](/api-reference/endpoint/domain/update-lambda-edge)                     | Deploy new routing rules to all distributions                           |

<Note>
  **Setup CloudFront Proxy** is now part of the [Onboarding](/api-reference/endpoint/onboarding/index) flow - it runs automatically during `generate-all`.
</Note>

## Why Let's Encrypt + ACM?

We use a hybrid approach for SSL certificates:

**Initial: Let's Encrypt**

* Works with ANY DNS provider
* Bypasses CAA restrictions (Vercel/Netlify block ACM)
* User adds one TXT record, done

**After Connection: ACM-Native**

* Once www points to CloudFront, ACM can validate
* Background upgrade happens automatically
* ACM-native certs auto-renew forever
* User never has to touch DNS again

## Lambda\@Edge Path-Based Routing

The Lambda\@Edge function at `origin-request` routes based on URL path:

```javascript theme={null}
// Paths that route to AI origin (Vercel)
const AI_PATHS = [
  "/ai/",              // All AI mirror pages
  "/llms.txt",         // AI discovery file
  "/robots.txt",       // We control this (overrides Shopify)
  "/sitemap.xml",      // Unified sitemap (overrides Shopify)
  "/search-company.txt" // IndexNow verification key
];

// Route to AI origin if path matches, else Shopify
const isAIPath = AI_PATHS.some(p => uri === p || uri.startsWith(p));
```

| Path                       | Destination                  | Host Header     |
| -------------------------- | ---------------------------- | --------------- |
| `/ai/*`, `/llms.txt`, etc. | `org-slug.searchcompany.dev` | AI site host    |
| Everything else            | Customer's Shopify origin    | Original domain |

### Updating Routing Rules

When routing rules need to change:

```bash theme={null}
# 1. Edit the Lambda code
vim src/app/apis/domain/setup_proxy/step_0_lambda_setup/lambda_code.js

# 2. Deploy to all distributions
curl -X POST https://api.searchcompany.ai/api/domain/update-lambda-edge \
  -H "Authorization: Bearer $TOKEN"
```

All customer distributions update automatically (\~15 min for 1,000 customers). Zero downtime - old version runs until new one propagates.

## Database Schema

The `ai_sites` table stores all proxy configuration:

| Column                        | Purpose                                            |
| ----------------------------- | -------------------------------------------------- |
| `entity_id`                   | Links to business entity                           |
| `cloudfront_distribution_id`  | CloudFront distribution ID                         |
| `cloudfront_domain`           | e.g., `d123abc.cloudfront.net`                     |
| `custom_domain`               | e.g., `www.example.com`                            |
| `origin_cname`                | Where www originally pointed                       |
| `original_www_cname`          | Preserved for disconnect                           |
| `certificate_arn`             | ACM certificate ARN                                |
| `certificate_type`            | `LETS_ENCRYPT` or `ACM_NATIVE`                     |
| `proxy_status`                | Current status                                     |
| `shopify_sitemap_urls`        | Shopify URLs for unified sitemap                   |
| `le_*`                        | Let's Encrypt temp data                            |
| `google_verification_token`   | TXT record value from Google Site Verification API |
| `google_verification_status`  | `PENDING`, `VERIFIED`, or `FAILED`                 |
| `google_sitemap_submitted_at` | Last sitemap submission timestamp                  |

## Disconnect and Relink Flow

Users can disconnect their domain and reconnect later:

### Disconnect Flow

1. User clicks "Disconnect Domain"
2. Frontend calls `/disconnect-proxy` → gets restore DNS records
3. Entri restores www CNAME to original
4. Frontend calls `/mark-disconnect-complete` → status becomes DISCONNECTED
5. CloudFront + ACM stay intact (for fast relink)

### Relink Flow (Fast)

1. User clicks "Reconnect"
2. **No CloudFront setup needed** - distribution already exists
3. Start certificate (if expired) or skip
4. 2-step DNS flow: SSL CNAME → www CNAME
5. Status becomes DEPLOYED

This is much faster than initial setup because CloudFront and ACM are preserved.

## Testing

The domain connection can be tested end-to-end:

```bash theme={null}
# Check current status
curl https://api.searchcompany.ai/api/domain/get-proxy/org_xxx \
  -H "Authorization: Bearer $TOKEN"

# Start certificate (Part 1)
curl -X POST https://api.searchcompany.ai/api/domain/start-certificate/org_xxx \
  -H "Authorization: Bearer $TOKEN"

# After adding TXT record, complete certificate
curl -X POST https://api.searchcompany.ai/api/domain/complete-certificate/org_xxx \
  -H "Authorization: Bearer $TOKEN"

# After switching www CNAME (Part 2)
curl -X POST https://api.searchcompany.ai/api/domain/mark-step-complete/org_xxx \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"step": 2}'

# Disconnect domain
curl -X POST https://api.searchcompany.ai/api/domain/disconnect-proxy/org_xxx \
  -H "Authorization: Bearer $TOKEN"

# After Entri restores DNS
curl -X POST https://api.searchcompany.ai/api/domain/mark-disconnect-complete/org_xxx \
  -H "Authorization: Bearer $TOKEN"
```
