The Domain API handles connecting customer domains to our CloudFront proxy. This enables path-based routing where AI content lives at /ai/* while the rest of the site remains unchanged.
Services Used
| Service | Purpose |
|---|
| AWS CloudFront | CDN and edge proxy - routes traffic based on URL path |
| AWS Lambda@Edge | Runs at CloudFront edge to route paths to correct origin |
| AWS ACM | SSL certificate management (stores Letβs Encrypt + native certs) |
| AWS Global Accelerator | Static IPs for apex domain support (A records) |
| Letβs Encrypt | Initial SSL certificates (bypasses CAA restrictions) |
| Entri | One-click DNS configuration for customers (no manual record editing) |
Architecture
βββββββββββββββββββββββββββββββββββββββββββββββ
β CloudFront Edge β
β (Lambda@Edge at origin-request) β
βββββββββββββββββββββββββββββββββββββββββββββββ
β
βββββββββββββββββββββ΄ββββββββββββββββββββ
β β
AI Paths Everything Else
/ai/*, /llms.txt, /, /products/*,
/robots.txt, /sitemap.xml /collections/*, etc.
β β
βΌ βΌ
βββββββββββββββββββββ βββββββββββββββββββββ
β AI Site (Vercel)β β Shopify Store β
β org-slug.search β β (customer's host) β
β company.dev β β β
βββββββββββββββββββββ βββββββββββββββββββββ
Path-Based Routing (No UA Cloaking)
We use path-based routing, NOT User-Agent based cloaking. This is safer and more transparent:
| Path | Destination | Content |
|---|
/ai/* | AI Site (Vercel) | Markdown mirrors + FAQ pages |
/llms.txt | AI Site (Vercel) | AI discovery file |
/robots.txt | AI Site (Vercel) | Our robots.txt (overrides Shopify) |
/sitemap.xml | AI Site (Vercel) | Unified sitemap (Shopify + AI pages) |
/search-company.txt | AI Site (Vercel) | IndexNow verification key |
| Everything else | Shopify | Original store content |
Why Path-Based?
- No cloaking risk: Same content for bots and humans on same URLs
- Transparent: Humans can access
/ai/* pages too (they just wonβt find them)
- SEO safe: Clear separation between Shopify pages and AI pages
- Canonical tags:
/ai/* pages point canonical to Shopify URLs
Control File Takeover
We override these Shopify files at the edge:
| File | What We Serve | Why |
|---|
/robots.txt | Our robots.txt with Sitemap: directive | Single source of crawler directives |
/sitemap.xml | Unified sitemap with ALL URLs | One sitemap containing Shopify + AI pages |
/llms.txt | AI discovery file | Entry point for AI crawlers |
Unified Sitemap Strategy
Our /sitemap.xml contains all URLs in a single file:
- Shopify URLs - Fetched during onboarding and stored in
ai_sites.shopify_sitemap_urls
- AI pages -
/ai/products/*, /ai/collections/*, /ai/<faq-slug>
- AI files -
/llms.txt, /llms/*.txt
This ensures:
- No confusion from multiple sitemaps
- Shopify indexing is preserved
- AI pages are discoverable
Connection Flow
The domain connection is a two-part process designed for zero downtime:
Part 1: SSL Certificate + Google TXT
- User clicks βStart Secure Connectionβ
- We request a Letβs Encrypt certificate via DNS-01 challenge
- We also request a Google verification token
- User adds BOTH TXT records via Entri (one-click DNS):
- SSL validation TXT record
- Google verification TXT record
- Letβs Encrypt validates and issues certificate
- Certificate is imported to ACM and attached to CloudFront
Part 2: Connect Domain + Verify Google
- SSL is ready on CloudFront
- User clicks βPoint Domainβ
- User configures DNS via Entri (one-click):
- CNAME
www β CloudFront distribution
- A
@ β Our gateway IPs (for naked domain)
- Traffic now flows through our proxy (both
www and naked domain)
- Background tasks trigger:
- ACM-native certificate upgrade (auto-renewal)
- Google verification polling (10s Γ 12 = 2 min)
- Once Google verified β IndexNow + GSC sitemap submission
- If Google fails β IndexNow only (GSC skipped)
Both www and naked domain are configured together. Whether the user enters www.example.com or example.com as their business URL, Part 2 sets up DNS records for both to ensure the full domain works.
Status Flow
PENDING_VALIDATION β SSL_VALIDATING β SSL_VALIDATED β DEPLOYED
β β β β
β β β β
CloudFront TXT record Certificate www CNAME
created added, waiting attached to points to
for Let's Encrypt CloudFront CloudFront
β
βΌ
DISCONNECTED
β
(can reconnect
instantly)
Endpoints
| Endpoint | Purpose |
|---|
| Get Proxy | Get current proxy status and DNS records |
| Start Certificate | Begin Letβs Encrypt certificate request |
| Complete Certificate | Finish certificate after TXT record added |
| Start Google Verification | Get Google TXT verification token (Step 1) |
| Complete Google Verification | Verify domain + add to Search Console (now backend-driven in Step 2) |
| Resubmit Sitemap | Resubmit sitemap after new AI pages (cron) |
| Mark Step Complete | Update status after Entri success + trigger Google verification polling |
| Verify DNS | Live DNS lookup verification |
| Disconnect Proxy | Get DNS records to restore original config |
| Mark Disconnect Complete | Update status after disconnect completes |
| Update Lambda@Edge | Deploy new routing rules to all distributions |
Setup CloudFront Proxy is now part of the Onboarding flow - it runs automatically during generate-all.
Why Letβs Encrypt + ACM?
We use a hybrid approach for SSL certificates:
Initial: Letβs Encrypt
- Works with ANY DNS provider
- Bypasses CAA restrictions (Vercel/Netlify block ACM)
- User adds one TXT record, done
After Connection: ACM-Native
- Once www points to CloudFront, ACM can validate
- Background upgrade happens automatically
- ACM-native certs auto-renew forever
- User never has to touch DNS again
Lambda@Edge Path-Based Routing
The Lambda@Edge function at origin-request routes based on URL path:
// Paths that route to AI origin (Vercel)
const AI_PATHS = [
"/ai/", // All AI mirror pages
"/llms.txt", // AI discovery file
"/robots.txt", // We control this (overrides Shopify)
"/sitemap.xml", // Unified sitemap (overrides Shopify)
"/search-company.txt" // IndexNow verification key
];
// Route to AI origin if path matches, else Shopify
const isAIPath = AI_PATHS.some(p => uri === p || uri.startsWith(p));
| Path | Destination | Host Header |
|---|
/ai/*, /llms.txt, etc. | org-slug.searchcompany.dev | AI site host |
| Everything else | Customerβs Shopify origin | Original domain |
Updating Routing Rules
When routing rules need to change:
# 1. Edit the Lambda code
vim src/app/apis/domain/setup_proxy/step_0_lambda_setup/lambda_code.js
# 2. Deploy to all distributions
curl -X POST https://api.searchcompany.ai/api/domain/update-lambda-edge \
-H "Authorization: Bearer $TOKEN"
All customer distributions update automatically (~15 min for 1,000 customers). Zero downtime - old version runs until new one propagates.
Database Schema
The ai_sites table stores all proxy configuration:
| Column | Purpose |
|---|
entity_id | Links to business entity |
cloudfront_distribution_id | CloudFront distribution ID |
cloudfront_domain | e.g., d123abc.cloudfront.net |
custom_domain | e.g., www.example.com |
origin_cname | Where www originally pointed |
original_www_cname | Preserved for disconnect |
certificate_arn | ACM certificate ARN |
certificate_type | LETS_ENCRYPT or ACM_NATIVE |
proxy_status | Current status |
shopify_sitemap_urls | Shopify URLs for unified sitemap |
le_* | Letβs Encrypt temp data |
google_verification_token | TXT record value from Google Site Verification API |
google_verification_status | PENDING, VERIFIED, or FAILED |
google_sitemap_submitted_at | Last sitemap submission timestamp |
Disconnect and Relink Flow
Users can disconnect their domain and reconnect later:
Disconnect Flow
- User clicks βDisconnect Domainβ
- Frontend calls
/disconnect-proxy β gets restore DNS records
- Entri restores www CNAME to original
- Frontend calls
/mark-disconnect-complete β status becomes DISCONNECTED
- CloudFront + ACM stay intact (for fast relink)
Relink Flow (Fast)
- User clicks βReconnectβ
- No CloudFront setup needed - distribution already exists
- Start certificate (if expired) or skip
- 2-step DNS flow: SSL CNAME β www CNAME
- Status becomes DEPLOYED
This is much faster than initial setup because CloudFront and ACM are preserved.
Testing
The domain connection can be tested end-to-end:
# Check current status
curl https://api.searchcompany.ai/api/domain/get-proxy/org_xxx \
-H "Authorization: Bearer $TOKEN"
# Start certificate (Part 1)
curl -X POST https://api.searchcompany.ai/api/domain/start-certificate/org_xxx \
-H "Authorization: Bearer $TOKEN"
# After adding TXT record, complete certificate
curl -X POST https://api.searchcompany.ai/api/domain/complete-certificate/org_xxx \
-H "Authorization: Bearer $TOKEN"
# After switching www CNAME (Part 2)
curl -X POST https://api.searchcompany.ai/api/domain/mark-step-complete/org_xxx \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{"step": 2}'
# Disconnect domain
curl -X POST https://api.searchcompany.ai/api/domain/disconnect-proxy/org_xxx \
-H "Authorization: Bearer $TOKEN"
# After Entri restores DNS
curl -X POST https://api.searchcompany.ai/api/domain/mark-disconnect-complete/org_xxx \
-H "Authorization: Bearer $TOKEN"