Skip to main content
These endpoints are called by the Cron service (Railway scheduled jobs) to perform background tasks.
Schedule: The cron runs daily at 10:00 PM Singapore Time (SGT) / 14:00 UTC.

3-Batch Architecture (Custom Mapper + Batch Scrape)

The cron uses a custom website mapper + Firecrawl Batch Scrape API for efficient change detection:
For each customer:

   BATCH 1: Update AI Site + Discover Products (parallel)
   β”œβ”€β”€ Batch 1a: detect-changes β†’ update-ai-site (skip_deploy)
   β”‚   └── Collect files for deploy
   β”œβ”€β”€ Batch 1b: discover-products (skip_deploy)
   β”‚   β”œβ”€β”€ Fetch from Shopify /products.json
   β”‚   β”œβ”€β”€ Save NEW products to DB
   β”‚   β”œβ”€β”€ Generate prompts for NEW products
   β”‚   β”œβ”€β”€ Generate /llms/{slug}.txt for NEW products
   β”‚   └── Collect files for deploy
   β”œβ”€β”€ Merge files from 1a + 1b
   β”œβ”€β”€ Deploy to Vercel (single combined deploy)
   └── Wait 10s for edge propagation

   BATCH 2a: Create Content (parallel)
   β”œβ”€β”€ Update Timestamps β†’ collect files
   β”œβ”€β”€ Create AI Articles β†’ collect files + new page URLs
   └── Deploy to Vercel (combined files)

   BATCH 2b: Analyze Visibility
   └── Analyze Visibility (sampling) β†’ DB only

   BATCH 3: Notify Search Engines
   β”œβ”€β”€ Aggregate all changed URLs from BATCH 1a + BATCH 2a
   β”œβ”€β”€ Submit to IndexNow (one call per business)
   └── Resubmit sitemap to Google Search Console (one call per business)
Why this architecture is efficient: - Custom Mapper combines sitemap, robots.txt, and HTML link extraction for comprehensive URL discovery [FREE] - Hashing Service fetches raw HTML and hashes to detect changes [FREE] - Batch Scrape only scrapes NEW + CHANGED pages (not all pages) [PAID] - Batch 1a and 1b run in parallel (no deploy conflicts - files collected first) - Single combined deploy for Batch 1 (1a + 1b files merged) - Visibility sampling reduces API calls by 50% - 2 deploys per customer (Batch 1 combined + Batch 2a) - BATCH 3 notifies search engines AFTER all content is deployed

The Five Jobs

JobNamePurposeBatch
1Update AI SitesRefresh AI websites with content changesBatch 1a
2Discover ProductsFind new products from ShopifyBatch 1b
3Update TimestampsRefresh timestamps on all pages for freshness signalsBatch 2a
4Create AI ArticlesGenerate AI-specific content pages (100/week target)Batch 2a
5Analyze VisibilityCheck visibility across 8 AI platforms (sampled)Batch 2b
6Notify Search EnginesSubmit URLs to IndexNow + Google Search ConsoleBatch 3

Batch 1a: Update AI Site

Detects changes on the real website and updates the AI site accordingly.
  1. detect_changes: Custom mapper + Hashing Service finds changes, Batch Scrape only for changed pages
  2. update_ai_site: Send changes to Gemini (3 parallel calls), regenerate files
  3. Return files (skip_deploy=True) for combined deploy
Batch 1a Flow:
1. Custom Mapper β†’ Get current URL list (up to 5000 pages) [FREE]
2. Compare URLs vs stored site_map β†’ find new/removed pages
3. Hashing Service β†’ Fetch raw HTML + hash existing pages [FREE]
4. Hash comparison β†’ Find changed pages
5. Batch Scrape β†’ Get markdown for NEW + CHANGED pages only [PAID]
6. If changes detected:
   └── update_ai_site: 3 parallel Gemini calls (skip_deploy=True)
7. Return files for combined deploy
8. Collect changed_urls for BATCH 3 submission

Batch 1b: Discover Products (Parallel with 1a)

Decoupled: Batch 1b runs in parallel with 1a. It fetches products directly from Shopify’s /products.json API - no scraped content needed.
  1. fetch_shopify_products: Fetch all products from /products.json API
  2. Hash comparison: Compute MD5 hash of sorted product handles, compare with stored products_hash
  3. Snapshot comparison: If hash changed, compare with stored products_snapshot to find NEW products
  4. Save products: Save new products to entities table
  5. generate_product_prompts: Generate 10 prompts per new product
  6. generate_product_llms_txt: Generate /llms/{slug}.txt for new products
  7. Return files (skip_deploy=True) for combined deploy
Batch 1b Flow:
1. Fetch products from Shopify /products.json [FREE]
2. Compute hash of sorted product handles
3. Compare hash with stored products_hash in ai_sites
4. If unchanged β†’ skip (no work needed)
5. If changed β†’ compare products_snapshot to find NEW products
6. Save new products to entities table
7. Generate prompts + llms.txt for NEW products only (skip_deploy=True)
8. Return files for combined deploy
9. Update products_hash + products_snapshot in ai_sites
Database columns used (in ai_sites table):
  • products_hash (TEXT): MD5 hash of sorted product handles for quick comparison
  • products_snapshot (JSONB): Full product list from last sync [{handle, name, ...}]

Batch 1 Combined Deploy

After Batch 1a and 1b complete in parallel, their files are merged and deployed in a single call:
Batch 1 Deploy Flow:
1. Batch 1a returns files_by_business
2. Batch 1b returns files_by_business
3. Merge files from both batches
4. Deploy to Vercel (single combined deploy per business)
5. Wait 10s for edge propagation

Edge Propagation Wait

After the Batch 1 combined deploy, we wait 10 seconds for Vercel edge propagation.
This wait is critical because Batch 2a scrapes the AI site for context. Without this wait, it might hit stale/cached content or 404s.

Batch 2a: Create Content

Refreshes timestamps on ALL pages (AI site core files + AI articles) to signal freshness to AI search engines.
Updates on every page:
  • Meta tags: article:modified_time
  • Year in titles: β€œ2025” β†’ β€œ2026” (if year changed)
  • Footer: β€œLast updated: December 24, 2025”
This helps because Bing and other AI search engines favor fresh content and may include dates in citations.

Job 3: Create AI Articles

Generates AI-specific content pages at the root level (/{slug}/) to improve discoverability.

Weekly Target (Per Customer)

  • 100 pages per week (Monday-Sunday)
  • 50 pages for the business (50%)
  • 50 pages distributed across products (50%)
Special cases:
  • No products: Business gets all 100 pages
  • 1-50 products: All products included, 50 pages split evenly
  • 51+ products: Round-robin rotation selects 50 products per week
When there are more than 50 products, the system uses a rotating selection each week so all products eventually get coverage.

URL Structure

AI articles are deployed at the root level for maximum SEO authority:
Entity TypeURL PatternExample
Business/{slug}//expert-review-of-website-arena/
Product/{slug}//deep-dive-into-remix-tool/

Job 4: Analyze Visibility (Sampling Architecture)

Cost Optimization: We sample 10 prompts per day (prioritizing untested ones) instead of checking all prompts. This reduces API costs by ~50% while ensuring all prompts eventually get tested.

How It Works

  1. Sample 10 prompts from the org’s total pool (untested first, then random)
  2. Analyze each prompt across 8 AI platforms (80 API calls total)
  3. Store results with pass/fail per platform and update last_tested_at
  4. Update overall score with floor protection (never dips below previous high)

Prompt Limits

  • Business: 50 prompts (10 via Exa during onboarding + 40 via regular generation)
  • Products: 10 prompts each (unlimited products)

Pass/Fail Paradigm

Each prompt shows visibility status per platform:
  • true (βœ“): Entity was mentioned/recommended by this AI platform
  • false (βœ—): Entity was not found in the AI platform’s response
  • null (-): Not yet tested

The 8 AI Platforms

Each platform uses its native search capabilities, then Gemini 3 Flash provides unified evaluation:
  • ChatGPT - OpenAI Direct w/ Search
  • Claude - Anthropic Direct w/ Search
  • Gemini - GCP AI Studio Direct w/ Search
  • Perplexity - Sonar API
  • Copilot - Parallel Search API
  • DeepSeek - Firecrawl Search API
  • Grok - X.AI Direct w/ Search
  • Google AI - Serp API (AI Overview)

Job 5: Notify Search Engines (BATCH 3)

BATCH 3 runs AFTER all content is deployed (BATCH 1a + BATCH 2a) to ensure search engines see the latest content.

How It Works

  1. Aggregate URLs from BATCH 1a (changed pages) and BATCH 2a (new AI articles)
  2. Submit to IndexNow - Instant notification to Bing, Yandex, and other IndexNow-compatible engines
  3. Resubmit sitemap to Google Search Console - Signals Google to re-crawl the sitemap

URL Sources

SourceURLs Submitted
Job 1a (update_ai_site)changed_urls - new + changed pages from detect-changes
Job 3 (create_ai_articles)New AI article slugs (e.g., /{slug}/)

APIs Called

# IndexNow - one call per business
POST /api/cron/submit-indexnow
{
  "urls": ["/about-us/", "/new-ai-article/", ...],
  "source_url": "https://customer-domain.com"
}

# Google Search Console - one call per business
POST /api/domain/resubmit-sitemap/{org_id}
Why BATCH 3 is separate: Search engines should only be notified AFTER content is deployed. If we submitted URLs before deployment, crawlers might hit 404s or stale content.

All Endpoints

EndpointMethodDescription
/api/cron/entitiesGETFetch all businesses/products to process
/api/cron/detect-changesPOSTDetect content changes using Mapper + Hashing + Batch Scrape
/api/cron/update-ai-sitePOSTUpdate AI website with changes
/api/cron/discover-productsPOSTDiscover NEW products from Shopify (hash-based detection)
/api/cron/generate-product-promptsPOSTGenerate visibility prompts for products
/api/cron/generate-product-llms-txtPOSTGenerate product llms.txt files
/api/cron/update-all-timestampsPOSTRefresh timestamps on all AI website pages
/api/cron/ai-articles-quotaGETCalculate today’s AI articles quota
/api/cron/create-ai-articlePOSTGenerate AI article content (no deploy)
/api/cron/deploy-to-vercelPOSTDeploy all files to Vercel (single deployment)
/api/cron/submit-indexnowPOSTNotify search engines of new URLs
/api/cron/sample-promptsGETRandomly sample prompts for visibility check
/api/cron/analyze-visibilityPOSTCheck visibility across 8 AI platforms
/api/cron/store-visibility-reportPOSTStore daily visibility report
/api/cron/store-visibility-scorePOSTCalculate and store visibility score
Deprecated: /api/cron/discover-products-from-changes is deprecated. Use /api/cron/discover-products instead (decoupled, hash-based detection).
Looking for prompt regeneration? See Regenerate Prompts in the Manual Trigger section.

Manual Trigger

Run all cron jobs immediately for testing or recovery:
curl -X POST https://searchcompany-main.up.railway.app/api/cron/trigger-all
⚠️ Warning: This can take up to 10 minutes depending on customer count.