Cron Overview

These endpoints are called by the Cron service (Railway scheduled jobs) to perform background tasks.

Schedule: The cron runs daily at 10:00 PM Singapore Time (SGT) / 14:00 UTC.

3-Batch Architecture (Custom Mapper + Batch Scrape)

The cron uses a custom website mapper + Firecrawl Batch Scrape API for efficient change detection:

For each customer:

   BATCH 1: Update AI Site + Discover Products (parallel)
   ├── Batch 1a: detect-changes → update-ai-site (skip_deploy)
   │   └── Collect files for deploy
   ├── Batch 1b: discover-products (skip_deploy)
   │   ├── Fetch from Shopify /products.json
   │   ├── Save NEW products to DB
   │   ├── Generate prompts for NEW products
   │   ├── Generate /llms/{slug}.txt for NEW products
   │   └── Collect files for deploy
   ├── Merge files from 1a + 1b
   ├── Deploy to Vercel (single combined deploy)
   └── Wait 10s for edge propagation

   BATCH 2a: Create Content (parallel)
   ├── Update Timestamps → collect files
   ├── Create AI Articles → collect files + new page URLs
   └── Deploy to Vercel (combined files)

   BATCH 2b: Analyze Visibility
   └── Analyze Visibility (sampling) → DB only

   BATCH 3: Notify Search Engines
   ├── Aggregate all changed URLs from BATCH 1a + BATCH 2a
   ├── Submit to IndexNow (one call per business)
   └── Resubmit sitemap to Google Search Console (one call per business)

Why this architecture is efficient: - Custom Mapper combines sitemap, robots.txt, and HTML link extraction for comprehensive URL discovery [FREE] - Hashing Service fetches raw HTML and hashes to detect changes [FREE] - Batch Scrape only scrapes NEW + CHANGED pages (not all pages) [PAID] - Batch 1a and 1b run in parallel (no deploy conflicts - files collected first) - Single combined deploy for Batch 1 (1a + 1b files merged) - Visibility sampling reduces API calls by 50% - 2 deploys per customer (Batch 1 combined + Batch 2a) - BATCH 3 notifies search engines AFTER all content is deployed

The Five Jobs

Job	Name	Purpose	Batch
1	Update AI Sites	Refresh AI websites with content changes	Batch 1a
2	Discover Products	Find new products from Shopify	Batch 1b
3	Update Timestamps	Refresh timestamps on all pages for freshness signals	Batch 2a
4	Create AI Articles	Generate AI-specific content pages (100/week target)	Batch 2a
5	Analyze Visibility	Check visibility across 8 AI platforms (sampled)	Batch 2b
6	Notify Search Engines	Submit URLs to IndexNow + Google Search Console	Batch 3

Batch 1a: Update AI Site

Detects changes on the real website and updates the AI site accordingly.

detect_changes: Custom mapper + Hashing Service finds changes, Batch Scrape only for changed pages
update_ai_site: Send changes to Gemini (3 parallel calls), regenerate files
Return files (skip_deploy=True) for combined deploy

Batch 1a Flow:
Custom Mapper → Get current URL list (up to 5000 pages) [FREE]
Compare URLs vs stored site_map → find new/removed pages
Hashing Service → Fetch raw HTML + hash existing pages [FREE]
Hash comparison → Find changed pages
Batch Scrape → Get markdown for NEW + CHANGED pages only [PAID]
If changes detected:
   └── update_ai_site: 3 parallel Gemini calls (skip_deploy=True)
Return files for combined deploy
Collect changed_urls for BATCH 3 submission

Batch 1b: Discover Products (Parallel with 1a)

Decoupled: Batch 1b runs in parallel with 1a. It fetches products directly from Shopify’s /products.json API - no scraped content needed.

fetch_shopify_products: Fetch all products from /products.json API
Hash comparison: Compute MD5 hash of sorted product handles, compare with stored products_hash
Snapshot comparison: If hash changed, compare with stored products_snapshot to find NEW products
Save products: Save new products to entities table
generate_product_prompts: Generate 10 prompts per new product
generate_product_llms_txt: Generate /llms/{slug}.txt for new products
Return files (skip_deploy=True) for combined deploy

Batch 1b Flow:
Fetch products from Shopify /products.json [FREE]
Compute hash of sorted product handles
Compare hash with stored products_hash in ai_sites
If unchanged → skip (no work needed)
If changed → compare products_snapshot to find NEW products
Save new products to entities table
Generate prompts + llms.txt for NEW products only (skip_deploy=True)
Return files for combined deploy
Update products_hash + products_snapshot in ai_sites

Database columns used (in ai_sites table):

products_hash (TEXT): MD5 hash of sorted product handles for quick comparison
products_snapshot (JSONB): Full product list from last sync [{handle, name, ...}]

Batch 1 Combined Deploy

After Batch 1a and 1b complete in parallel, their files are merged and deployed in a single call:

Batch 1 Deploy Flow:
Batch 1a returns files_by_business
Batch 1b returns files_by_business
Merge files from both batches
Deploy to Vercel (single combined deploy per business)
Wait 10s for edge propagation

Edge Propagation Wait

After the Batch 1 combined deploy, we wait 10 seconds for Vercel edge propagation.

This wait is critical because Batch 2a scrapes the AI site for context. Without this wait, it might hit stale/cached content or 404s.

Batch 2a: Create Content

Refreshes timestamps on ALL pages (AI site core files + AI articles) to signal freshness to AI search engines.

Updates on every page:

Meta tags: article:modified_time
Year in titles: “2025” → “2026” (if year changed)
Footer: “Last updated: December 24, 2025”

This helps because Bing and other AI search engines favor fresh content and may include dates in citations.

Job 3: Create AI Articles

Generates AI-specific content pages at the root level (/{slug}/) to improve discoverability.

Weekly Target (Per Customer)

100 pages per week (Monday-Sunday)
50 pages for the business (50%)
50 pages distributed across products (50%)

Special cases:

No products: Business gets all 100 pages
1-50 products: All products included, 50 pages split evenly
51+ products: Round-robin rotation selects 50 products per week

When there are more than 50 products, the system uses a rotating selection each week so all products eventually get coverage.

URL Structure

AI articles are deployed at the root level for maximum SEO authority:

Entity Type	URL Pattern	Example
Business	`/{slug}/`	`/expert-review-of-website-arena/`
Product	`/{slug}/`	`/deep-dive-into-remix-tool/`

Job 4: Analyze Visibility (Sampling Architecture)

Cost Optimization: We sample 10 prompts per day (prioritizing untested ones) instead of checking all prompts. This reduces API costs by ~50% while ensuring all prompts eventually get tested.

How It Works

Sample 10 prompts from the org’s total pool (untested first, then random)
Analyze each prompt across 8 AI platforms (80 API calls total)
Store results with pass/fail per platform and update last_tested_at
Update overall score with floor protection (never dips below previous high)

Prompt Limits

Business: 50 prompts (10 via Exa during onboarding + 40 via regular generation)
Products: 10 prompts each (unlimited products)

Pass/Fail Paradigm

Each prompt shows visibility status per platform:

true (✓): Entity was mentioned/recommended by this AI platform
false (✗): Entity was not found in the AI platform’s response
null (-): Not yet tested

The 8 AI Platforms

Each platform uses its native search capabilities, then Gemini 3 Flash provides unified evaluation:

ChatGPT - OpenAI Direct w/ Search
Claude - Anthropic Direct w/ Search
Gemini - GCP AI Studio Direct w/ Search
Perplexity - Sonar API
Copilot - Parallel Search API
DeepSeek - Firecrawl Search API
Grok - X.AI Direct w/ Search
Google AI - Serp API (AI Overview)

Job 5: Notify Search Engines (BATCH 3)

BATCH 3 runs AFTER all content is deployed (BATCH 1a + BATCH 2a) to ensure search engines see the latest content.

How It Works

Aggregate URLs from BATCH 1a (changed pages) and BATCH 2a (new AI articles)
Submit to IndexNow - Instant notification to Bing, Yandex, and other IndexNow-compatible engines
Resubmit sitemap to Google Search Console - Signals Google to re-crawl the sitemap

URL Sources

Source	URLs Submitted
Job 1a (update_ai_site)	`changed_urls` - new + changed pages from detect-changes
Job 3 (create_ai_articles)	New AI article slugs (e.g., `/{slug}/`)

APIs Called

# IndexNow - one call per business
POST /api/cron/submit-indexnow
{
  "urls": ["/about-us/", "/new-ai-article/", ...],
  "source_url": "https://customer-domain.com"
}

# Google Search Console - one call per business
POST /api/domain/resubmit-sitemap/{org_id}

Why BATCH 3 is separate: Search engines should only be notified AFTER content is deployed. If we submitted URLs before deployment, crawlers might hit 404s or stale content.

All Endpoints

Endpoint	Method	Description
`/api/cron/entities`	GET	Fetch all businesses/products to process
`/api/cron/detect-changes`	POST	Detect content changes using Mapper + Hashing + Batch Scrape
`/api/cron/update-ai-site`	POST	Update AI website with changes
`/api/cron/discover-products`	POST	Discover NEW products from Shopify (hash-based detection)
`/api/cron/generate-product-prompts`	POST	Generate visibility prompts for products
`/api/cron/generate-product-llms-txt`	POST	Generate product llms.txt files
`/api/cron/update-all-timestamps`	POST	Refresh timestamps on all AI website pages
`/api/cron/ai-articles-quota`	GET	Calculate today’s AI articles quota
`/api/cron/create-ai-article`	POST	Generate AI article content (no deploy)
`/api/cron/deploy-to-vercel`	POST	Deploy all files to Vercel (single deployment)
`/api/cron/submit-indexnow`	POST	Notify search engines of new URLs
`/api/cron/sample-prompts`	GET	Randomly sample prompts for visibility check
`/api/cron/analyze-visibility`	POST	Check visibility across 8 AI platforms
`/api/cron/store-visibility-report`	POST	Store daily visibility report
`/api/cron/store-visibility-score`	POST	Calculate and store visibility score

Deprecated: /api/cron/discover-products-from-changes is deprecated. Use /api/cron/discover-products instead (decoupled, hash-based detection).

Looking for prompt regeneration? See Regenerate Prompts in the Manual Trigger section.

Manual Trigger

Run all cron jobs immediately for testing or recovery:

curl -X POST https://searchcompany-main.up.railway.app/api/cron/trigger-all

⚠️ Warning: This can take up to 10 minutes depending on customer count.

Getting Started

Website

Onboarding

Cron

Your Current Setup

Explore

Settings - Toggle

Settings - Business

Settings - Team

Settings - Billing

Settings - Domain

Webhooks

Health

Manual Trigger

3-Batch Architecture (Custom Mapper + Batch Scrape)

The Five Jobs

Batch 1a: Update AI Site

Batch 1b: Discover Products (Parallel with 1a)

Batch 1 Combined Deploy

Edge Propagation Wait

Batch 2a: Create Content

Job 3: Create AI Articles

Weekly Target (Per Customer)

URL Structure

Job 4: Analyze Visibility (Sampling Architecture)

How It Works

Prompt Limits

Pass/Fail Paradigm

The 8 AI Platforms

Job 5: Notify Search Engines (BATCH 3)

How It Works

URL Sources

APIs Called

All Endpoints

Manual Trigger

Getting Started

Website

Onboarding

Cron

Your Current Setup

Explore

Settings - Toggle

Settings - Business

Settings - Team

Settings - Billing

Settings - Domain

Webhooks

Health

Manual Trigger

​3-Batch Architecture (Custom Mapper + Batch Scrape)

​The Five Jobs

​Batch 1a: Update AI Site

​Batch 1b: Discover Products (Parallel with 1a)

​Batch 1 Combined Deploy

​Edge Propagation Wait

​Batch 2a: Create Content

​Job 3: Create AI Articles

​Weekly Target (Per Customer)

​URL Structure

​Job 4: Analyze Visibility (Sampling Architecture)

​How It Works

​Prompt Limits

​Pass/Fail Paradigm

​The 8 AI Platforms

​Job 5: Notify Search Engines (BATCH 3)

​How It Works

​URL Sources

​APIs Called

​All Endpoints

​Manual Trigger

3-Batch Architecture (Custom Mapper + Batch Scrape)

The Five Jobs

Batch 1a: Update AI Site

Batch 1b: Discover Products (Parallel with 1a)

Batch 1 Combined Deploy

Edge Propagation Wait

Batch 2a: Create Content

Job 3: Create AI Articles

Weekly Target (Per Customer)

URL Structure

Job 4: Analyze Visibility (Sampling Architecture)

How It Works

Prompt Limits

Pass/Fail Paradigm

The 8 AI Platforms

Job 5: Notify Search Engines (BATCH 3)

How It Works

URL Sources

APIs Called

All Endpoints

Manual Trigger