> ## Documentation Index
> Fetch the complete documentation index at: https://docs.searchcompany.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Cron Overview

> Background jobs that run on a schedule to keep your AI visibility optimized

These endpoints are called by the Cron service (Railway scheduled jobs) to perform background tasks.

<Info>
  **Schedule**: The cron runs daily at **10:00 PM Singapore Time (SGT)** / 14:00
  UTC.
</Info>

## 3-Batch Architecture (Custom Mapper + Batch Scrape)

The cron uses a custom website mapper + Firecrawl Batch Scrape API for efficient change detection:

```
For each customer:

   BATCH 1: Update AI Site + Discover Products (parallel)
   ├── Batch 1a: detect-changes → update-ai-site (skip_deploy)
   │   └── Collect files for deploy
   ├── Batch 1b: discover-products (skip_deploy)
   │   ├── Fetch from Shopify /products.json
   │   ├── Save NEW products to DB
   │   ├── Generate prompts for NEW products
   │   ├── Generate /llms/{slug}.txt for NEW products
   │   └── Collect files for deploy
   ├── Merge files from 1a + 1b
   ├── Deploy to Vercel (single combined deploy)
   └── Wait 10s for edge propagation

   BATCH 2a: Create Content (parallel)
   ├── Update Timestamps → collect files
   ├── Create AI Articles → collect files + new page URLs
   └── Deploy to Vercel (combined files)

   BATCH 2b: Analyze Visibility
   └── Analyze Visibility (sampling) → DB only

   BATCH 3: Notify Search Engines
   ├── Aggregate all changed URLs from BATCH 1a + BATCH 2a
   ├── Submit to IndexNow (one call per business)
   └── Resubmit sitemap to Google Search Console (one call per business)
```

<Note>
  **Why this architecture is efficient:** - **Custom Mapper** combines sitemap,
  robots.txt, and HTML link extraction for comprehensive URL discovery \[FREE] -
  **Hashing Service** fetches raw HTML and hashes to detect changes \[FREE] -
  **Batch Scrape** only scrapes NEW + CHANGED pages (not all pages) \[PAID] -
  **Batch 1a and 1b run in parallel** (no deploy conflicts - files collected first) -
  **Single combined deploy** for Batch 1 (1a + 1b files merged) -
  **Visibility sampling** reduces API calls by 50% - **2 deploys per customer**
  (Batch 1 combined + Batch 2a) - **BATCH 3** notifies search engines AFTER all content is
  deployed
</Note>

***

## The Five Jobs

| Job   | Name                  | Purpose                                               | Batch    |
| ----- | --------------------- | ----------------------------------------------------- | -------- |
| **1** | Update AI Sites       | Refresh AI websites with content changes              | Batch 1a |
| **2** | Discover Products     | Find new products from Shopify                        | Batch 1b |
| **3** | Update Timestamps     | Refresh timestamps on all pages for freshness signals | Batch 2a |
| **4** | Create AI Articles    | Generate AI-specific content pages (100/week target)  | Batch 2a |
| **5** | Analyze Visibility    | Check visibility across 8 AI platforms (sampled)      | Batch 2b |
| **6** | Notify Search Engines | Submit URLs to IndexNow + Google Search Console       | Batch 3  |

***

## Batch 1a: Update AI Site

Detects changes on the real website and updates the AI site accordingly.

1. **detect\_changes**: Custom mapper + Hashing Service finds changes, Batch Scrape only for changed pages
2. **update\_ai\_site**: Send changes to Gemini (3 parallel calls), regenerate files
3. **Return files** (skip\_deploy=True) for combined deploy

```
Batch 1a Flow:
1. Custom Mapper → Get current URL list (up to 5000 pages) [FREE]
2. Compare URLs vs stored site_map → find new/removed pages
3. Hashing Service → Fetch raw HTML + hash existing pages [FREE]
4. Hash comparison → Find changed pages
5. Batch Scrape → Get markdown for NEW + CHANGED pages only [PAID]
6. If changes detected:
   └── update_ai_site: 3 parallel Gemini calls (skip_deploy=True)
7. Return files for combined deploy
8. Collect changed_urls for BATCH 3 submission
```

***

## Batch 1b: Discover Products (Parallel with 1a)

<Info>
  **Decoupled**: Batch 1b runs in parallel with 1a. It fetches products directly
  from Shopify's `/products.json` API - no scraped content needed.
</Info>

1. **fetch\_shopify\_products**: Fetch all products from `/products.json` API
2. **Hash comparison**: Compute MD5 hash of sorted product handles, compare with stored `products_hash`
3. **Snapshot comparison**: If hash changed, compare with stored `products_snapshot` to find NEW products
4. **Save products**: Save new products to `entities` table
5. **generate\_product\_prompts**: Generate 10 prompts per new product
6. **generate\_product\_llms\_txt**: Generate `/llms/{slug}.txt` for new products
7. **Return files** (skip\_deploy=True) for combined deploy

```
Batch 1b Flow:
1. Fetch products from Shopify /products.json [FREE]
2. Compute hash of sorted product handles
3. Compare hash with stored products_hash in ai_sites
4. If unchanged → skip (no work needed)
5. If changed → compare products_snapshot to find NEW products
6. Save new products to entities table
7. Generate prompts + llms.txt for NEW products only (skip_deploy=True)
8. Return files for combined deploy
9. Update products_hash + products_snapshot in ai_sites
```

**Database columns used (in `ai_sites` table):**

* `products_hash` (TEXT): MD5 hash of sorted product handles for quick comparison
* `products_snapshot` (JSONB): Full product list from last sync `[{handle, name, ...}]`

***

## Batch 1 Combined Deploy

After Batch 1a and 1b complete in parallel, their files are merged and deployed in a single call:

```
Batch 1 Deploy Flow:
1. Batch 1a returns files_by_business
2. Batch 1b returns files_by_business
3. Merge files from both batches
4. Deploy to Vercel (single combined deploy per business)
5. Wait 10s for edge propagation
```

***

## Edge Propagation Wait

After the Batch 1 combined deploy, we wait **10 seconds** for Vercel edge propagation.

<Warning>
  This wait is critical because Batch 2a scrapes the AI site for context.
  Without this wait, it might hit stale/cached content or 404s.
</Warning>

***

## Batch 2a: Create Content

<Info>
  Refreshes timestamps on ALL pages (AI site core files + AI articles) to
  signal freshness to AI search engines.
</Info>

Updates on every page:

* Meta tags: `article:modified_time`
* Year in titles: "2025" → "2026" (if year changed)
* Footer: "Last updated: December 24, 2025"

This helps because Bing and other AI search engines favor fresh content and may include dates in citations.

## Job 3: Create AI Articles

Generates AI-specific content pages at the **root level** (`/{slug}/`) to improve discoverability.

### Weekly Target (Per Customer)

* **100 pages per week** (Monday-Sunday)
* **50 pages** for the business (50%)
* **50 pages** distributed across products (50%)

**Special cases:**

* **No products**: Business gets all 100 pages
* **1-50 products**: All products included, 50 pages split evenly
* **51+ products**: Round-robin rotation selects 50 products per week

When there are more than 50 products, the system uses a rotating selection each week so all products eventually get coverage.

### URL Structure

AI articles are deployed at the **root level** for maximum SEO authority:

| Entity Type | URL Pattern | Example                            |
| ----------- | ----------- | ---------------------------------- |
| Business    | `/{slug}/`  | `/expert-review-of-website-arena/` |
| Product     | `/{slug}/`  | `/deep-dive-into-remix-tool/`      |

## Job 4: Analyze Visibility (Sampling Architecture)

<Info>
  **Cost Optimization**: We sample 10 prompts per day (prioritizing untested
  ones) instead of checking all prompts. This reduces API costs by \~50% while
  ensuring all prompts eventually get tested.
</Info>

### How It Works

1. **Sample 10 prompts** from the org's total pool (untested first, then random)
2. **Analyze each prompt** across 8 AI platforms (80 API calls total)
3. **Store results** with pass/fail per platform and update `last_tested_at`
4. **Update overall score** with floor protection (never dips below previous high)

### Prompt Limits

* **Business**: 50 prompts (10 via Exa during onboarding + 40 via regular generation)
* **Products**: 10 prompts each (unlimited products)

### Pass/Fail Paradigm

Each prompt shows visibility status per platform:

* **true (✓)**: Entity was mentioned/recommended by this AI platform
* **false (✗)**: Entity was not found in the AI platform's response
* **null (-)**: Not yet tested

### The 8 AI Platforms

Each platform uses its native search capabilities, then Gemini 3 Flash provides unified evaluation:

* **ChatGPT** - OpenAI Direct w/ Search
* **Claude** - Anthropic Direct w/ Search
* **Gemini** - GCP AI Studio Direct w/ Search
* **Perplexity** - Sonar API
* **Copilot** - Parallel Search API
* **DeepSeek** - Firecrawl Search API
* **Grok** - X.AI Direct w/ Search
* **Google AI** - Serp API (AI Overview)

***

## Job 5: Notify Search Engines (BATCH 3)

<Info>
  **BATCH 3** runs AFTER all content is deployed (BATCH 1a + BATCH 2a) to ensure
  search engines see the latest content.
</Info>

### How It Works

1. **Aggregate URLs** from BATCH 1a (changed pages) and BATCH 2a (new AI articles)
2. **Submit to IndexNow** - Instant notification to Bing, Yandex, and other IndexNow-compatible engines
3. **Resubmit sitemap to Google Search Console** - Signals Google to re-crawl the sitemap

### URL Sources

| Source                       | URLs Submitted                                           |
| ---------------------------- | -------------------------------------------------------- |
| Job 1a (update\_ai\_site)    | `changed_urls` - new + changed pages from detect-changes |
| Job 3 (create\_ai\_articles) | New AI article slugs (e.g., `/{slug}/`)                  |

### APIs Called

```python theme={null}
# IndexNow - one call per business
POST /api/cron/submit-indexnow
{
  "urls": ["/about-us/", "/new-ai-article/", ...],
  "source_url": "https://customer-domain.com"
}

# Google Search Console - one call per business
POST /api/domain/resubmit-sitemap/{org_id}
```

<Note>
  **Why BATCH 3 is separate**: Search engines should only be notified AFTER
  content is deployed. If we submitted URLs before deployment, crawlers might
  hit 404s or stale content.
</Note>

***

## All Endpoints

| Endpoint                              | Method | Description                                                  |
| ------------------------------------- | ------ | ------------------------------------------------------------ |
| `/api/cron/entities`                  | GET    | Fetch all businesses/products to process                     |
| `/api/cron/detect-changes`            | POST   | Detect content changes using Mapper + Hashing + Batch Scrape |
| `/api/cron/update-ai-site`            | POST   | Update AI website with changes                               |
| `/api/cron/discover-products`         | POST   | Discover NEW products from Shopify (hash-based detection)    |
| `/api/cron/generate-product-prompts`  | POST   | Generate visibility prompts for products                     |
| `/api/cron/generate-product-llms-txt` | POST   | Generate product llms.txt files                              |
| `/api/cron/update-all-timestamps`     | POST   | Refresh timestamps on all AI website pages                   |
| `/api/cron/ai-articles-quota`         | GET    | Calculate today's AI articles quota                          |
| `/api/cron/create-ai-article`         | POST   | Generate AI article content (no deploy)                      |
| `/api/cron/deploy-to-vercel`          | POST   | Deploy all files to Vercel (single deployment)               |
| `/api/cron/submit-indexnow`           | POST   | Notify search engines of new URLs                            |
| `/api/cron/sample-prompts`            | GET    | Randomly sample prompts for visibility check                 |
| `/api/cron/analyze-visibility`        | POST   | Check visibility across 8 AI platforms                       |
| `/api/cron/store-visibility-report`   | POST   | Store daily visibility report                                |
| `/api/cron/store-visibility-score`    | POST   | Calculate and store visibility score                         |

<Warning>
  **Deprecated**: `/api/cron/discover-products-from-changes` is deprecated.
  Use `/api/cron/discover-products` instead (decoupled, hash-based detection).
</Warning>

<Note>
  Looking for prompt regeneration? See [Regenerate
  Prompts](/api-reference/endpoint/manual-trigger/regenerate-prompts) in the
  Manual Trigger section.
</Note>

***

## Manual Trigger

Run all cron jobs immediately for testing or recovery:

```bash theme={null}
curl -X POST https://searchcompany-main.up.railway.app/api/cron/trigger-all
```

⚠️ **Warning**: This can take up to 10 minutes depending on customer count.