Detect Changes

Detect content changes on a website using an efficient 2-stage approach:

Hashing Service (free) - Fetch raw HTML and hash to find changes
Firecrawl Batch Scrape (paid) - Only scrape pages that actually changed

This is part of Batch 1a in the cron architecture:

Batch 1a: detect-changes + update-ai-site run together. Batch 1b: discover-products runs in parallel (completely decoupled).

detect-changes (this endpoint) - Find what changed
update-ai-site - Update the AI website

How It Works

1. Custom Mapper → Get current URL list (sitemap + robots.txt + HTML links) [FREE]
2. Compare URLs vs stored site_map
   - New URLs = new pages
   - Missing URLs = removed pages
3. Hashing Service → Fetch raw HTML + hash for existing pages [FREE]
4. Compare hashes vs stored hashes
   - Hash mismatch = content changed
5. Batch Scrape ONLY new + changed URLs → Get markdown [PAID - only what's needed]
6. Return all data for downstream APIs

Cost Efficiency

Step	Cost	Description
Custom Mapper	Free	Our internal service
Hashing Service	Free	Raw HTTP GET + SHA-256
Batch Scrape	~$0.01/page	Only for new + changed pages

Before optimization: Batch scrape ALL pages every time After optimization: Batch scrape only pages that actually changed

Request Body

Field	Type	Required	Description
`business_id`	string	Yes	Clerk org ID

Response

{
  "status": "success",  // or "unchanged" or "error"
  "new_pages": [
    {"url": "https://example.com/new-page", "markdown": "...", "hash": "abc123"}
  ],
  "changed_pages": [
    {"url": "https://example.com/about", "markdown": "...", "old_hash": "def456", "new_hash": "ghi789"}
  ],
  "removed_urls": ["https://example.com/old-page"],
  "unchanged_pages": [
    {"url": "https://example.com/", "hash": "..."}
  ],
  "updated_site_map": ["https://example.com/", "https://example.com/about", ...],
  "updated_hashes": {"https://example.com/": "abc123", ...},
  "business_info": {
    "entity_id": "uuid",
    "url": "https://example.com",
    "name": "Example Company",
    "clerk_org_id": "org_xxx",
    "ai_site_id": "uuid",
    "deployment_url": "https://example.searchcompany.dev",
    "project_name": "example-searchcompany-dev"
  }
}

Key difference: unchanged_pages does NOT have markdown - they weren’t batch scraped. Only new_pages and changed_pages have markdown content.

Status Values

Status	Meaning
`success`	Changes detected, proceed with update
`unchanged`	No changes detected, skip update
`error`	Something went wrong

Usage in Cron

# Batch 1a: Detect changes and update AI site
changes = await detect_changes(business_id)

if changes["status"] == "unchanged":
    return  # Nothing to do for this business

# Update AI site with the changes
await update_ai_site(business_id, changes, skip_deploy=True)
# Files collected for combined deploy after Batch 1a + 1b complete

Decoupled architecture: Product discovery (Batch 1b) now runs in parallel and uses the Shopify products.json API directly - it doesn’t depend on scraped content from detect-changes.

Database Reads

entities table - Get business entity info
ai_sites table - Get site_map and page_hashes

External API Calls

Custom Website Mapper (src/app/shared/mapping) - Get URL list [FREE]
Hashing Service (src/app/shared/hashing) - Fetch raw HTML + hash [FREE]
Firecrawl Batch Scrape API (/v2/batch/scrape) - Get markdown [PAID, only for changed pages]

Getting Started

Website

Onboarding

Cron

Your Current Setup

Explore

Settings - Toggle

Settings - Business

Settings - Team

Settings - Billing

Settings - Domain

Webhooks

Health

Manual Trigger

How It Works

Cost Efficiency

Request Body

Response

Status Values

Usage in Cron

Database Reads

External API Calls

Getting Started

Website

Onboarding

Cron

Your Current Setup

Explore

Settings - Toggle

Settings - Business

Settings - Team

Settings - Billing

Settings - Domain

Webhooks

Health

Manual Trigger

​How It Works

​Cost Efficiency

​Request Body

​Response

​Status Values

​Usage in Cron

​Database Reads

​External API Calls

How It Works

Cost Efficiency

Request Body

Response

Status Values

Usage in Cron

Database Reads

External API Calls