> ## Documentation Index
> Fetch the complete documentation index at: https://docs.searchcompany.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Detect Changes

Detect content changes on a website using an **efficient 2-stage approach**:

1. **Hashing Service** (free) - Fetch raw HTML and hash to find changes
2. **Firecrawl Batch Scrape** (paid) - Only scrape pages that actually changed

This is part of Batch 1a in the cron architecture:

<Info>
  **Batch 1a**: `detect-changes` + `update-ai-site` run together.
  **Batch 1b**: `discover-products` runs in parallel (completely decoupled).
</Info>

1. **detect-changes** (this endpoint) - Find what changed
2. `update-ai-site` - Update the AI website

## How It Works

```
1. Custom Mapper → Get current URL list (sitemap + robots.txt + HTML links) [FREE]
2. Compare URLs vs stored site_map
   - New URLs = new pages
   - Missing URLs = removed pages
3. Hashing Service → Fetch raw HTML + hash for existing pages [FREE]
4. Compare hashes vs stored hashes
   - Hash mismatch = content changed
5. Batch Scrape ONLY new + changed URLs → Get markdown [PAID - only what's needed]
6. Return all data for downstream APIs
```

## Cost Efficiency

| Step            | Cost          | Description                      |
| --------------- | ------------- | -------------------------------- |
| Custom Mapper   | Free          | Our internal service             |
| Hashing Service | Free          | Raw HTTP GET + SHA-256           |
| Batch Scrape    | \~\$0.01/page | **Only for new + changed pages** |

**Before optimization**: Batch scrape ALL pages every time
**After optimization**: Batch scrape only pages that actually changed

## Request Body

| Field         | Type   | Required | Description  |
| ------------- | ------ | -------- | ------------ |
| `business_id` | string | Yes      | Clerk org ID |

## Response

```json theme={null}
{
  "status": "success",  // or "unchanged" or "error"
  "new_pages": [
    {"url": "https://example.com/new-page", "markdown": "...", "hash": "abc123"}
  ],
  "changed_pages": [
    {"url": "https://example.com/about", "markdown": "...", "old_hash": "def456", "new_hash": "ghi789"}
  ],
  "removed_urls": ["https://example.com/old-page"],
  "unchanged_pages": [
    {"url": "https://example.com/", "hash": "..."}
  ],
  "updated_site_map": ["https://example.com/", "https://example.com/about", ...],
  "updated_hashes": {"https://example.com/": "abc123", ...},
  "business_info": {
    "entity_id": "uuid",
    "url": "https://example.com",
    "name": "Example Company",
    "clerk_org_id": "org_xxx",
    "ai_site_id": "uuid",
    "deployment_url": "https://example.searchcompany.dev",
    "project_name": "example-searchcompany-dev"
  }
}
```

<Note>
  **Key difference**: `unchanged_pages` does NOT have `markdown` - they weren't
  batch scraped. Only `new_pages` and `changed_pages` have markdown content.
</Note>

## Status Values

| Status      | Meaning                               |
| ----------- | ------------------------------------- |
| `success`   | Changes detected, proceed with update |
| `unchanged` | No changes detected, skip update      |
| `error`     | Something went wrong                  |

## Usage in Cron

```python theme={null}
# Batch 1a: Detect changes and update AI site
changes = await detect_changes(business_id)

if changes["status"] == "unchanged":
    return  # Nothing to do for this business

# Update AI site with the changes
await update_ai_site(business_id, changes, skip_deploy=True)
# Files collected for combined deploy after Batch 1a + 1b complete
```

<Note>
  **Decoupled architecture**: Product discovery (Batch 1b) now runs in parallel
  and uses the Shopify `products.json` API directly - it doesn't depend on
  scraped content from `detect-changes`.
</Note>

## Database Reads

* `entities` table - Get business entity info
* `ai_sites` table - Get `site_map` and `page_hashes`

## External API Calls

* Custom Website Mapper (`src/app/shared/mapping`) - Get URL list \[FREE]
* Hashing Service (`src/app/shared/hashing`) - Fetch raw HTML + hash \[FREE]
* Firecrawl Batch Scrape API (`/v2/batch/scrape`) - Get markdown \[PAID, only for changed pages]
