> ## Documentation Index
> Fetch the complete documentation index at: https://docs.searchcompany.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Overview

> Architecture for the Detect Changes endpoint

## Purpose

Detects content changes on a website using an **efficient 2-stage approach** that minimizes Firecrawl costs:

1. **Hashing Service** - Fetch raw HTML directly (free) and compare hashes
2. **Batch Scrape** - Only scrape pages that actually changed (paid)

## Architecture

```mermaid theme={null}
flowchart TD
    Request["POST /api/cron/detect-changes"]
    
    subgraph orchestrator [Mini Orchestrator]
        GetStored["Get stored site_map + hashes"]
        CustomMapper["Custom Website Mapper"]
        HashService["Hashing Service - Fetch HTML + Hash"]
        CompareHashes["Compare hashes"]
        BatchScrape["Batch Scrape NEW + CHANGED only"]
    end
    
    subgraph db [Database]
        AISites["ai_sites table"]
    end
    
    subgraph external [External]
        Firecrawl["Firecrawl API"]
    end
    
    Request --> GetStored
    GetStored --> AISites
    AISites --> CustomMapper
    CustomMapper --> HashService
    HashService --> CompareHashes
    CompareHashes --> BatchScrape
    BatchScrape --> Firecrawl
    Firecrawl --> Response
```

## Detection Logic

1. **Get stored data** - Retrieve `site_map` (URL list) and `page_hashes` from ai\_sites
2. **Map current URLs** - Use custom mapper to get current URL list \[FREE]
3. **Compare URL lists** - Find new URLs, removed URLs, existing URLs
4. **Fetch + Hash existing pages** - Use hashing service to get raw HTML and hash \[FREE]
5. **Compare hashes** - Find which existing pages changed
6. **Batch scrape** - Only scrape NEW + CHANGED pages with Firecrawl \[PAID]

## Cost Optimization

| Stage | Service         | Cost          | What it does                      |
| ----- | --------------- | ------------- | --------------------------------- |
| 1     | Custom Mapper   | Free          | Sitemap + robots.txt + HTML links |
| 2     | Hashing Service | Free          | Raw HTTP GET + SHA-256 hash       |
| 3     | Firecrawl Batch | \~\$0.01/page | Only for new + changed pages      |

**Example**: Site with 100 pages, 3 changed

* **Old approach**: 100 × $0.01 = $1.00
* **New approach**: 3 × $0.01 = $0.03

## Response Format

```json theme={null}
{
  "status": "success",
  "new_pages": [...],        // Have markdown (batch scraped)
  "changed_pages": [...],    // Have markdown (batch scraped)
  "removed_urls": [...],
  "unchanged_pages": [...],  // NO markdown (not scraped)
  "updated_site_map": [...],
  "updated_hashes": {...},
  "business_info": {...}
}
```

## Change Categories

| Category          | Has Markdown? | Description                       |
| ----------------- | ------------- | --------------------------------- |
| `new_pages`       | Yes           | URLs that didn't exist before     |
| `changed_pages`   | Yes           | URLs with different content hash  |
| `unchanged_pages` | No            | URLs with same hash (not scraped) |
| `removed_urls`    | N/A           | URLs that no longer exist         |

## Code Location

```
src/app/apis/cron/detect_changes/
├── routes.py           # HTTP endpoint
└── mini_orchestrator.py # Detection logic

src/app/shared/hashing/
├── __init__.py         # Exports
└── service.py          # fetch_and_hash, fetch_and_hash_batch

src/app/shared/mapping/
├── __init__.py         # Exports
└── mapper.py           # map_website
```

## Used By

* Daily cron job for site freshness
* Triggers `update-ai-site` when changes found
* Triggers `discover-products-from-changes` for new pages
