Detect content changes on a website using an efficient 2-stage approach:
- Hashing Service (free) - Fetch raw HTML and hash to find changes
- Firecrawl Batch Scrape (paid) - Only scrape pages that actually changed
This is the first step in the 3-API update flow:
- detect-changes (this endpoint) - Find what changed
update-ai-site - Update the AI website
discover-products-from-changes - Find new products
How It Works
1. Custom Mapper β Get current URL list (sitemap + robots.txt + HTML links) [FREE]
2. Compare URLs vs stored site_map
- New URLs = new pages
- Missing URLs = removed pages
3. Hashing Service β Fetch raw HTML + hash for existing pages [FREE]
4. Compare hashes vs stored hashes
- Hash mismatch = content changed
5. Batch Scrape ONLY new + changed URLs β Get markdown [PAID - only what's needed]
6. Return all data for downstream APIs
Cost Efficiency
| Step | Cost | Description |
|---|
| Custom Mapper | Free | Our internal service |
| Hashing Service | Free | Raw HTTP GET + SHA-256 |
| Batch Scrape | ~$0.01/page | Only for new + changed pages |
Before optimization: Batch scrape ALL pages every time
After optimization: Batch scrape only pages that actually changed
Request Body
| Field | Type | Required | Description |
|---|
business_id | string | Yes | Clerk org ID |
Response
{
"status": "success", // or "unchanged" or "error"
"new_pages": [
{"url": "https://example.com/new-page", "markdown": "...", "hash": "abc123"}
],
"changed_pages": [
{"url": "https://example.com/about", "markdown": "...", "old_hash": "def456", "new_hash": "ghi789"}
],
"removed_urls": ["https://example.com/old-page"],
"unchanged_pages": [
{"url": "https://example.com/", "hash": "..."}
],
"updated_site_map": ["https://example.com/", "https://example.com/about", ...],
"updated_hashes": {"https://example.com/": "abc123", ...},
"business_info": {
"entity_id": "uuid",
"url": "https://example.com",
"name": "Example Company",
"clerk_org_id": "org_xxx",
"ai_site_id": "uuid",
"deployment_url": "https://example.searchcompany.dev",
"project_name": "example-searchcompany-dev"
}
}
Key difference: unchanged_pages does NOT have markdown - they werenβt
batch scraped. Only new_pages and changed_pages have markdown content.
Status Values
| Status | Meaning |
|---|
success | Changes detected, proceed with update |
unchanged | No changes detected, skip update |
error | Something went wrong |
Usage in Cron
# Step 1: Detect changes
changes = await detect_changes(business_id)
if changes["status"] == "unchanged":
return # Nothing to do
# Step 2 & 3: Run in parallel
await asyncio.gather(
update_ai_site(business_id, changes),
discover_products_from_changes(business_id, changes)
)
Database Reads
entities table - Get business entity info
ai_sites table - Get site_map and page_hashes
External API Calls
- Custom Website Mapper (
src/app/shared/mapping) - Get URL list [FREE]
- Hashing Service (
src/app/shared/hashing) - Fetch raw HTML + hash [FREE]
- Firecrawl Batch Scrape API (
/v2/batch/scrape) - Get markdown [PAID, only for changed pages]