Purpose
Detects content changes on a website using an efficient 2-stage approach that minimizes Firecrawl costs:- Hashing Service - Fetch raw HTML directly (free) and compare hashes
- Batch Scrape - Only scrape pages that actually changed (paid)
Architecture
Detection Logic
- Get stored data - Retrieve
site_map(URL list) andpage_hashesfrom ai_sites - Map current URLs - Use custom mapper to get current URL list [FREE]
- Compare URL lists - Find new URLs, removed URLs, existing URLs
- Fetch + Hash existing pages - Use hashing service to get raw HTML and hash [FREE]
- Compare hashes - Find which existing pages changed
- Batch scrape - Only scrape NEW + CHANGED pages with Firecrawl [PAID]
Cost Optimization
| Stage | Service | Cost | What it does |
|---|---|---|---|
| 1 | Custom Mapper | Free | Sitemap + robots.txt + HTML links |
| 2 | Hashing Service | Free | Raw HTTP GET + SHA-256 hash |
| 3 | Firecrawl Batch | ~$0.01/page | Only for new + changed pages |
- Old approach: 100 × 1.00
- New approach: 3 × 0.03
Response Format
Change Categories
| Category | Has Markdown? | Description |
|---|---|---|
new_pages | Yes | URLs that didn’t exist before |
changed_pages | Yes | URLs with different content hash |
unchanged_pages | No | URLs with same hash (not scraped) |
removed_urls | N/A | URLs that no longer exist |
Code Location
Used By
- Daily cron job for site freshness
- Triggers
update-ai-sitewhen changes found - Triggers
discover-products-from-changesfor new pages