Overview

Purpose

Discovers NEW products from a Shopify store using hash-based change detection. This endpoint handles the complete product discovery pipeline internally - fetching, saving, prompt generation, and LLMs file generation.

Batch 1b: This endpoint runs in parallel with Batch 1a (update-ai-site). It’s completely decoupled - no scraped content needed.

Architecture

Why Shopify products.json?

Shopify stores expose a public products.json API endpoint that returns all product data:

Faster - Direct API call vs scraping/AI extraction
More reliable - Structured JSON data, no parsing errors
Complete data - Title, description, variants, images, pricing
No AI costs - No Gemini/OpenAI API calls needed for product discovery

Hash-Based Change Detection

Instead of re-processing all products every time, we use efficient change detection:

Column	Type	Purpose
`products_hash`	TEXT	MD5 hash of sorted product handles for quick comparison
`products_snapshot`	JSONB	Full product list from last sync

Hash comparison:
- If hash unchanged → skip entirely (no work needed)
- If hash changed → compare snapshots to find NEW products only

Product Data Extracted

Each product from Shopify includes:

{
  "name": "Product Title",
  "description": "Product description (cleaned from HTML)",
  "url": "https://store.com/products/handle",
  "handle": "product-handle",
  "product_type": "Category",
  "vendor": "Brand Name",
  "price": "29.99",
  "image_url": "https://cdn.shopify.com/...",
  "meta_title": "Product Title | Store Name",
  "meta_description": "SEO description for this product",
  "variants": [
    {"id": 123, "title": "Default", "price": "29.99", "sku": "ABC123"}
  ],
  "images": ["https://cdn.shopify.com/..."],
  "tags": "tag1, tag2",
  "created_at": "2024-01-01T00:00:00Z",
  "updated_at": "2024-01-15T00:00:00Z"
}

Meta title and description are always fetched from each product page (with rate limiting to be respectful).

What This Endpoint Does (All-in-One)

This endpoint handles the complete product pipeline internally:

discover-products (this endpoint)
├── Fetch /products.json from Shopify
├── Compute hash & compare with stored products_hash
├── If changed:
│   ├── Compare snapshots to find NEW products
│   ├── Save new products to entities table
│   ├── Generate 10 prompts per new product (internal call)
│   ├── Generate /llms/{slug}.txt for new products (internal call)
│   ├── Return files for combined deploy (if skip_deploy=True)
│   └── Update products_hash & products_snapshot
└── If unchanged:
    └── Return "unchanged" status (skip)

Standalone endpoints exist for manual use: The separate endpoints (generate-product-prompts, generate-product-llms-txt) exist for manual triggering and debugging, but are NOT called by the cron job.

Code Location

src/app/apis/cron/discover_products/routes.py
src/app/shared/products/discover.py  # Core Shopify product fetching logic
src/app/shared/products/generate_llms_txt.py  # LLMs generation

Getting Started

Website

Onboarding

Cron

Your Current Setup

Explore

Settings - Toggle

Settings - Business

Settings - Team

Settings - Billing

Settings - Domain

Webhooks

Health

Manual Trigger

Purpose

Architecture

Why Shopify products.json?

Hash-Based Change Detection

Product Data Extracted

What This Endpoint Does (All-in-One)

Code Location

Getting Started

Website

Onboarding

Cron

Your Current Setup

Explore

Settings - Toggle

Settings - Business

Settings - Team

Settings - Billing

Settings - Domain

Webhooks

Health

Manual Trigger

​Purpose

​Architecture

​Why Shopify products.json?

​Hash-Based Change Detection

​Product Data Extracted

​What This Endpoint Does (All-in-One)

​Code Location

Purpose

Architecture

Why Shopify products.json?

Hash-Based Change Detection

Product Data Extracted

What This Endpoint Does (All-in-One)

Code Location