> ## Documentation Index
> Fetch the complete documentation index at: https://docs.searchcompany.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Overview

> Architecture for the Discover Products endpoint

## Purpose

Discovers NEW products from a Shopify store using hash-based change detection. This endpoint handles the **complete product discovery pipeline** internally - fetching, saving, prompt generation, and LLMs file generation.

<Note>
  **Batch 1b**: This endpoint runs in parallel with Batch 1a (update-ai-site).
  It's completely decoupled - no scraped content needed.
</Note>

## Architecture

```mermaid theme={null}
flowchart TD
    Request["POST /api/cron/discover-products"]
    
    subgraph fetch [Shopify API]
        ProductsJSON["Fetch /products.json"]
        ComputeHash["Compute MD5 hash of handles"]
    end
    
    subgraph detect [Change Detection]
        CompareHash{"Hash changed?"}
        CompareSnapshot["Compare snapshots"]
        FindNew["Identify NEW products only"]
    end
    
    subgraph save [Database]
        SaveProducts["Save new products to entities"]
    end
    
    subgraph generate [Content Generation]
        GenPrompts["Generate 10 prompts per product"]
        GenLLMs["Generate /llms/slug.txt files"]
    end
    
    subgraph deploy [Deployment]
        SkipDeploy{"skip_deploy?"}
        ReturnFiles["Return files for combined deploy"]
        DeployVercel["Deploy to Vercel"]
    end
    
    Request --> ProductsJSON
    ProductsJSON --> ComputeHash
    ComputeHash --> CompareHash
    CompareHash -->|No change| Skip["Return unchanged"]
    CompareHash -->|Changed| CompareSnapshot
    CompareSnapshot --> FindNew
    FindNew --> SaveProducts
    SaveProducts --> GenPrompts
    GenPrompts --> GenLLMs
    GenLLMs --> SkipDeploy
    SkipDeploy -->|Yes| ReturnFiles
    SkipDeploy -->|No| DeployVercel
```

## Why Shopify products.json?

Shopify stores expose a public `products.json` API endpoint that returns all product data:

1. **Faster** - Direct API call vs scraping/AI extraction
2. **More reliable** - Structured JSON data, no parsing errors
3. **Complete data** - Title, description, variants, images, pricing
4. **No AI costs** - No Gemini/OpenAI API calls needed for product discovery

## Hash-Based Change Detection

Instead of re-processing all products every time, we use efficient change detection:

| Column              | Type  | Purpose                                                 |
| ------------------- | ----- | ------------------------------------------------------- |
| `products_hash`     | TEXT  | MD5 hash of sorted product handles for quick comparison |
| `products_snapshot` | JSONB | Full product list from last sync                        |

```
Hash comparison:
- If hash unchanged → skip entirely (no work needed)
- If hash changed → compare snapshots to find NEW products only
```

## Product Data Extracted

Each product from Shopify includes:

```json theme={null}
{
  "name": "Product Title",
  "description": "Product description (cleaned from HTML)",
  "url": "https://store.com/products/handle",
  "handle": "product-handle",
  "product_type": "Category",
  "vendor": "Brand Name",
  "price": "29.99",
  "image_url": "https://cdn.shopify.com/...",
  "meta_title": "Product Title | Store Name",
  "meta_description": "SEO description for this product",
  "variants": [
    {"id": 123, "title": "Default", "price": "29.99", "sku": "ABC123"}
  ],
  "images": ["https://cdn.shopify.com/..."],
  "tags": "tag1, tag2",
  "created_at": "2024-01-01T00:00:00Z",
  "updated_at": "2024-01-15T00:00:00Z"
}
```

Meta title and description are always fetched from each product page (with rate limiting to be respectful).

## What This Endpoint Does (All-in-One)

This endpoint handles the **complete product pipeline** internally:

```
discover-products (this endpoint)
├── Fetch /products.json from Shopify
├── Compute hash & compare with stored products_hash
├── If changed:
│   ├── Compare snapshots to find NEW products
│   ├── Save new products to entities table
│   ├── Generate 10 prompts per new product (internal call)
│   ├── Generate /llms/{slug}.txt for new products (internal call)
│   ├── Return files for combined deploy (if skip_deploy=True)
│   └── Update products_hash & products_snapshot
└── If unchanged:
    └── Return "unchanged" status (skip)
```

<Note>
  **Standalone endpoints exist for manual use**: The separate endpoints
  (`generate-product-prompts`, `generate-product-llms-txt`) exist for
  manual triggering and debugging, but are NOT called by the cron job.
</Note>

## Code Location

```
src/app/apis/cron/discover_products/routes.py
src/app/shared/products/discover.py  # Core Shopify product fetching logic
src/app/shared/products/generate_llms_txt.py  # LLMs generation
```
