Discover Products

Internal Service — This is not an HTTP endpoint. It’s called directly by the generate-all orchestrator.

Purpose

Fetches products from a Shopify store using the public products.json API and creates entity records for each discovered product. Runs in GROUP 1d (parallel with all GROUP 1 tasks, its output feeds GROUP 2b and 2c).

Shared Service

This task uses the shared discover_products service:

src/app/shared/products/discover.py

The same service is used by:

Onboarding (this task - GROUP 1d)
Cron (discover-products)

Function Signature

async def run_discover_products(
    url: str, 
    org_slug: str,
    business_name: str,
    pages: list = None  # Deprecated, kept for backwards compatibility
) -> tuple[StepResult, list]

Parameters

Parameter	Type	Description
`url`	`str`	The Shopify store URL
`org_slug`	`str`	The Clerk organization slug
`business_name`	`str`	The business name
`pages`	`list`	(Deprecated) Ignored - kept for backwards compatibility

Returns

{
  "name": "discover_products",
  "status": "success",
  "data": {
    "product_count": 3,
    "existing_count": 0,
    "products": ["Product A", "Product B", "Product C"]
  }
}

Also returns a list of discovered products for GROUP 2b and 2c:

[
  {"name": "Product A", "entity_id": "uuid-1", "url": "https://store.com/products/a", "handle": "a"},
  {"name": "Product B", "entity_id": "uuid-2", "url": "https://store.com/products/b", "handle": "b"},
  {"name": "Product C", "entity_id": "uuid-3", "url": "https://store.com/products/c", "handle": "c"}
]

Execution Flow

GROUP 2b (Product Prompts) and GROUP 2c (Generate Product LLMs) run in parallel when products are discovered. GROUP 2 starts after GROUP 1a, 1b, and 1d complete.

How It Works

Products are fetched directly from Shopify’s public products.json API:

Direct API access - Fetch from {store_url}/products.json
Pagination - Automatically fetches all pages of products
No AI required - Product data comes directly from Shopify
Instant extraction - No scraping or content parsing needed

This is faster and more reliable than using AI to extract products from scraped content.

Database Schema

Products are stored as entities with type = 'product':

INSERT INTO entities (
  clerk_org_id,
  type,
  parent_id,
  name,
  url,
  product_source_urls
) VALUES (
  'my-business-abc123',
  'product',
  'business-entity-uuid',
  'Product A',
  'https://store.com/products/product-a',
  '["https://store.com/products/product-a"]'
);

Code Location

src/app/shared/products/
├── __init__.py
├── discover.py           # Shared discover_products service (Shopify products.json)
└── generate_llms_txt.py  # Shared generate_product_llms_txt service

src/app/apis/onboarding/generate_all/tasks/
├── discover_products.py  # run_discover_products task wrapper
└── product_llms.py       # run_generate_product_llms_txt task wrapper

Error Handling

{
  "name": "discover_products",
  "status": "error",
  "error": "Failed to fetch products from Shopify"
}

If product discovery fails, GROUP 2b and 2c are skipped and onboarding continues without product prompts or product LLMs.

Getting Started

Website

Onboarding

Cron

Your Current Setup

Explore

Settings - Toggle

Settings - Business

Settings - Team

Settings - Billing

Settings - Domain

Webhooks

Health

Manual Trigger

Purpose

Shared Service

Function Signature

Parameters

Returns

Execution Flow

How It Works

Database Schema

Code Location

Error Handling

Getting Started

Website

Onboarding

Cron

Your Current Setup

Explore

Settings - Toggle

Settings - Business

Settings - Team

Settings - Billing

Settings - Domain

Webhooks

Health

Manual Trigger

​Purpose

​Shared Service

​Function Signature

​Parameters

​Returns

​Execution Flow

​How It Works

​Database Schema

​Code Location

​Error Handling

Purpose

Shared Service

Function Signature

Parameters

Returns

Execution Flow

How It Works

Database Schema

Code Location

Error Handling