Internal Service β This is not an HTTP endpoint. Itβs called directly by the generate-all orchestrator.
Purpose
Fetches products from a Shopify store using the public products.json API and creates entity records for each discovered product.
Runs in GROUP 1d (parallel with all GROUP 1 tasks, its output feeds GROUP 2b and 2c).
Shared Service
This task uses the shared discover_products service:
src/app/shared/products/discover.py
The same service is used by:
- Onboarding (this task - GROUP 1d)
- Cron (
discover-products)
Function Signature
async def run_discover_products(
url: str,
org_slug: str,
business_name: str,
pages: list = None # Deprecated, kept for backwards compatibility
) -> tuple[StepResult, list]
Parameters
| Parameter | Type | Description |
|---|
url | str | The Shopify store URL |
org_slug | str | The Clerk organization slug |
business_name | str | The business name |
pages | list | (Deprecated) Ignored - kept for backwards compatibility |
Returns
{
"name": "discover_products",
"status": "success",
"data": {
"product_count": 3,
"existing_count": 0,
"products": ["Product A", "Product B", "Product C"]
}
}
Also returns a list of discovered products for GROUP 2b and 2c:
[
{"name": "Product A", "entity_id": "uuid-1", "url": "https://store.com/products/a", "handle": "a"},
{"name": "Product B", "entity_id": "uuid-2", "url": "https://store.com/products/b", "handle": "b"},
{"name": "Product C", "entity_id": "uuid-3", "url": "https://store.com/products/c", "handle": "c"}
]
Execution Flow
GROUP 2b (Product Prompts) and GROUP 2c (Generate Product LLMs) run in parallel when products are discovered. GROUP 2 starts after GROUP 1a, 1b, and 1d complete.
How It Works
Products are fetched directly from Shopifyβs public products.json API:
- Direct API access - Fetch from
{store_url}/products.json
- Pagination - Automatically fetches all pages of products
- No AI required - Product data comes directly from Shopify
- Instant extraction - No scraping or content parsing needed
This is faster and more reliable than using AI to extract products from scraped content.
Database Schema
Products are stored as entities with type = 'product':
INSERT INTO entities (
clerk_org_id,
type,
parent_id,
name,
url,
product_source_urls
) VALUES (
'my-business-abc123',
'product',
'business-entity-uuid',
'Product A',
'https://store.com/products/product-a',
'["https://store.com/products/product-a"]'
);
Code Location
src/app/shared/products/
βββ __init__.py
βββ discover.py # Shared discover_products service (Shopify products.json)
βββ generate_llms_txt.py # Shared generate_product_llms_txt service
src/app/apis/onboarding/generate_all/tasks/
βββ discover_products.py # run_discover_products task wrapper
βββ product_llms.py # run_generate_product_llms_txt task wrapper
Error Handling
{
"name": "discover_products",
"status": "error",
"error": "Failed to fetch products from Shopify"
}
If product discovery fails, GROUP 2b and 2c are skipped and onboarding continues without product prompts or product LLMs.