Skip to main content
Internal Service β€” This is not an HTTP endpoint. It’s called directly by the generate-all orchestrator.

Purpose

Fetches products from a Shopify store using the public products.json API and creates entity records for each discovered product. Runs in GROUP 1d (parallel with all GROUP 1 tasks, its output feeds GROUP 2b and 2c).

Shared Service

This task uses the shared discover_products service:
src/app/shared/products/discover.py
The same service is used by:
  • Onboarding (this task - GROUP 1d)
  • Cron (discover-products)

Function Signature

async def run_discover_products(
    url: str, 
    org_slug: str,
    business_name: str,
    pages: list = None  # Deprecated, kept for backwards compatibility
) -> tuple[StepResult, list]

Parameters

ParameterTypeDescription
urlstrThe Shopify store URL
org_slugstrThe Clerk organization slug
business_namestrThe business name
pageslist(Deprecated) Ignored - kept for backwards compatibility

Returns

{
  "name": "discover_products",
  "status": "success",
  "data": {
    "product_count": 3,
    "existing_count": 0,
    "products": ["Product A", "Product B", "Product C"]
  }
}
Also returns a list of discovered products for GROUP 2b and 2c:
[
  {"name": "Product A", "entity_id": "uuid-1", "url": "https://store.com/products/a", "handle": "a"},
  {"name": "Product B", "entity_id": "uuid-2", "url": "https://store.com/products/b", "handle": "b"},
  {"name": "Product C", "entity_id": "uuid-3", "url": "https://store.com/products/c", "handle": "c"}
]

Execution Flow

GROUP 2b (Product Prompts) and GROUP 2c (Generate Product LLMs) run in parallel when products are discovered. GROUP 2 starts after GROUP 1a, 1b, and 1d complete.

How It Works

Products are fetched directly from Shopify’s public products.json API:
  1. Direct API access - Fetch from {store_url}/products.json
  2. Pagination - Automatically fetches all pages of products
  3. No AI required - Product data comes directly from Shopify
  4. Instant extraction - No scraping or content parsing needed
This is faster and more reliable than using AI to extract products from scraped content.

Database Schema

Products are stored as entities with type = 'product':
INSERT INTO entities (
  clerk_org_id,
  type,
  parent_id,
  name,
  url,
  product_source_urls
) VALUES (
  'my-business-abc123',
  'product',
  'business-entity-uuid',
  'Product A',
  'https://store.com/products/product-a',
  '["https://store.com/products/product-a"]'
);

Code Location

src/app/shared/products/
β”œβ”€β”€ __init__.py
β”œβ”€β”€ discover.py           # Shared discover_products service (Shopify products.json)
└── generate_llms_txt.py  # Shared generate_product_llms_txt service

src/app/apis/onboarding/generate_all/tasks/
β”œβ”€β”€ discover_products.py  # run_discover_products task wrapper
└── product_llms.py       # run_generate_product_llms_txt task wrapper

Error Handling

{
  "name": "discover_products",
  "status": "error",
  "error": "Failed to fetch products from Shopify"
}
If product discovery fails, GROUP 2b and 2c are skipped and onboarding continues without product prompts or product LLMs.