Internal Service β This is not an HTTP endpoint. Itβs called directly by the generate-all orchestrator.
Purpose
Uses Gemini to analyze scraped website content and identify products, services, or SaaS offerings. Creates entity records for each discovered product.
Runs in GROUP 2b (parallel with GROUP 2a, but its output feeds GROUP 3a and 3b).
Shared Service
This task uses the shared discover_products service:
src/app/shared/products/discover.py
The same service is used by:
- Onboarding (this task - GROUP 2b)
- Cron (
discover-products-from-changes)
Function Signature
async def run_discover_products(
url: str,
org_slug: str,
business_name: str,
pages: list
) -> tuple[StepResult, list]
Parameters
| Parameter | Type | Description |
|---|
url | str | The business website URL |
org_slug | str | The Clerk organization slug |
business_name | str | The business name |
pages | list | Pre-scraped pages from GROUP 1a |
Returns
{
"name": "discover_products",
"status": "success",
"data": {
"product_count": 3,
"existing_count": 0,
"products": ["Product A", "Product B", "Product C"]
}
}
Also returns a list of discovered products for GROUP 3a and 3b:
[
{"name": "Product A", "entity_id": "uuid-1", "source_urls": [...], "url": "..."},
{"name": "Product B", "entity_id": "uuid-2", "source_urls": [...], "url": "..."},
{"name": "Product C", "entity_id": "uuid-3", "source_urls": [...], "url": "..."}
]
Execution Flow
GROUP 3a (Product Prompts) and GROUP 3b (Generate Product LLMs) run in parallel when products are discovered, without waiting for GROUP 2a to complete.
Database Schema
Products are stored as entities with type = 'product':
INSERT INTO entities (
clerk_org_id,
type,
parent_id,
name,
url,
product_source_urls
) VALUES (
'my-business-abc123',
'product',
'business-entity-uuid',
'Product A',
'https://example.com/product-a',
'["https://example.com/product-a", "https://example.com/categories/widgets"]'
);
Code Location
src/app/shared/products/
βββ __init__.py
βββ discover.py # Shared discover_products service
βββ generate_llms_txt.py # Shared generate_product_llms_txt service
src/app/apis/onboarding/generate_all/tasks/
βββ discover_products.py # run_discover_products task wrapper
βββ product_llms.py # run_generate_product_llms_txt task wrapper
Error Handling
{
"name": "discover_products",
"status": "error",
"error": "Gemini API failed to parse products"
}
If product discovery fails, GROUP 3a and 3b are skipped and onboarding continues with just business prompts.