Skip to main content
Internal Service β€” This is not an HTTP endpoint. It’s called directly by the generate-all orchestrator.

Purpose

Uses Gemini to analyze scraped website content and identify products, services, or SaaS offerings. Creates entity records for each discovered product. Runs in GROUP 2b (parallel with GROUP 2a, but its output feeds GROUP 3a and 3b).

Shared Service

This task uses the shared discover_products service:
src/app/shared/products/discover.py
The same service is used by:
  • Onboarding (this task - GROUP 2b)
  • Cron (discover-products-from-changes)

Function Signature

async def run_discover_products(
    url: str, 
    org_slug: str,
    business_name: str,
    pages: list
) -> tuple[StepResult, list]

Parameters

ParameterTypeDescription
urlstrThe business website URL
org_slugstrThe Clerk organization slug
business_namestrThe business name
pageslistPre-scraped pages from GROUP 1a

Returns

{
  "name": "discover_products",
  "status": "success",
  "data": {
    "product_count": 3,
    "existing_count": 0,
    "products": ["Product A", "Product B", "Product C"]
  }
}
Also returns a list of discovered products for GROUP 3a and 3b:
[
  {"name": "Product A", "entity_id": "uuid-1", "source_urls": [...], "url": "..."},
  {"name": "Product B", "entity_id": "uuid-2", "source_urls": [...], "url": "..."},
  {"name": "Product C", "entity_id": "uuid-3", "source_urls": [...], "url": "..."}
]

Execution Flow

GROUP 3a (Product Prompts) and GROUP 3b (Generate Product LLMs) run in parallel when products are discovered, without waiting for GROUP 2a to complete.

Database Schema

Products are stored as entities with type = 'product':
INSERT INTO entities (
  clerk_org_id,
  type,
  parent_id,
  name,
  url,
  product_source_urls
) VALUES (
  'my-business-abc123',
  'product',
  'business-entity-uuid',
  'Product A',
  'https://example.com/product-a',
  '["https://example.com/product-a", "https://example.com/categories/widgets"]'
);

Code Location

src/app/shared/products/
β”œβ”€β”€ __init__.py
β”œβ”€β”€ discover.py           # Shared discover_products service
└── generate_llms_txt.py  # Shared generate_product_llms_txt service

src/app/apis/onboarding/generate_all/tasks/
β”œβ”€β”€ discover_products.py  # run_discover_products task wrapper
└── product_llms.py       # run_generate_product_llms_txt task wrapper

Error Handling

{
  "name": "discover_products",
  "status": "error",
  "error": "Gemini API failed to parse products"
}
If product discovery fails, GROUP 3a and 3b are skipped and onboarding continues with just business prompts.