Architecture
The generate-all endpoint is a thin orchestrator that coordinates internal services. It returns immediately and runs all tasks in the background.
Execution Groups
| Group | Services | Waits For | Purpose |
|---|
| 1a | Discover Business Info | Nothing | Firecrawl agent for business context |
| 1b | Scrape Website | Nothing | Get pages for markdown replicas |
| 1c | Discover Competitors | Nothing | Fire-and-forget, doesnβt block |
| 1d | Discover Products | Nothing | Uses Shopify products.json API |
| 1e | Fetch Favicon | Nothing | Fire-and-forget, doesnβt block |
| 1f | Materialize Score | Nothing | Fire-and-forget, doesnβt block |
| 1g | Setup CloudFront | Nothing | Fire-and-forget, doesnβt block |
| 2a | Create AI Website | 1a, 1b, 1d | Uses business_info for LLM content, pages for replicas |
| 2b | Product Prompts | 1d | Uses discovered products |
| 2c | Generate Product LLMs | 1b, 1d | Uses products and pages |
Key Architecture Change: Business info from Firecrawl agent (GROUP 1a) is used for LLM content generation (llms.txt, Q&A pages, data.json). Scraped pages (GROUP 1b) are ONLY used for markdown replica generation.
GROUP 2 runs in parallel - All three tasks (2a, 2b, 2c) start simultaneously once their dependencies are ready.
Internal Services
Each service is documented on its own page:
| Service | Group | Input | Output | Page |
|---|
| Discover Business Info | 1a | url, business_name | business_info{} | View β |
| Scrape Website | 1b | url | pages[] | View β |
| Discover Competitors | 1c | url, business_name | competitors[] | View β |
| Discover Products | 1d | url, org_slug | products[] | View β |
| Fetch Favicon | 1e | url, org_slug | favicon URL | View β |
| Materialize Score | 1f | url, org_slug | visibility score | View β |
| Setup CloudFront | 1g | url, org_slug | CloudFront distribution | View β |
| Create AI Website | 2a | business_info, pages[], products[] | AI site URL | View β |
| Product Prompts | 2b | products[], business_name | 5+ prompts/product | View β |
| Generate Product LLMs | 2c | products[], pages[], business_name | llms.txt files | View β |
Prompt Generation
All prompts are now tied to products:
| Metric | Value |
|---|
| Prompts per product | 5 (default) |
| Minimum total prompts | 50 |
| New products (via cron) | 5 prompts each |
| Daily sampling | 10 prompts |
During onboarding, if there are fewer than 10 products, prompts per product is increased to ensure at least 50 total prompts.
Data Flow
GROUP 1 (all parallel):
ββββββββββββββββ ββββββββββββββββ ββββββββββββββββ ββββββββββββββββ
β business_infoβ β scrape β β competitors β β products β
β (1a) β β (1b) β β (1c) β β (1d) β
ββββββββ¬ββββββββ ββββββββ¬ββββββββ ββββββββββββββββ ββββββββ¬ββββββββ
β β β
β β β
βΌ βΌ βΌ
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β GROUP 2 (parallel) β
ββββββββββββββββββββββββ¬ββββββββββββββββββββ¬ββββββββββββββββββββββββ€
β ai_website (2a) β prompts (2b) β llms_txt (2c) β
β uses: business_info β uses: products β uses: products β
β pages (replicas) β pages β
β products β business_name β
ββββββββββββββββββββββββ΄ββββββββββββββββββββ΄ββββββββββββββββββββββββ
Also running in parallel (GROUP 1e-1g):
ββββββββββββββββ ββββββββββββββββ ββββββββββββββββ
β favicon β β scoring β β cloudfront β
ββββββββββββββββ ββββββββββββββββ ββββββββββββββββ
Code Location
src/app/apis/onboarding/
βββ generate_all/
β βββ routes.py # Main orchestrator (all groups)
β βββ models.py # Pydantic models
β βββ scrape_website.py # GROUP 1b: Scrape service wrapper
β βββ tasks/ # Individual task wrappers
β βββ business_info.py # GROUP 1a: NEW
β βββ ai_website.py # GROUP 2a
β βββ cloudfront.py # GROUP 1g
β βββ discover_products.py # GROUP 1d
β βββ favicon.py # GROUP 1e
β βββ prompts.py # GROUP 2b
β βββ product_llms.py # GROUP 2c
β βββ scoring.py # GROUP 1f
βββ services/
βββ discover_competitors/ # GROUP 1c: Competitor discovery service
src/app/shared/
βββ discover_business_info/ # NEW: Firecrawl agent for business info
β βββ __init__.py
β βββ service.py
βββ ai_website/ # AI Website service
β βββ service.py # create_ai_website_from_business_info
β βββ llm_organize.py # organize_with_llm_from_business_info
βββ products/ # Shared services used by both onboarding and cron
βββ discover.py # Product discovery via Shopify products.json
βββ generate_llms_txt.py # Product llms.txt generation service
Testing Individual Services
Each service can be tested independently via pytest:
# Test the orchestrator
uv run pytest src/pytests/onboarding/test_generate_all.py -v
# Test shared services used by generate-all
uv run pytest src/pytests/cron/test_generate_prompts.py -v