Architecture
The generate-all endpoint is a thin orchestrator that coordinates internal services. It returns immediately and runs all tasks in the background.
Execution Groups
| Group | Services | Waits For | Purpose |
|---|
| 1a | Discover Business Info | Nothing | Firecrawl agent for business context |
| 1b | Scrape Website | Nothing | Get pages for markdown replicas |
| 1c | Discover Competitors | Nothing | Fire-and-forget, doesn’t block |
| 1d | Discover Products | Nothing | Uses Shopify products.json API |
| 1e | Fetch Favicon | Nothing | Fire-and-forget, doesn’t block |
| 1f | Materialize Score | Nothing | Fire-and-forget, doesn’t block |
| 1g | Setup CloudFront | Nothing | Fire-and-forget, doesn’t block |
| 2a | Create AI Website | 1a, 1b, 1d | Uses business_info for LLM content, pages for replicas |
| 2b | Product Prompts | 1d | Uses discovered products |
| 2c | Generate Product LLMs | 1b, 1d | Uses products and pages |
Key Architecture Change: Business info from Firecrawl agent (GROUP 1a) is used for LLM content generation (llms.txt, Q&A pages, data.json). Scraped pages (GROUP 1b) are ONLY used for markdown replica generation.
GROUP 2 runs in parallel - All three tasks (2a, 2b, 2c) start simultaneously once their dependencies are ready.
Internal Services
Each service is documented on its own page:
| Service | Group | Input | Output | Page |
|---|
| Discover Business Info | 1a | url, business_name | business_info{} | View → |
| Scrape Website | 1b | url | pages[] | View → |
| Discover Competitors | 1c | url, business_name | competitors[] | View → |
| Discover Products | 1d | url, org_slug | products[] | View → |
| Fetch Favicon | 1e | url, org_slug | favicon URL | View → |
| Materialize Score | 1f | url, org_slug | visibility score | View → |
| Setup CloudFront | 1g | url, org_slug | CloudFront distribution | View → |
| Create AI Website | 2a | business_info, pages[], products[] | AI site URL | View → |
| Product Prompts | 2b | products[], business_name | 5+ prompts/product | View → |
| Generate Product LLMs | 2c | products[], pages[], business_name | llms.txt files | View → |
Prompt Generation
All prompts are now tied to products:
| Metric | Value |
|---|
| Prompts per product | 5 (default) |
| Minimum total prompts | 50 |
| New products (via cron) | 5 prompts each |
| Daily sampling | 10 prompts |
During onboarding, if there are fewer than 10 products, prompts per product is increased to ensure at least 50 total prompts.
Data Flow
GROUP 1 (all parallel):
┌──────────────┐ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│ business_info│ │ scrape │ │ competitors │ │ products │
│ (1a) │ │ (1b) │ │ (1c) │ │ (1d) │
└──────┬───────┘ └──────┬───────┘ └──────────────┘ └──────┬───────┘
│ │ │
│ │ │
▼ ▼ ▼
┌──────────────────────────────────────────────────────────────────┐
│ GROUP 2 (parallel) │
├──────────────────────┬───────────────────┬───────────────────────┤
│ ai_website (2a) │ prompts (2b) │ llms_txt (2c) │
│ uses: business_info │ uses: products │ uses: products │
│ pages (replicas) │ pages │
│ products │ business_name │
└──────────────────────┴───────────────────┴───────────────────────┘
Also running in parallel (GROUP 1e-1g):
┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│ favicon │ │ scoring │ │ cloudfront │
└──────────────┘ └──────────────┘ └──────────────┘
Code Location
src/app/apis/onboarding/
├── generate_all/
│ ├── routes.py # Main orchestrator (all groups)
│ ├── models.py # Pydantic models
│ ├── scrape_website.py # GROUP 1b: Scrape service wrapper
│ └── tasks/ # Individual task wrappers
│ ├── business_info.py # GROUP 1a: NEW
│ ├── ai_website.py # GROUP 2a
│ ├── cloudfront.py # GROUP 1g
│ ├── discover_products.py # GROUP 1d
│ ├── favicon.py # GROUP 1e
│ ├── prompts.py # GROUP 2b
│ ├── product_llms.py # GROUP 2c
│ └── scoring.py # GROUP 1f
└── services/
└── discover_competitors/ # GROUP 1c: Competitor discovery service
src/app/shared/
├── discover_business_info/ # NEW: Firecrawl agent for business info
│ ├── __init__.py
│ └── service.py
├── ai_website/ # AI Website service
│ ├── service.py # create_ai_website_from_business_info
│ └── llm_organize.py # organize_with_llm_from_business_info
└── products/ # Shared services used by both onboarding and cron
├── discover.py # Product discovery via Shopify products.json
└── generate_llms_txt.py # Product llms.txt generation service
Testing Individual Services
Each service can be tested independently via pytest:
# Test the orchestrator
uv run pytest src/pytests/onboarding/test_generate_all.py -v
# Test shared services used by generate-all
uv run pytest src/pytests/cron/test_generate_prompts.py -v