Overview

Architecture

The generate-all endpoint is a thin orchestrator that coordinates internal services. It returns immediately and runs all tasks in the background.

Execution Groups

Group	Services	Waits For	Purpose
1a	Discover Business Info	Nothing	Firecrawl agent for business context
1b	Scrape Website	Nothing	Get pages for markdown replicas
1c	Discover Competitors	Nothing	Fire-and-forget, doesn’t block
1d	Discover Products	Nothing	Uses Shopify products.json API
1e	Fetch Favicon	Nothing	Fire-and-forget, doesn’t block
1f	Materialize Score	Nothing	Fire-and-forget, doesn’t block
1g	Setup CloudFront	Nothing	Fire-and-forget, doesn’t block
2a	Create AI Website	1a, 1b, 1d	Uses business_info for LLM content, pages for replicas
2b	Product Prompts	1d	Uses discovered products
2c	Generate Product LLMs	1b, 1d	Uses products and pages

Key Architecture Change: Business info from Firecrawl agent (GROUP 1a) is used for LLM content generation (llms.txt, Q&A pages, data.json). Scraped pages (GROUP 1b) are ONLY used for markdown replica generation.

GROUP 2 runs in parallel - All three tasks (2a, 2b, 2c) start simultaneously once their dependencies are ready.

Internal Services

Each service is documented on its own page:

Service	Group	Input	Output	Page
Discover Business Info	1a	`url`, `business_name`	`business_info{}`	View →
Scrape Website	1b	`url`	`pages[]`	View →
Discover Competitors	1c	`url`, `business_name`	`competitors[]`	View →
Discover Products	1d	`url`, `org_slug`	`products[]`	View →
Fetch Favicon	1e	`url`, `org_slug`	favicon URL	View →
Materialize Score	1f	`url`, `org_slug`	visibility score	View →
Setup CloudFront	1g	`url`, `org_slug`	CloudFront distribution	View →
Create AI Website	2a	`business_info`, `pages[]`, `products[]`	AI site URL	View →
Product Prompts	2b	`products[]`, `business_name`	5+ prompts/product	View →
Generate Product LLMs	2c	`products[]`, `pages[]`, `business_name`	llms.txt files	View →

Prompt Generation

All prompts are now tied to products:

Metric	Value
Prompts per product	5 (default)
Minimum total prompts	50
New products (via cron)	5 prompts each
Daily sampling	10 prompts

During onboarding, if there are fewer than 10 products, prompts per product is increased to ensure at least 50 total prompts.

Data Flow

GROUP 1 (all parallel):
┌──────────────┐  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐
│ business_info│  │   scrape     │  │ competitors  │  │   products   │
│   (1a)       │  │    (1b)      │  │    (1c)      │  │    (1d)      │
└──────┬───────┘  └──────┬───────┘  └──────────────┘  └──────┬───────┘
       │                 │                                    │
       │                 │                                    │
       ▼                 ▼                                    ▼
┌──────────────────────────────────────────────────────────────────┐
│                         GROUP 2 (parallel)                        │
├──────────────────────┬───────────────────┬───────────────────────┤
│    ai_website (2a)   │   prompts (2b)    │   llms_txt (2c)       │
│  uses: business_info │  uses: products   │  uses: products       │
│        pages (replicas)                  │        pages          │
│        products                          │        business_name  │
└──────────────────────┴───────────────────┴───────────────────────┘

Also running in parallel (GROUP 1e-1g):
┌──────────────┐  ┌──────────────┐  ┌──────────────┐
│   favicon    │  │   scoring    │  │  cloudfront  │
└──────────────┘  └──────────────┘  └──────────────┘

Code Location

src/app/apis/onboarding/
├── generate_all/
│   ├── routes.py              # Main orchestrator (all groups)
│   ├── models.py              # Pydantic models
│   ├── scrape_website.py      # GROUP 1b: Scrape service wrapper
│   └── tasks/                 # Individual task wrappers
│       ├── business_info.py   # GROUP 1a: NEW
│       ├── ai_website.py      # GROUP 2a
│       ├── cloudfront.py      # GROUP 1g
│       ├── discover_products.py # GROUP 1d
│       ├── favicon.py         # GROUP 1e
│       ├── prompts.py         # GROUP 2b
│       ├── product_llms.py    # GROUP 2c
│       └── scoring.py         # GROUP 1f
└── services/
    └── discover_competitors/  # GROUP 1c: Competitor discovery service

src/app/shared/
├── discover_business_info/    # NEW: Firecrawl agent for business info
│   ├── __init__.py
│   └── service.py
├── ai_website/                # AI Website service
│   ├── service.py             # create_ai_website_from_business_info
│   └── llm_organize.py        # organize_with_llm_from_business_info
└── products/                  # Shared services used by both onboarding and cron
    ├── discover.py            # Product discovery via Shopify products.json
    └── generate_llms_txt.py   # Product llms.txt generation service

Testing Individual Services

Each service can be tested independently via pytest:

# Test the orchestrator
uv run pytest src/pytests/onboarding/test_generate_all.py -v

# Test shared services used by generate-all
uv run pytest src/pytests/cron/test_generate_prompts.py -v

Getting Started

Website

Onboarding

Cron

Your Current Setup

Explore

Settings - Toggle

Settings - Business

Settings - Team

Settings - Billing

Settings - Domain

Webhooks

Health

Manual Trigger

Architecture

Execution Groups

Internal Services

Prompt Generation

Data Flow

Code Location

Testing Individual Services

Getting Started

Website

Onboarding

Cron

Your Current Setup

Explore

Settings - Toggle

Settings - Business

Settings - Team

Settings - Billing

Settings - Domain

Webhooks

Health

Manual Trigger

​Architecture

​Execution Groups

​Internal Services

​Prompt Generation

​Data Flow

​Code Location

​Testing Individual Services

Architecture

Execution Groups

Internal Services

Prompt Generation

Data Flow

Code Location

Testing Individual Services