Skip to main content

Architecture

The generate-all endpoint is a thin orchestrator that coordinates internal services. It returns immediately and runs all tasks in the background.

Execution Groups

GroupServicesWaits ForPurpose
1aDiscover Business InfoNothingFirecrawl agent for business context
1bScrape WebsiteNothingGet pages for markdown replicas
1cDiscover CompetitorsNothingFire-and-forget, doesn’t block
1dDiscover ProductsNothingUses Shopify products.json API
1eFetch FaviconNothingFire-and-forget, doesn’t block
1fMaterialize ScoreNothingFire-and-forget, doesn’t block
1gSetup CloudFrontNothingFire-and-forget, doesn’t block
2aCreate AI Website1a, 1b, 1dUses business_info for LLM content, pages for replicas
2bProduct Prompts1dUses discovered products
2cGenerate Product LLMs1b, 1dUses products and pages
Key Architecture Change: Business info from Firecrawl agent (GROUP 1a) is used for LLM content generation (llms.txt, Q&A pages, data.json). Scraped pages (GROUP 1b) are ONLY used for markdown replica generation.
GROUP 2 runs in parallel - All three tasks (2a, 2b, 2c) start simultaneously once their dependencies are ready.

Internal Services

Each service is documented on its own page:
ServiceGroupInputOutputPage
Discover Business Info1aurl, business_namebusiness_info{}View β†’
Scrape Website1burlpages[]View β†’
Discover Competitors1curl, business_namecompetitors[]View β†’
Discover Products1durl, org_slugproducts[]View β†’
Fetch Favicon1eurl, org_slugfavicon URLView β†’
Materialize Score1furl, org_slugvisibility scoreView β†’
Setup CloudFront1gurl, org_slugCloudFront distributionView β†’
Create AI Website2abusiness_info, pages[], products[]AI site URLView β†’
Product Prompts2bproducts[], business_name5+ prompts/productView β†’
Generate Product LLMs2cproducts[], pages[], business_namellms.txt filesView β†’

Prompt Generation

All prompts are now tied to products:
MetricValue
Prompts per product5 (default)
Minimum total prompts50
New products (via cron)5 prompts each
Daily sampling10 prompts
During onboarding, if there are fewer than 10 products, prompts per product is increased to ensure at least 50 total prompts.

Data Flow

GROUP 1 (all parallel):
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ business_infoβ”‚  β”‚   scrape     β”‚  β”‚ competitors  β”‚  β”‚   products   β”‚
β”‚   (1a)       β”‚  β”‚    (1b)      β”‚  β”‚    (1c)      β”‚  β”‚    (1d)      β”‚
β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜
       β”‚                 β”‚                                    β”‚
       β”‚                 β”‚                                    β”‚
       β–Ό                 β–Ό                                    β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                         GROUP 2 (parallel)                        β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚    ai_website (2a)   β”‚   prompts (2b)    β”‚   llms_txt (2c)       β”‚
β”‚  uses: business_info β”‚  uses: products   β”‚  uses: products       β”‚
β”‚        pages (replicas)                  β”‚        pages          β”‚
β”‚        products                          β”‚        business_name  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Also running in parallel (GROUP 1e-1g):
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   favicon    β”‚  β”‚   scoring    β”‚  β”‚  cloudfront  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Code Location

src/app/apis/onboarding/
β”œβ”€β”€ generate_all/
β”‚   β”œβ”€β”€ routes.py              # Main orchestrator (all groups)
β”‚   β”œβ”€β”€ models.py              # Pydantic models
β”‚   β”œβ”€β”€ scrape_website.py      # GROUP 1b: Scrape service wrapper
β”‚   └── tasks/                 # Individual task wrappers
β”‚       β”œβ”€β”€ business_info.py   # GROUP 1a: NEW
β”‚       β”œβ”€β”€ ai_website.py      # GROUP 2a
β”‚       β”œβ”€β”€ cloudfront.py      # GROUP 1g
β”‚       β”œβ”€β”€ discover_products.py # GROUP 1d
β”‚       β”œβ”€β”€ favicon.py         # GROUP 1e
β”‚       β”œβ”€β”€ prompts.py         # GROUP 2b
β”‚       β”œβ”€β”€ product_llms.py    # GROUP 2c
β”‚       └── scoring.py         # GROUP 1f
└── services/
    └── discover_competitors/  # GROUP 1c: Competitor discovery service

src/app/shared/
β”œβ”€β”€ discover_business_info/    # NEW: Firecrawl agent for business info
β”‚   β”œβ”€β”€ __init__.py
β”‚   └── service.py
β”œβ”€β”€ ai_website/                # AI Website service
β”‚   β”œβ”€β”€ service.py             # create_ai_website_from_business_info
β”‚   └── llm_organize.py        # organize_with_llm_from_business_info
└── products/                  # Shared services used by both onboarding and cron
    β”œβ”€β”€ discover.py            # Product discovery via Shopify products.json
    └── generate_llms_txt.py   # Product llms.txt generation service

Testing Individual Services

Each service can be tested independently via pytest:
# Test the orchestrator
uv run pytest src/pytests/onboarding/test_generate_all.py -v

# Test shared services used by generate-all
uv run pytest src/pytests/cron/test_generate_prompts.py -v