Skip to main content

Architecture

The generate-all endpoint is a thin orchestrator that coordinates internal services. It returns immediately and runs all tasks in the background.

Execution Groups

GroupServicesWaits ForPurpose
1aScrape WebsiteNothingGet pages for GROUP 2 services
1bDiscover CompetitorsNothingFire-and-forget, doesn’t block
1cExa PromptsNothingFire-and-forget, doesn’t block
1dFetch FaviconNothingFire-and-forget, doesn’t block
1eMaterialize ScoreNothingFire-and-forget, doesn’t block
1fSetup CloudFrontNothingFire-and-forget, doesn’t block
2aAI Website, Business PromptsGroup 1aUses scraped pages
2bDiscover ProductsGroup 1aUses scraped pages
3aProduct PromptsGroup 2bUses discovered products
3bGenerate Product LLMsGroup 2bUses discovered products
GROUP 3a and 3b run in parallel after products are discovered. This means product prompts and product llms.txt files are generated simultaneously.
Groups 1b-1f run in parallel with 1a but don’t block Group 2. This means the main onboarding flow (scrape β†’ create AI site β†’ generate prompts) isn’t slowed down by tasks that don’t need scraped pages.

Internal Services

Each service is documented on its own page:
ServiceGroupInputOutputPage
Scrape Website1aurlpages[]View β†’
Discover Competitors1burl, business_namecompetitors[]View β†’
Exa Prompts1corg_slug, business_name, url10 pre-tested promptsView β†’
Fetch Favicon1durl, org_slugfavicon URLView β†’
Materialize Score1eurl, org_slugvisibility scoreView β†’
Setup CloudFront1furl, org_slugCloudFront distributionView β†’
Create AI Website2aurl, org_slug, pages[]AI site URLView β†’
Business Prompts2aurl, org_slug, pages[]40 promptsView β†’
Discover Products2burl, org_slug, pages[]products[]View β†’
Product Prompts3aurl, org_slug, products[], pages[]10 prompts/productView β†’
Generate Product LLMs3burl, org_slug, products[], pages[]llms.txt filesView β†’

Data Flow

scrape_website(url)
    β”‚
    β”œβ”€β”€ pages[] ──────────────────────────────────────────┐
    β”‚                                                      β”‚
    β–Ό                                                      β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ create_ai_websiteβ”‚  β”‚ business_prompts β”‚  β”‚ discover_productsβ”‚
β”‚    (pages[])     β”‚  β”‚    (pages[])     β”‚  β”‚    (pages[])     β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                                   β”‚
                                         β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                                         β–Ό                   β–Ό
                               β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                               β”‚ product_prompts β”‚  β”‚generate_product β”‚
                               β”‚   (products[])  β”‚  β”‚   _llms_txt     β”‚
                               β”‚    GROUP 3a     β”‚  β”‚    GROUP 3b     β”‚
                               β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                         β”‚                   β”‚
                                         β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                                   β”‚
                                              (parallel)

Tasks that DON'T need pages[] (run in parallel with scrape):
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   favicon    β”‚  β”‚   scoring    β”‚  β”‚  cloudfront  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ competitors  β”‚  β”‚ exa_prompts  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Code Location

src/app/apis/onboarding/
β”œβ”€β”€ generate_all/
β”‚   β”œβ”€β”€ routes.py              # Main orchestrator (GROUP 1a-1f)
β”‚   β”œβ”€β”€ models.py              # Pydantic models
β”‚   β”œβ”€β”€ scrape_website.py      # GROUP 1a: Scrape service wrapper
β”‚   β”œβ”€β”€ exa_prompts.py         # GROUP 1c: Exa prompts service
β”‚   β”œβ”€β”€ task_orchestrator.py   # GROUP 2 and 3 coordinator
β”‚   └── tasks/                 # Individual task wrappers
β”‚       β”œβ”€β”€ ai_website.py      # GROUP 2a
β”‚       β”œβ”€β”€ cloudfront.py      # GROUP 1f
β”‚       β”œβ”€β”€ discover_products.py # GROUP 2b
β”‚       β”œβ”€β”€ favicon.py         # GROUP 1d
β”‚       β”œβ”€β”€ prompts.py         # GROUP 2a + 3a
β”‚       β”œβ”€β”€ product_llms.py    # GROUP 3b
β”‚       └── scoring.py         # GROUP 1e
└── services/
    └── discover_competitors/  # GROUP 1b: Competitor discovery service

src/app/shared/products/       # Shared services used by both onboarding and cron
β”œβ”€β”€ discover.py                # Product discovery service
└── generate_llms_txt.py       # Product llms.txt generation service

Testing Individual Services

Each service can be tested independently via pytest:
# Test the orchestrator
uv run pytest src/pytests/onboarding/test_generate_all.py -v

# Test shared services used by generate-all
uv run pytest src/pytests/cron/test_generate_prompts.py -v