Skip to main content
The Onboarding API handles everything needed when a new business signs up. The backend orchestrates all tasks via a single endpoint - the frontend just triggers it and can navigate away.

Flow

When a user completes payment and onboarding:
FRONTEND:
1. POST /api/business                     β†’ Create org metadata + entity
2. POST /api/onboarding/generate-all      β†’ Trigger backend orchestrator (returns immediately)

BACKEND (runs in background):
β”œβ”€β”€ GROUP 1a - Scrape (await):
β”‚   └── Scrape website once (pages used by GROUP 2+)
β”‚
β”œβ”€β”€ GROUP 1b - Discover Competitors (parallel with 1a, non-blocking):
β”‚   └── Competitors Service (finds up to 10 competitors using Firecrawl agent)
β”‚
β”œβ”€β”€ GROUP 1c - Exa Prompts + Visibility (parallel with 1a, non-blocking):
β”‚   └── Exa Answer API generates 10 prompts, then visibility checked across 8 platforms
β”‚
β”œβ”€β”€ GROUP 1d - Fetch Favicon (parallel with 1a, non-blocking):
β”‚   └── Favicon Service (doesn't need scrape data)
β”‚
β”œβ”€β”€ GROUP 1e - Materialize Score (parallel with 1a, non-blocking):
β”‚   └── Scoring Service (doesn't need scrape data)
β”‚
β”œβ”€β”€ GROUP 1f - Setup CloudFront (parallel with 1a, non-blocking):
β”‚   └── CloudFront Service (doesn't need scrape data)
β”‚
β”œβ”€β”€ GROUP 2a - Main Tasks (parallel, starts when 1a completes):
β”‚   β”œβ”€β”€ AI Website Service (uses scrape data)
β”‚   └── Business Prompts Service (40 prompts, uses scrape data)
β”‚
β”œβ”€β”€ GROUP 2b - Discover Products (parallel with 2a):
β”‚   └── Products Service (uses scrape data)
β”‚
β”œβ”€β”€ GROUP 3a - Product Prompts (starts when 2b completes, parallel with 3b):
β”‚   └── Prompts Service for each product (10 prompts per product)
β”‚
└── GROUP 3b - Generate Product LLMs (starts when 2b completes, parallel with 3a):
    └── Product LLMs Service (generates /llms/{product-slug}.txt files)
Optimized Flow: GROUP 1b-1f run in parallel with GROUP 1a (scrape) but don’t block GROUP 2. Favicon, Scoring, and CloudFront were moved to GROUP 1 since they don’t need scraped pages. GROUP 2 starts as soon as GROUP 1a completes. GROUP 3a and 3b start as soon as products are discovered (GROUP 2b), without waiting for GROUP 2a to complete. GROUP 3a and 3b run in parallel.

Endpoints

The frontend only calls these two endpoints:
EndpointPurposeAuth Required
Create BusinessCreate org metadata + entityYes (JWT)
Generate AllBackend orchestrator - runs all onboarding tasksYes (JWT)
All other onboarding tasks are internal services called by generate-all. They are not exposed as HTTP endpoints.

Internal Services

The generate-all orchestrator calls these shared services directly (not via HTTP):
ServicePurposeLocation
ScrapingCustom mapper + Firecrawl batch scrapeshared/scraping/, shared/mapping/
AI WebsiteDeploy AI-optimized site to Vercelshared/ai_website/
PromptsGenerate visibility prompts with Geminishared/prompts/
ProductsDiscover products and generate llms filesshared/products/
CloudFrontCreate CloudFront distributionshared/cloudfront/
Content HasherStore page hashes for change detectionshared/content_hasher/
HashingFetch raw HTML and create hashesshared/hashing/
Exa AnswerGenerate prompts via Exa Answer APIshared/exa/
FaviconFetch & store favicon (onboarding-only)onboarding/generate_all/tasks/favicon.py
ScoringCopy pre-payment ranking score (onboarding-only)onboarding/generate_all/tasks/scoring.py
CompetitorsDiscover up to 10 competitors using Firecrawl agentonboarding/services/discover_competitors/
Exa PromptsGenerate 10 pre-tested prompts (onboarding-only)onboarding/generate_all/exa_prompts.py
Services in shared/ are used by multiple modules (onboarding, cron, domain). Services in onboarding/generate_all/ are only used during onboarding.

What Gets Created

After onboarding completes, the business has:
AssetDescriptionCreated By
Org MetadataClerk org details in databasePOST /api/business
Business EntityEntity record in entities tablePOST /api/business
FaviconStored favicon URLFavicon Service
AI SiteAI-optimized website at *.searchcompany.devAI Website Service
Markdown Replica Pages1:1 markdown copies of source website pagesAI Website Service
50 Business Prompts10 pre-tested (Exa) + 40 via GeminiExa Prompts + Business Prompts Service
Product EntitiesAuto-discovered products with source URLsProducts Service
10 Prompts per ProductProduct visibility promptsProduct Prompts Service
Product LLMs Files/llms/{product-slug}.txt for each productProduct LLMs Service
Visibility ScoreInitial pre-payment scoreScoring Service
CloudFront DistributionPre-created proxy for DNS propagationCloudFront Service
CompetitorsUp to 10 auto-discovered competitorsCompetitors Service
The business entity is created by POST /api/business before generate-all is called. All other assets are created by the backend orchestrator running in the background.

Prompt Generation Strategy

SourceCountPre-testedDescription
GROUP 1c (Exa)10βœ… YesGenerated via Exa Answer API, visibility checked across 8 platforms
GROUP 2a (Gemini)40❌ NoGenerated via Gemini 3 Flash from scraped content
GROUP 3a (Products)10/product❌ NoGenerated via Gemini for each discovered product
The 10 pre-tested prompts from GROUP 1c appear first in the UI with real visibility data, so users see immediate results after onboarding.

Regenerating Prompts

If prompts need to be regenerated (e.g., after website content changes), use the manual trigger endpoint:
curl -X POST https://searchcompany-main.up.railway.app/api/manual-trigger/regenerate-prompts \
  -H "X-API-Key: your-api-key" \
  -H "Content-Type: application/json" \
  -d '{"url": "https://example.com", "business_id": "my-business-abc123", "prompt_count": 50}'
See Regenerate Prompts for details.

Testing

Run all onboarding tests:
cd Backend
uv run pytest src/app/onboarding/pytest_generate_all.py -v -s
Run a specific test:
uv run pytest src/app/onboarding/pytest_generate_all.py::TestOnboarding::test_03_create_ai_website -v -s