Onboarding Overview

The Onboarding API handles everything needed when a new business signs up. The backend orchestrates all tasks via a single endpoint - the frontend just triggers it and can navigate away.

Flow

When a user completes payment and onboarding:

FRONTEND:
1. POST /api/business                     → Create org metadata + entity
2. POST /api/onboarding/generate-all      → Trigger backend orchestrator (returns immediately)

BACKEND (runs in background):
├── GROUP 1 - All Parallel:
│   ├── 1a: Scrape website (pages used by GROUP 2)
│   ├── 1b: Discover Competitors (Firecrawl agent)
│   ├── 1c: Discover Products (Shopify products.json API)
│   ├── 1d: Fetch Favicon
│   ├── 1e: Materialize Score
│   └── 1f: Setup CloudFront
│
├── GROUP 2 - After Scrape (1a):
│   └── 2a: Create AI Website (uses scraped pages)
│
└── GROUP 3 - After Products (1c) - Parallel:
    ├── 3a: Product Prompts (5+ prompts per product, min 50 total)
    └── 3b: Generate Product LLMs (/llms/{product-slug}.txt files)

Optimized Flow: All GROUP 1 tasks run in parallel. GROUP 2 starts when scrape completes. GROUP 3 starts when products are discovered - it can start before GROUP 2! GROUP 3a and 3b run in parallel with each other.

Endpoints

The frontend calls these endpoints:

Endpoint	Purpose	Auth Required
Product Names	Fetch product names for scanning UI	No
Create Business	Create org metadata + entity	Yes (JWT)
Generate All	Backend orchestrator - runs all onboarding tasks	Yes (JWT)

All other onboarding tasks are internal services called by generate-all. They are not exposed as HTTP endpoints.

Internal Services

The generate-all orchestrator calls these shared services directly (not via HTTP):

Service	Purpose	Location
Scraping	Custom mapper + Firecrawl batch scrape	`shared/scraping/`, `shared/mapping/`
AI Website	Deploy AI-optimized site to Vercel	`shared/ai_website/`
Prompts	Generate visibility prompts with Gemini	`shared/prompts/`
Products	Discover products via Shopify API	`shared/products/`
CloudFront	Create CloudFront distribution	`shared/cloudfront/`
Content Hasher	Store page hashes for change detection	`shared/content_hasher/`
Favicon	Fetch & store favicon (onboarding-only)	`onboarding/generate_all/tasks/favicon.py`
Scoring	Copy pre-payment ranking score (onboarding-only)	`onboarding/generate_all/tasks/scoring.py`
Competitors	Discover up to 10 competitors using Firecrawl agent	`onboarding/services/discover_competitors/`

Services in shared/ are used by multiple modules (onboarding, cron, domain). Services in onboarding/generate_all/ are only used during onboarding.

What Gets Created

After onboarding completes, the business has:

Asset	Description	Created By
Org Metadata	Clerk org details in database	`POST /api/business`
Business Entity	Entity record in `entities` table	`POST /api/business`
Favicon	Stored favicon URL	Favicon Service
AI Site	AI-optimized website at `*.searchcompany.dev`	AI Website Service
Markdown Replica Pages	1:1 markdown copies of source website pages	AI Website Service
Product Entities	Auto-discovered products from Shopify	Products Service
50+ Product Prompts	5+ prompts per product (min 50 total)	Product Prompts Service
Product LLMs Files	`/llms/{product-slug}.txt` for each product	Product LLMs Service
Visibility Score	Initial pre-payment score	Scoring Service
CloudFront Distribution	Pre-created proxy for DNS propagation	CloudFront Service
Competitors	Up to 10 auto-discovered competitors	Competitors Service

The business entity is created by POST /api/business before generate-all is called. All other assets are created by the backend orchestrator running in the background.

Prompt Generation Strategy

All prompts are tied to products:

Metric	Value
Prompts per product	5 (default)
Minimum total prompts	50 during onboarding
New products (via cron)	5 prompts each
Daily sampling	10 prompts for visibility scoring

If a store has fewer than 10 products, prompts per product is increased to ensure at least 50 total.

Regenerating Prompts

If prompts need to be regenerated for a product:

curl -X POST https://searchcompany-main.up.railway.app/api/cron/generate-prompts \
  -H "X-API-Key: your-api-key" \
  -H "Content-Type: application/json" \
  -d '{
    "business_id": "my-business-abc123",
    "url": "https://mystore.com/products/my-product",
    "product_id": "product-entity-uuid",
    "product_name": "My Product"
  }'

Testing

Run all onboarding tests:

cd Backend
uv run pytest src/pytests/onboarding/test_generate_all.py -v -s

Getting Started

Website

Onboarding

Cron

Your Current Setup

Explore

Settings - Toggle

Settings - Business

Settings - Team

Settings - Billing

Settings - Domain

Webhooks

Health

Manual Trigger

Flow

Endpoints

Internal Services

What Gets Created

Prompt Generation Strategy

Regenerating Prompts

Testing

Getting Started

Website

Onboarding

Cron

Your Current Setup

Explore

Settings - Toggle

Settings - Business

Settings - Team

Settings - Billing

Settings - Domain

Webhooks

Health

Manual Trigger

​Flow

​Endpoints

​Internal Services

​What Gets Created

​Prompt Generation Strategy

​Regenerating Prompts

​Testing

Flow

Endpoints

Internal Services

What Gets Created

Prompt Generation Strategy

Regenerating Prompts

Testing