Overview
This is the main onboarding endpoint. It triggers all onboarding tasks and runs them in the background. The frontend can navigate away immediately after calling this endpoint.
This endpoint returns immediately with status "started". All tasks run asynchronously in the backend.
Request Body
The business website URL (e.g., https://example.com)
The Clerk organization slug (e.g., my-business-abc123)
The business name for display purposes
Total number of visibility prompts to generate for the business (10 via Exa + 40 via Gemini)
Response
Always "started" on success
Human-readable status message
Example
curl -X POST https://searchcompany-main.up.railway.app/api/onboarding/generate-all \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{
"url": "https://example.com",
"org_slug": "my-business-abc123",
"business_name": "My Business"
}'
{
"status": "started",
"message": "Onboarding tasks started for my-business-abc123. Check logs for progress."
}
Internal Services
The orchestrator calls these services directly (not via HTTP). All services are in src/app/services/:
GROUP 1a: Scrape (await)
Scrapes the website using Custom Website Mapper + Firecrawl Batch Scrape API. Returns pages for all subsequent tasks. GROUP 2 waits for this.
How It Works:
- Custom Website Mapper (free) - Multi-strategy URL discovery: robots.txt, sitemap.xml, HTML link extraction, recursive crawling. Returns up to 5000 URLs.
- Firecrawl Batch Scrape (
POST /v2/batch/scrape) - Scrapes all discovered URLs in parallel. Returns markdown content for each page.
This two-step approach is efficient: our custom mapper finds URLs (free), then Firecrawl fetches content in parallel (paid).
GROUP 1b: Discover Competitors (parallel with 1a, non-blocking)
Runs in parallel with scraping but doesnโt block GROUP 2. Uses Firecrawl Agent API to find competitors.
| Service | File | Purpose |
|---|
| Competitors Service | discover_competitors/ | Uses Firecrawl Agent API to find up to 10 competitors |
GROUP 1c: Exa Prompts + Visibility Check (parallel with 1a, non-blocking)
Runs in parallel with scraping but doesnโt block GROUP 2. Generates 10 prompts using Exa Answer API and immediately runs visibility checks on them.
| Service | File | Purpose |
|---|
| Exa Prompts Service | exa_prompts.py | Uses Exa Answer API to generate 10 prompts, then runs visibility checks across 8 AI platforms |
Pre-tested Prompts: These 10 prompts are pre-tested during onboarding so users see real visibility data immediately when they first load the dashboard. The prompts will have last_tested_at set and visibility results populated.
GROUP 1d: Fetch Favicon (parallel with 1a, non-blocking)
Runs in parallel with scraping but doesnโt block GROUP 2. Doesnโt need scraped pages.
| Service | File | Purpose |
|---|
| Favicon Service | favicon.py | Downloads favicon, converts to PNG, uploads to Supabase Storage |
GROUP 1e: Materialize Score (parallel with 1a, non-blocking)
Runs in parallel with scraping but doesnโt block GROUP 2. Doesnโt need scraped pages.
| Service | File | Purpose |
|---|
| Scoring Service | scoring.py | Copies pre-payment ranking score to visibility_score_history table |
GROUP 1f: Setup CloudFront (parallel with 1a, non-blocking)
Runs in parallel with scraping but doesnโt block GROUP 2. Doesnโt need scraped pages.
| Service | File | Purpose |
|---|
| CloudFront Service | cloudfront.py | Creates CloudFront distribution for domain proxy |
GROUP 2a: Main Tasks (parallel, starts when 1a completes)
Runs simultaneously using asyncio.gather. Only includes tasks that need scraped pages:
| Service | File | Purpose |
|---|
| AI Website Service | ai_website.py | Generates llms.txt, robots.txt, sitemap, schema, and markdown replica pages; deploys to Vercel |
| Business Prompts Service | prompts.py | Generates 40 visibility prompts using Gemini 3 Flash (10 come from GROUP 1c) |
GROUP 2b: Discover Products (parallel with 2a)
| Service | File | Purpose |
|---|
| Products Service | discover_products.py | Uses shared discover_products service to identify products |
GROUP 3a: Product Prompts (starts when 2b completes, parallel with 3b)
| Service | File | Purpose |
|---|
| Product Prompts Service | prompts.py | Generates 10 prompts per discovered product |
GROUP 3b: Generate Product LLMs (starts when 2b completes, parallel with 3a)
| Service | File | Purpose |
|---|
| Product LLMs Service | product_llms.py | Generates /llms/{product-slug}.txt files for each product |
Optimized Flow:
- GROUP 1b-1f run in parallel with GROUP 1a (scrape) but donโt block GROUP 2.
- GROUP 2 starts as soon as GROUP 1a completes.
- GROUP 3a and 3b start as soon as products are discovered (GROUP 2b), and run in parallel with each other.
- GROUP 1c generates 10 pre-tested prompts, GROUP 2a generates the remaining 40 business prompts.
- Favicon, Scoring, and CloudFront moved to GROUP 1 since they donโt need scraped pages.
Prompt Generation Strategy
The system generates 50 business prompts total, split between two services:
| Source | Count | Method | Pre-tested |
|---|
| GROUP 1c (Exa) | 10 | Exa Answer API generates prompts, then visibility checked across 8 platforms | โ
Yes |
| GROUP 2a (Gemini) | 40 | Gemini 3 Flash analyzes website content | โ No (tested by daily cron) |
Additionally, 10 prompts per product are generated in GROUP 3a.
Service Details
Exa Prompts Service (onboarding/generate_all/exa_prompts.py)
async def run_group1c_exa_prompts(
org_slug: str,
business_name: str,
url: str
) -> StepResult
- Calls Exa Answer API to generate 10 search queries people would use to find this type of business
- Filters out any prompts that contain the business name (we test if AI recommends them organically)
- Runs visibility check on each prompt across 8 AI platforms (ChatGPT, Claude, Gemini, Perplexity, Copilot, DeepSeek, Grok, Google AI)
- Saves prompts to
entity_prompts_tracker with visibility results and last_tested_at set
- Returns count of generated/saved prompts and average visibility
Favicon Service (services/favicon.py)
async def fetch_favicon(url: str, org_slug: str) -> Optional[str]
- Tries common favicon locations (
/favicon.ico, apple-touch-icon, etc.)
- Falls back to Googleโs favicon service
- Converts to PNG (128x128) for browser compatibility
- Uploads to Supabase Storage via S3 protocol
- Returns public URL
AI Website Service (services/ai_website.py)
async def create_ai_website(
url: str,
business_id: str,
pages: Optional[List[dict]] = None,
...
) -> dict
- Checks if site already exists
- Uses provided pages OR scrapes website (via Custom Mapper + Firecrawl Batch Scrape)
- Hashes pages for future change detection
- Calls Gemini to organize content
- Generates AI-optimized files (llms.txt, robots.txt, sitemap.xml, schema.json)
- Generates markdown replica pages for each scraped page (at
/{path})
- Deploys to Vercel
- Assigns
*.searchcompany.dev subdomain
- Stores page hashes for cron updates
Business Prompts Service (services/prompts.py)
async def generate_prompts(
url: str,
business_id: str,
prompt_count: int = 40, # Adjusted since 10 come from Exa
pages: Optional[List[dict]] = None,
...
) -> dict
- Uses provided pages OR scrapes website
- Calls Gemini 3 Flash to generate prompts
- Saves to
entity_prompts_tracker table
- Returns count of generated/saved prompts
Products Service (shared/products/discover.py)
async def discover_products(
business_id: str,
pages: list,
business_name: str,
parent_entity_id: str,
source_url: str,
generate_prompts: bool = True
) -> dict
- Analyzes scraped content with Gemini
- Identifies products, services, or SaaS offerings
- Extracts source_urls for each product (for update tracking)
- Saves products to entities table
- Optionally generates prompts (disabled during onboarding GROUP 2b)
- Returns products list for GROUP 3a and 3b
Scoring Service (services/scoring.py)
async def materialize_score(url: str, org_slug: str) -> dict
- Fetches pre-payment score from ranking database
- Creates history entries for visualization
- Inserts/updates
visibility_score_history table
CloudFront Service (services/cloudfront.py)
async def setup_cloudfront(url: str, org_slug: str, entity_id: str) -> dict
- Detects domain type (apex vs subdomain)
- Looks up origin CNAME
- Creates CloudFront distribution with Lambda@Edge
- Upserts configuration to
ai_sites table
- Returns distribution ID and CloudFront domain
Competitors Service (services/discover_competitors/)
async def discover_competitors(
business_url: str,
business_name: str,
entity_id: str,
max_competitors: int = 10
) -> dict
- Uses Firecrawl Agent API (
POST /v2/agent) to search for competitors
- Extracts competitor names, URLs, and descriptions from search results
- Stores competitors in
competitors table
- Returns list of discovered competitors with names and descriptions
Firecrawl Agent API is an AI-powered endpoint that can browse the web and extract structured data based on a natural language prompt.
API Summary
| Service | API Used | Purpose |
|---|
| Custom Website Mapper | Internal (free) | Discover all URLs on a website (up to 5000) |
| Batch Scrape | Firecrawl POST /v2/batch/scrape | Scrape multiple URLs in parallel, returns markdown |
| Competitor Discovery | Firecrawl POST /v2/agent | AI agent that browses web and extracts structured data |
URL discovery uses our custom website mapper (free) instead of Firecrawlโs Map API. The mapper combines multiple strategies: robots.txt parsing, sitemap.xml parsing, HTML link extraction, and recursive crawling.
Prerequisites
Before calling this endpoint, you must:
- Create a Clerk organization
- Call
POST /api/business to create the entity
The entity must exist before generate-all runs, as several tasks depend on it.
Monitoring Progress
Check backend logs to monitor progress:
๐ GENERATE ALL: Starting onboarding for my-business-abc123
URL: https://example.com
Business: My Business
๐ก GROUP 1a: Scraping website...
๐ GROUP 1b: Discovering competitors (parallel, non-blocking)...
๐ฎ GROUP 1c: Generating Exa prompts + visibility (parallel, non-blocking)...
๐จ GROUP 1d: Fetching favicon (parallel, non-blocking)...
๐ GROUP 1e: Materializing score (parallel, non-blocking)...
โ๏ธ GROUP 1f: Setting up CloudFront (parallel, non-blocking)...
๐ฎ [GROUP 1c] Generating prompts via Exa + visibility check...
๐ Step 1: Generating prompts via Exa Answer API...
โ
Generated 10 prompts
1. What payment gateway supports instant bank transfers...
2. How can I set up subscription billing with builtโin tax...
...
๐ Step 2: Checking visibility across 8 AI platforms...
โ
Saved 10 prompts with visibility data
๐ Average visibility: 4.2/8 platforms
โ
GROUP 1a Complete: 15 pages scraped
(GROUP 1b-1f still running in background)
๐ฅ [GROUP 2] Running tasks that need pages (15 pages)...
๐ [GROUP 2b] Discovering products...
โ
discover_products: 3 found, 3 saved
๐ฆ [GROUP 3a] Starting product prompts for 3 products...
โ
Product A: 10 prompts
โ
Product B: 10 prompts
โ
Product C: 10 prompts
โ
product_prompts: 30 for 3 products
๐ [GROUP 3b] Generating product llms.txt files...
โ
Product A: /llms/product-a.txt
โ
Product B: /llms/product-b.txt
โ
Product C: /llms/product-c.txt
โ
generate_product_llms_txt: 3 files deployed
โ
create_ai_website: https://my-business-abc123.searchcompany.dev
โ
business_prompts: 40 saved
โ
[GROUP 2 & 3] Complete
โณ Waiting for GROUP 1b-1f background tasks to complete...
โ
discover_competitors: 10 found
โ
exa_prompts: 10 (pre-tested)
โ
favicon: https://example.com/favicon.ico
โ
materialize_score: 72
โ
setup_cloudfront: d1234567890abc.cloudfront.net
๐ GENERATE ALL: Complete for my-business-abc123
Success: 12/12 tasks
File Structure
src/app/
โโโ shared/ # Shared services (used by onboarding + cron)
โ โโโ scraping/ # Custom mapper + Firecrawl batch scrape
โ โโโ mapping/ # Custom website mapper (URL discovery)
โ โโโ ai_website/ # AI Website Service
โ โโโ products/ # Product discovery + llms generation
โ โ โโโ discover.py # discover_products service
โ โ โโโ generate_llms_txt.py # generate_product_llms_txt service
โ โโโ prompts/ # Prompts Service
โ โโโ cloudfront/ # CloudFront Service
โ โโโ hashing/ # Raw HTML hashing service
โ โโโ content_hasher/ # Markdown hash storage
โ
โโโ apis/onboarding/
โโโ generate_all/
โ โโโ routes.py # Main endpoint & orchestrator (GROUP 1a-1f)
โ โโโ scrape_website.py # GROUP 1a: Single scrape
โ โโโ exa_prompts.py # GROUP 1c: Exa prompts + visibility
โ โโโ task_orchestrator.py # GROUP 2a, 2b, 3a, 3b: Calls services
โ โโโ models.py # Pydantic models
โ โโโ tasks/ # Task wrappers
โ โโโ ai_website.py # GROUP 2a
โ โโโ discover_products.py # GROUP 2b
โ โโโ prompts.py # GROUP 2a + 3a
โ โโโ product_llms.py # GROUP 3b
โ โโโ favicon.py # GROUP 1d
โ โโโ scoring.py # GROUP 1e
โ โโโ cloudfront.py # GROUP 1f
โ
โโโ services/
โโโ discover_competitors/ # GROUP 1b: Competitor discovery