Skip to main content
POST
https://searchcompany-main.up.railway.app
/
api
/
onboarding
/
generate-all
curl -X POST https://searchcompany-main.up.railway.app/api/onboarding/generate-all \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com",
    "org_slug": "my-business-abc123",
    "business_name": "My Business"
  }'
{
  "status": "started",
  "message": "Onboarding tasks started for my-business-abc123. Check logs for progress."
}

Overview

This is the main onboarding endpoint. It triggers all onboarding tasks and runs them in the background. The frontend can navigate away immediately after calling this endpoint.
This endpoint returns immediately with status "started". All tasks run asynchronously in the backend.

Request Body

url
string
required
The business website URL (e.g., https://example.com)
org_slug
string
required
The Clerk organization slug (e.g., my-business-abc123)
business_name
string
required
The business name for display purposes
prompt_count
integer
default:"50"
Total number of visibility prompts to generate for the business (10 via Exa + 40 via Gemini)

Response

status
string
Always "started" on success
message
string
Human-readable status message

Example

curl -X POST https://searchcompany-main.up.railway.app/api/onboarding/generate-all \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com",
    "org_slug": "my-business-abc123",
    "business_name": "My Business"
  }'
{
  "status": "started",
  "message": "Onboarding tasks started for my-business-abc123. Check logs for progress."
}

Internal Services

The orchestrator calls these services directly (not via HTTP). All services are in src/app/services/:

GROUP 1a: Scrape (await)

Scrapes the website using Custom Website Mapper + Firecrawl Batch Scrape API. Returns pages for all subsequent tasks. GROUP 2 waits for this.
How It Works:
  • Custom Website Mapper (free) - Multi-strategy URL discovery: robots.txt, sitemap.xml, HTML link extraction, recursive crawling. Returns up to 5000 URLs.
  • Firecrawl Batch Scrape (POST /v2/batch/scrape) - Scrapes all discovered URLs in parallel. Returns markdown content for each page.
This two-step approach is efficient: our custom mapper finds URLs (free), then Firecrawl fetches content in parallel (paid).

GROUP 1b: Discover Competitors (parallel with 1a, non-blocking)

Runs in parallel with scraping but doesnโ€™t block GROUP 2. Uses Firecrawl Agent API to find competitors.
ServiceFilePurpose
Competitors Servicediscover_competitors/Uses Firecrawl Agent API to find up to 10 competitors

GROUP 1c: Exa Prompts + Visibility Check (parallel with 1a, non-blocking)

Runs in parallel with scraping but doesnโ€™t block GROUP 2. Generates 10 prompts using Exa Answer API and immediately runs visibility checks on them.
ServiceFilePurpose
Exa Prompts Serviceexa_prompts.pyUses Exa Answer API to generate 10 prompts, then runs visibility checks across 8 AI platforms
Pre-tested Prompts: These 10 prompts are pre-tested during onboarding so users see real visibility data immediately when they first load the dashboard. The prompts will have last_tested_at set and visibility results populated.

GROUP 1d: Fetch Favicon (parallel with 1a, non-blocking)

Runs in parallel with scraping but doesnโ€™t block GROUP 2. Doesnโ€™t need scraped pages.
ServiceFilePurpose
Favicon Servicefavicon.pyDownloads favicon, converts to PNG, uploads to Supabase Storage

GROUP 1e: Materialize Score (parallel with 1a, non-blocking)

Runs in parallel with scraping but doesnโ€™t block GROUP 2. Doesnโ€™t need scraped pages.
ServiceFilePurpose
Scoring Servicescoring.pyCopies pre-payment ranking score to visibility_score_history table

GROUP 1f: Setup CloudFront (parallel with 1a, non-blocking)

Runs in parallel with scraping but doesnโ€™t block GROUP 2. Doesnโ€™t need scraped pages.
ServiceFilePurpose
CloudFront Servicecloudfront.pyCreates CloudFront distribution for domain proxy

GROUP 2a: Main Tasks (parallel, starts when 1a completes)

Runs simultaneously using asyncio.gather. Only includes tasks that need scraped pages:
ServiceFilePurpose
AI Website Serviceai_website.pyGenerates llms.txt, robots.txt, sitemap, schema, and markdown replica pages; deploys to Vercel
Business Prompts Serviceprompts.pyGenerates 40 visibility prompts using Gemini 3 Flash (10 come from GROUP 1c)

GROUP 2b: Discover Products (parallel with 2a)

ServiceFilePurpose
Products Servicediscover_products.pyUses shared discover_products service to identify products

GROUP 3a: Product Prompts (starts when 2b completes, parallel with 3b)

ServiceFilePurpose
Product Prompts Serviceprompts.pyGenerates 10 prompts per discovered product

GROUP 3b: Generate Product LLMs (starts when 2b completes, parallel with 3a)

ServiceFilePurpose
Product LLMs Serviceproduct_llms.pyGenerates /llms/{product-slug}.txt files for each product
Optimized Flow:
  • GROUP 1b-1f run in parallel with GROUP 1a (scrape) but donโ€™t block GROUP 2.
  • GROUP 2 starts as soon as GROUP 1a completes.
  • GROUP 3a and 3b start as soon as products are discovered (GROUP 2b), and run in parallel with each other.
  • GROUP 1c generates 10 pre-tested prompts, GROUP 2a generates the remaining 40 business prompts.
  • Favicon, Scoring, and CloudFront moved to GROUP 1 since they donโ€™t need scraped pages.

Prompt Generation Strategy

The system generates 50 business prompts total, split between two services:
SourceCountMethodPre-tested
GROUP 1c (Exa)10Exa Answer API generates prompts, then visibility checked across 8 platformsโœ… Yes
GROUP 2a (Gemini)40Gemini 3 Flash analyzes website contentโŒ No (tested by daily cron)
Additionally, 10 prompts per product are generated in GROUP 3a.

Service Details

Exa Prompts Service (onboarding/generate_all/exa_prompts.py)

async def run_group1c_exa_prompts(
    org_slug: str,
    business_name: str,
    url: str
) -> StepResult
  1. Calls Exa Answer API to generate 10 search queries people would use to find this type of business
  2. Filters out any prompts that contain the business name (we test if AI recommends them organically)
  3. Runs visibility check on each prompt across 8 AI platforms (ChatGPT, Claude, Gemini, Perplexity, Copilot, DeepSeek, Grok, Google AI)
  4. Saves prompts to entity_prompts_tracker with visibility results and last_tested_at set
  5. Returns count of generated/saved prompts and average visibility

Favicon Service (services/favicon.py)

async def fetch_favicon(url: str, org_slug: str) -> Optional[str]
  1. Tries common favicon locations (/favicon.ico, apple-touch-icon, etc.)
  2. Falls back to Googleโ€™s favicon service
  3. Converts to PNG (128x128) for browser compatibility
  4. Uploads to Supabase Storage via S3 protocol
  5. Returns public URL

AI Website Service (services/ai_website.py)

async def create_ai_website(
    url: str,
    business_id: str,
    pages: Optional[List[dict]] = None,
    ...
) -> dict
  1. Checks if site already exists
  2. Uses provided pages OR scrapes website (via Custom Mapper + Firecrawl Batch Scrape)
  3. Hashes pages for future change detection
  4. Calls Gemini to organize content
  5. Generates AI-optimized files (llms.txt, robots.txt, sitemap.xml, schema.json)
  6. Generates markdown replica pages for each scraped page (at /{path})
  7. Deploys to Vercel
  8. Assigns *.searchcompany.dev subdomain
  9. Stores page hashes for cron updates

Business Prompts Service (services/prompts.py)

async def generate_prompts(
    url: str,
    business_id: str,
    prompt_count: int = 40,  # Adjusted since 10 come from Exa
    pages: Optional[List[dict]] = None,
    ...
) -> dict
  1. Uses provided pages OR scrapes website
  2. Calls Gemini 3 Flash to generate prompts
  3. Saves to entity_prompts_tracker table
  4. Returns count of generated/saved prompts

Products Service (shared/products/discover.py)

async def discover_products(
    business_id: str,
    pages: list,
    business_name: str,
    parent_entity_id: str,
    source_url: str,
    generate_prompts: bool = True
) -> dict
  1. Analyzes scraped content with Gemini
  2. Identifies products, services, or SaaS offerings
  3. Extracts source_urls for each product (for update tracking)
  4. Saves products to entities table
  5. Optionally generates prompts (disabled during onboarding GROUP 2b)
  6. Returns products list for GROUP 3a and 3b

Scoring Service (services/scoring.py)

async def materialize_score(url: str, org_slug: str) -> dict
  1. Fetches pre-payment score from ranking database
  2. Creates history entries for visualization
  3. Inserts/updates visibility_score_history table

CloudFront Service (services/cloudfront.py)

async def setup_cloudfront(url: str, org_slug: str, entity_id: str) -> dict
  1. Detects domain type (apex vs subdomain)
  2. Looks up origin CNAME
  3. Creates CloudFront distribution with Lambda@Edge
  4. Upserts configuration to ai_sites table
  5. Returns distribution ID and CloudFront domain

Competitors Service (services/discover_competitors/)

async def discover_competitors(
    business_url: str,
    business_name: str,
    entity_id: str,
    max_competitors: int = 10
) -> dict
  1. Uses Firecrawl Agent API (POST /v2/agent) to search for competitors
  2. Extracts competitor names, URLs, and descriptions from search results
  3. Stores competitors in competitors table
  4. Returns list of discovered competitors with names and descriptions
Firecrawl Agent API is an AI-powered endpoint that can browse the web and extract structured data based on a natural language prompt.

API Summary

ServiceAPI UsedPurpose
Custom Website MapperInternal (free)Discover all URLs on a website (up to 5000)
Batch ScrapeFirecrawl POST /v2/batch/scrapeScrape multiple URLs in parallel, returns markdown
Competitor DiscoveryFirecrawl POST /v2/agentAI agent that browses web and extracts structured data
URL discovery uses our custom website mapper (free) instead of Firecrawlโ€™s Map API. The mapper combines multiple strategies: robots.txt parsing, sitemap.xml parsing, HTML link extraction, and recursive crawling.

Prerequisites

Before calling this endpoint, you must:
  1. Create a Clerk organization
  2. Call POST /api/business to create the entity
The entity must exist before generate-all runs, as several tasks depend on it.

Monitoring Progress

Check backend logs to monitor progress:
๐Ÿš€ GENERATE ALL: Starting onboarding for my-business-abc123
   URL: https://example.com
   Business: My Business

๐Ÿ“ก GROUP 1a: Scraping website...
๐Ÿ” GROUP 1b: Discovering competitors (parallel, non-blocking)...
๐Ÿ”ฎ GROUP 1c: Generating Exa prompts + visibility (parallel, non-blocking)...
๐ŸŽจ GROUP 1d: Fetching favicon (parallel, non-blocking)...
๐Ÿ“Š GROUP 1e: Materializing score (parallel, non-blocking)...
โ˜๏ธ GROUP 1f: Setting up CloudFront (parallel, non-blocking)...

๐Ÿ”ฎ [GROUP 1c] Generating prompts via Exa + visibility check...
   ๐Ÿ“ Step 1: Generating prompts via Exa Answer API...
   โœ… Generated 10 prompts
      1. What payment gateway supports instant bank transfers...
      2. How can I set up subscription billing with builtโ€‘in tax...
      ...
   ๐Ÿ” Step 2: Checking visibility across 8 AI platforms...
   โœ… Saved 10 prompts with visibility data
   ๐Ÿ“Š Average visibility: 4.2/8 platforms

โœ… GROUP 1a Complete: 15 pages scraped
   (GROUP 1b-1f still running in background)

๐Ÿ”ฅ [GROUP 2] Running tasks that need pages (15 pages)...
   ๐Ÿ” [GROUP 2b] Discovering products...
   โœ… discover_products: 3 found, 3 saved
   ๐Ÿ“ฆ [GROUP 3a] Starting product prompts for 3 products...
      โœ… Product A: 10 prompts
      โœ… Product B: 10 prompts
      โœ… Product C: 10 prompts
   โœ… product_prompts: 30 for 3 products
   ๐Ÿ“„ [GROUP 3b] Generating product llms.txt files...
      โœ… Product A: /llms/product-a.txt
      โœ… Product B: /llms/product-b.txt
      โœ… Product C: /llms/product-c.txt
   โœ… generate_product_llms_txt: 3 files deployed
   โœ… create_ai_website: https://my-business-abc123.searchcompany.dev
   โœ… business_prompts: 40 saved
โœ… [GROUP 2 & 3] Complete

โณ Waiting for GROUP 1b-1f background tasks to complete...
   โœ… discover_competitors: 10 found
   โœ… exa_prompts: 10 (pre-tested)
   โœ… favicon: https://example.com/favicon.ico
   โœ… materialize_score: 72
   โœ… setup_cloudfront: d1234567890abc.cloudfront.net

๐Ÿ GENERATE ALL: Complete for my-business-abc123
   Success: 12/12 tasks

File Structure

src/app/
โ”œโ”€โ”€ shared/                     # Shared services (used by onboarding + cron)
โ”‚   โ”œโ”€โ”€ scraping/               # Custom mapper + Firecrawl batch scrape
โ”‚   โ”œโ”€โ”€ mapping/                # Custom website mapper (URL discovery)
โ”‚   โ”œโ”€โ”€ ai_website/             # AI Website Service
โ”‚   โ”œโ”€โ”€ products/               # Product discovery + llms generation
โ”‚   โ”‚   โ”œโ”€โ”€ discover.py         # discover_products service
โ”‚   โ”‚   โ””โ”€โ”€ generate_llms_txt.py # generate_product_llms_txt service
โ”‚   โ”œโ”€โ”€ prompts/                # Prompts Service
โ”‚   โ”œโ”€โ”€ cloudfront/             # CloudFront Service
โ”‚   โ”œโ”€โ”€ hashing/                # Raw HTML hashing service
โ”‚   โ””โ”€โ”€ content_hasher/         # Markdown hash storage
โ”‚
โ””โ”€โ”€ apis/onboarding/
    โ”œโ”€โ”€ generate_all/
    โ”‚   โ”œโ”€โ”€ routes.py           # Main endpoint & orchestrator (GROUP 1a-1f)
    โ”‚   โ”œโ”€โ”€ scrape_website.py   # GROUP 1a: Single scrape
    โ”‚   โ”œโ”€โ”€ exa_prompts.py      # GROUP 1c: Exa prompts + visibility
    โ”‚   โ”œโ”€โ”€ task_orchestrator.py # GROUP 2a, 2b, 3a, 3b: Calls services
    โ”‚   โ”œโ”€โ”€ models.py           # Pydantic models
    โ”‚   โ””โ”€โ”€ tasks/              # Task wrappers
    โ”‚       โ”œโ”€โ”€ ai_website.py   # GROUP 2a
    โ”‚       โ”œโ”€โ”€ discover_products.py # GROUP 2b
    โ”‚       โ”œโ”€โ”€ prompts.py      # GROUP 2a + 3a
    โ”‚       โ”œโ”€โ”€ product_llms.py # GROUP 3b
    โ”‚       โ”œโ”€โ”€ favicon.py      # GROUP 1d
    โ”‚       โ”œโ”€โ”€ scoring.py      # GROUP 1e
    โ”‚       โ””โ”€โ”€ cloudfront.py   # GROUP 1f
    โ”‚
    โ””โ”€โ”€ services/
        โ””โ”€โ”€ discover_competitors/  # GROUP 1b: Competitor discovery