Generate All

Overview

This is the main onboarding endpoint. It triggers all onboarding tasks and runs them in the background. The frontend can navigate away immediately after calling this endpoint.

This endpoint returns immediately with status "started". All tasks run asynchronously in the backend.

Request Body

url

string

required

The Shopify store URL (e.g., https://mystore.com)

org_slug

string

required

The Clerk organization slug (e.g., my-business-abc123)

business_name

string

required

The business name for display purposes

Response

status

string

Always "started" on success

message

string

Human-readable status message

Example

curl -X POST https://searchcompany-main.up.railway.app/api/onboarding/generate-all \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://mystore.com",
    "org_slug": "my-business-abc123",
    "business_name": "My Business"
  }'

{
  "status": "started",
  "message": "Onboarding tasks started for my-business-abc123. Check logs for progress."
}

Internal Services

The orchestrator calls these services directly (not via HTTP):

GROUP 1: All Parallel Tasks

All GROUP 1 tasks run in parallel. None block each other.

Group	Service	Purpose
1a	Discover Business Info	Uses Firecrawl Agent to extract what the company does
1b	Scrape Website	Custom mapper + Firecrawl batch scrape. Returns pages for markdown replicas.
1c	Discover Competitors	Uses Firecrawl Agent API to find up to 10 competitors
1d	Discover Products	Fetches products from Shopify products.json API
1e	Fetch Favicon	Downloads favicon, converts to PNG, uploads to storage
1f	Materialize Score	Copies pre-payment ranking score to history table
1g	Setup CloudFront	Creates CloudFront distribution for domain proxy

GROUP 2: After 1a + 1b + 1d Complete (All Parallel)

Group	Service	Purpose
2a	Create AI Website	Uses business_info (1a) for llms.txt/Q&A/data.json; pages (1b) for markdown replicas
2b	Product Prompts	Generates 5+ prompts per product (min 50 total)
2c	Generate Product LLMs	Generates `/llms/{product-slug}.txt` files for each product

Key Architecture Change:

GROUP 1a (Discover Business Info) uses Firecrawl Agent to extract business information
This business info is used by GROUP 2a to generate llms.txt, Q&A pages, and data.json
Scraped pages (GROUP 1b) are ONLY used for markdown replica generation
This separation makes LLM content generation more focused and efficient

Prompt Generation Strategy

All prompts are now tied to products (no business-level prompts):

Metric	Value
Prompts per product	5 (default)
Minimum total prompts	50 during onboarding
New products (via cron)	5 prompts each
Daily sampling	10 prompts for visibility scoring

If a store has fewer than 10 products, prompts per product is increased to ensure at least 50 total.

Service Details

Discover Business Info Service (`shared/discover_business_info/service.py`)

async def discover_business_info(
    url: str,
    business_name: str
) -> dict

Calls Firecrawl Agent API with the business URL
Extracts: description, products_services, target_market, key_features, value_proposition
Returns structured dict for AI website generation

Discover Products Service (`shared/products/discover.py`)

async def discover_products(
    business_id: str,
    source_url: str,
    business_name: str,
    parent_entity_id: str,
    generate_prompts: bool = False
) -> dict

Fetches products from {store_url}/products.json
Paginates through all pages of products
Extracts product title, description, URL, handle, variants
Filters out existing products
Saves new products to entities table
Returns products list for GROUP 2

Product Prompts Service (`tasks/prompts.py`)

async def run_product_prompts(
    url: str,
    org_slug: str,
    discovered_products: list,
    pages: list,
    prompts_per_product: int = 5
) -> StepResult

Calculates prompts per product to ensure minimum 50 total
Generates prompts for each product using Gemini 3 Flash
Saves to entity_prompts_tracker table
Returns count of generated/saved prompts

AI Website Service (`shared/ai_website/`)

async def create_ai_website_from_business_info(
    url: str,
    business_id: str,
    business_info: dict,
    pages: List[dict]
) -> dict

Uses business_info from GROUP 1a for LLM content generation
Runs 3 parallel Gemini calls for llms.txt, Q&A pages, data.json
Uses scraped pages from GROUP 1b for markdown replica generation only
Deploys to Vercel
Assigns *.searchcompany.dev subdomain

Prerequisites

Before calling this endpoint, you must:

Create a Clerk organization
Call POST /api/business to create the entity

The entity must exist before generate-all runs.

Monitoring Progress

Check backend logs to monitor progress:

🚀 GENERATE ALL: Starting onboarding for my-business-abc123
   URL: https://mystore.com
   Business: My Business

🏢 GROUP 1a: Discovering business info (Firecrawl agent)...
📡 GROUP 1b: Scraping website (for replicas)...
🔍 GROUP 1c: Discovering competitors (parallel)...
🛍️ GROUP 1d: Discovering products from Shopify (parallel)...
🎨 GROUP 1e: Fetching favicon (parallel)...
📊 GROUP 1f: Materializing score (parallel)...
☁️ GROUP 1g: Setting up CloudFront (parallel)...

✅ GROUP 1a Complete: Business info discovered
✅ GROUP 1b Complete: 15 pages scraped
✅ GROUP 1d Complete: 8 products discovered

⚡ GROUP 2: Running AI Website + Product tasks in parallel...
   📊 [GROUP 2] 8 products × 7 prompts = 56 total (min 50)

   ✅ create_ai_website: https://my-business-abc123.searchcompany.dev
   ✅ product_prompts: 56 total for 8 products
   ✅ generate_product_llms_txt: 8 files deployed

⏳ Waiting for GROUP 1c, 1e, 1f, 1g to complete...
   ✅ discover_competitors: 10 found
   ✅ favicon: stored
   ✅ materialize_score: 72
   ✅ setup_cloudfront: d1234567890abc.cloudfront.net

🏁 GENERATE ALL: Complete for my-business-abc123
   ✅ Success: 10/10 tasks

File Structure

src/app/
├── shared/                     # Shared services (used by onboarding + cron)
│   ├── discover_business_info/ # NEW: Firecrawl agent for business info
│   │   ├── __init__.py
│   │   └── service.py          # discover_business_info function
│   ├── scraping/               # Custom mapper + Firecrawl batch scrape
│   ├── mapping/                # Custom website mapper (URL discovery)
│   ├── ai_website/             # AI Website Service
│   │   ├── service.py          # create_ai_website_from_business_info
│   │   └── llm_organize.py     # organize_with_llm_from_business_info
│   ├── products/               # Product discovery + llms generation
│   │   ├── discover.py         # discover_products service (Shopify API)
│   │   └── generate_llms_txt.py # generate_product_llms_txt service
│   ├── prompts/                # Prompts Service
│   ├── cloudfront/             # CloudFront Service
│   └── content_hasher/         # Markdown hash storage
│
└── apis/onboarding/
    ├── generate_all/
    │   ├── routes.py           # Main endpoint & orchestrator
    │   ├── scrape_website.py   # GROUP 1b: Scrape wrapper
    │   ├── models.py           # Pydantic models
    │   └── tasks/              # Task wrappers
    │       ├── business_info.py # GROUP 1a: NEW
    │       ├── ai_website.py   # GROUP 2a
    │       ├── discover_products.py # GROUP 1d
    │       ├── prompts.py      # GROUP 2b
    │       ├── product_llms.py # GROUP 2c
    │       ├── favicon.py      # GROUP 1e
    │       ├── scoring.py      # GROUP 1f
    │       └── cloudfront.py   # GROUP 1g
    │
    └── services/
        └── discover_competitors/  # GROUP 1c: Competitor discovery

Getting Started

Website

Onboarding

Cron

Your Current Setup

Explore

Settings - Toggle

Settings - Business

Settings - Team

Settings - Billing

Settings - Domain

Webhooks

Health

Manual Trigger

Overview

Request Body

Response

Example

Internal Services

GROUP 1: All Parallel Tasks

GROUP 2: After 1a + 1b + 1d Complete (All Parallel)

Prompt Generation Strategy

Service Details

Discover Business Info Service (`shared/discover_business_info/service.py`)

Discover Products Service (`shared/products/discover.py`)

Product Prompts Service (`tasks/prompts.py`)

AI Website Service (`shared/ai_website/`)

Prerequisites

Monitoring Progress

File Structure

Getting Started

Website

Onboarding

Cron

Your Current Setup

Explore

Settings - Toggle

Settings - Business

Settings - Team

Settings - Billing

Settings - Domain

Webhooks

Health

Manual Trigger

​Overview

​Request Body

​Response

​Example

​Internal Services

​GROUP 1: All Parallel Tasks

​GROUP 2: After 1a + 1b + 1d Complete (All Parallel)

​Prompt Generation Strategy

​Service Details

​Discover Business Info Service (shared/discover_business_info/service.py)

​Discover Products Service (shared/products/discover.py)

​Product Prompts Service (tasks/prompts.py)

​AI Website Service (shared/ai_website/)

​Prerequisites

​Monitoring Progress

​File Structure

Overview

Request Body

Response

Example

Internal Services

GROUP 1: All Parallel Tasks

GROUP 2: After 1a + 1b + 1d Complete (All Parallel)

Prompt Generation Strategy

Service Details

Discover Business Info Service (`shared/discover_business_info/service.py`)

Discover Products Service (`shared/products/discover.py`)

Product Prompts Service (`tasks/prompts.py`)

AI Website Service (`shared/ai_website/`)

Prerequisites

Monitoring Progress

File Structure