Skip to main content
POST
https://searchcompany-main.up.railway.app
/
api
/
onboarding
/
generate-all
curl -X POST https://searchcompany-main.up.railway.app/api/onboarding/generate-all \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://mystore.com",
    "org_slug": "my-business-abc123",
    "business_name": "My Business"
  }'
{
  "status": "started",
  "message": "Onboarding tasks started for my-business-abc123. Check logs for progress."
}

Overview

This is the main onboarding endpoint. It triggers all onboarding tasks and runs them in the background. The frontend can navigate away immediately after calling this endpoint.
This endpoint returns immediately with status "started". All tasks run asynchronously in the backend.

Request Body

url
string
required
The Shopify store URL (e.g., https://mystore.com)
org_slug
string
required
The Clerk organization slug (e.g., my-business-abc123)
business_name
string
required
The business name for display purposes

Response

status
string
Always "started" on success
message
string
Human-readable status message

Example

curl -X POST https://searchcompany-main.up.railway.app/api/onboarding/generate-all \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://mystore.com",
    "org_slug": "my-business-abc123",
    "business_name": "My Business"
  }'
{
  "status": "started",
  "message": "Onboarding tasks started for my-business-abc123. Check logs for progress."
}

Internal Services

The orchestrator calls these services directly (not via HTTP):

GROUP 1: All Parallel Tasks

All GROUP 1 tasks run in parallel. None block each other.
GroupServicePurpose
1aDiscover Business InfoUses Firecrawl Agent to extract what the company does
1bScrape WebsiteCustom mapper + Firecrawl batch scrape. Returns pages for markdown replicas.
1cDiscover CompetitorsUses Firecrawl Agent API to find up to 10 competitors
1dDiscover ProductsFetches products from Shopify products.json API
1eFetch FaviconDownloads favicon, converts to PNG, uploads to storage
1fMaterialize ScoreCopies pre-payment ranking score to history table
1gSetup CloudFrontCreates CloudFront distribution for domain proxy

GROUP 2: After 1a + 1b + 1d Complete (All Parallel)

GroupServicePurpose
2aCreate AI WebsiteUses business_info (1a) for llms.txt/Q&A/data.json; pages (1b) for markdown replicas
2bProduct PromptsGenerates 5+ prompts per product (min 50 total)
2cGenerate Product LLMsGenerates /llms/{product-slug}.txt files for each product
Key Architecture Change:
  • GROUP 1a (Discover Business Info) uses Firecrawl Agent to extract business information
  • This business info is used by GROUP 2a to generate llms.txt, Q&A pages, and data.json
  • Scraped pages (GROUP 1b) are ONLY used for markdown replica generation
  • This separation makes LLM content generation more focused and efficient

Prompt Generation Strategy

All prompts are now tied to products (no business-level prompts):
MetricValue
Prompts per product5 (default)
Minimum total prompts50 during onboarding
New products (via cron)5 prompts each
Daily sampling10 prompts for visibility scoring
If a store has fewer than 10 products, prompts per product is increased to ensure at least 50 total.

Service Details

Discover Business Info Service (shared/discover_business_info/service.py)

async def discover_business_info(
    url: str,
    business_name: str
) -> dict
  1. Calls Firecrawl Agent API with the business URL
  2. Extracts: description, products_services, target_market, key_features, value_proposition
  3. Returns structured dict for AI website generation

Discover Products Service (shared/products/discover.py)

async def discover_products(
    business_id: str,
    source_url: str,
    business_name: str,
    parent_entity_id: str,
    generate_prompts: bool = False
) -> dict
  1. Fetches products from {store_url}/products.json
  2. Paginates through all pages of products
  3. Extracts product title, description, URL, handle, variants
  4. Filters out existing products
  5. Saves new products to entities table
  6. Returns products list for GROUP 2

Product Prompts Service (tasks/prompts.py)

async def run_product_prompts(
    url: str,
    org_slug: str,
    discovered_products: list,
    pages: list,
    prompts_per_product: int = 5
) -> StepResult
  1. Calculates prompts per product to ensure minimum 50 total
  2. Generates prompts for each product using Gemini 3 Flash
  3. Saves to entity_prompts_tracker table
  4. Returns count of generated/saved prompts

AI Website Service (shared/ai_website/)

async def create_ai_website_from_business_info(
    url: str,
    business_id: str,
    business_info: dict,
    pages: List[dict]
) -> dict
  1. Uses business_info from GROUP 1a for LLM content generation
  2. Runs 3 parallel Gemini calls for llms.txt, Q&A pages, data.json
  3. Uses scraped pages from GROUP 1b for markdown replica generation only
  4. Deploys to Vercel
  5. Assigns *.searchcompany.dev subdomain

Prerequisites

Before calling this endpoint, you must:
  1. Create a Clerk organization
  2. Call POST /api/business to create the entity
The entity must exist before generate-all runs.

Monitoring Progress

Check backend logs to monitor progress:
🚀 GENERATE ALL: Starting onboarding for my-business-abc123
   URL: https://mystore.com
   Business: My Business

🏢 GROUP 1a: Discovering business info (Firecrawl agent)...
📡 GROUP 1b: Scraping website (for replicas)...
🔍 GROUP 1c: Discovering competitors (parallel)...
🛍️ GROUP 1d: Discovering products from Shopify (parallel)...
🎨 GROUP 1e: Fetching favicon (parallel)...
📊 GROUP 1f: Materializing score (parallel)...
☁️ GROUP 1g: Setting up CloudFront (parallel)...

✅ GROUP 1a Complete: Business info discovered
✅ GROUP 1b Complete: 15 pages scraped
✅ GROUP 1d Complete: 8 products discovered

⚡ GROUP 2: Running AI Website + Product tasks in parallel...
   📊 [GROUP 2] 8 products × 7 prompts = 56 total (min 50)

   ✅ create_ai_website: https://my-business-abc123.searchcompany.dev
   ✅ product_prompts: 56 total for 8 products
   ✅ generate_product_llms_txt: 8 files deployed

⏳ Waiting for GROUP 1c, 1e, 1f, 1g to complete...
   ✅ discover_competitors: 10 found
   ✅ favicon: stored
   ✅ materialize_score: 72
   ✅ setup_cloudfront: d1234567890abc.cloudfront.net

🏁 GENERATE ALL: Complete for my-business-abc123
   ✅ Success: 10/10 tasks

File Structure

src/app/
├── shared/                     # Shared services (used by onboarding + cron)
│   ├── discover_business_info/ # NEW: Firecrawl agent for business info
│   │   ├── __init__.py
│   │   └── service.py          # discover_business_info function
│   ├── scraping/               # Custom mapper + Firecrawl batch scrape
│   ├── mapping/                # Custom website mapper (URL discovery)
│   ├── ai_website/             # AI Website Service
│   │   ├── service.py          # create_ai_website_from_business_info
│   │   └── llm_organize.py     # organize_with_llm_from_business_info
│   ├── products/               # Product discovery + llms generation
│   │   ├── discover.py         # discover_products service (Shopify API)
│   │   └── generate_llms_txt.py # generate_product_llms_txt service
│   ├── prompts/                # Prompts Service
│   ├── cloudfront/             # CloudFront Service
│   └── content_hasher/         # Markdown hash storage

└── apis/onboarding/
    ├── generate_all/
    │   ├── routes.py           # Main endpoint & orchestrator
    │   ├── scrape_website.py   # GROUP 1b: Scrape wrapper
    │   ├── models.py           # Pydantic models
    │   └── tasks/              # Task wrappers
    │       ├── business_info.py # GROUP 1a: NEW
    │       ├── ai_website.py   # GROUP 2a
    │       ├── discover_products.py # GROUP 1d
    │       ├── prompts.py      # GROUP 2b
    │       ├── product_llms.py # GROUP 2c
    │       ├── favicon.py      # GROUP 1e
    │       ├── scoring.py      # GROUP 1f
    │       └── cloudfront.py   # GROUP 1g

    └── services/
        └── discover_competitors/  # GROUP 1c: Competitor discovery