Overview
This is the main onboarding endpoint. It triggers all onboarding tasks and runs them in the background. The frontend can navigate away immediately after calling this endpoint.
This endpoint returns immediately with status "started". All tasks run asynchronously in the backend.
Request Body
The Shopify store URL (e.g., https://mystore.com)
The Clerk organization slug (e.g., my-business-abc123)
The business name for display purposes
Response
Always "started" on success
Human-readable status message
Example
curl -X POST https://searchcompany-main.up.railway.app/api/onboarding/generate-all \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{
"url": "https://mystore.com",
"org_slug": "my-business-abc123",
"business_name": "My Business"
}'
{
"status": "started",
"message": "Onboarding tasks started for my-business-abc123. Check logs for progress."
}
Internal Services
The orchestrator calls these services directly (not via HTTP):
GROUP 1: All Parallel Tasks
All GROUP 1 tasks run in parallel. None block each other.
| Group | Service | Purpose |
|---|
| 1a | Discover Business Info | Uses Firecrawl Agent to extract what the company does |
| 1b | Scrape Website | Custom mapper + Firecrawl batch scrape. Returns pages for markdown replicas. |
| 1c | Discover Competitors | Uses Firecrawl Agent API to find up to 10 competitors |
| 1d | Discover Products | Fetches products from Shopify products.json API |
| 1e | Fetch Favicon | Downloads favicon, converts to PNG, uploads to storage |
| 1f | Materialize Score | Copies pre-payment ranking score to history table |
| 1g | Setup CloudFront | Creates CloudFront distribution for domain proxy |
GROUP 2: After 1a + 1b + 1d Complete (All Parallel)
| Group | Service | Purpose |
|---|
| 2a | Create AI Website | Uses business_info (1a) for llms.txt/Q&A/data.json; pages (1b) for markdown replicas |
| 2b | Product Prompts | Generates 5+ prompts per product (min 50 total) |
| 2c | Generate Product LLMs | Generates /llms/{product-slug}.txt files for each product |
Key Architecture Change:
- GROUP 1a (Discover Business Info) uses Firecrawl Agent to extract business information
- This business info is used by GROUP 2a to generate llms.txt, Q&A pages, and data.json
- Scraped pages (GROUP 1b) are ONLY used for markdown replica generation
- This separation makes LLM content generation more focused and efficient
Prompt Generation Strategy
All prompts are now tied to products (no business-level prompts):
| Metric | Value |
|---|
| Prompts per product | 5 (default) |
| Minimum total prompts | 50 during onboarding |
| New products (via cron) | 5 prompts each |
| Daily sampling | 10 prompts for visibility scoring |
If a store has fewer than 10 products, prompts per product is increased to ensure at least 50 total.
Service Details
Discover Business Info Service (shared/discover_business_info/service.py)
async def discover_business_info(
url: str,
business_name: str
) -> dict
- Calls Firecrawl Agent API with the business URL
- Extracts: description, products_services, target_market, key_features, value_proposition
- Returns structured dict for AI website generation
Discover Products Service (shared/products/discover.py)
async def discover_products(
business_id: str,
source_url: str,
business_name: str,
parent_entity_id: str,
generate_prompts: bool = False
) -> dict
- Fetches products from
{store_url}/products.json
- Paginates through all pages of products
- Extracts product title, description, URL, handle, variants
- Filters out existing products
- Saves new products to entities table
- Returns products list for GROUP 2
Product Prompts Service (tasks/prompts.py)
async def run_product_prompts(
url: str,
org_slug: str,
discovered_products: list,
pages: list,
prompts_per_product: int = 5
) -> StepResult
- Calculates prompts per product to ensure minimum 50 total
- Generates prompts for each product using Gemini 3 Flash
- Saves to
entity_prompts_tracker table
- Returns count of generated/saved prompts
AI Website Service (shared/ai_website/)
async def create_ai_website_from_business_info(
url: str,
business_id: str,
business_info: dict,
pages: List[dict]
) -> dict
- Uses business_info from GROUP 1a for LLM content generation
- Runs 3 parallel Gemini calls for llms.txt, Q&A pages, data.json
- Uses scraped pages from GROUP 1b for markdown replica generation only
- Deploys to Vercel
- Assigns
*.searchcompany.dev subdomain
Prerequisites
Before calling this endpoint, you must:
- Create a Clerk organization
- Call
POST /api/business to create the entity
The entity must exist before generate-all runs.
Monitoring Progress
Check backend logs to monitor progress:
🚀 GENERATE ALL: Starting onboarding for my-business-abc123
URL: https://mystore.com
Business: My Business
🏢 GROUP 1a: Discovering business info (Firecrawl agent)...
📡 GROUP 1b: Scraping website (for replicas)...
🔍 GROUP 1c: Discovering competitors (parallel)...
🛍️ GROUP 1d: Discovering products from Shopify (parallel)...
🎨 GROUP 1e: Fetching favicon (parallel)...
📊 GROUP 1f: Materializing score (parallel)...
☁️ GROUP 1g: Setting up CloudFront (parallel)...
✅ GROUP 1a Complete: Business info discovered
✅ GROUP 1b Complete: 15 pages scraped
✅ GROUP 1d Complete: 8 products discovered
⚡ GROUP 2: Running AI Website + Product tasks in parallel...
📊 [GROUP 2] 8 products × 7 prompts = 56 total (min 50)
✅ create_ai_website: https://my-business-abc123.searchcompany.dev
✅ product_prompts: 56 total for 8 products
✅ generate_product_llms_txt: 8 files deployed
⏳ Waiting for GROUP 1c, 1e, 1f, 1g to complete...
✅ discover_competitors: 10 found
✅ favicon: stored
✅ materialize_score: 72
✅ setup_cloudfront: d1234567890abc.cloudfront.net
🏁 GENERATE ALL: Complete for my-business-abc123
✅ Success: 10/10 tasks
File Structure
src/app/
├── shared/ # Shared services (used by onboarding + cron)
│ ├── discover_business_info/ # NEW: Firecrawl agent for business info
│ │ ├── __init__.py
│ │ └── service.py # discover_business_info function
│ ├── scraping/ # Custom mapper + Firecrawl batch scrape
│ ├── mapping/ # Custom website mapper (URL discovery)
│ ├── ai_website/ # AI Website Service
│ │ ├── service.py # create_ai_website_from_business_info
│ │ └── llm_organize.py # organize_with_llm_from_business_info
│ ├── products/ # Product discovery + llms generation
│ │ ├── discover.py # discover_products service (Shopify API)
│ │ └── generate_llms_txt.py # generate_product_llms_txt service
│ ├── prompts/ # Prompts Service
│ ├── cloudfront/ # CloudFront Service
│ └── content_hasher/ # Markdown hash storage
│
└── apis/onboarding/
├── generate_all/
│ ├── routes.py # Main endpoint & orchestrator
│ ├── scrape_website.py # GROUP 1b: Scrape wrapper
│ ├── models.py # Pydantic models
│ └── tasks/ # Task wrappers
│ ├── business_info.py # GROUP 1a: NEW
│ ├── ai_website.py # GROUP 2a
│ ├── discover_products.py # GROUP 1d
│ ├── prompts.py # GROUP 2b
│ ├── product_llms.py # GROUP 2c
│ ├── favicon.py # GROUP 1e
│ ├── scoring.py # GROUP 1f
│ └── cloudfront.py # GROUP 1g
│
└── services/
└── discover_competitors/ # GROUP 1c: Competitor discovery