> ## Documentation Index
> Fetch the complete documentation index at: https://docs.searchcompany.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Onboarding Overview

> All endpoints needed to onboard a new business for AI visibility tracking

The Onboarding API handles everything needed when a new business signs up. The **backend orchestrates** all tasks via a single endpoint - the frontend just triggers it and can navigate away.

## Flow

When a user completes payment and onboarding:

```
FRONTEND:
1. POST /api/business                     → Create org metadata + entity
2. POST /api/onboarding/generate-all      → Trigger backend orchestrator (returns immediately)

BACKEND (runs in background):
├── GROUP 1 - All Parallel:
│   ├── 1a: Scrape website (pages used by GROUP 2)
│   ├── 1b: Discover Competitors (Firecrawl agent)
│   ├── 1c: Discover Products (Shopify products.json API)
│   ├── 1d: Fetch Favicon
│   ├── 1e: Materialize Score
│   └── 1f: Setup CloudFront
│
├── GROUP 2 - After Scrape (1a):
│   └── 2a: Create AI Website (uses scraped pages)
│
└── GROUP 3 - After Products (1c) - Parallel:
    ├── 3a: Product Prompts (5+ prompts per product, min 50 total)
    └── 3b: Generate Product LLMs (/llms/{product-slug}.txt files)
```

<Info>
  **Optimized Flow**: All GROUP 1 tasks run in parallel. GROUP 2 starts when scrape completes.
  GROUP 3 starts when products are discovered - it can start before GROUP 2!
  GROUP 3a and 3b run in parallel with each other.
</Info>

## Endpoints

The frontend calls these endpoints:

| Endpoint                                                            | Purpose                                              | Auth Required |
| ------------------------------------------------------------------- | ---------------------------------------------------- | ------------- |
| [Product Names](/api-reference/endpoint/onboarding/product-names)   | Fetch product names for scanning UI                  | No            |
| [Create Business](/api-reference/endpoint/business/create-business) | Create org metadata + entity                         | Yes (JWT)     |
| [Generate All](/api-reference/endpoint/onboarding/generate-all)     | **Backend orchestrator** - runs all onboarding tasks | Yes (JWT)     |

<Note>
  All other onboarding tasks are **internal services** called by `generate-all`.
  They are not exposed as HTTP endpoints.
</Note>

## Internal Services

The `generate-all` orchestrator calls these shared services directly (not via HTTP):

| Service            | Purpose                                             | Location                                    |
| ------------------ | --------------------------------------------------- | ------------------------------------------- |
| **Scraping**       | Custom mapper + Firecrawl batch scrape              | `shared/scraping/`, `shared/mapping/`       |
| **AI Website**     | Deploy AI-optimized site to Vercel                  | `shared/ai_website/`                        |
| **Prompts**        | Generate visibility prompts with Gemini             | `shared/prompts/`                           |
| **Products**       | Discover products via Shopify API                   | `shared/products/`                          |
| **CloudFront**     | Create CloudFront distribution                      | `shared/cloudfront/`                        |
| **Content Hasher** | Store page hashes for change detection              | `shared/content_hasher/`                    |
| **Favicon**        | Fetch & store favicon (onboarding-only)             | `onboarding/generate_all/tasks/favicon.py`  |
| **Scoring**        | Copy pre-payment ranking score (onboarding-only)    | `onboarding/generate_all/tasks/scoring.py`  |
| **Competitors**    | Discover up to 10 competitors using Firecrawl agent | `onboarding/services/discover_competitors/` |

<Note>
  Services in `shared/` are used by multiple modules (onboarding, cron, domain).
  Services in `onboarding/generate_all/` are only used during onboarding.
</Note>

## What Gets Created

After onboarding completes, the business has:

| Asset                       | Description                                   | Created By              |
| --------------------------- | --------------------------------------------- | ----------------------- |
| **Org Metadata**            | Clerk org details in database                 | `POST /api/business`    |
| **Business Entity**         | Entity record in `entities` table             | `POST /api/business`    |
| **Favicon**                 | Stored favicon URL                            | Favicon Service         |
| **AI Site**                 | AI-optimized website at `*.searchcompany.dev` | AI Website Service      |
| **Markdown Replica Pages**  | 1:1 markdown copies of source website pages   | AI Website Service      |
| **Product Entities**        | Auto-discovered products from Shopify         | Products Service        |
| **50+ Product Prompts**     | 5+ prompts per product (min 50 total)         | Product Prompts Service |
| **Product LLMs Files**      | `/llms/{product-slug}.txt` for each product   | Product LLMs Service    |
| **Visibility Score**        | Initial pre-payment score                     | Scoring Service         |
| **CloudFront Distribution** | Pre-created proxy for DNS propagation         | CloudFront Service      |
| **Competitors**             | Up to 10 auto-discovered competitors          | Competitors Service     |

<Note>
  The business **entity** is created by `POST /api/business` before `generate-all` is called.
  All other assets are created by the backend orchestrator running in the background.
</Note>

## Prompt Generation Strategy

All prompts are tied to products:

| Metric                      | Value                             |
| --------------------------- | --------------------------------- |
| **Prompts per product**     | 5 (default)                       |
| **Minimum total prompts**   | 50 during onboarding              |
| **New products (via cron)** | 5 prompts each                    |
| **Daily sampling**          | 10 prompts for visibility scoring |

If a store has fewer than 10 products, prompts per product is increased to ensure at least 50 total.

## Regenerating Prompts

If prompts need to be regenerated for a product:

```bash theme={null}
curl -X POST https://searchcompany-main.up.railway.app/api/cron/generate-prompts \
  -H "X-API-Key: your-api-key" \
  -H "Content-Type: application/json" \
  -d '{
    "business_id": "my-business-abc123",
    "url": "https://mystore.com/products/my-product",
    "product_id": "product-entity-uuid",
    "product_name": "My Product"
  }'
```

## Testing

Run all onboarding tests:

```bash theme={null}
cd Backend
uv run pytest src/pytests/onboarding/test_generate_all.py -v -s
```
