> ## Documentation Index
> Fetch the complete documentation index at: https://docs.searchcompany.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Generate All

> Backend orchestrator that runs all onboarding tasks in the background

## Overview

This is the **main onboarding endpoint**. It triggers all onboarding tasks and runs them in the background. The frontend can navigate away immediately after calling this endpoint.

<Info>
  This endpoint returns immediately with status `"started"`. All tasks run asynchronously in the backend.
</Info>

## Request Body

<ParamField body="url" type="string" required>
  The Shopify store URL (e.g., `https://mystore.com`)
</ParamField>

<ParamField body="org_slug" type="string" required>
  The Clerk organization slug (e.g., `my-business-abc123`)
</ParamField>

<ParamField body="business_name" type="string" required>
  The business name for display purposes
</ParamField>

## Response

<ResponseField name="status" type="string">
  Always `"started"` on success
</ResponseField>

<ResponseField name="message" type="string">
  Human-readable status message
</ResponseField>

## Example

<RequestExample>
  ```bash cURL theme={null}
  curl -X POST https://searchcompany-main.up.railway.app/api/onboarding/generate-all \
    -H "Authorization: Bearer $TOKEN" \
    -H "Content-Type: application/json" \
    -d '{
      "url": "https://mystore.com",
      "org_slug": "my-business-abc123",
      "business_name": "My Business"
    }'
  ```
</RequestExample>

<ResponseExample>
  ```json Success Response theme={null}
  {
    "status": "started",
    "message": "Onboarding tasks started for my-business-abc123. Check logs for progress."
  }
  ```
</ResponseExample>

## Internal Services

The orchestrator calls these services directly (not via HTTP):

### GROUP 1: All Parallel Tasks

All GROUP 1 tasks run in parallel. None block each other.

| Group  | Service                | Purpose                                                                      |
| ------ | ---------------------- | ---------------------------------------------------------------------------- |
| **1a** | Discover Business Info | Uses Firecrawl Agent to extract what the company does                        |
| **1b** | Scrape Website         | Custom mapper + Firecrawl batch scrape. Returns pages for markdown replicas. |
| **1c** | Discover Competitors   | Uses Firecrawl Agent API to find up to 10 competitors                        |
| **1d** | Discover Products      | Fetches products from Shopify products.json API                              |
| **1e** | Fetch Favicon          | Downloads favicon, converts to PNG, uploads to storage                       |
| **1f** | Materialize Score      | Copies pre-payment ranking score to history table                            |
| **1g** | Setup CloudFront       | Creates CloudFront distribution for domain proxy                             |

### GROUP 2: After 1a + 1b + 1d Complete (All Parallel)

| Group  | Service               | Purpose                                                                                |
| ------ | --------------------- | -------------------------------------------------------------------------------------- |
| **2a** | Create AI Website     | Uses business\_info (1a) for llms.txt/Q\&A/data.json; pages (1b) for markdown replicas |
| **2b** | Product Prompts       | Generates 5+ prompts per product (min 50 total)                                        |
| **2c** | Generate Product LLMs | Generates `/llms/{product-slug}.txt` files for each product                            |

<Note>
  **Key Architecture Change**:

  * GROUP 1a (Discover Business Info) uses Firecrawl Agent to extract business information
  * This business info is used by GROUP 2a to generate llms.txt, Q\&A pages, and data.json
  * Scraped pages (GROUP 1b) are ONLY used for markdown replica generation
  * This separation makes LLM content generation more focused and efficient
</Note>

## Prompt Generation Strategy

All prompts are now tied to products (no business-level prompts):

| Metric                      | Value                             |
| --------------------------- | --------------------------------- |
| **Prompts per product**     | 5 (default)                       |
| **Minimum total prompts**   | 50 during onboarding              |
| **New products (via cron)** | 5 prompts each                    |
| **Daily sampling**          | 10 prompts for visibility scoring |

If a store has fewer than 10 products, prompts per product is increased to ensure at least 50 total.

## Service Details

### Discover Business Info Service (`shared/discover_business_info/service.py`)

```python theme={null}
async def discover_business_info(
    url: str,
    business_name: str
) -> dict
```

1. Calls Firecrawl Agent API with the business URL
2. Extracts: description, products\_services, target\_market, key\_features, value\_proposition
3. Returns structured dict for AI website generation

### Discover Products Service (`shared/products/discover.py`)

```python theme={null}
async def discover_products(
    business_id: str,
    source_url: str,
    business_name: str,
    parent_entity_id: str,
    generate_prompts: bool = False
) -> dict
```

1. Fetches products from `{store_url}/products.json`
2. Paginates through all pages of products
3. Extracts product title, description, URL, handle, variants
4. Filters out existing products
5. Saves new products to entities table
6. Returns products list for GROUP 2

### Product Prompts Service (`tasks/prompts.py`)

```python theme={null}
async def run_product_prompts(
    url: str,
    org_slug: str,
    discovered_products: list,
    pages: list,
    prompts_per_product: int = 5
) -> StepResult
```

1. Calculates prompts per product to ensure minimum 50 total
2. Generates prompts for each product using Gemini 3 Flash
3. Saves to `entity_prompts_tracker` table
4. Returns count of generated/saved prompts

### AI Website Service (`shared/ai_website/`)

```python theme={null}
async def create_ai_website_from_business_info(
    url: str,
    business_id: str,
    business_info: dict,
    pages: List[dict]
) -> dict
```

1. Uses business\_info from GROUP 1a for LLM content generation
2. Runs 3 parallel Gemini calls for llms.txt, Q\&A pages, data.json
3. Uses scraped pages from GROUP 1b for markdown replica generation only
4. Deploys to Vercel
5. Assigns `*.searchcompany.dev` subdomain

## Prerequisites

Before calling this endpoint, you must:

1. Create a Clerk organization
2. Call `POST /api/business` to create the entity

The entity must exist before `generate-all` runs.

## Monitoring Progress

Check backend logs to monitor progress:

```
🚀 GENERATE ALL: Starting onboarding for my-business-abc123
   URL: https://mystore.com
   Business: My Business

🏢 GROUP 1a: Discovering business info (Firecrawl agent)...
📡 GROUP 1b: Scraping website (for replicas)...
🔍 GROUP 1c: Discovering competitors (parallel)...
🛍️ GROUP 1d: Discovering products from Shopify (parallel)...
🎨 GROUP 1e: Fetching favicon (parallel)...
📊 GROUP 1f: Materializing score (parallel)...
☁️ GROUP 1g: Setting up CloudFront (parallel)...

✅ GROUP 1a Complete: Business info discovered
✅ GROUP 1b Complete: 15 pages scraped
✅ GROUP 1d Complete: 8 products discovered

⚡ GROUP 2: Running AI Website + Product tasks in parallel...
   📊 [GROUP 2] 8 products × 7 prompts = 56 total (min 50)

   ✅ create_ai_website: https://my-business-abc123.searchcompany.dev
   ✅ product_prompts: 56 total for 8 products
   ✅ generate_product_llms_txt: 8 files deployed

⏳ Waiting for GROUP 1c, 1e, 1f, 1g to complete...
   ✅ discover_competitors: 10 found
   ✅ favicon: stored
   ✅ materialize_score: 72
   ✅ setup_cloudfront: d1234567890abc.cloudfront.net

🏁 GENERATE ALL: Complete for my-business-abc123
   ✅ Success: 10/10 tasks
```

## File Structure

```
src/app/
├── shared/                     # Shared services (used by onboarding + cron)
│   ├── discover_business_info/ # NEW: Firecrawl agent for business info
│   │   ├── __init__.py
│   │   └── service.py          # discover_business_info function
│   ├── scraping/               # Custom mapper + Firecrawl batch scrape
│   ├── mapping/                # Custom website mapper (URL discovery)
│   ├── ai_website/             # AI Website Service
│   │   ├── service.py          # create_ai_website_from_business_info
│   │   └── llm_organize.py     # organize_with_llm_from_business_info
│   ├── products/               # Product discovery + llms generation
│   │   ├── discover.py         # discover_products service (Shopify API)
│   │   └── generate_llms_txt.py # generate_product_llms_txt service
│   ├── prompts/                # Prompts Service
│   ├── cloudfront/             # CloudFront Service
│   └── content_hasher/         # Markdown hash storage
│
└── apis/onboarding/
    ├── generate_all/
    │   ├── routes.py           # Main endpoint & orchestrator
    │   ├── scrape_website.py   # GROUP 1b: Scrape wrapper
    │   ├── models.py           # Pydantic models
    │   └── tasks/              # Task wrappers
    │       ├── business_info.py # GROUP 1a: NEW
    │       ├── ai_website.py   # GROUP 2a
    │       ├── discover_products.py # GROUP 1d
    │       ├── prompts.py      # GROUP 2b
    │       ├── product_llms.py # GROUP 2c
    │       ├── favicon.py      # GROUP 1e
    │       ├── scoring.py      # GROUP 1f
    │       └── cloudfront.py   # GROUP 1g
    │
    └── services/
        └── discover_competitors/  # GROUP 1c: Competitor discovery
```
