> ## Documentation Index
> Fetch the complete documentation index at: https://docs.searchcompany.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Overview

> How the Generate All orchestrator coordinates all onboarding services

## Architecture

The `generate-all` endpoint is a **thin orchestrator** that coordinates internal services. It returns immediately and runs all tasks in the background.

```mermaid theme={null}
flowchart TD
    Frontend["Frontend"]
    GenerateAll["POST /api/onboarding/generate-all"]

    subgraph group1 [GROUP 1 - All Parallel]
        BusinessInfo["1a: Discover Business Info"]
        Scrape["1b: Scrape Website"]
        Competitors["1c: Discover Competitors"]
        Products["1d: Discover Products"]
        Favicon["1e: Fetch Favicon"]
        Score["1f: Materialize Score"]
        Cloudfront["1g: Setup CloudFront"]
    end

    subgraph group2 [GROUP 2 - After 1a + 1b + 1d]
        AIWebsite["2a: Create AI Website"]
        ProductPrompts["2b: Product Prompts"]
        ProductLlms["2c: Generate Product LLMs"]
    end

    Frontend --> GenerateAll
    GenerateAll --> BusinessInfo
    GenerateAll --> Scrape
    GenerateAll --> Competitors
    GenerateAll --> Products
    GenerateAll --> Favicon
    GenerateAll --> Score
    GenerateAll --> Cloudfront

    BusinessInfo -->|"business_info"| AIWebsite
    Scrape -->|"pages[] for replicas"| AIWebsite
    Products -->|"products[]"| AIWebsite

    Products -->|"products[]"| ProductPrompts
    Products -->|"products[]"| ProductLlms
    Scrape -->|"pages[]"| ProductLlms
```

## Execution Groups

| Group  | Services               | Waits For  | Purpose                                                 |
| ------ | ---------------------- | ---------- | ------------------------------------------------------- |
| **1a** | Discover Business Info | Nothing    | Firecrawl agent for business context                    |
| **1b** | Scrape Website         | Nothing    | Get pages for markdown replicas                         |
| **1c** | Discover Competitors   | Nothing    | Fire-and-forget, doesn't block                          |
| **1d** | Discover Products      | Nothing    | Uses Shopify products.json API                          |
| **1e** | Fetch Favicon          | Nothing    | Fire-and-forget, doesn't block                          |
| **1f** | Materialize Score      | Nothing    | Fire-and-forget, doesn't block                          |
| **1g** | Setup CloudFront       | Nothing    | Fire-and-forget, doesn't block                          |
| **2a** | Create AI Website      | 1a, 1b, 1d | Uses business\_info for LLM content, pages for replicas |
| **2b** | Product Prompts        | 1d         | Uses discovered products                                |
| **2c** | Generate Product LLMs  | 1b, 1d     | Uses products and pages                                 |

<Info>
  **Key Architecture Change**: Business info from Firecrawl agent (GROUP 1a) is used for LLM content generation (llms.txt, Q\&A pages, data.json). Scraped pages (GROUP 1b) are ONLY used for markdown replica generation.
</Info>

<Info>
  **GROUP 2 runs in parallel** - All three tasks (2a, 2b, 2c) start simultaneously once their dependencies are ready.
</Info>

## Internal Services

Each service is documented on its own page:

| Service                | Group | Input                                    | Output                  | Page                             |
| ---------------------- | ----- | ---------------------------------------- | ----------------------- | -------------------------------- |
| Discover Business Info | 1a    | `url`, `business_name`                   | `business_info{}`       | [View →](discover-business-info) |
| Scrape Website         | 1b    | `url`                                    | `pages[]`               | [View →](scrape-website)         |
| Discover Competitors   | 1c    | `url`, `business_name`                   | `competitors[]`         | [View →](discover-competitors)   |
| Discover Products      | 1d    | `url`, `org_slug`                        | `products[]`            | [View →](discover-products)      |
| Fetch Favicon          | 1e    | `url`, `org_slug`                        | favicon URL             | [View →](fetch-favicon)          |
| Materialize Score      | 1f    | `url`, `org_slug`                        | visibility score        | [View →](materialize-score)      |
| Setup CloudFront       | 1g    | `url`, `org_slug`                        | CloudFront distribution | [View →](setup-cloudfront)       |
| Create AI Website      | 2a    | `business_info`, `pages[]`, `products[]` | AI site URL             | [View →](create-ai-website)      |
| Product Prompts        | 2b    | `products[]`, `business_name`            | 5+ prompts/product      | [View →](product-prompts)        |
| Generate Product LLMs  | 2c    | `products[]`, `pages[]`, `business_name` | llms.txt files          | [View →](generate-product-llms)  |

## Prompt Generation

All prompts are now tied to products:

| Metric                      | Value          |
| --------------------------- | -------------- |
| **Prompts per product**     | 5 (default)    |
| **Minimum total prompts**   | 50             |
| **New products (via cron)** | 5 prompts each |
| **Daily sampling**          | 10 prompts     |

During onboarding, if there are fewer than 10 products, prompts per product is increased to ensure at least 50 total prompts.

## Data Flow

```
GROUP 1 (all parallel):
┌──────────────┐  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐
│ business_info│  │   scrape     │  │ competitors  │  │   products   │
│   (1a)       │  │    (1b)      │  │    (1c)      │  │    (1d)      │
└──────┬───────┘  └──────┬───────┘  └──────────────┘  └──────┬───────┘
       │                 │                                    │
       │                 │                                    │
       ▼                 ▼                                    ▼
┌──────────────────────────────────────────────────────────────────┐
│                         GROUP 2 (parallel)                        │
├──────────────────────┬───────────────────┬───────────────────────┤
│    ai_website (2a)   │   prompts (2b)    │   llms_txt (2c)       │
│  uses: business_info │  uses: products   │  uses: products       │
│        pages (replicas)                  │        pages          │
│        products                          │        business_name  │
└──────────────────────┴───────────────────┴───────────────────────┘

Also running in parallel (GROUP 1e-1g):
┌──────────────┐  ┌──────────────┐  ┌──────────────┐
│   favicon    │  │   scoring    │  │  cloudfront  │
└──────────────┘  └──────────────┘  └──────────────┘
```

## Code Location

```
src/app/apis/onboarding/
├── generate_all/
│   ├── routes.py              # Main orchestrator (all groups)
│   ├── models.py              # Pydantic models
│   ├── scrape_website.py      # GROUP 1b: Scrape service wrapper
│   └── tasks/                 # Individual task wrappers
│       ├── business_info.py   # GROUP 1a: NEW
│       ├── ai_website.py      # GROUP 2a
│       ├── cloudfront.py      # GROUP 1g
│       ├── discover_products.py # GROUP 1d
│       ├── favicon.py         # GROUP 1e
│       ├── prompts.py         # GROUP 2b
│       ├── product_llms.py    # GROUP 2c
│       └── scoring.py         # GROUP 1f
└── services/
    └── discover_competitors/  # GROUP 1c: Competitor discovery service

src/app/shared/
├── discover_business_info/    # NEW: Firecrawl agent for business info
│   ├── __init__.py
│   └── service.py
├── ai_website/                # AI Website service
│   ├── service.py             # create_ai_website_from_business_info
│   └── llm_organize.py        # organize_with_llm_from_business_info
└── products/                  # Shared services used by both onboarding and cron
    ├── discover.py            # Product discovery via Shopify products.json
    └── generate_llms_txt.py   # Product llms.txt generation service
```

## Testing Individual Services

Each service can be tested independently via pytest:

```bash theme={null}
# Test the orchestrator
uv run pytest src/pytests/onboarding/test_generate_all.py -v

# Test shared services used by generate-all
uv run pytest src/pytests/cron/test_generate_prompts.py -v
```
