> ## Documentation Index
> Fetch the complete documentation index at: https://docs.searchcompany.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Overview

> Technical overview of the Regenerate Fresh Website endpoint

# POST /api/cron/regenerate-fresh-website

Performs a FULL rebuild of an existing AI site using the **new onboarding flow** with Firecrawl agent for business info extraction.

## Purpose

Use this endpoint when you want to:

* Regenerate content with fresh LLM output
* Fix issues with an existing AI site
* Test changes to the generation pipeline

This is different from `update-site` which only does incremental updates when source website content changes.

## Architecture

```mermaid theme={null}
flowchart TD
    A[Request] --> B[Get Entity Info]
    B --> C{Parallel Step 1}
    
    subgraph parallel [Step 1 - Parallel]
        C1[Discover Business Info]
        C2[Scrape Website]
    end
    
    C --> C1
    C --> C2
    
    C1 --> D[LLM Organize]
    C2 --> E[Hash Pages]
    C2 --> F[Markdown Replicas]
    
    E --> D
    D --> G[Generate All Files]
    F --> G
    G --> H[Deploy to Vercel]
    H --> I[Assign Domain]
    I --> J[Store Page Hashes]
    J --> K[Submit to IndexNow]
    K --> L[Return Success]
```

<Info>
  **Key Change**: Business info from Firecrawl agent is used for LLM content generation. Scraped pages are ONLY used for markdown replicas.
</Info>

## Request Body

| Field         | Type    | Required | Description                                             |
| ------------- | ------- | -------- | ------------------------------------------------------- |
| `business_id` | string  | Yes      | Org slug (e.g., "the-ai-teddy-bear-company-1767082986") |
| `url`         | string  | No       | Source URL (fetched from DB if not provided)            |
| `max_pages`   | integer | No       | Max pages to scrape (default: 5000)                     |

## Pipeline Steps

| Step   | Service                                | Input                                     | Output              |
| ------ | -------------------------------------- | ----------------------------------------- | ------------------- |
| **1a** | `discover_business_info`               | url, business\_name                       | business\_info dict |
| **1b** | `scrape_website`                       | url                                       | pages\[]            |
| **2**  | `fetch_and_hash_batch`                 | page\_urls                                | page\_hashes        |
| **3**  | `organize_with_llm_from_business_info` | **business\_info**                        | organized\_data     |
| **4**  | `generate_ai_site`                     | organized\_data, **pages** (for replicas) | files\[]            |
| **5**  | `deploy_to_vercel`                     | files                                     | deployment\_url     |
| **6**  | `assign_domain`                        | deployment                                | ai\_site\_url       |
| **7**  | `store_page_hashes`                    | page\_hashes                              | -                   |
| **8**  | `submit_urls_to_indexnow`              | urls                                      | -                   |

<Note>
  Steps 1a and 1b run in **parallel** using `asyncio.gather()`.
</Note>

## Content Sources

| Content Type      | Source              | Why                                       |
| ----------------- | ------------------- | ----------------------------------------- |
| llms.txt          | business\_info (1a) | Focused, structured business context      |
| Q\&A Pages        | business\_info (1a) | Clean Q\&A from business understanding    |
| data.json         | business\_info (1a) | Accurate Schema.org from business context |
| Markdown Replicas | pages (1b)          | 1:1 copy of original website content      |

## Response Fields

| Field             | Type    | Description                      |
| ----------------- | ------- | -------------------------------- |
| `status`          | string  | "success" or error               |
| `ai_site_url`     | string  | Deployed AI site URL             |
| `source_url`      | string  | Source website URL               |
| `business_id`     | string  | Business identifier              |
| `business_name`   | string  | Business name                    |
| `pages_scraped`   | integer | Number of pages scraped          |
| `files_generated` | integer | Number of files generated        |
| `pages_hashed`    | integer | Number of page hashes stored     |
| `qa_pages`        | integer | Number of Q\&A pages generated   |
| `replica_pages`   | integer | Number of markdown replica pages |

## Example Request

```bash theme={null}
curl -X POST http://localhost:8000/api/cron/regenerate-fresh-website \
  -H "Content-Type: application/json" \
  -H "X-API-Key: search-company" \
  -d '{
    "business_id": "the-ai-teddy-bear-company-1767082986",
    "url": "https://new-supreme-3.myshopify.com"
  }'
```

## Code Location

```
src/app/apis/cron/regenerate_fresh_website/routes.py
```

### Key Imports

```python theme={null}
from src.app.shared.discover_business_info import discover_business_info
from src.app.shared.ai_website import (
    organize_with_llm_from_business_info,
    generate_ai_site,
    deploy_to_vercel,
    assign_domain,
)
```