> ## Documentation Index
> Fetch the complete documentation index at: https://docs.searchcompany.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Regenerate Fresh Website

> Fully regenerate an AI site from scratch using the new onboarding flow

Performs a complete regeneration of an existing AI site using the **new onboarding flow** with Firecrawl agent for business info extraction.

<Warning>
  This endpoint performs a full rebuild. Use with caution in production.
</Warning>

## When to Use

* Regenerate content with fresh LLM output
* Fix issues with an existing AI site
* Test changes to the generation pipeline

This is different from `update-site` which only does incremental updates when source website content changes.

## New Architecture

This endpoint now uses the **same flow as onboarding**:

```mermaid theme={null}
flowchart TD
    Request["POST /regenerate-fresh-website"]
    
    subgraph parallel [Step 1 - Parallel]
        BusinessInfo["Discover Business Info<br/>(Firecrawl Agent)"]
        Scrape["Scrape Website<br/>(for replicas)"]
    end
    
    subgraph sequential [Step 2-8 - Sequential]
        Hash["Hash Pages"]
        LLM["LLM Organize<br/>(from business_info)"]
        Generate["Generate Files"]
        Deploy["Deploy to Vercel"]
        Domain["Assign Domain"]
        Store["Store Hashes"]
        IndexNow["Submit to IndexNow"]
    end
    
    Request --> BusinessInfo
    Request --> Scrape
    BusinessInfo --> LLM
    Scrape --> Hash
    Scrape -->|"pages for replicas"| Generate
    Hash --> LLM
    LLM --> Generate
    Generate --> Deploy
    Deploy --> Domain
    Domain --> Store
    Store --> IndexNow
```

<Info>
  **Key Change**: Business info from Firecrawl agent is used for LLM content generation (llms.txt, Q\&A pages, data.json). Scraped pages are ONLY used for markdown replica generation.
</Info>

## Request Body

<ParamField body="business_id" type="string" required>
  The org\_slug / business ID (e.g. "the-ai-teddy-bear-company-1767082986")
</ParamField>

<ParamField body="url" type="string">
  Source website URL. Optional - will use URL from database if not provided.
</ParamField>

<ParamField body="max_pages" type="integer">
  Maximum pages to scrape. Default: 5000
</ParamField>

## Response

<ResponseField name="status" type="string">
  "success" or "error"
</ResponseField>

<ResponseField name="ai_site_url" type="string">
  The AI site URL (unchanged from before)
</ResponseField>

<ResponseField name="business_name" type="string">
  The business name from database
</ResponseField>

<ResponseField name="pages_scraped" type="integer">
  Number of pages scraped from source
</ResponseField>

<ResponseField name="files_generated" type="integer">
  Number of files generated
</ResponseField>

<ResponseField name="qa_pages" type="integer">
  Number of Q\&A pages generated
</ResponseField>

<ResponseField name="replica_pages" type="integer">
  Number of markdown replica pages generated
</ResponseField>

<RequestExample>
  ```bash cURL theme={null}
  curl -X POST https://searchcompany-main.up.railway.app/api/cron/regenerate-fresh-website \
    -H "Content-Type: application/json" \
    -H "X-API-Key: search-company" \
    -d '{
      "business_id": "the-ai-teddy-bear-company-1767082986",
      "url": "https://new-supreme-3.myshopify.com"
    }'
  ```
</RequestExample>

<ResponseExample>
  ```json Response theme={null}
  {
    "status": "success",
    "ai_site_url": "https://the-ai-teddy-bear-company-1767082986.searchcompany.dev",
    "source_url": "https://new-supreme-3.myshopify.com",
    "business_id": "the-ai-teddy-bear-company-1767082986",
    "business_name": "The AI Teddy Bear Company",
    "pages_scraped": 15,
    "files_generated": 25,
    "pages_hashed": 15,
    "qa_pages": 8,
    "replica_pages": 15
  }
  ```
</ResponseExample>

## Process Steps

| Step   | Action                 | Details                                                                                                      |
| ------ | ---------------------- | ------------------------------------------------------------------------------------------------------------ |
| **1a** | Discover Business Info | Firecrawl agent extracts: description, products\_services, target\_market, key\_features, value\_proposition |
| **1b** | Scrape Website         | Custom mapper + Firecrawl batch scrape for markdown replicas                                                 |
| **2**  | Hash Pages             | Raw HTML hashing for future change detection                                                                 |
| **3**  | LLM Organize           | Three parallel Gemini calls using **business\_info** (not pages)                                             |
| **4**  | Generate Files         | Create all files including markdown replicas from **pages**                                                  |
| **5**  | Deploy                 | Push to existing Vercel project                                                                              |
| **6**  | Assign Domain          | Update domain records if needed                                                                              |
| **7**  | Store Hashes           | Save page hashes for change detection                                                                        |
| **8**  | IndexNow               | Submit all URLs for instant indexing                                                                         |

<Note>
  **Step 1a and 1b run in parallel** for faster execution. The rest runs sequentially.
</Note>

## Content Generation Sources

| Content               | Source                    | Why                                       |
| --------------------- | ------------------------- | ----------------------------------------- |
| **llms.txt**          | Business Info (Firecrawl) | Focused, structured business context      |
| **Q\&A Pages**        | Business Info (Firecrawl) | Clean Q\&A from business understanding    |
| **data.json**         | Business Info (Firecrawl) | Accurate Schema.org from business context |
| **Markdown Replicas** | Scraped Pages             | 1:1 copy of original website content      |

## Files Generated

### LLM-Generated Content (from business\_info)

| File               | Source                  | Purpose                      |
| ------------------ | ----------------------- | ---------------------------- |
| `public/llms.txt`  | Gemini + business\_info | Primary AI-readable content  |
| `pages/index.js`   | Gemini + business\_info | Homepage with Q\&A structure |
| `public/data.json` | Gemini + business\_info | Schema.org structured data   |

### Q\&A Pages (from business\_info)

| File Pattern      | Purpose                                                  |
| ----------------- | -------------------------------------------------------- |
| `pages/{slug}.js` | AI-generated Q\&A pages (e.g., `/what-is-teddy-bear-ai`) |

### Markdown Replica Pages (from scraped pages)

| File Pattern                | Purpose                               |
| --------------------------- | ------------------------------------- |
| `pages/{path}.js`           | Next.js page for each scraped URL     |
| `public/markdown/{path}.md` | Markdown content for each scraped URL |

### Static Templates

| File                        | Purpose                                    |
| --------------------------- | ------------------------------------------ |
| `public/robots.txt`         | Crawler permissions (allows all bots)      |
| `public/sitemap.xml`        | Site structure for search engines          |
| `middleware.js`             | Edge middleware for tracking AI bot visits |
| `package.json`              | Next.js dependencies                       |
| `next.config.js`            | Next.js configuration                      |
| `public/search-company.txt` | IndexNow key verification file             |

## Business Info Schema

The Firecrawl agent extracts this structured data:

```json theme={null}
{
  "description": "2-3 sentence description of what the company does",
  "products_services": "Overview of their main products or services",
  "target_market": "Who their target customers are",
  "key_features": "Key features, capabilities, or differentiators",
  "value_proposition": "The core value they provide to customers",
  "business_name": "The AI Teddy Bear Company",
  "url": "https://new-supreme-3.myshopify.com"
}
```

## Code Location

```
Backend/src/app/apis/cron/regenerate_fresh_website/routes.py
```

### Key Imports

```python theme={null}
from src.app.shared.discover_business_info import discover_business_info
from src.app.shared.ai_website import (
    organize_with_llm_from_business_info,
    generate_ai_site,
    deploy_to_vercel,
    assign_domain,
)
```
