Regenerate Fresh Website

Performs a complete regeneration of an existing AI site using the new onboarding flow with Firecrawl agent for business info extraction.

This endpoint performs a full rebuild. Use with caution in production.

When to Use

Regenerate content with fresh LLM output
Fix issues with an existing AI site
Test changes to the generation pipeline

This is different from update-site which only does incremental updates when source website content changes.

New Architecture

This endpoint now uses the same flow as onboarding:

Key Change: Business info from Firecrawl agent is used for LLM content generation (llms.txt, Q&A pages, data.json). Scraped pages are ONLY used for markdown replica generation.

Request Body

business_id

string

required

The org_slug / business ID (e.g. “the-ai-teddy-bear-company-1767082986”)

url

string

Source website URL. Optional - will use URL from database if not provided.

max_pages

integer

Maximum pages to scrape. Default: 5000

Response

status

string

“success” or “error”

ai_site_url

string

The AI site URL (unchanged from before)

business_name

string

The business name from database

pages_scraped

integer

Number of pages scraped from source

files_generated

integer

Number of files generated

qa_pages

integer

Number of Q&A pages generated

replica_pages

integer

Number of markdown replica pages generated

curl -X POST https://searchcompany-main.up.railway.app/api/cron/regenerate-fresh-website \
  -H "Content-Type: application/json" \
  -H "X-API-Key: search-company" \
  -d '{
    "business_id": "the-ai-teddy-bear-company-1767082986",
    "url": "https://new-supreme-3.myshopify.com"
  }'

{
  "status": "success",
  "ai_site_url": "https://the-ai-teddy-bear-company-1767082986.searchcompany.dev",
  "source_url": "https://new-supreme-3.myshopify.com",
  "business_id": "the-ai-teddy-bear-company-1767082986",
  "business_name": "The AI Teddy Bear Company",
  "pages_scraped": 15,
  "files_generated": 25,
  "pages_hashed": 15,
  "qa_pages": 8,
  "replica_pages": 15
}

Process Steps

Step	Action	Details
1a	Discover Business Info	Firecrawl agent extracts: description, products_services, target_market, key_features, value_proposition
1b	Scrape Website	Custom mapper + Firecrawl batch scrape for markdown replicas
2	Hash Pages	Raw HTML hashing for future change detection
3	LLM Organize	Three parallel Gemini calls using business_info (not pages)
4	Generate Files	Create all files including markdown replicas from pages
5	Deploy	Push to existing Vercel project
6	Assign Domain	Update domain records if needed
7	Store Hashes	Save page hashes for change detection
8	IndexNow	Submit all URLs for instant indexing

Step 1a and 1b run in parallel for faster execution. The rest runs sequentially.

Content Generation Sources

Content	Source	Why
llms.txt	Business Info (Firecrawl)	Focused, structured business context
Q&A Pages	Business Info (Firecrawl)	Clean Q&A from business understanding
data.json	Business Info (Firecrawl)	Accurate Schema.org from business context
Markdown Replicas	Scraped Pages	1:1 copy of original website content

Files Generated

LLM-Generated Content (from business_info)

File	Source	Purpose
`public/llms.txt`	Gemini + business_info	Primary AI-readable content
`pages/index.js`	Gemini + business_info	Homepage with Q&A structure
`public/data.json`	Gemini + business_info	Schema.org structured data

Q&A Pages (from business_info)

File Pattern	Purpose
`pages/{slug}.js`	AI-generated Q&A pages (e.g., `/what-is-teddy-bear-ai`)

Markdown Replica Pages (from scraped pages)

File Pattern	Purpose
`pages/{path}.js`	Next.js page for each scraped URL
`public/markdown/{path}.md`	Markdown content for each scraped URL

Static Templates

File	Purpose
`public/robots.txt`	Crawler permissions (allows all bots)
`public/sitemap.xml`	Site structure for search engines
`middleware.js`	Edge middleware for tracking AI bot visits
`package.json`	Next.js dependencies
`next.config.js`	Next.js configuration
`public/search-company.txt`	IndexNow key verification file

Business Info Schema

The Firecrawl agent extracts this structured data:

{
  "description": "2-3 sentence description of what the company does",
  "products_services": "Overview of their main products or services",
  "target_market": "Who their target customers are",
  "key_features": "Key features, capabilities, or differentiators",
  "value_proposition": "The core value they provide to customers",
  "business_name": "The AI Teddy Bear Company",
  "url": "https://new-supreme-3.myshopify.com"
}

Code Location

Backend/src/app/apis/cron/regenerate_fresh_website/routes.py

Key Imports

from src.app.shared.discover_business_info import discover_business_info
from src.app.shared.ai_website import (
    organize_with_llm_from_business_info,
    generate_ai_site,
    deploy_to_vercel,
    assign_domain,
)

Getting Started

Website

Onboarding

Cron

Your Current Setup

Explore

Settings - Toggle

Settings - Business

Settings - Team

Settings - Billing

Settings - Domain

Webhooks

Health

Manual Trigger

When to Use

New Architecture

Request Body

Response

Process Steps

Content Generation Sources

Files Generated

LLM-Generated Content (from business_info)

Q&A Pages (from business_info)

Markdown Replica Pages (from scraped pages)

Static Templates

Business Info Schema

Code Location

Key Imports

Getting Started

Website

Onboarding

Cron

Your Current Setup

Explore

Settings - Toggle

Settings - Business

Settings - Team

Settings - Billing

Settings - Domain

Webhooks

Health

Manual Trigger

​When to Use

​New Architecture

​Request Body

​Response

​Process Steps

​Content Generation Sources

​Files Generated

​LLM-Generated Content (from business_info)

​Q&A Pages (from business_info)

​Markdown Replica Pages (from scraped pages)

​Static Templates

​Business Info Schema

​Code Location

​Key Imports

When to Use

New Architecture

Request Body

Response

Process Steps

Content Generation Sources

Files Generated

LLM-Generated Content (from business_info)

Q&A Pages (from business_info)

Markdown Replica Pages (from scraped pages)

Static Templates

Business Info Schema

Code Location

Key Imports