Performs a complete regeneration of an existing AI site using the new onboarding flow with Firecrawl agent for business info extraction.
This endpoint performs a full rebuild. Use with caution in production.
When to Use
- Regenerate content with fresh LLM output
- Fix issues with an existing AI site
- Test changes to the generation pipeline
This is different from update-site which only does incremental updates when source website content changes.
New Architecture
This endpoint now uses the same flow as onboarding:
Key Change: Business info from Firecrawl agent is used for LLM content generation (llms.txt, Q&A pages, data.json). Scraped pages are ONLY used for markdown replica generation.
Request Body
The org_slug / business ID (e.g. βthe-ai-teddy-bear-company-1767082986β)
Source website URL. Optional - will use URL from database if not provided.
Maximum pages to scrape. Default: 5000
Response
βsuccessβ or βerrorβ
The AI site URL (unchanged from before)
The business name from database
Number of pages scraped from source
Number of files generated
Number of Q&A pages generated
Number of markdown replica pages generated
curl -X POST https://searchcompany-main.up.railway.app/api/cron/regenerate-fresh-website \
-H "Content-Type: application/json" \
-H "X-API-Key: search-company" \
-d '{
"business_id": "the-ai-teddy-bear-company-1767082986",
"url": "https://new-supreme-3.myshopify.com"
}'
{
"status": "success",
"ai_site_url": "https://the-ai-teddy-bear-company-1767082986.searchcompany.dev",
"source_url": "https://new-supreme-3.myshopify.com",
"business_id": "the-ai-teddy-bear-company-1767082986",
"business_name": "The AI Teddy Bear Company",
"pages_scraped": 15,
"files_generated": 25,
"pages_hashed": 15,
"qa_pages": 8,
"replica_pages": 15
}
Process Steps
| Step | Action | Details |
|---|
| 1a | Discover Business Info | Firecrawl agent extracts: description, products_services, target_market, key_features, value_proposition |
| 1b | Scrape Website | Custom mapper + Firecrawl batch scrape for markdown replicas |
| 2 | Hash Pages | Raw HTML hashing for future change detection |
| 3 | LLM Organize | Three parallel Gemini calls using business_info (not pages) |
| 4 | Generate Files | Create all files including markdown replicas from pages |
| 5 | Deploy | Push to existing Vercel project |
| 6 | Assign Domain | Update domain records if needed |
| 7 | Store Hashes | Save page hashes for change detection |
| 8 | IndexNow | Submit all URLs for instant indexing |
Step 1a and 1b run in parallel for faster execution. The rest runs sequentially.
Content Generation Sources
| Content | Source | Why |
|---|
| llms.txt | Business Info (Firecrawl) | Focused, structured business context |
| Q&A Pages | Business Info (Firecrawl) | Clean Q&A from business understanding |
| data.json | Business Info (Firecrawl) | Accurate Schema.org from business context |
| Markdown Replicas | Scraped Pages | 1:1 copy of original website content |
Files Generated
LLM-Generated Content (from business_info)
| File | Source | Purpose |
|---|
public/llms.txt | Gemini + business_info | Primary AI-readable content |
pages/index.js | Gemini + business_info | Homepage with Q&A structure |
public/data.json | Gemini + business_info | Schema.org structured data |
Q&A Pages (from business_info)
| File Pattern | Purpose |
|---|
pages/{slug}.js | AI-generated Q&A pages (e.g., /what-is-teddy-bear-ai) |
Markdown Replica Pages (from scraped pages)
| File Pattern | Purpose |
|---|
pages/{path}.js | Next.js page for each scraped URL |
public/markdown/{path}.md | Markdown content for each scraped URL |
Static Templates
| File | Purpose |
|---|
public/robots.txt | Crawler permissions (allows all bots) |
public/sitemap.xml | Site structure for search engines |
middleware.js | Edge middleware for tracking AI bot visits |
package.json | Next.js dependencies |
next.config.js | Next.js configuration |
public/search-company.txt | IndexNow key verification file |
Business Info Schema
The Firecrawl agent extracts this structured data:
{
"description": "2-3 sentence description of what the company does",
"products_services": "Overview of their main products or services",
"target_market": "Who their target customers are",
"key_features": "Key features, capabilities, or differentiators",
"value_proposition": "The core value they provide to customers",
"business_name": "The AI Teddy Bear Company",
"url": "https://new-supreme-3.myshopify.com"
}
Code Location
Backend/src/app/apis/cron/regenerate_fresh_website/routes.py
Key Imports
from src.app.shared.discover_business_info import discover_business_info
from src.app.shared.ai_website import (
organize_with_llm_from_business_info,
generate_ai_site,
deploy_to_vercel,
assign_domain,
)