Skip to main content
POST
https://searchcompany-main.up.railway.app
/
api
/
cron
/
regenerate-fresh-website
curl -X POST https://searchcompany-main.up.railway.app/api/cron/regenerate-fresh-website \
  -H "Content-Type: application/json" \
  -H "X-API-Key: search-company" \
  -d '{
    "business_id": "the-ai-teddy-bear-company-1767082986",
    "url": "https://new-supreme-3.myshopify.com"
  }'
{
  "status": "success",
  "ai_site_url": "https://the-ai-teddy-bear-company-1767082986.searchcompany.dev",
  "source_url": "https://new-supreme-3.myshopify.com",
  "business_id": "the-ai-teddy-bear-company-1767082986",
  "business_name": "The AI Teddy Bear Company",
  "pages_scraped": 15,
  "files_generated": 25,
  "pages_hashed": 15,
  "qa_pages": 8,
  "replica_pages": 15
}
Performs a complete regeneration of an existing AI site using the new onboarding flow with Firecrawl agent for business info extraction.
This endpoint performs a full rebuild. Use with caution in production.

When to Use

  • Regenerate content with fresh LLM output
  • Fix issues with an existing AI site
  • Test changes to the generation pipeline
This is different from update-site which only does incremental updates when source website content changes.

New Architecture

This endpoint now uses the same flow as onboarding:
Key Change: Business info from Firecrawl agent is used for LLM content generation (llms.txt, Q&A pages, data.json). Scraped pages are ONLY used for markdown replica generation.

Request Body

business_id
string
required
The org_slug / business ID (e.g. β€œthe-ai-teddy-bear-company-1767082986”)
url
string
Source website URL. Optional - will use URL from database if not provided.
max_pages
integer
Maximum pages to scrape. Default: 5000

Response

status
string
β€œsuccess” or β€œerror”
ai_site_url
string
The AI site URL (unchanged from before)
business_name
string
The business name from database
pages_scraped
integer
Number of pages scraped from source
files_generated
integer
Number of files generated
qa_pages
integer
Number of Q&A pages generated
replica_pages
integer
Number of markdown replica pages generated
curl -X POST https://searchcompany-main.up.railway.app/api/cron/regenerate-fresh-website \
  -H "Content-Type: application/json" \
  -H "X-API-Key: search-company" \
  -d '{
    "business_id": "the-ai-teddy-bear-company-1767082986",
    "url": "https://new-supreme-3.myshopify.com"
  }'
{
  "status": "success",
  "ai_site_url": "https://the-ai-teddy-bear-company-1767082986.searchcompany.dev",
  "source_url": "https://new-supreme-3.myshopify.com",
  "business_id": "the-ai-teddy-bear-company-1767082986",
  "business_name": "The AI Teddy Bear Company",
  "pages_scraped": 15,
  "files_generated": 25,
  "pages_hashed": 15,
  "qa_pages": 8,
  "replica_pages": 15
}

Process Steps

StepActionDetails
1aDiscover Business InfoFirecrawl agent extracts: description, products_services, target_market, key_features, value_proposition
1bScrape WebsiteCustom mapper + Firecrawl batch scrape for markdown replicas
2Hash PagesRaw HTML hashing for future change detection
3LLM OrganizeThree parallel Gemini calls using business_info (not pages)
4Generate FilesCreate all files including markdown replicas from pages
5DeployPush to existing Vercel project
6Assign DomainUpdate domain records if needed
7Store HashesSave page hashes for change detection
8IndexNowSubmit all URLs for instant indexing
Step 1a and 1b run in parallel for faster execution. The rest runs sequentially.

Content Generation Sources

ContentSourceWhy
llms.txtBusiness Info (Firecrawl)Focused, structured business context
Q&A PagesBusiness Info (Firecrawl)Clean Q&A from business understanding
data.jsonBusiness Info (Firecrawl)Accurate Schema.org from business context
Markdown ReplicasScraped Pages1:1 copy of original website content

Files Generated

LLM-Generated Content (from business_info)

FileSourcePurpose
public/llms.txtGemini + business_infoPrimary AI-readable content
pages/index.jsGemini + business_infoHomepage with Q&A structure
public/data.jsonGemini + business_infoSchema.org structured data

Q&A Pages (from business_info)

File PatternPurpose
pages/{slug}.jsAI-generated Q&A pages (e.g., /what-is-teddy-bear-ai)

Markdown Replica Pages (from scraped pages)

File PatternPurpose
pages/{path}.jsNext.js page for each scraped URL
public/markdown/{path}.mdMarkdown content for each scraped URL

Static Templates

FilePurpose
public/robots.txtCrawler permissions (allows all bots)
public/sitemap.xmlSite structure for search engines
middleware.jsEdge middleware for tracking AI bot visits
package.jsonNext.js dependencies
next.config.jsNext.js configuration
public/search-company.txtIndexNow key verification file

Business Info Schema

The Firecrawl agent extracts this structured data:
{
  "description": "2-3 sentence description of what the company does",
  "products_services": "Overview of their main products or services",
  "target_market": "Who their target customers are",
  "key_features": "Key features, capabilities, or differentiators",
  "value_proposition": "The core value they provide to customers",
  "business_name": "The AI Teddy Bear Company",
  "url": "https://new-supreme-3.myshopify.com"
}

Code Location

Backend/src/app/apis/cron/regenerate_fresh_website/routes.py

Key Imports

from src.app.shared.discover_business_info import discover_business_info
from src.app.shared.ai_website import (
    organize_with_llm_from_business_info,
    generate_ai_site,
    deploy_to_vercel,
    assign_domain,
)