Skip to main content
POST
https://searchcompany-main.up.railway.app
/
api
/
cron
/
regenerate-fresh-website
curl -X POST https://searchcompany-main.up.railway.app/api/cron/regenerate-fresh-website \
  -H "Content-Type: application/json" \
  -H "x-api-key: YOUR_INTERNAL_API_KEY" \
  -d '{
    "business_id": "website-arena-1766312513"
  }'
{
  "status": "success",
  "ai_site_url": "https://website-arena-1766312513.searchcompany.dev",
  "source_url": "https://www.websitearena.dev",
  "business_id": "website-arena-1766312513",
  "pages_scraped": 15,
  "files_generated": 11,
  "pages_hashed": 15
}
Performs a complete regeneration of an existing AI site using the same pipeline as initial onboarding, but deploys to the existing Vercel project.
This endpoint performs a full rebuild. Use with caution in production.

When to Use

  • Regenerate content with fresh LLM output
  • Fix issues with an existing AI site
  • Test changes to the generation pipeline
This is different from update-site which only does incremental updates when source website content changes.

Request Body

business_id
string
required
The org_slug / business ID (e.g. β€œwebsite-arena-1766312513”)
url
string
Source website URL. Optional - will use URL from database if not provided.
max_pages
integer
Maximum pages to scrape. Default: 5000

Response

status
string
β€œsuccess” or β€œerror”
ai_site_url
string
The AI site URL (unchanged from before)
pages_scraped
integer
Number of pages scraped from source
files_generated
integer
Number of files generated (always 11)
curl -X POST https://searchcompany-main.up.railway.app/api/cron/regenerate-fresh-website \
  -H "Content-Type: application/json" \
  -H "x-api-key: YOUR_INTERNAL_API_KEY" \
  -d '{
    "business_id": "website-arena-1766312513"
  }'
{
  "status": "success",
  "ai_site_url": "https://website-arena-1766312513.searchcompany.dev",
  "source_url": "https://www.websitearena.dev",
  "business_id": "website-arena-1766312513",
  "pages_scraped": 15,
  "files_generated": 11,
  "pages_hashed": 15
}

Process

  1. Scrape source website using custom mapper + Firecrawl batch scrape
  2. Hash all pages (raw HTML) for future change detection
  3. Three parallel Gemini 3 Flash calls for content generation
  4. Generate all files from scratch (including markdown replicas)
  5. Deploy to the same Vercel project
  6. Assign domain (if needed)
  7. Store page hashes in database
  8. Submit to IndexNow for instant search engine indexing
Product LLMs Not Regenerated: Product-specific /llms/{product - slug} .txt files are NOT regenerated here. They are only created when new products are discovered by the discover-products-from-changes cron job.

Files Generated

Files are generated dynamically based on the website content:

LLM-Generated Content (from 3 Gemini calls)

FileSource
public/llms.txtGemini Call 1 - Primary AI-readable content (Markdown)
pages/index.jsGemini Call 2 - Homepage HTML wrapped in Next.js
public/data.jsonGemini Call 3 - Schema.org structured data (JSON-LD)

Markdown Replica Pages (Dynamic)

File PatternPurpose
pages/{path}.jsNext.js page for each scraped URL
public/markdown/{path}.mdMarkdown content for each scraped URL

Q&A Pages (Dynamic)

File PatternPurpose
pages/{slug}.jsAI-generated Q&A pages (e.g., /what-is-teddy-bear-ai)

Static Templates

FilePurpose
public/robots.txtCrawler permissions (allows all bots)
public/sitemap.xmlSite structure for search engines
middleware.jsEdge middleware for tracking AI bot visits
package.jsonNext.js dependencies
next.config.jsNext.js configuration

Boosted Pages Index

FilePurpose
public/boosted/index.txtBoosted pages index (lists all boosted pages)
pages/boosted/index.jsNext.js boosted pages index page

IndexNow Verification

FilePurpose
public/search-company.txtIndexNow key verification file

Gemini LLM Calls (3 Parallel)

Three Gemini 3 Flash calls run in parallel, each generating its output directly:

1. llms.txt Generation

Generates the primary AI-readable content file in Markdown format.
  • No rigid structure - adapts to business type
  • A restaurant gets Menu, Hours, Location sections
  • A SaaS gets Products, Features, Pricing sections
  • An artist gets Portfolio, Exhibitions, Commissions sections
  • Includes links to product-specific llms files (if products exist)

2. index.html Generation

Generates the homepage HTML content + meta tags.
  • Returns JSON with business_name, meta_title, meta_description, html_body
  • Structure adapts to business type

3. data.json Generation

Generates Schema.org JSON-LD structured data.
  • Picks the most appropriate @type for the business
  • Restaurant, SoftwareApplication, ProfessionalService, etc.

Code Location

Backend/src/app/apis/cron/regenerate_fresh_website/routes.py