Performs a complete regeneration of an existing AI site using the same
pipeline as initial onboarding, but deploys to the existing Vercel project.
This endpoint performs a full rebuild. Use with caution in production.
When to Use
- Regenerate content with fresh LLM output
- Fix issues with an existing AI site
- Test changes to the generation pipeline
This is different from update-site which only does incremental updates
when source website content changes.
Request Body
The org_slug / business ID (e.g. βwebsite-arena-1766312513β)
Source website URL. Optional - will use URL from database if not provided.
Maximum pages to scrape. Default: 5000
Response
βsuccessβ or βerrorβ
The AI site URL (unchanged from before)
Number of pages scraped from source
Number of files generated (always 11)
curl -X POST https://searchcompany-main.up.railway.app/api/cron/regenerate-fresh-website \
-H "Content-Type: application/json" \
-H "x-api-key: YOUR_INTERNAL_API_KEY" \
-d '{
"business_id": "website-arena-1766312513"
}'
{
"status": "success",
"ai_site_url": "https://website-arena-1766312513.searchcompany.dev",
"source_url": "https://www.websitearena.dev",
"business_id": "website-arena-1766312513",
"pages_scraped": 15,
"files_generated": 11,
"pages_hashed": 15
}
Process
- Scrape source website using custom mapper + Firecrawl batch scrape
- Hash all pages (raw HTML) for future change detection
- Three parallel Gemini 3 Flash calls for content generation
- Generate all files from scratch (including markdown replicas)
- Deploy to the same Vercel project
- Assign domain (if needed)
- Store page hashes in database
- Submit to IndexNow for instant search engine indexing
Product LLMs Not Regenerated: Product-specific /llms/{product - slug} .txt files are NOT regenerated here. They are only created when new products
are discovered by the discover-products-from-changes cron job.
Files Generated
Files are generated dynamically based on the website content:
LLM-Generated Content (from 3 Gemini calls)
| File | Source |
|---|
public/llms.txt | Gemini Call 1 - Primary AI-readable content (Markdown) |
pages/index.js | Gemini Call 2 - Homepage HTML wrapped in Next.js |
public/data.json | Gemini Call 3 - Schema.org structured data (JSON-LD) |
Markdown Replica Pages (Dynamic)
| File Pattern | Purpose |
|---|
pages/{path}.js | Next.js page for each scraped URL |
public/markdown/{path}.md | Markdown content for each scraped URL |
Q&A Pages (Dynamic)
| File Pattern | Purpose |
|---|
pages/{slug}.js | AI-generated Q&A pages (e.g., /what-is-teddy-bear-ai) |
Static Templates
| File | Purpose |
|---|
public/robots.txt | Crawler permissions (allows all bots) |
public/sitemap.xml | Site structure for search engines |
middleware.js | Edge middleware for tracking AI bot visits |
package.json | Next.js dependencies |
next.config.js | Next.js configuration |
Boosted Pages Index
| File | Purpose |
|---|
public/boosted/index.txt | Boosted pages index (lists all boosted pages) |
pages/boosted/index.js | Next.js boosted pages index page |
IndexNow Verification
| File | Purpose |
|---|
public/search-company.txt | IndexNow key verification file |
Gemini LLM Calls (3 Parallel)
Three Gemini 3 Flash calls run in parallel, each generating its output directly:
1. llms.txt Generation
Generates the primary AI-readable content file in Markdown format.
- No rigid structure - adapts to business type
- A restaurant gets Menu, Hours, Location sections
- A SaaS gets Products, Features, Pricing sections
- An artist gets Portfolio, Exhibitions, Commissions sections
- Includes links to product-specific llms files (if products exist)
2. index.html Generation
Generates the homepage HTML content + meta tags.
- Returns JSON with
business_name, meta_title, meta_description, html_body
- Structure adapts to business type
3. data.json Generation
Generates Schema.org JSON-LD structured data.
- Picks the most appropriate
@type for the business
- Restaurant, SoftwareApplication, ProfessionalService, etc.
Code Location
Backend/src/app/apis/cron/regenerate_fresh_website/routes.py