Internal Service β This is not an HTTP endpoint. Itβs called directly by the generate-all orchestrator.
Purpose
Uses the Firecrawl Agent API to extract what a business does. This information is used to generate the AI website content (llms.txt, Q&A pages, data.json).
Runs in GROUP 1a (parallel with all other GROUP 1 tasks).
Function Signature
async def discover_business_info(
url: str,
business_name: str
) -> dict
Parameters
| Parameter | Type | Default | Description |
|---|
url | str | required | The business website URL |
business_name | str | required | The business name for context |
Returns
{
"status": "success",
"business_info": {
"description": "2-3 sentence description of what the company does",
"products_services": "Overview of main products or services",
"target_market": "Who their target customers are",
"key_features": "Key features, capabilities, or differentiators",
"value_proposition": "The core value they provide to customers",
"business_name": "The AI Teddy Bear Company",
"url": "https://myteddybearai.com"
}
}
How It Works
Firecrawl Agent Prompt
The service sends this prompt to Firecrawl Agent:
Analyze the website {url} for "{business_name}" and extract comprehensive business information.
Focus on understanding:
1. What the company does (core business)
2. Their main products or services
3. Who their target customers are
4. What makes them unique or valuable
5. Their core value proposition
Be thorough but concise.
{
"type": "object",
"properties": {
"description": {
"type": "string",
"description": "2-3 sentence description of what the company does"
},
"products_services": {
"type": "string",
"description": "Overview of their main products or services"
},
"target_market": {
"type": "string",
"description": "Who their target customers or audience are"
},
"key_features": {
"type": "string",
"description": "Key features, capabilities, or differentiators"
},
"value_proposition": {
"type": "string",
"description": "The core value they provide to customers"
}
},
"required": ["description", "products_services"]
}
Used By
| Service | Why |
|---|
| Create AI Website (2a) | Uses business_info to generate llms.txt, Q&A pages, and data.json |
This is separate from the Scrape Website service (GROUP 1b). Scraped pages are only used for markdown replica generation, not for LLM content.
Why Separate from Scraping?
| Approach | Purpose |
|---|
| Discover Business Info (1a) | Extracts structured business context for LLM prompts |
| Scrape Website (1b) | Gets full page content for 1:1 markdown replicas |
Benefits:
- Focused prompts: LLM content generation gets clean, structured business info
- No token waste: Donβt send full page markdown to Gemini for llms.txt generation
- Parallel execution: Both can run simultaneously since theyβre independent
Code Location
src/app/shared/discover_business_info/
βββ __init__.py # Exports discover_business_info
βββ service.py # Main implementation
src/app/apis/onboarding/generate_all/tasks/
βββ business_info.py # Task wrapper for orchestrator
Error Handling
{
"status": "error",
"error": "Agent job timed out after 3 minutes",
"business_info": null
}
Common errors:
- Timeout - Firecrawl agent takes too long (max 3 minutes)
- API error - Firecrawl API returns non-200 status
- Missing API key -
FIRECRAWL_API_KEY not configured
If this service fails, the orchestrator will not proceed with GROUP 2a (Create AI Website) since business_info is required.