Skip to main content
Internal Service β€” This is not an HTTP endpoint. It’s called directly by the generate-all orchestrator.

Purpose

Uses the Firecrawl Agent API to extract what a business does. This information is used to generate the AI website content (llms.txt, Q&A pages, data.json). Runs in GROUP 1a (parallel with all other GROUP 1 tasks).

Function Signature

async def discover_business_info(
    url: str,
    business_name: str
) -> dict

Parameters

ParameterTypeDefaultDescription
urlstrrequiredThe business website URL
business_namestrrequiredThe business name for context

Returns

{
  "status": "success",
  "business_info": {
    "description": "2-3 sentence description of what the company does",
    "products_services": "Overview of main products or services",
    "target_market": "Who their target customers are",
    "key_features": "Key features, capabilities, or differentiators",
    "value_proposition": "The core value they provide to customers",
    "business_name": "The AI Teddy Bear Company",
    "url": "https://myteddybearai.com"
  }
}

How It Works

Firecrawl Agent Prompt

The service sends this prompt to Firecrawl Agent:
Analyze the website {url} for "{business_name}" and extract comprehensive business information.

Focus on understanding:
1. What the company does (core business)
2. Their main products or services
3. Who their target customers are
4. What makes them unique or valuable
5. Their core value proposition

Be thorough but concise.

Schema

{
  "type": "object",
  "properties": {
    "description": {
      "type": "string",
      "description": "2-3 sentence description of what the company does"
    },
    "products_services": {
      "type": "string",
      "description": "Overview of their main products or services"
    },
    "target_market": {
      "type": "string",
      "description": "Who their target customers or audience are"
    },
    "key_features": {
      "type": "string",
      "description": "Key features, capabilities, or differentiators"
    },
    "value_proposition": {
      "type": "string",
      "description": "The core value they provide to customers"
    }
  },
  "required": ["description", "products_services"]
}

Used By

ServiceWhy
Create AI Website (2a)Uses business_info to generate llms.txt, Q&A pages, and data.json
This is separate from the Scrape Website service (GROUP 1b). Scraped pages are only used for markdown replica generation, not for LLM content.

Why Separate from Scraping?

ApproachPurpose
Discover Business Info (1a)Extracts structured business context for LLM prompts
Scrape Website (1b)Gets full page content for 1:1 markdown replicas
Benefits:
  • Focused prompts: LLM content generation gets clean, structured business info
  • No token waste: Don’t send full page markdown to Gemini for llms.txt generation
  • Parallel execution: Both can run simultaneously since they’re independent

Code Location

src/app/shared/discover_business_info/
β”œβ”€β”€ __init__.py           # Exports discover_business_info
└── service.py            # Main implementation

src/app/apis/onboarding/generate_all/tasks/
└── business_info.py      # Task wrapper for orchestrator

Error Handling

{
  "status": "error",
  "error": "Agent job timed out after 3 minutes",
  "business_info": null
}
Common errors:
  • Timeout - Firecrawl agent takes too long (max 3 minutes)
  • API error - Firecrawl API returns non-200 status
  • Missing API key - FIRECRAWL_API_KEY not configured
If this service fails, the orchestrator will not proceed with GROUP 2a (Create AI Website) since business_info is required.