Discover Business Info - The Search Company API

Internal Service — This is not an HTTP endpoint. It’s called directly by the generate-all orchestrator.

Purpose

Uses the Firecrawl Agent API to extract what a business does. This information is used to generate the AI website content (llms.txt, Q&A pages, data.json). Runs in GROUP 1a (parallel with all other GROUP 1 tasks).

Function Signature

async def discover_business_info(
    url: str,
    business_name: str
) -> dict

Parameters

Parameter	Type	Default	Description
`url`	`str`	required	The business website URL
`business_name`	`str`	required	The business name for context

Returns

{
  "status": "success",
  "business_info": {
    "description": "2-3 sentence description of what the company does",
    "products_services": "Overview of main products or services",
    "target_market": "Who their target customers are",
    "key_features": "Key features, capabilities, or differentiators",
    "value_proposition": "The core value they provide to customers",
    "business_name": "The AI Teddy Bear Company",
    "url": "https://myteddybearai.com"
  }
}

How It Works

Firecrawl Agent Prompt

The service sends this prompt to Firecrawl Agent:

Analyze the website {url} for "{business_name}" and extract comprehensive business information.

Focus on understanding:
1. What the company does (core business)
2. Their main products or services
3. Who their target customers are
4. What makes them unique or valuable
5. Their core value proposition

Be thorough but concise.

Schema

{
  "type": "object",
  "properties": {
    "description": {
      "type": "string",
      "description": "2-3 sentence description of what the company does"
    },
    "products_services": {
      "type": "string",
      "description": "Overview of their main products or services"
    },
    "target_market": {
      "type": "string",
      "description": "Who their target customers or audience are"
    },
    "key_features": {
      "type": "string",
      "description": "Key features, capabilities, or differentiators"
    },
    "value_proposition": {
      "type": "string",
      "description": "The core value they provide to customers"
    }
  },
  "required": ["description", "products_services"]
}

Used By

Service	Why
Create AI Website (2a)	Uses business_info to generate llms.txt, Q&A pages, and data.json

This is separate from the Scrape Website service (GROUP 1b). Scraped pages are only used for markdown replica generation, not for LLM content.

Why Separate from Scraping?

Approach	Purpose
Discover Business Info (1a)	Extracts structured business context for LLM prompts
Scrape Website (1b)	Gets full page content for 1:1 markdown replicas

Benefits:

Focused prompts: LLM content generation gets clean, structured business info
No token waste: Don’t send full page markdown to Gemini for llms.txt generation
Parallel execution: Both can run simultaneously since they’re independent

Code Location

src/app/shared/discover_business_info/
├── __init__.py           # Exports discover_business_info
└── service.py            # Main implementation

src/app/apis/onboarding/generate_all/tasks/
└── business_info.py      # Task wrapper for orchestrator

Error Handling

{
  "status": "error",
  "error": "Agent job timed out after 3 minutes",
  "business_info": null
}

Common errors:

Timeout - Firecrawl agent takes too long (max 3 minutes)
API error - Firecrawl API returns non-200 status
Missing API key - FIRECRAWL_API_KEY not configured

If this service fails, the orchestrator will not proceed with GROUP 2a (Create AI Website) since business_info is required.

​Purpose

​Function Signature

​Parameters

​Returns

​How It Works

​Firecrawl Agent Prompt

​Schema

​Used By

​Why Separate from Scraping?

​Code Location

​Error Handling

Purpose

Function Signature

Parameters

Returns

How It Works

Firecrawl Agent Prompt

Schema

Used By

Why Separate from Scraping?

Code Location

Error Handling