> ## Documentation Index
> Fetch the complete documentation index at: https://docs.searchcompany.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Discover Business Info

<Note>
  **Internal Service** — This is not an HTTP endpoint. It's called directly by the `generate-all` orchestrator.
</Note>

## Purpose

Uses the **Firecrawl Agent API** to extract what a business does. This information is used to generate the AI website content (llms.txt, Q\&A pages, data.json).

Runs in **GROUP 1a** (parallel with all other GROUP 1 tasks).

## Function Signature

```python theme={null}
async def discover_business_info(
    url: str,
    business_name: str
) -> dict
```

## Parameters

| Parameter       | Type  | Default  | Description                   |
| --------------- | ----- | -------- | ----------------------------- |
| `url`           | `str` | required | The business website URL      |
| `business_name` | `str` | required | The business name for context |

## Returns

```json theme={null}
{
  "status": "success",
  "business_info": {
    "description": "2-3 sentence description of what the company does",
    "products_services": "Overview of main products or services",
    "target_market": "Who their target customers are",
    "key_features": "Key features, capabilities, or differentiators",
    "value_proposition": "The core value they provide to customers",
    "business_name": "The AI Teddy Bear Company",
    "url": "https://myteddybearai.com"
  }
}
```

## How It Works

```mermaid theme={null}
sequenceDiagram
    participant Service as Discover Business Info
    participant Firecrawl as Firecrawl Agent API
    
    Service->>Firecrawl: POST /v2/agent
    Note over Firecrawl: Analyzes website content
    Firecrawl-->>Service: Job ID
    loop Poll for completion
        Service->>Firecrawl: GET /v2/agent/{job_id}
        Firecrawl-->>Service: Status + Data
    end
    Service-->>Service: Return business_info dict
```

### Firecrawl Agent Prompt

The service sends this prompt to Firecrawl Agent:

```
Analyze the website {url} for "{business_name}" and extract comprehensive business information.

Focus on understanding:
1. What the company does (core business)
2. Their main products or services
3. Who their target customers are
4. What makes them unique or valuable
5. Their core value proposition

Be thorough but concise.
```

### Schema

```json theme={null}
{
  "type": "object",
  "properties": {
    "description": {
      "type": "string",
      "description": "2-3 sentence description of what the company does"
    },
    "products_services": {
      "type": "string",
      "description": "Overview of their main products or services"
    },
    "target_market": {
      "type": "string",
      "description": "Who their target customers or audience are"
    },
    "key_features": {
      "type": "string",
      "description": "Key features, capabilities, or differentiators"
    },
    "value_proposition": {
      "type": "string",
      "description": "The core value they provide to customers"
    }
  },
  "required": ["description", "products_services"]
}
```

## Used By

| Service                | Why                                                                 |
| ---------------------- | ------------------------------------------------------------------- |
| Create AI Website (2a) | Uses business\_info to generate llms.txt, Q\&A pages, and data.json |

<Info>
  This is separate from the **Scrape Website** service (GROUP 1b). Scraped pages are only used for markdown replica generation, not for LLM content.
</Info>

## Why Separate from Scraping?

| Approach                        | Purpose                                              |
| ------------------------------- | ---------------------------------------------------- |
| **Discover Business Info (1a)** | Extracts structured business context for LLM prompts |
| **Scrape Website (1b)**         | Gets full page content for 1:1 markdown replicas     |

Benefits:

* **Focused prompts**: LLM content generation gets clean, structured business info
* **No token waste**: Don't send full page markdown to Gemini for llms.txt generation
* **Parallel execution**: Both can run simultaneously since they're independent

## Code Location

```
src/app/shared/discover_business_info/
├── __init__.py           # Exports discover_business_info
└── service.py            # Main implementation

src/app/apis/onboarding/generate_all/tasks/
└── business_info.py      # Task wrapper for orchestrator
```

## Error Handling

```json theme={null}
{
  "status": "error",
  "error": "Agent job timed out after 3 minutes",
  "business_info": null
}
```

Common errors:

* **Timeout** - Firecrawl agent takes too long (max 3 minutes)
* **API error** - Firecrawl API returns non-200 status
* **Missing API key** - `FIRECRAWL_API_KEY` not configured

If this service fails, the orchestrator will not proceed with GROUP 2a (Create AI Website) since business\_info is required.
