APM

>Agent Skill

@datadrivenconstruction/voice-to-report

skilldevelopment

Convert voice recordings to structured construction reports. Field workers speak, AI transcribes and formats. Supports daily reports, safety observations, progress updates.

apm::install
$apm install @datadrivenconstruction/voice-to-report
apm::skill.md
---
name: "voice-to-report"
description: "Convert voice recordings to structured construction reports. Field workers speak, AI transcribes and formats. Supports daily reports, safety observations, progress updates."
homepage: "https://datadrivenconstruction.io"
metadata: {"openclaw": {"emoji": "📱", "os": ["darwin", "linux", "win32"], "homepage": "https://datadrivenconstruction.io", "requires": {"bins": ["python3"], "env": ["OPENAI_API_KEY"]}, "primaryEnv": "OPENAI_API_KEY"}}
---
# Voice to Report

## Overview

Field workers prefer talking over typing. This skill converts voice recordings into structured construction reports using speech-to-text and LLM processing.

## Why Voice?

| Typing | Voice |
|--------|-------|
| Slow on mobile | 3x faster |
| Requires attention | Hands-free |
| Limited in cold/rain | Works anywhere |
| Formal language | Natural expression |
| Short messages | Detailed descriptions |

## Architecture

```
┌─────────────────────────────────────────────────────────────────┐
│                    VOICE TO REPORT PIPELINE                      │
├─────────────────────────────────────────────────────────────────┤
│                                                                  │
│  🎤 Voice      →    📝 Transcribe    →    🤖 Structure    →    📊 Report │
│  Recording         Whisper API           GPT-4o               Formatted  │
│                                                                  │
│  "We finished      "We finished         {                    Daily Report │
│   the foundation    the foundation       "activity":         ──────────── │
│   pour today,       pour today,          "foundation",       Foundation   │
│   about 500         about 500            "quantity": 500,    pour: 500m³  │
│   cubic meters"     cubic meters"        "unit": "m³"        Complete ✓   │
│                                          }                               │
└─────────────────────────────────────────────────────────────────┘
```

## Quick Start

```python
from openai import OpenAI
import json

client = OpenAI()

def voice_to_report(audio_path: str, report_type: str = "daily") -> dict:
    """Convert voice recording to structured report"""

    # Step 1: Transcribe audio
    with open(audio_path, "rb") as audio_file:
        transcript = client.audio.transcriptions.create(
            model="whisper-1",
            file=audio_file,
            language="en"
        )

    # Step 2: Structure with LLM
    schema = get_report_schema(report_type)

    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {
                "role": "system",
                "content": f"""You are a construction report assistant.
                Convert the voice transcript into a structured report.
                Extract all relevant information and format as JSON.

                Report type: {report_type}
                Schema: {json.dumps(schema, indent=2)}

                Rules:
                - Extract quantities with units
                - Identify activities and locations
                - Note any issues or concerns
                - Capture weather if mentioned
                - List workers/trades if mentioned
                """
            },
            {
                "role": "user",
                "content": f"Transcript:\n{transcript.text}"
            }
        ],
        response_format={"type": "json_object"}
    )

    return {
        "transcript": transcript.text,
        "structured_report": json.loads(response.choices[0].message.content)
    }
```

## Report Schemas

### Daily Report Schema

```python
daily_report_schema = {
    "date": "YYYY-MM-DD",
    "project": "string",
    "weather": {
        "conditions": "string",
        "temperature": "number",
        "impact": "none|minor|major"
    },
    "workforce": [
        {
            "trade": "string",
            "count": "number",
            "hours": "number"
        }
    ],
    "activities": [
        {
            "description": "string",
            "location": "string",
            "quantity": "number",
            "unit": "string",
            "status": "in_progress|completed|delayed"
        }
    ],
    "equipment": [
        {
            "type": "string",
            "hours": "number"
        }
    ],
    "issues": [
        {
            "description": "string",
            "severity": "low|medium|high",
            "action_taken": "string"
        }
    ],
    "notes": "string"
}
```

### Safety Observation Schema

```python
safety_schema = {
    "date": "YYYY-MM-DD",
    "time": "HH:MM",
    "location": "string",
    "observer": "string",
    "observation_type": "positive|concern|incident",
    "description": "string",
    "people_involved": ["list of names/roles"],
    "immediate_action": "string",
    "follow_up_required": "boolean",
    "photos_attached": "boolean"
}
```

### Progress Update Schema

```python
progress_schema = {
    "date": "YYYY-MM-DD",
    "area": "string",
    "activity": "string",
    "planned_quantity": "number",
    "actual_quantity": "number",
    "unit": "string",
    "percent_complete": "number",
    "on_schedule": "boolean",
    "variance_reason": "string or null",
    "next_steps": "string"
}
```

## n8n Workflow

```json
{
  "workflow": "Voice to Report",
  "nodes": [
    {
      "name": "Telegram Trigger",
      "type": "Telegram",
      "event": "voice_message"
    },
    {
      "name": "Download Voice",
      "type": "Telegram",
      "action": "getFile"
    },
    {
      "name": "Transcribe",
      "type": "OpenAI",
      "operation": "transcribe",
      "model": "whisper-1"
    },
    {
      "name": "Detect Report Type",
      "type": "OpenAI",
      "prompt": "Classify: daily_report, safety, progress, issue"
    },
    {
      "name": "Structure Report",
      "type": "OpenAI",
      "operation": "chat",
      "model": "gpt-4o"
    },
    {
      "name": "Save to Database",
      "type": "PostgreSQL"
    },
    {
      "name": "Confirm to User",
      "type": "Telegram",
      "action": "sendMessage"
    },
    {
      "name": "Generate PDF",
      "type": "HTTP Request",
      "url": "pdf-service/generate"
    }
  ]
}
```

## Multi-Language Support

```python
def transcribe_multilingual(audio_path: str) -> dict:
    """Transcribe in any language, output in English"""

    with open(audio_path, "rb") as audio_file:
        # Detect language automatically
        transcript = client.audio.transcriptions.create(
            model="whisper-1",
            file=audio_file
            # language parameter omitted for auto-detection
        )

    # Translate to English if needed
    if not is_english(transcript.text):
        translation = client.chat.completions.create(
            model="gpt-4o",
            messages=[
                {"role": "system", "content": "Translate to English, preserve construction terminology."},
                {"role": "user", "content": transcript.text}
            ]
        )
        english_text = translation.choices[0].message.content
    else:
        english_text = transcript.text

    return {
        "original": transcript.text,
        "english": english_text
    }
```

## Mobile App Integration

```python
# Example: Flutter/React Native integration

# Send voice to API
async def upload_voice_report(audio_bytes, project_id):
    response = await api.post(
        "/voice-report",
        files={"audio": audio_bytes},
        data={
            "project_id": project_id,
            "report_type": "daily"
        }
    )
    return response.json()

# Response includes:
# - transcript
# - structured_report
# - report_id
# - pdf_url (if generated)
```

## Cost Optimization

```python
# Use local Whisper for high volume
import whisper

model = whisper.load_model("base")  # or "small", "medium", "large"

def transcribe_local(audio_path: str) -> str:
    """Transcribe locally to save API costs"""
    result = model.transcribe(audio_path)
    return result["text"]

# Cost comparison (per hour of audio):
# - OpenAI Whisper API: $0.36
# - Local Whisper (base): $0 (compute only)
# - Local Whisper (large): $0 (compute only, slower)
```

## Requirements

```bash
pip install openai whisper python-telegram-bot
```

## Resources

- OpenAI Whisper: https://platform.openai.com/docs/guides/speech-to-text
- Local Whisper: https://github.com/openai/whisper
- n8n Voice Processing: https://docs.n8n.io/integrations/