@datadrivenconstruction/image-to-data — Agent Skill

---
name: "image-to-data"
description: "Extract data from construction images using AI Vision. Analyze site photos, scanned documents, drawings."
homepage: "https://datadrivenconstruction.io"
metadata: {"openclaw":{"emoji":"📸","os":["darwin","linux","win32"],"homepage":"https://datadrivenconstruction.io","requires":{"bins":["python3"],"env":["OPENAI_API_KEY"]},"primaryEnv":"OPENAI_API_KEY"}}
---

# Image To Data

## Overview

Based on DDC methodology (Chapter 2.4), this skill extracts structured data from construction images using computer vision, OCR, and AI models to analyze site photos, scanned documents, and drawings.

**Book Reference:** "Преобразование данных в структурированную форму" / "Data Transformation to Structured Form"

## Quick Start

```python
from dataclasses import dataclass, field
from enum import Enum
from typing import List, Dict, Optional, Any, Tuple
from datetime import datetime
import json
import base64

class ImageType(Enum):
    """Types of construction images"""
    SITE_PHOTO = "site_photo"
    SCANNED_DOCUMENT = "scanned_document"
    FLOOR_PLAN = "floor_plan"
    ELEVATION = "elevation"
    DETAIL_DRAWING = "detail_drawing"
    PROGRESS_PHOTO = "progress_photo"
    SAFETY_PHOTO = "safety_photo"
    DEFECT_PHOTO = "defect_photo"
    MATERIAL_PHOTO = "material_photo"
    EQUIPMENT_PHOTO = "equipment_photo"

class ExtractionType(Enum):
    """Types of data extraction"""
    OCR_TEXT = "ocr_text"
    TABLE = "table"
    OBJECT_DETECTION = "object_detection"
    MEASUREMENT = "measurement"
    CLASSIFICATION = "classification"
    PROGRESS = "progress"

@dataclass
class BoundingBox:
    """Bounding box for detected region"""
    x: int
    y: int
    width: int
    height: int
    confidence: float = 1.0

@dataclass
class TextRegion:
    """Extracted text region from image"""
    text: str
    bbox: BoundingBox
    confidence: float
    language: str = "en"

@dataclass
class DetectedObject:
    """Detected object in image"""
    label: str
    bbox: BoundingBox
    confidence: float
    attributes: Dict[str, Any] = field(default_factory=dict)

@dataclass
class ExtractedTable:
    """Extracted table from image"""
    headers: List[str]
    rows: List[List[str]]
    bbox: BoundingBox
    confidence: float

@dataclass
class ProgressMeasurement:
    """Progress measurement from image"""
    element_type: str
    total_count: int
    completed_count: int
    percent_complete: float
    area_sqft: Optional[float] = None
    volume_cuft: Optional[float] = None

@dataclass
class ImageAnalysisResult:
    """Complete image analysis result"""
    image_id: str
    image_type: ImageType
    text_regions: List[TextRegion]
    detected_objects: List[DetectedObject]
    tables: List[ExtractedTable]
    progress: Optional[ProgressMeasurement] = None
    metadata: Dict[str, Any] = field(default_factory=dict)
    processing_time: float = 0.0


class OCREngine:
    """OCR engine for text extraction"""

    def __init__(self, engine: str = "tesseract"):
        self.engine = engine
        self.supported_languages = ["en", "ru", "de", "fr", "es"]

    def extract_text(
        self,
        image_data: bytes,
        language: str = "en"
    ) -> List[TextRegion]:
        """Extract text from image"""
        # Simulated OCR extraction (use actual OCR library in production)
        # In production: pytesseract, EasyOCR, or cloud OCR services

        regions = []

        # Simulate detecting title block in drawing
        regions.append(TextRegion(
            text="PROJECT: OFFICE BUILDING",
            bbox=BoundingBox(x=100, y=50, width=300, height=30, confidence=0.95),
            confidence=0.95,
            language=language
        ))

        regions.append(TextRegion(
            text="DRAWING: A-101",
            bbox=BoundingBox(x=100, y=90, width=200, height=25, confidence=0.92),
            confidence=0.92,
            language=language
        ))

        regions.append(TextRegion(
            text="SCALE: 1:100",
            bbox=BoundingBox(x=100, y=120, width=150, height=20, confidence=0.88),
            confidence=0.88,
            language=language
        ))

        return regions

    def extract_structured_text(
        self,
        image_data: bytes,
        template: Optional[Dict] = None
    ) -> Dict[str, str]:
        """Extract structured text using template matching"""
        # Extract text regions
        regions = self.extract_text(image_data)

        # Match to template fields
        structured = {}

        if template:
            for field_name, field_config in template.items():
                # Find matching region
                for region in regions:
                    if field_config.get("keyword") in region.text.lower():
                        structured[field_name] = region.text
                        break
        else:
            # Default extraction
            for region in regions:
                if "PROJECT:" in region.text:
                    structured["project_name"] = region.text.split(":")[-1].strip()
                elif "DRAWING:" in region.text:
                    structured["drawing_number"] = region.text.split(":")[-1].strip()
                elif "SCALE:" in region.text:
                    structured["scale"] = region.text.split(":")[-1].strip()

        return structured


class ObjectDetector:
    """Object detection for construction images"""

    def __init__(self, model: str = "yolov8"):
        self.model = model
        self.construction_classes = self._load_construction_classes()

    def _load_construction_classes(self) -> Dict[str, Dict]:
        """Load construction-specific object classes"""
        return {
            # Equipment
            "excavator": {"category": "equipment", "safety_zone": 20},
            "crane": {"category": "equipment", "safety_zone": 30},
            "forklift": {"category": "equipment", "safety_zone": 10},
            "concrete_mixer": {"category": "equipment", "safety_zone": 5},
            "scaffolding": {"category": "equipment", "safety_zone": 5},

            # Safety
            "hard_hat": {"category": "ppe", "required": True},
            "safety_vest": {"category": "ppe", "required": True},
            "safety_glasses": {"category": "ppe", "required": False},
            "harness": {"category": "ppe", "required": False},

            # Materials
            "rebar_bundle": {"category": "material", "unit": "bundle"},
            "concrete_block": {"category": "material", "unit": "pallet"},
            "lumber_stack": {"category": "material", "unit": "bundle"},
            "pipe_stack": {"category": "material", "unit": "bundle"},

            # Workers
            "worker": {"category": "person", "track": True},

            # Building elements
            "column": {"category": "structure"},
            "beam": {"category": "structure"},
            "slab": {"category": "structure"},
            "wall": {"category": "structure"},
        }

    def detect(
        self,
        image_data: bytes,
        confidence_threshold: float = 0.5
    ) -> List[DetectedObject]:
        """Detect objects in image"""
        # Simulated detection (use actual model in production)
        # In production: YOLO, Faster R-CNN, etc.

        detected = []

        # Simulate detected objects
        sample_detections = [
            ("worker", 0.92, BoundingBox(200, 300, 80, 180, 0.92)),
            ("hard_hat", 0.88, BoundingBox(210, 300, 30, 25, 0.88)),
            ("safety_vest", 0.85, BoundingBox(210, 340, 60, 80, 0.85)),
            ("scaffolding", 0.78, BoundingBox(400, 100, 200, 400, 0.78)),
            ("concrete_block", 0.72, BoundingBox(50, 450, 100, 50, 0.72)),
        ]

        for label, conf, bbox in sample_detections:
            if conf >= confidence_threshold:
                class_info = self.construction_classes.get(label, {})
                detected.append(DetectedObject(
                    label=label,
                    bbox=bbox,
                    confidence=conf,
                    attributes=class_info
                ))

        return detected

    def detect_safety_compliance(
        self,
        image_data: bytes
    ) -> Dict:
        """Detect safety compliance in image"""
        objects = self.detect(image_data)

        workers = [o for o in objects if o.label == "worker"]
        hard_hats = [o for o in objects if o.label == "hard_hat"]
        vests = [o for o in objects if o.label == "safety_vest"]

        compliance = {
            "workers_detected": len(workers),
            "hard_hats_detected": len(hard_hats),
            "vests_detected": len(vests),
            "hard_hat_compliance": len(hard_hats) / len(workers) if workers else 1.0,
            "vest_compliance": len(vests) / len(workers) if workers else 1.0,
            "overall_compliance": "compliant" if len(hard_hats) >= len(workers) else "non-compliant",
            "violations": []
        }

        if len(hard_hats) < len(workers):
            compliance["violations"].append({
                "type": "missing_hard_hat",
                "count": len(workers) - len(hard_hats)
            })

        return compliance


class TableExtractor:
    """Extract tables from images"""

    def extract_tables(
        self,
        image_data: bytes,
        detect_headers: bool = True
    ) -> List[ExtractedTable]:
        """Extract tables from image"""
        # Simulated table extraction
        # In production: Camelot, Tabula, or custom CNN

        tables = []

        # Simulate a schedule table
        tables.append(ExtractedTable(
            headers=["Activity", "Start", "End", "Duration"],
            rows=[
                ["Foundation", "2024-01-01", "2024-01-15", "14 days"],
                ["Framing", "2024-01-16", "2024-02-28", "44 days"],
                ["MEP Rough-in", "2024-03-01", "2024-03-31", "31 days"]
            ],
            bbox=BoundingBox(50, 200, 500, 200, 0.85),
            confidence=0.85
        ))

        return tables

    def table_to_dataframe(self, table: ExtractedTable) -> Dict:
        """Convert table to dictionary (DataFrame-like)"""
        return {
            "columns": table.headers,
            "data": table.rows,
            "records": [
                dict(zip(table.headers, row))
                for row in table.rows
            ]
        }


class ProgressAnalyzer:
    """Analyze construction progress from images"""

    def __init__(self):
        self.reference_models = {}

    def analyze_progress(
        self,
        current_image: bytes,
        reference_image: Optional[bytes] = None,
        element_type: str = "general"
    ) -> ProgressMeasurement:
        """Analyze progress by comparing images"""
        # Simulated progress analysis
        # In production: Use semantic segmentation + comparison

        # Simulate progress detection
        return ProgressMeasurement(
            element_type=element_type,
            total_count=100,
            completed_count=65,
            percent_complete=65.0,
            area_sqft=15000.0,
            volume_cuft=None
        )

    def compare_with_plan(
        self,
        site_photo: bytes,
        plan_image: bytes
    ) -> Dict:
        """Compare site photo with plan"""
        return {
            "match_score": 0.78,
            "deviations": [],
            "completion_estimate": 65.0,
            "areas_of_concern": []
        }


class ConstructionImageAnalyzer:
    """
    Main class for construction image analysis.
    Based on DDC methodology Chapter 2.4.
    """

    def __init__(self):
        self.ocr = OCREngine()
        self.detector = ObjectDetector()
        self.table_extractor = TableExtractor()
        self.progress_analyzer = ProgressAnalyzer()

    def analyze_image(
        self,
        image_data: bytes,
        image_type: ImageType,
        image_id: str = "img_001",
        extract_types: Optional[List[ExtractionType]] = None
    ) -> ImageAnalysisResult:
        """
        Analyze a construction image.

        Args:
            image_data: Image data as bytes
            image_type: Type of image
            image_id: Unique image identifier
            extract_types: Types of extraction to perform

        Returns:
            Complete analysis result
        """
        start_time = datetime.now()

        if extract_types is None:
            extract_types = [ExtractionType.OCR_TEXT, ExtractionType.OBJECT_DETECTION]

        text_regions = []
        detected_objects = []
        tables = []
        progress = None

        # OCR extraction
        if ExtractionType.OCR_TEXT in extract_types:
            text_regions = self.ocr.extract_text(image_data)

        # Object detection
        if ExtractionType.OBJECT_DETECTION in extract_types:
            detected_objects = self.detector.detect(image_data)

        # Table extraction
        if ExtractionType.TABLE in extract_types:
            tables = self.table_extractor.extract_tables(image_data)

        # Progress analysis
        if ExtractionType.PROGRESS in extract_types:
            progress = self.progress_analyzer.analyze_progress(image_data)

        processing_time = (datetime.now() - start_time).total_seconds()

        return ImageAnalysisResult(
            image_id=image_id,
            image_type=image_type,
            text_regions=text_regions,
            detected_objects=detected_objects,
            tables=tables,
            progress=progress,
            metadata={"extraction_types": [e.value for e in extract_types]},
            processing_time=processing_time
        )

    def analyze_site_photo(
        self,
        image_data: bytes,
        image_id: str = "site_001"
    ) -> Dict:
        """Analyze site photo for progress and safety"""
        result = self.analyze_image(
            image_data,
            ImageType.SITE_PHOTO,
            image_id,
            [ExtractionType.OBJECT_DETECTION, ExtractionType.PROGRESS]
        )

        safety = self.detector.detect_safety_compliance(image_data)

        return {
            "image_id": result.image_id,
            "objects_detected": len(result.detected_objects),
            "progress": result.progress,
            "safety_compliance": safety,
            "equipment": [o.label for o in result.detected_objects if o.attributes.get("category") == "equipment"],
            "materials": [o.label for o in result.detected_objects if o.attributes.get("category") == "material"]
        }

    def extract_drawing_data(
        self,
        image_data: bytes,
        image_id: str = "dwg_001"
    ) -> Dict:
        """Extract data from scanned drawing"""
        result = self.analyze_image(
            image_data,
            ImageType.FLOOR_PLAN,
            image_id,
            [ExtractionType.OCR_TEXT, ExtractionType.TABLE]
        )

        # Extract title block info
        title_block = self.ocr.extract_structured_text(image_data)

        return {
            "image_id": result.image_id,
            "title_block": title_block,
            "text_regions": len(result.text_regions),
            "tables": [
                self.table_extractor.table_to_dataframe(t)
                for t in result.tables
            ],
            "all_text": [r.text for r in result.text_regions]
        }

    def batch_analyze(
        self,
        images: List[Tuple[bytes, ImageType, str]]
    ) -> List[ImageAnalysisResult]:
        """Analyze multiple images"""
        results = []
        for image_data, image_type, image_id in images:
            result = self.analyze_image(image_data, image_type, image_id)
            results.append(result)
        return results

    def export_results(
        self,
        result: ImageAnalysisResult,
        format: str = "json"
    ) -> str:
        """Export analysis results"""
        data = {
            "image_id": result.image_id,
            "image_type": result.image_type.value,
            "text_count": len(result.text_regions),
            "object_count": len(result.detected_objects),
            "table_count": len(result.tables),
            "texts": [
                {"text": r.text, "confidence": r.confidence}
                for r in result.text_regions
            ],
            "objects": [
                {"label": o.label, "confidence": o.confidence}
                for o in result.detected_objects
            ],
            "processing_time": result.processing_time
        }

        if format == "json":
            return json.dumps(data, indent=2)
        else:
            raise ValueError(f"Unsupported format: {format}")
```

## Common Use Cases

### Analyze Site Photo

```python
analyzer = ConstructionImageAnalyzer()

# Load image (in production, read from file)
with open("site_photo.jpg", "rb") as f:
    image_data = f.read()

result = analyzer.analyze_site_photo(image_data)

print(f"Objects detected: {result['objects_detected']}")
print(f"Safety compliance: {result['safety_compliance']['overall_compliance']}")
print(f"Progress: {result['progress'].percent_complete}%")
```

### Extract Drawing Data

```python
with open("floor_plan.png", "rb") as f:
    drawing_data = f.read()

data = analyzer.extract_drawing_data(drawing_data)

print(f"Drawing: {data['title_block'].get('drawing_number')}")
print(f"Project: {data['title_block'].get('project_name')}")
for table in data['tables']:
    print(f"Table with {len(table['records'])} rows")
```

### Detect Safety Violations

```python
detector = ObjectDetector()

with open("site_photo.jpg", "rb") as f:
    image_data = f.read()

safety = detector.detect_safety_compliance(image_data)

if safety['overall_compliance'] == 'non-compliant':
    for violation in safety['violations']:
        print(f"Violation: {violation['type']} - Count: {violation['count']}")
```

## Quick Reference

| Component | Purpose |
|-----------|---------|
| `ConstructionImageAnalyzer` | Main analysis engine |
| `OCREngine` | Text extraction |
| `ObjectDetector` | Object detection |
| `TableExtractor` | Table extraction |
| `ProgressAnalyzer` | Progress analysis |
| `ImageAnalysisResult` | Complete analysis result |

## Resources

- **Book**: "Data-Driven Construction" by Artem Boiko, Chapter 2.4
- **Website**: https://datadrivenconstruction.io

## Next Steps

- Use [cad-to-data](../cad-to-data/SKILL.md) for CAD/BIM extraction
- Use [defect-detection-ai](../../../DDC_Innovative/defect-detection-ai/SKILL.md) for defects
- Use [safety-compliance-checker](../../../DDC_Innovative/safety-compliance-checker/SKILL.md) for safety