cad-to-data
skillConvert CAD/BIM files to structured data. Extract element data from Revit, IFC, DWG, DGN files.
apm::install
apm install @datadrivenconstruction/cad-to-dataapm::skill.md
---
name: "cad-to-data"
description: "Convert CAD/BIM files to structured data. Extract element data from Revit, IFC, DWG, DGN files."
homepage: "https://datadrivenconstruction.io"
metadata: {"openclaw":{"emoji":"🗂️","os":["darwin","linux","win32"],"homepage":"https://datadrivenconstruction.io","requires":{"bins":["python3"]}}}
---
# CAD To Data
## Overview
Based on DDC methodology (Chapter 2.4), this skill converts CAD and BIM files to structured data, extracting element properties, quantities, and relationships from Revit, IFC, DWG, and DGN files.
**Book Reference:** "Преобразование данных в структурированную форму" / "Data Transformation to Structured Form"
## Quick Start
```python
from dataclasses import dataclass, field
from enum import Enum
from typing import List, Dict, Optional, Any, Tuple, Generator
from datetime import datetime
import json
class CADFormat(Enum):
"""Supported CAD/BIM formats"""
IFC = "ifc"
RVT = "rvt"
DWG = "dwg"
DXF = "dxf"
DGN = "dgn"
NWD = "nwd"
STEP = "step"
class ElementCategory(Enum):
"""BIM element categories"""
WALL = "wall"
FLOOR = "floor"
ROOF = "roof"
CEILING = "ceiling"
DOOR = "door"
WINDOW = "window"
COLUMN = "column"
BEAM = "beam"
STAIR = "stair"
RAMP = "ramp"
FURNITURE = "furniture"
EQUIPMENT = "equipment"
PIPE = "pipe"
DUCT = "duct"
CABLE_TRAY = "cable_tray"
SPACE = "space"
GENERIC = "generic"
@dataclass
class Point3D:
"""3D point"""
x: float
y: float
z: float
@dataclass
class BoundingBox3D:
"""3D bounding box"""
min_point: Point3D
max_point: Point3D
@property
def width(self) -> float:
return abs(self.max_point.x - self.min_point.x)
@property
def depth(self) -> float:
return abs(self.max_point.y - self.min_point.y)
@property
def height(self) -> float:
return abs(self.max_point.z - self.min_point.z)
@property
def volume(self) -> float:
return self.width * self.depth * self.height
@dataclass
class MaterialInfo:
"""Material information"""
name: str
category: str
color: Optional[str] = None
area: float = 0.0
volume: float = 0.0
properties: Dict[str, Any] = field(default_factory=dict)
@dataclass
class CADElement:
"""Extracted CAD/BIM element"""
id: str
guid: str
name: str
category: ElementCategory
type_name: str
level: Optional[str] = None
bounding_box: Optional[BoundingBox3D] = None
properties: Dict[str, Any] = field(default_factory=dict)
quantities: Dict[str, float] = field(default_factory=dict)
materials: List[MaterialInfo] = field(default_factory=list)
relationships: Dict[str, List[str]] = field(default_factory=dict)
@dataclass
class CADLayer:
"""CAD layer information"""
name: str
color: Optional[str] = None
line_type: Optional[str] = None
visible: bool = True
element_count: int = 0
@dataclass
class CADExtractionResult:
"""Result of CAD extraction"""
file_path: str
file_format: CADFormat
elements: List[CADElement]
layers: List[CADLayer]
levels: List[str]
total_elements: int
categories: Dict[str, int]
extraction_time: float
metadata: Dict[str, Any] = field(default_factory=dict)
class IFCExtractor:
"""Extract data from IFC files"""
def __init__(self):
self.schema_version = "IFC4"
self.element_mapping = self._build_element_mapping()
def _build_element_mapping(self) -> Dict[str, ElementCategory]:
"""Map IFC types to categories"""
return {
"IfcWall": ElementCategory.WALL,
"IfcWallStandardCase": ElementCategory.WALL,
"IfcSlab": ElementCategory.FLOOR,
"IfcRoof": ElementCategory.ROOF,
"IfcCeiling": ElementCategory.CEILING,
"IfcDoor": ElementCategory.DOOR,
"IfcWindow": ElementCategory.WINDOW,
"IfcColumn": ElementCategory.COLUMN,
"IfcBeam": ElementCategory.BEAM,
"IfcStair": ElementCategory.STAIR,
"IfcRamp": ElementCategory.RAMP,
"IfcFurnishingElement": ElementCategory.FURNITURE,
"IfcPipeSegment": ElementCategory.PIPE,
"IfcDuctSegment": ElementCategory.DUCT,
"IfcCableCarrierSegment": ElementCategory.CABLE_TRAY,
"IfcSpace": ElementCategory.SPACE,
}
def extract(
self,
file_path: str,
categories: Optional[List[ElementCategory]] = None
) -> CADExtractionResult:
"""
Extract data from IFC file.
Args:
file_path: Path to IFC file
categories: Optional filter for categories
Returns:
Extraction result
"""
start_time = datetime.now()
# In production, use ifcopenshell:
# import ifcopenshell
# ifc_file = ifcopenshell.open(file_path)
# Simulated extraction
elements = self._simulate_ifc_elements()
# Filter by category if specified
if categories:
elements = [e for e in elements if e.category in categories]
# Build category counts
category_counts = {}
for element in elements:
cat = element.category.value
category_counts[cat] = category_counts.get(cat, 0) + 1
# Extract levels
levels = list(set(e.level for e in elements if e.level))
extraction_time = (datetime.now() - start_time).total_seconds()
return CADExtractionResult(
file_path=file_path,
file_format=CADFormat.IFC,
elements=elements,
layers=[], # IFC doesn't use layers in traditional sense
levels=levels,
total_elements=len(elements),
categories=category_counts,
extraction_time=extraction_time,
metadata={
"schema": self.schema_version,
"project_name": "Sample Project"
}
)
def _simulate_ifc_elements(self) -> List[CADElement]:
"""Simulate IFC element extraction"""
elements = []
# Sample walls
for i in range(10):
elements.append(CADElement(
id=f"wall_{i}",
guid=f"1234567890ABCDEF{i:04d}",
name=f"Basic Wall {i}",
category=ElementCategory.WALL,
type_name="Basic Wall:200mm Concrete",
level="Level 1",
bounding_box=BoundingBox3D(
min_point=Point3D(i * 5, 0, 0),
max_point=Point3D(i * 5 + 5, 0.2, 3)
),
properties={
"IsExternal": True,
"FireRating": "1 HR",
"LoadBearing": True
},
quantities={
"Length": 5.0,
"Height": 3.0,
"Width": 0.2,
"Area": 15.0,
"Volume": 3.0
},
materials=[
MaterialInfo(
name="Concrete",
category="Concrete",
area=15.0,
volume=3.0
)
]
))
# Sample doors
for i in range(5):
elements.append(CADElement(
id=f"door_{i}",
guid=f"DOOR0000000000{i:04d}",
name=f"Single Door {i}",
category=ElementCategory.DOOR,
type_name="Single Flush:900x2100",
level="Level 1",
properties={
"FireRating": "None",
"IsExternal": False
},
quantities={
"Width": 0.9,
"Height": 2.1,
"Area": 1.89
},
relationships={
"host_wall": [f"wall_{i}"]
}
))
# Sample spaces
for i in range(3):
elements.append(CADElement(
id=f"space_{i}",
guid=f"SPACE000000000{i:04d}",
name=f"Room {i+101}",
category=ElementCategory.SPACE,
type_name="Office",
level="Level 1",
quantities={
"Area": 25.0 + i * 5,
"Volume": 75.0 + i * 15,
"Perimeter": 20.0 + i * 2
},
properties={
"OccupancyType": "Office",
"DesignOccupancy": 4
}
))
return elements
def get_quantities(
self,
elements: List[CADElement],
quantity_type: str = "all"
) -> Dict[str, float]:
"""Aggregate quantities from elements"""
totals = {}
for element in elements:
for qty_name, qty_value in element.quantities.items():
if quantity_type == "all" or qty_name.lower() == quantity_type.lower():
key = f"{element.category.value}_{qty_name}"
totals[key] = totals.get(key, 0) + qty_value
return totals
class DWGExtractor:
"""Extract data from DWG/DXF files"""
def __init__(self):
self.supported_entities = ["LINE", "POLYLINE", "CIRCLE", "ARC", "TEXT", "MTEXT", "INSERT", "HATCH"]
def extract(
self,
file_path: str,
layers: Optional[List[str]] = None
) -> CADExtractionResult:
"""Extract data from DWG file"""
start_time = datetime.now()
# In production, use ezdxf:
# import ezdxf
# doc = ezdxf.readfile(file_path)
# Simulated extraction
elements, cad_layers = self._simulate_dwg_extraction()
# Filter by layers if specified
if layers:
elements = [e for e in elements if e.properties.get("layer") in layers]
extraction_time = (datetime.now() - start_time).total_seconds()
return CADExtractionResult(
file_path=file_path,
file_format=CADFormat.DWG,
elements=elements,
layers=cad_layers,
levels=[],
total_elements=len(elements),
categories={"generic": len(elements)},
extraction_time=extraction_time,
metadata={"units": "millimeters"}
)
def _simulate_dwg_extraction(self) -> Tuple[List[CADElement], List[CADLayer]]:
"""Simulate DWG extraction"""
elements = []
layers = [
CADLayer("Walls", "Red", "Continuous", True, 15),
CADLayer("Doors", "Blue", "Continuous", True, 8),
CADLayer("Windows", "Cyan", "Continuous", True, 12),
CADLayer("Dimensions", "Green", "Continuous", True, 50),
CADLayer("Text", "White", "Continuous", True, 25),
]
# Simulate polylines (walls)
for i in range(15):
elements.append(CADElement(
id=f"polyline_{i}",
guid=f"PL{i:08d}",
name=f"Polyline {i}",
category=ElementCategory.GENERIC,
type_name="POLYLINE",
properties={
"layer": "Walls",
"color": "Red",
"closed": True
},
quantities={
"Length": 10.5 + i * 0.5
}
))
return elements, layers
class CADDataConverter:
"""
Convert CAD/BIM files to structured data.
Based on DDC methodology Chapter 2.4.
"""
def __init__(self):
self.ifc_extractor = IFCExtractor()
self.dwg_extractor = DWGExtractor()
def convert(
self,
file_path: str,
output_format: str = "json"
) -> Dict[str, Any]:
"""
Convert CAD file to structured data.
Args:
file_path: Path to CAD file
output_format: Output format (json, csv, dataframe)
Returns:
Structured data
"""
# Detect file format
file_format = self._detect_format(file_path)
# Extract based on format
if file_format == CADFormat.IFC:
result = self.ifc_extractor.extract(file_path)
elif file_format in [CADFormat.DWG, CADFormat.DXF]:
result = self.dwg_extractor.extract(file_path)
else:
raise ValueError(f"Unsupported format: {file_format}")
# Convert to output format
return self._format_output(result, output_format)
def _detect_format(self, file_path: str) -> CADFormat:
"""Detect CAD file format"""
extension = file_path.lower().split(".")[-1]
format_map = {
"ifc": CADFormat.IFC,
"rvt": CADFormat.RVT,
"dwg": CADFormat.DWG,
"dxf": CADFormat.DXF,
"dgn": CADFormat.DGN,
"nwd": CADFormat.NWD,
}
return format_map.get(extension, CADFormat.IFC)
def _format_output(
self,
result: CADExtractionResult,
format: str
) -> Dict[str, Any]:
"""Format extraction result"""
output = {
"file": result.file_path,
"format": result.file_format.value,
"total_elements": result.total_elements,
"categories": result.categories,
"levels": result.levels,
"extraction_time": result.extraction_time,
"elements": []
}
for element in result.elements:
output["elements"].append({
"id": element.id,
"guid": element.guid,
"name": element.name,
"category": element.category.value,
"type": element.type_name,
"level": element.level,
"properties": element.properties,
"quantities": element.quantities,
"materials": [
{"name": m.name, "area": m.area, "volume": m.volume}
for m in element.materials
]
})
return output
def extract_quantities(
self,
file_path: str,
categories: Optional[List[ElementCategory]] = None
) -> Dict[str, Any]:
"""Extract quantity takeoff from CAD file"""
file_format = self._detect_format(file_path)
if file_format == CADFormat.IFC:
result = self.ifc_extractor.extract(file_path, categories)
else:
result = self.dwg_extractor.extract(file_path)
# Aggregate quantities by category
quantities = {}
for element in result.elements:
cat = element.category.value
if cat not in quantities:
quantities[cat] = {
"count": 0,
"totals": {}
}
quantities[cat]["count"] += 1
for qty_name, qty_value in element.quantities.items():
if qty_name not in quantities[cat]["totals"]:
quantities[cat]["totals"][qty_name] = 0
quantities[cat]["totals"][qty_name] += qty_value
return {
"file": file_path,
"quantities": quantities,
"summary": {
"total_elements": result.total_elements,
"categories": list(quantities.keys())
}
}
def extract_schedule(
self,
file_path: str,
category: ElementCategory,
fields: List[str]
) -> List[Dict]:
"""Extract schedule data for specific category"""
file_format = self._detect_format(file_path)
if file_format == CADFormat.IFC:
result = self.ifc_extractor.extract(file_path, [category])
else:
result = self.dwg_extractor.extract(file_path)
schedule = []
for element in result.elements:
if element.category == category:
row = {"id": element.id, "name": element.name, "type": element.type_name}
for field in fields:
if field in element.properties:
row[field] = element.properties[field]
elif field in element.quantities:
row[field] = element.quantities[field]
schedule.append(row)
return schedule
def export_to_json(
self,
result: CADExtractionResult,
output_path: str
):
"""Export extraction result to JSON file"""
output = self._format_output(result, "json")
with open(output_path, 'w') as f:
json.dump(output, f, indent=2)
def generate_report(self, result: CADExtractionResult) -> str:
"""Generate extraction report"""
report = f"""
# CAD Extraction Report
**File:** {result.file_path}
**Format:** {result.file_format.value}
**Total Elements:** {result.total_elements}
**Extraction Time:** {result.extraction_time:.2f}s
## Elements by Category
"""
for cat, count in result.categories.items():
report += f"- **{cat.title()}:** {count}\n"
if result.levels:
report += "\n## Levels\n"
for level in result.levels:
report += f"- {level}\n"
if result.layers:
report += "\n## Layers\n"
for layer in result.layers:
report += f"- {layer.name}: {layer.element_count} elements\n"
return report
```
## Common Use Cases
### Extract IFC Data
```python
converter = CADDataConverter()
# Convert IFC to structured data
data = converter.convert("building.ifc", output_format="json")
print(f"Total elements: {data['total_elements']}")
print(f"Categories: {data['categories']}")
# Access elements
for element in data['elements'][:5]:
print(f" {element['name']}: {element['type']}")
```
### Extract Quantities
```python
quantities = converter.extract_quantities(
"building.ifc",
categories=[ElementCategory.WALL, ElementCategory.FLOOR]
)
print(f"Wall count: {quantities['quantities']['wall']['count']}")
print(f"Total wall area: {quantities['quantities']['wall']['totals']['Area']}")
```
### Generate Schedule
```python
door_schedule = converter.extract_schedule(
"building.ifc",
category=ElementCategory.DOOR,
fields=["Width", "Height", "FireRating", "IsExternal"]
)
for door in door_schedule:
print(f"{door['name']}: {door.get('Width')}x{door.get('Height')}")
```
### Generate Report
```python
ifc_extractor = IFCExtractor()
result = ifc_extractor.extract("building.ifc")
report = converter.generate_report(result)
print(report)
```
## Quick Reference
| Component | Purpose |
|-----------|---------|
| `CADDataConverter` | Main conversion engine |
| `IFCExtractor` | IFC file extraction |
| `DWGExtractor` | DWG/DXF extraction |
| `CADElement` | Extracted element data |
| `CADExtractionResult` | Complete extraction result |
| `ElementCategory` | BIM element categories |
## Resources
- **Book**: "Data-Driven Construction" by Artem Boiko, Chapter 2.4
- **Website**: https://datadrivenconstruction.io
## Next Steps
- Use [image-to-data](../image-to-data/SKILL.md) for image extraction
- Use [qto-report](../../Chapter-3.2/qto-report/SKILL.md) for quantity reports
- Use [bim-validation-pipeline](../../Chapter-4.3/bim-validation-pipeline/SKILL.md) for validation