open-construction-estimate
skillAccess and utilize open construction pricing databases. Match BIM elements to standardized work items, calculate costs using public unit price databases with 55,000+ work items.
apm::install
apm install @datadrivenconstruction/open-construction-estimateapm::skill.md
---
name: "open-construction-estimate"
description: "Access and utilize open construction pricing databases. Match BIM elements to standardized work items, calculate costs using public unit price databases with 55,000+ work items."
homepage: "https://openconstructionestimate.com"
metadata: {"openclaw":{"emoji":"🌐","os":["darwin","linux","win32"],"homepage":"https://openconstructionestimate.com","requires":{"bins":["python3"]}}}
---
# Open Construction Estimate
## Overview
This skill leverages open construction pricing databases for automated cost estimation. Match project elements to standardized work items and calculate costs using publicly available unit prices.
**Data Sources:**
- OpenConstructionEstimate (55,000+ work items)
- RSMeans Online (subscription)
- Government pricing databases
- Regional cost indexes
> "Открытые базы данных расценок содержат более 55,000 позиций работ, что позволяет автоматизировать сметные расчеты для большинства проектов."
> — DDC LinkedIn
## Quick Start
```python
import pandas as pd
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity
# Load work items database
work_items = pd.read_csv("open_construction_estimate.csv")
print(f"Loaded {len(work_items)} work items")
# Simple matching function
vectorizer = TfidfVectorizer(ngram_range=(1, 2))
item_vectors = vectorizer.fit_transform(work_items['description'])
def find_matching_items(query, top_n=5):
query_vec = vectorizer.transform([query])
similarities = cosine_similarity(query_vec, item_vectors)[0]
top_indices = similarities.argsort()[-top_n:][::-1]
return work_items.iloc[top_indices][['code', 'description', 'unit', 'unit_price']]
# Find matches
matches = find_matching_items("reinforced concrete wall 300mm")
print(matches)
```
## Open Database Structure
### Database Schema
```python
# Standard work items database structure
WORK_ITEMS_SCHEMA = {
'code': 'Work item code (e.g., 03.31.13.13)',
'description': 'Full description of work',
'short_description': 'Abbreviated description',
'unit': 'Unit of measure (m³, m², ton, pcs)',
'unit_price': 'Base unit price',
'labor_cost': 'Labor component per unit',
'material_cost': 'Material component per unit',
'equipment_cost': 'Equipment component per unit',
'labor_hours': 'Labor hours per unit',
'crew_size': 'Typical crew size',
'productivity': 'Units per day',
'category_l1': 'Primary category (CSI Division)',
'category_l2': 'Secondary category',
'category_l3': 'Detailed category',
'region': 'Geographic region',
'year': 'Price year',
'source': 'Data source'
}
# CSI MasterFormat Divisions
CSI_DIVISIONS = {
'03': 'Concrete',
'04': 'Masonry',
'05': 'Metals',
'06': 'Wood, Plastics, Composites',
'07': 'Thermal and Moisture Protection',
'08': 'Openings',
'09': 'Finishes',
'10': 'Specialties',
'21': 'Fire Suppression',
'22': 'Plumbing',
'23': 'HVAC',
'26': 'Electrical',
'31': 'Earthwork',
'32': 'Exterior Improvements',
'33': 'Utilities'
}
```
## Work Item Matching Engine
### Semantic Matching System
```python
import pandas as pd
import numpy as np
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity
from sentence_transformers import SentenceTransformer
from typing import List, Dict, Optional, Tuple
import re
class WorkItemMatcher:
"""Match BIM elements to standardized work items"""
def __init__(self, database_path: str, use_embeddings: bool = True):
self.db = pd.read_csv(database_path)
# TF-IDF for fast initial filtering
self.tfidf = TfidfVectorizer(
ngram_range=(1, 3),
max_features=10000,
stop_words='english'
)
self.tfidf_matrix = self.tfidf.fit_transform(self.db['description'])
# Sentence embeddings for semantic matching
self.use_embeddings = use_embeddings
if use_embeddings:
self.embedder = SentenceTransformer('all-MiniLM-L6-v2')
self.embeddings = self.embedder.encode(
self.db['description'].tolist(),
show_progress_bar=True
)
def match(self, query: str, top_n: int = 5,
category: str = None) -> List[Dict]:
"""Find matching work items for a query"""
# Filter by category if specified
if category:
mask = self.db['category_l1'].str.contains(category, case=False, na=False)
search_db = self.db[mask]
search_matrix = self.tfidf_matrix[mask]
else:
search_db = self.db
search_matrix = self.tfidf_matrix
if self.use_embeddings:
return self._semantic_match(query, search_db, top_n)
else:
return self._tfidf_match(query, search_db, search_matrix, top_n)
def _tfidf_match(self, query: str, db: pd.DataFrame,
matrix, top_n: int) -> List[Dict]:
"""TF-IDF based matching"""
query_vec = self.tfidf.transform([query])
similarities = cosine_similarity(query_vec, matrix)[0]
top_indices = similarities.argsort()[-top_n:][::-1]
results = []
for idx in top_indices:
row = db.iloc[idx]
results.append({
'code': row['code'],
'description': row['description'],
'unit': row['unit'],
'unit_price': row['unit_price'],
'similarity': float(similarities[idx]),
'category': row.get('category_l1', '')
})
return results
def _semantic_match(self, query: str, db: pd.DataFrame,
top_n: int) -> List[Dict]:
"""Semantic embedding based matching"""
query_embedding = self.embedder.encode([query])
# Get indices for filtered db
indices = db.index.tolist()
filtered_embeddings = self.embeddings[indices]
similarities = cosine_similarity(query_embedding, filtered_embeddings)[0]
top_indices = similarities.argsort()[-top_n:][::-1]
results = []
for i, idx in enumerate(top_indices):
row = db.iloc[idx]
results.append({
'code': row['code'],
'description': row['description'],
'unit': row['unit'],
'unit_price': row['unit_price'],
'similarity': float(similarities[idx]),
'category': row.get('category_l1', '')
})
return results
def match_bim_element(self, element: Dict) -> List[Dict]:
"""Match a BIM element to work items"""
# Build query from element properties
query_parts = []
if element.get('material'):
query_parts.append(element['material'])
if element.get('category'):
query_parts.append(element['category'])
if element.get('description'):
query_parts.append(element['description'])
# Add dimensions if available
if element.get('thickness'):
query_parts.append(f"{element['thickness']}mm thick")
if element.get('height'):
query_parts.append(f"{element['height']}m high")
query = ' '.join(query_parts)
# Determine category from element type
category = self._get_category_from_element(element)
return self.match(query, top_n=3, category=category)
def _get_category_from_element(self, element: Dict) -> Optional[str]:
"""Map BIM element type to CSI category"""
element_mapping = {
'IfcWall': 'Concrete|Masonry',
'IfcSlab': 'Concrete',
'IfcColumn': 'Concrete|Metals',
'IfcBeam': 'Concrete|Metals',
'IfcDoor': 'Openings',
'IfcWindow': 'Openings',
'IfcRoof': 'Thermal',
'IfcStair': 'Concrete',
'IfcPipeSegment': 'Plumbing',
'IfcDuctSegment': 'HVAC'
}
elem_type = element.get('ifc_type', '')
return element_mapping.get(elem_type)
```
## Cost Estimation Engine
### Automated Estimator
```python
class OpenConstructionEstimator:
"""Generate cost estimates using open databases"""
def __init__(self, matcher: WorkItemMatcher, region: str = 'default'):
self.matcher = matcher
self.region = region
self.regional_factors = self._load_regional_factors()
self.estimates = []
def _load_regional_factors(self) -> Dict[str, float]:
"""Load regional cost adjustment factors"""
return {
'default': 1.0,
'northeast_us': 1.15,
'southeast_us': 0.92,
'midwest_us': 0.95,
'west_us': 1.08,
'moscow': 1.20,
'spb': 1.10,
'regions_ru': 0.85
}
def estimate_element(self, element: Dict) -> Dict:
"""Estimate cost for a single element"""
# Get matching work items
matches = self.matcher.match_bim_element(element)
if not matches:
return {
'element_id': element.get('id'),
'status': 'no_match',
'estimated_cost': 0
}
best_match = matches[0]
quantity = element.get('quantity', 1)
unit_price = best_match['unit_price']
# Apply regional factor
regional_factor = self.regional_factors.get(self.region, 1.0)
adjusted_price = unit_price * regional_factor
# Calculate total
total_cost = adjusted_price * quantity
estimate = {
'element_id': element.get('id'),
'element_type': element.get('ifc_type'),
'element_description': element.get('description', ''),
'matched_code': best_match['code'],
'matched_description': best_match['description'],
'match_confidence': best_match['similarity'],
'unit': best_match['unit'],
'quantity': quantity,
'unit_price': unit_price,
'regional_factor': regional_factor,
'adjusted_unit_price': adjusted_price,
'total_cost': total_cost
}
self.estimates.append(estimate)
return estimate
def estimate_project(self, elements: List[Dict]) -> Dict:
"""Estimate entire project"""
for element in elements:
self.estimate_element(element)
df = pd.DataFrame(self.estimates)
# Summary by category
if not df.empty:
summary = df.groupby('element_type').agg({
'total_cost': 'sum',
'element_id': 'count',
'match_confidence': 'mean'
}).rename(columns={'element_id': 'count'})
else:
summary = pd.DataFrame()
total = df['total_cost'].sum() if not df.empty else 0
return {
'total_cost': total,
'element_count': len(elements),
'matched_count': len(df[df['match_confidence'] > 0.5]) if not df.empty else 0,
'summary_by_type': summary.to_dict() if not summary.empty else {},
'details': self.estimates
}
def export_estimate(self, output_path: str) -> str:
"""Export estimate to Excel"""
df = pd.DataFrame(self.estimates)
with pd.ExcelWriter(output_path, engine='openpyxl') as writer:
# Summary
summary = pd.DataFrame({
'Metric': ['Total Cost', 'Elements', 'Matched', 'Avg Confidence'],
'Value': [
df['total_cost'].sum() if not df.empty else 0,
len(df),
len(df[df['match_confidence'] > 0.5]) if not df.empty else 0,
df['match_confidence'].mean() if not df.empty else 0
]
})
summary.to_excel(writer, sheet_name='Summary', index=False)
# Details
if not df.empty:
df.to_excel(writer, sheet_name='Details', index=False)
# By type
by_type = df.groupby('element_type')['total_cost'].sum()
by_type.to_excel(writer, sheet_name='By_Type')
return output_path
def get_missing_items(self) -> List[Dict]:
"""Get elements that couldn't be matched"""
df = pd.DataFrame(self.estimates)
if df.empty:
return []
low_confidence = df[df['match_confidence'] < 0.5]
return low_confidence.to_dict('records')
```
## Database Management
### Creating and Updating Database
```python
class OpenDatabaseManager:
"""Manage open construction pricing database"""
def __init__(self, db_path: str):
self.db_path = db_path
self.db = self._load_or_create()
def _load_or_create(self) -> pd.DataFrame:
"""Load existing or create new database"""
try:
return pd.read_csv(self.db_path)
except FileNotFoundError:
return pd.DataFrame(columns=list(WORK_ITEMS_SCHEMA.keys()))
def add_items(self, items: List[Dict]):
"""Add new work items"""
new_df = pd.DataFrame(items)
self.db = pd.concat([self.db, new_df], ignore_index=True)
self.db.drop_duplicates(subset=['code'], keep='last', inplace=True)
def update_prices(self, updates: pd.DataFrame, year: int):
"""Update prices with new data"""
for _, row in updates.iterrows():
mask = self.db['code'] == row['code']
if mask.any():
self.db.loc[mask, 'unit_price'] = row['unit_price']
self.db.loc[mask, 'year'] = year
def apply_inflation(self, rate: float):
"""Apply inflation adjustment"""
self.db['unit_price'] = self.db['unit_price'] * (1 + rate)
def export_subset(self, category: str, output_path: str):
"""Export subset of database"""
subset = self.db[
self.db['category_l1'].str.contains(category, case=False, na=False)
]
subset.to_csv(output_path, index=False)
def save(self):
"""Save database"""
self.db.to_csv(self.db_path, index=False)
def get_statistics(self) -> Dict:
"""Get database statistics"""
return {
'total_items': len(self.db),
'categories': self.db['category_l1'].nunique(),
'avg_price': self.db['unit_price'].mean(),
'price_range': (self.db['unit_price'].min(), self.db['unit_price'].max()),
'latest_year': self.db['year'].max() if 'year' in self.db else None
}
```
## Quick Reference
| Category | CSI Division | Typical Items |
|----------|--------------|---------------|
| Concrete | 03 | Walls, slabs, columns, beams |
| Masonry | 04 | Brick, block, stone |
| Metals | 05 | Structural steel, misc metals |
| Finishes | 09 | Drywall, paint, flooring |
| MEP | 21-26 | Plumbing, HVAC, electrical |
| Sitework | 31-33 | Excavation, paving, utilities |
## Resources
- **OpenConstructionEstimate**: Open database initiative
- **CSI MasterFormat**: https://www.csiresources.org/standards/masterformat
- **DDC Website**: https://datadrivenconstruction.io
## Next Steps
- See `vector-search` for semantic item matching
- See `cost-prediction` for ML-based estimation
- See `qto-report` for quantity extraction