@microsoft/mcs-research — Agent Skill

---
name: mcs-research
description: Full research pass — reads project documents, identifies agents, researches MCS components, designs architecture, enriches brief.json + generates evals. Uses Agent Teams for quality.
---

# MCS Research

Three-phase pipeline: **Fast Preview** (30-90s scan) -> **Deep Research** (3-10min enrichment) -> **Decisions & Build**. The fast preview gives the customer a plain-language behavior contract to confirm before investing in deep research. This skill absorbs the former mcs-analyze step -- there is no separate extraction step.

## Input

```
/mcs-research {projectId} {agentId} --fast   # Fast Preview: scan docs, extract behavior contract (30-90s)
/mcs-research {projectId}                     # Project-level: full deep research for all agents
/mcs-research {projectId} {agentId}           # Agent-level: deep research scoped to one agent
```

### `--fast` Flag (Fast Preview Mode)

When `--fast` is specified, the skill runs in **preview-only mode**:

**Runs:**
- Phase 0: Smart routing (full -- same as standard)
- Phase A: Document comprehension (full -- same as standard)
- Phase A Step 4: Extract per-agent data -> write to brief.json
- Phase A Step 5: Generate open questions

**Writes to brief.json:**
- `overview.*` (name, description, problemStatement, targetUsers, challenges, benefits)
- `capabilities[]` (names + descriptions only, all marked `phase: "mvp"`, `implementationType: "prompt"` as default, each with `source: "from-docs"` or `source: "inferred"`)
- `boundaries.*` (handles, politelyDeclines, hardRefuses -- extracted from docs, each with `source` field)
- `openQuestions[]` (ambiguities found in docs, each with `source: "from-docs"` or `source: "inferred"`)
- `knowledge[]` (high-level source references -- doc names, not config)
- `workflow.phase = "preview"`, `workflow.previewGeneratedAt = <ISO timestamp>`

**Does not write:** `architecture`, `instructions`, `evalSets`, `integrations/tools`, `conversations.topics`, `decisions[]`

**Does not spawn teammates:** No PE, QA, TE, FD, RA. Lead-only, single LLM pass.

**Target time:** 30-90 seconds for typical SDR package.

After preview generation, the customer reviews the Overview page in the dashboard and clicks "This looks right" (which sets `workflow.previewConfirmed = true`). The customer can edit items directly before confirming.

### Standard Mode (Deep Research -- no `--fast` flag)

When `workflow.previewConfirmed = true`, deep research skips re-reading docs (already done in preview) and focuses on:

- Phase B: Component research (full -- connectors, MCP discovery, solution library check)
- Phase C: Architecture + instructions + evals + topics (full parallel teammate dispatch)

Reads confirmed preview data (capabilities, boundaries, open questions) as input constraints -- it preserves customer edits.

After deep research completes, it sets:
- `workflow.phase = "decisions"`, `workflow.researchCompletedAt = <ISO timestamp>`

The customer then reviews `decisions[]` in the dashboard, confirms choices, and the workflow advances to `ready_to_build`.

**Project-level** (no agentId):
- First run: reads all docs, identifies agents, deep research, creates brief.json with evalSets + evals.csv
- Subsequent runs: smart-detects new/changed docs, routes to full or incremental

**Agent-level** (with agentId):
- After project research: smart-detects new/changed docs relevant to this agent, incremental enrichment
- Manually created agent (no prior research): full deep research scoped to this agent
- Brief edited (open questions answered): re-enriches with new context even without new docs

## Output Files (per agent)

- `Build-Guides/{projectId}/agents/{agentId}/brief.json` -- Single source of truth (all fields populated including instructions + evalSets)
- `Build-Guides/{projectId}/agents/{agentId}/evals.csv` -- Evaluation test cases (flat CSV generated from evalSets for MCS native eval compatibility)

Two files only. No research report (future: on-demand export from dashboard). No working-paper files.

## Before Research -- Load Frameworks

The session startup protocol already checks cache freshness and refreshes stale Tier 1 files, so there's no need to re-check all 19 cache files here.

1. Read `knowledge/frameworks/component-selection.md` for the research protocol
2. Read `knowledge/frameworks/architecture-scoring.md` for scoring criteria
3. Read `knowledge/frameworks/solution-type-scoring.md` for the solution type pre-gate

Cache files are read on-demand in Phase A (for informed questions) and Phase B (for component research). Only read the specific files needed, not all 18.

## Microsoft-First Component Priority

Enterprise agents run on the Microsoft stack. When selecting components, follow this priority order:

| Priority | Source | Examples | Research Needed? |
|----------|--------|----------|-----------------|
| 1 | **MCS Built-In** | MCP servers, native knowledge, generative orchestration | Cache only |
| 2 | **Power Platform** | Power Automate flows, Dataverse, custom connectors | Cache only |
| 3 | **Azure Services** | Azure Functions, Azure AI, Azure Storage | Cache + quick verify |
| 4 | **M365 Connectors** | SharePoint, Outlook, Teams (Standard tier) | Cache only |
| 5 | **Certified Premium Connectors** | Dynamics 365, ServiceNow, Salesforce | Cache + verify availability |
| 6 | **Third-Party / Custom** | Custom MCP servers, HTTP endpoints, community tools | Full live research required |

**Fast path rule:** If all agent integrations map to Priority 1-4, skip live MCP catalog scan (Phase B Step 0) and skip Research Analyst spawn (Phase B Step 4). Resolve everything from cache because these are well-documented, enterprise-supported, and GA.

Only escalate to live research when the agent has Priority 5-6 integrations (external systems not in cache, or cache > 7 days stale for the specific system).

---

## Phase 0: Smart Research Routing

**Goal:** Determine the optimal processing path for any invocation -- project or agent level. Detects new/changed docs, brief edits, and manually created agents. This phase runs for all invocations because accurate routing prevents wasted work and missed updates.

> Full routing protocol with all 7 substeps: see `reference/smart-routing.md`

**Key steps:** Determine scope -> check preconditions (manifest + brief) -> diff docs against manifest -> map docs to agents -> check brief modifications -> detect drastic changes -> route and report.

### Routing Table (quick reference)

| Condition | `processingPath` | Phases |
|-----------|-----------------|--------|
| First project run (no manifest) | `full` | A -> B -> C (all docs, deep research) |
| First agent run (empty brief) | `full-agent` | A -> B -> C (scoped to agent, reads all project docs for relevance) |
| No changes, brief not edited | `none` | Exit with message |
| Brief edited, no new docs | `re-enrich` | B -> C (skip A, re-enrich with current brief context) |
| Changes exist, not drastic | `incremental` | A-inc -> B-inc -> C-inc |
| Changes exist, drastic | `full` | Warning -> A -> B -> C |

**Output to user before proceeding:**

```
## Research: {projectId} [{agentId if scoped}]
**Scope:** {Project / Agent: agentName}
**New docs:** {N} | **Changed:** {N} | **Deleted:** {N}
**Mode:** {Full / Full-Agent / Incremental / Re-enrich / Nothing new}
{If incremental: doc->agent mapping table}
```

---

## Phase A: Document Comprehension & Agent Identification

**Goal:** Read all project documents, build a unified understanding, identify every agent to build, and create brief.json stubs with informed open questions. This is deep comprehension, not surface extraction.

> Full document comprehension protocol: see `reference/phase-a-details.md`

**Key steps:**
1. Read all documents (md, docx via pandoc, pdf, txt, images)
2. Cross-reference and build a unified picture (systems, personas, contradictions, themes)
3. Identify agents (explicit names, distinct domains, SDR sections)
4. Solution type assessment (5-factor scoring from `solution-type-scoring.md` -- routes to agent/flow/hybrid/not-recommended)
5. Extract per-agent data and generate informed open questions (cross-referenced against cache)
6. Create brief.json stubs, confirm with user, write document manifest

### Incremental Merge Rules Summary

When `processingPath == "incremental"`, Phase A operates on new/changed docs only:
- **Append-only:** `capabilities[]`, `boundaries.*`, `integrations[]`, `conversations.topics[]`, `knowledge[]`, `evalSets[].tests[]`
- **Preserve:** `instructions`, answered `openQuestions[].answer`
- **Resolve:** unanswered `openQuestions` if doc provides the answer
- **Flag conflicts:** `business.problemStatement`, `architecture.type` -> add to `_updateFlags`

### Source Fields on Extracted Items

Every capability, boundary, and open question extracted during `--fast` mode includes a `source` field: `"from-docs"` (explicitly stated), `"inferred"` (derived from context), or `"user-added"` (added by customer in dashboard). The UI shows confidence badges based on this field.

---

## Phase B: Component Research -- Targeted

**Goal:** Research MCS components and recommend the best tools, knowledge sources, model, triggers, and channels for each agent. Stable categories use cache directly; only dispatch live research for the agent's specific external integration systems.

> Full component research protocol: see `reference/phase-b-details.md`

**Fast path:** If all agent integrations map to Priority 1-4 (M365-native), skip live MCP catalog scan and skip Research Analyst spawn. Resolve everything from cache.

**Key decision points:**

1. **Solution library check (Step 0.5):** Read `knowledge/solutions/index.json` for prior builds with 2+ tag overlap. Fast local reads, runs for all processing paths.
2. **MCP discovery (Step 0, conditional):** Full scan only when Priority 5-6 integrations exist or cache > 7 days stale.
3. **Stable categories from cache (Step 1):** Channels, triggers, knowledge sources -- read cache directly, write to brief.
4. **Implementation reality check (Step 2.5):** Challenge every MVP capability against solution patterns (Part A) and 5 first-principles feasibility questions (Part B). Generates structured `decisions[]` when 2+ viable implementation tiers exist (Part C). Runs for all processing paths because patterns may be added after initial research.
5. **Learnings consultation (Step 3):** Read relevant `knowledge/learnings/` files for this agent's systems.
6. **Research Analyst dispatch (Step 4):** Only when external systems not in cache, or Step 2.5 flagged integrations as `needsRework`.
7. **Decision generation from RA results (Step 4.5):** When RA finds 2+ viable approaches, create structured decision. One clear winner -> auto-apply.

### Component Selection Rules

- Prefer MCP over individual connector actions because MCP provides richer multi-tool access
- Present options: recommend the best option but note alternatives
- Flag preview features: note GA vs preview status for each recommendation

---

## Phase C: Architecture, Instructions, Eval Sets & Topics (Parallel)

**Goal:** Score architecture, select model, write instructions, classify topics, generate eval sets, validate topic feasibility. Teammates run in parallel for speed (~8-12 min parallel vs ~20-25 min sequential).

> Full architecture + parallel dispatch protocol: see `reference/phase-c-details.md`

**Processing path variations:**
- **Re-enrich:** Skip Phase A, re-score architecture if answers affect scoring, generate instructionsDelta (or full instructions if empty), parallel QA + TE dispatch
- **Incremental:** Preserve existing architecture/instructions, generate delta, QA appends new tests, PE skipped unless instructions empty
- **Full/full-agent:** Full architecture scoring + parallel dispatch

### Lead Pre-Work (before teammate dispatch)

1. **Consult learnings:** Read `architecture.md`, `instructions.md`, `topics-triggers.md`, `eval-testing.md` from `knowledge/learnings/`
2. **Architecture decision (Step 1):** 6-factor scoring (domain, data sources, team ownership, reusability, instruction size, knowledge isolation). Score 0-2 -> single agent, 3+ -> multi-agent. Borderline (2-3) creates architecture decision entry.
3. **Model selection (Step 1.5):** Query `models.md` cache, evaluate fit (reasoning vs general-purpose vs cost), auto-apply obvious choice or create model decision.
4. **Topic classification (Step 1.5):** Classify each capability as generative or custom topic. Borderline creates topic-implementation decision.

### Parallel Teammate Dispatch (Step 2)

| Teammate | Input | Output | Skip When |
|----------|-------|--------|-----------|
| **Prompt Engineer** | brief.json, instructions-authoring cache, model selection | Instructions (up to 8K chars, self-verified) | Incremental + instructions exist |
| **QA Challenger** | brief.json, eval-scenarios library, learnings | 3 eval sets (safety/functional/resilience), 40-55 tests, coverage report | -- |
| **Topic Engineer** | brief.json topics, capabilities, integrations, adaptive-cards + conversation-design cache | Per-topic feasibility (OK / SPLIT / caveats) | No custom topics |
| **Flow Designer** | brief.json (flow capabilities), integrations, architecture | flow-spec.md | solutionType is not flow/hybrid |

All run simultaneously — they do not depend on each other's output. Every teammate uses GPT-5.4 co-generation internally (PE fires `generate-instructions`, QA fires `generate-evals`, TE fires `generate-topics` for 3+ node topics, FD fires `review-flow`). Teammates handle their own merging; the lead sees the merged output.

### Lead Reconciliation (Step 3)

After teammates return: apply PE instructions with inline review (9-point checklist), apply QA eval sets + coverage review, apply TE split/caveat recommendations, generate per-set CSVs (evals-safety.csv, evals-functional.csv, evals-resilience.csv), write all build-ready data to brief.json.

### GPT Parallel Review (Step 3.5)

After reconciliation, fire GPT-5.4 in parallel on all research outputs:

```bash
node tools/multi-model-review.js review-brief --brief <path-to-brief.json>
node tools/multi-model-review.js review-instructions --brief <path-to-brief.json>
node tools/multi-model-review.js review-components --brief <path-to-brief.json>
# If hybrid/flow:
node tools/multi-model-review.js review-flow --file <flow-spec.md> --brief <path-to-brief.json>
```

GPT checks: brief completeness, instruction anti-patterns, capability-instruction alignment, Microsoft-first priority violations, MCP opportunities, preview risks. Merge protocol: union of findings, stricter wins, flag divergence. Apply fixes for actionable items before writing final output. If GPT is unavailable, proceed without it.

---

## Workflow Phase Updates

At the end of each research mode, update `workflow` in brief.json:

| Mode | Workflow Fields Written |
|------|----------------------|
| `--fast` (preview) | `workflow.phase = "preview"`, `workflow.previewGeneratedAt = now()` |
| Standard (deep research) | `workflow.phase = "decisions"`, `workflow.researchCompletedAt = now()` |

The `workflow.previewConfirmed` and `workflow.decisionsConfirmed` fields are set by the dashboard UI when the customer clicks confirmation buttons -- not by this skill.

---

## Final Output

After all phases complete for each agent:

1. **brief.json** -- All fields populated (business, agent, capabilities, integrations, knowledge, conversations, boundaries, architecture, evalSets, evalConfig, mvpSummary, openQuestions, instructions, workflow)
2. **evals.csv** -- Evaluation test cases in MCS-compatible flat CSV format (generated from evalSets)

### Terminal Output -- Incremental Mode

```
## Incremental Research Complete: {projectId}

**Mode:** Incremental ({N} new/changed docs processed)
**Agents updated:** {count}

| Agent | +Capabilities | +Integrations | +Tests | +Decisions | Flags |
|-------|--------------|---------------|--------|------------|-------|
| {name} | +{N} | +{M} | +{K} | +{D} | {F} |

{If _updateFlags exist: "Review flagged items in dashboard. Instructions delta in notes."}
{If new decisions: "New decisions added -- review in brief before building."}

**Next:** Review changes in dashboard. If instructions need updating, edit in dashboard or re-run with agentId.
```

### Terminal Output -- Full Mode

```
## Research Complete: {projectId}

**Agents:** {count} | **Open Questions:** {count} | **Decisions:** {count pending}

| Agent | Architecture | Tools | Evals | Decisions |
|-------|-------------|-------|-------|-----------|
| {name} | {Single/Multi} | {N} | {N} | {N pending} |

{If decisions exist:}
## Decisions Requiring Review: {count}
| # | Category | Decision | Recommended | Options |
|---|----------|----------|-------------|---------|
| d-001 | integration | How to extract web content? | Azure Function + Readability | 3 options |
| d-002 | model | Which AI model? | GPT-4.1 | 2 options |

Recommended defaults pre-applied to brief. Review and confirm before building.

Files: brief.json + evals.csv per agent

**Next:** Review brief in the dashboard. Resolve open questions and decisions. Then /mcs-build.
```

No report file is generated. The dashboard renders brief.json directly.

---

## Post-Research Learnings Capture

After the terminal output, check for research-phase discoveries worth capturing.

**What to capture:** New components not in cache, cache corrections, customer/industry patterns, non-obvious architecture insights.

**Tier 1 (auto, no user confirmation):** Bump `confirmed` count and `lastConfirmed` date in `index.json` when an approach matched a prior learning. Write cache corrections directly.

**Tier 2 (user confirms):** New discoveries not covered by existing entries, contradictions with existing learnings, non-obvious architecture insights. Present summary, wait for confirmation, then write to `knowledge/learnings/{category}.md` and update `index.json`.

Run the 4-step comparison engine (see CLAUDE.md "Learnings Protocol" section B) before writing any new entry to avoid duplicates and catch contradictions.

If nothing surprising was found, skip the Tier 2 summary -- run Tier 1 auto-check only.

### Update Document Manifest

After all phases complete, set `manifest.lastResearchAt` to the current timestamp. This lets incremental research know when the last full research was performed.

---

## Important Rules

- **brief.json is the source of truth** -- the dashboard reads it, the build skill reads it, reports are generated from it
- **No separate agent-spec.md** -- everything lives in brief.json including instructions and MVP scope
- **evals.csv is for MCS native eval compatibility** -- flat export from brief.json evalSets. The Eval skill reads evalSets directly.
- **Only 2 permanent output files per agent**: `brief.json` and `evals.csv`. No working-paper files -- consolidate teammate outputs into brief.json fields and delete working files before completing.
- **Targeted research, not exhaustive** -- only spawn RA for systems that need live lookup. Stable categories use cache.
- **Single-pass QA** -- no PE<->QA iteration loop. PE self-checks, QA reviews once, lead applies fixes.
- **TE validates feasibility in Phase C** (parallel with PE and QA) but does not generate YAML. Full YAML authoring is reserved for `/mcs-build`.
- **PE, QA, and TE use dual model co-generation internally** -- teammates handle their own merging; the lead sees the merged output.
- **Research components, present options** -- don't assume. Update cache after live research.
- **Don't stop between phases** -- this is a single-pass skill. Run A->B->C continuously.
- **Phase 0 runs for all invocations** -- project and agent level, because accurate routing prevents wasted work.
- **Document-to-agent mapping is auto-detected.** Ask user only when ambiguous.
- **Brief edits trigger re-enrichment** because answered questions provide new context even without new docs.
- **`full-agent` for manually created agents** -- empty brief + agent scope = full research scoped to that agent.
- **Incremental by default** -- when a manifest exists and docs changed without drastic thresholds, prefer incremental. Don't re-process unchanged documents.
- **brief.json is the context** -- during incremental processing, read the brief for context instead of re-reading unchanged docs.
- **Merge rules are fixed** -- during incremental processing, follow merge rules exactly. Append-only for arrays, preserve answered questions, flag conflicts.
- **Manifest consistency** -- after any path (full, full-agent, incremental, re-enrich), the manifest reflects current `docs/` state with accurate hashes and timestamps.
- **`--fast` generates preview only** -- runs Phase 0 + Phase A, writes overview/capabilities/boundaries/openQuestions with `source` tags, sets `workflow.phase = "preview"`. Does not run Phases B or C. Teammates are not spawned. Target: 30-90 seconds.
- **Deep research respects preview edits** -- when `workflow.previewConfirmed = true`, deep research reads confirmed data as input constraints and preserves customer edits.
- **Decisions are structured choices, not open questions** -- `decisions[]` stores ranked options when 2+ approaches are viable. `openQuestions[]` stores freeform unknowns. Keep them separate.
- **Only create decisions when genuinely needed** -- one clear winner = auto-apply, no decision entry. Too many decisions overwhelms the customer.
- **Pre-apply the recommended option** -- the brief should be buildable even if the user never reviews decisions, because the recommended option's `briefPatch` is written to brief fields as the default.

### Decision Generation Rules

| Trigger | Action |
|---------|--------|
| Step 2.5 pattern match with 2+ viable tiers | Create decision, one option per tier |
| Step 2.5 pattern match with 1 viable tier | Auto-apply, no decision |
| RA finds 2+ viable tools for a system | Create decision with ranked options |
| RA finds 1 clear winner | Auto-apply, no decision |
| Architecture score 2-3 (borderline) | Create architecture decision |
| Architecture score 0-1 or 4-6 (clear) | Auto-apply, no decision |
| Model choice has meaningful tradeoffs | Create model decision |
| Topic type is borderline (generative vs custom) | Create topic decision |
| Only one valid option exists | Auto-apply, no decision |

---

## Teammate Usage Summary

| Phase | Full | Full-Agent | Incremental | Re-enrich |
|-------|------|-----------|-------------|-----------|
| 0 | Lead | Lead | Lead | Lead |
| A | Lead (all docs, all agents) | Lead (all docs, one agent) | Lead (new docs only) | Skipped |
| B | Lead + **RA** (if external systems) | Lead + **RA** (if external) | Lead + **RA** (new external only) | Lead only |
| C | Lead + **PE** + **QA** + **TE** + **FD** (if flow/hybrid) (parallel) | Lead + **PE** + **QA** + **TE** + **FD** (if flow/hybrid) (parallel) | Lead + **QA** (PE skipped unless instructions empty) + **TE** (if new topics) | Lead + **QA** + **TE** (if topics affected) |

Parallel dispatch in Phase C: PE, QA, TE (and FD if flow/hybrid) run simultaneously -- not sequentially.
Maximum teammates per run: 5 (RA + PE + QA + TE + FD). RA runs in Phase B; PE + QA + TE + FD run in parallel in Phase C. FD only for flow/hybrid solutionType.
Microsoft-native agents: Often just 3 (PE + QA + TE) -- RA skipped when no external systems, FD skipped when solutionType is "agent".
Incremental runs: Often just 1-2 (QA alone, or QA + TE for new topics).