@fuenfgeld
Fuenfgeldpydantic-evals
skillTest and evaluate AI agents and LLM outputs using code-first evaluation framework with strong typing. Use when the user wants to: (1) Create evaluation datasets with test cases for AI agents, (2) Define evaluators (deterministic, LLM-as-Judge, custom, or span-based), (3) Run evaluations and generate reports, (4) Compare model performance across experiments, (5) Integrate evaluations with Pydantic AI agents, (6) Set up observability with Logfire, (7) Generate test datasets using LLMs, (8) Implement regression testing for AI systems.
pydantic-ai-agents
skillBuild and debug Pydantic AI agents using best practices for dependencies, dynamic system prompts, tools, and structured output validation. Use when the user wants to: (1) Create a new Pydantic AI agent, (2) Debug or fix an existing agent, (3) Add features like tools, validators, or dynamic prompts, (4) Integrate OpenRouter for multi-model access, (5) Add Logfire for debugging/observability, (6) Structure agent architecture with dependency injection.