Category 2: Workflow Automation
Used for: Multi-step processes that benefit from consistent methodology, including coordination across multiple MCP servers.
Real example: skill-creator skill
"Interactive guide for creating new skills. Walks the user through use case definition, frontmatter generation, instruction writing, and validation."
Key techniques:
- • Step-by-step workflow with validation gates
- • Templates for common structures
- • Built-in review and improvement suggestions
- • Iterative refinement loops
Category 3: MCP Enhancement
Used for: Workflow guidance to enhance the tool access an MCP server provides.
Real example: sentry-code-review skill (from Sentry)
"Automatically analyzes and fixes detected bugs in GitHub Pull Requests using Sentry's error monitoring data via their MCP server."
Key techniques:
- • Coordinates multiple MCP calls in sequence
- • Embeds domain expertise
- • Provides context users would otherwise need to specify
- • Error handling for common MCP issues
Define success criteria
How will you know your skill is working?
These are aspirational targets - rough benchmarks rather than precise thresholds. Aim for rigor but accept that there will be an element of vibes-based assessment. We are actively developing more robust measurement guidance and tooling.
Quantitative metrics:
-
• Skill triggers on 90% of relevant queries
- – How to measure: Run 10-20 test queries that should trigger your skill. Track how many times it loads automatically vs. requires explicit invocation.
-
• Completes workflow in X tool calls
- – How to measure: Compare the same task with and without the skill enabled. Count tool calls and total tokens consumed.
-
• 0 failed API calls per workflow
- – How to measure: Monitor MCP server logs during test runs. Track retry rates and error codes.
Qualitative metrics:
-
• Users don't need to prompt Claude about next steps
- – How to assess: During testing, note how often you need to redirect or clarify. Ask beta users for feedback.
-
• Workflows complete without user correction
- – How to assess: Run the same request 3-5 times. Compare outputs for structural consistency and quality.
-
• Consistent results across sessions
- – How to assess: Can a new user accomplish the task on first try with minimal guidance?