CCA-F Section 5: Tool Design & MCP - Tool Design Fundamentals

Summary

(To be synthesized)

Key Ideas

L1: Claude read tool descriptions and use that text as the primary signal for routing decisions L1: The tool description defines - Tool description β†’ natural language text β†’ Claude what a tool does and when to use - Routing β†’ how claude matches a request to the best-fitting tool from description - Misrouting β†’ when a vague description makes claude call the wrong tool L1: Description is the decision - Tool name are hints β†’ not rules β†’the description do it - Claude performs semantic matching between the request and description - Strong description narrows the candidates set to one obvious choice β†’ what tool does, when to call tool, what input is expects and what is return β†’ claude can confidently select with minimal ambiguity - A week or missing description makes every tool equally likely to be chosen β†’ give only name or vague phrase β†’ Claude has no basis for confidently selection β†’ misroute or skip the tool L1: How selection actually work β†’ clear/complete description is the single most important - Assign implicit relevance weights based on semantic fit to the request - Higher-confidence matches win - NO single keyword trigger selection β†’. Full description context matters L1: 4 routing outcomes - Correct call - Misroute β†’ wrong tool called due to 2 descriptions read as too similar - No call - Fallback β†’ ask the user to clarify rather than guessing between unclear options L1: The name is not enoughβ†’ every tool needs a description that remove the ambiguity the name leaves behinds L1: 3 description failure modes - Vague intent β†’ claude can’t separate it from similar alternatives - Missing scope β†’ omits what the tool does not handle β†’ overlapping tools create selection ambiguity - No output signal β†’ doesn’t say what the tool returns β†’ claude can’t confirm it will produce what’s needed

L2: Description need - What is does β†’ 1/2 sentences purpose statement β†’ describe the tool’s function without ambiguity - When to use/call tool β†’ condition under which tool should be selected β†’ helps Claude choose among similar tools - When not to use/call tool β†’ boundary condition that scope the tool L2: Write the description as 3 part as above L2: Including input format expectation β†’ description should signal what kinds of inputs the tool expects β†’ input hints help Claude form the call correctly (not just select the tool) - Mention required parameter type - If input is structured, name the format - Avoid restarting the schema verbatim β†’ summary the input intent L2: Including output format int the description β†’ what tool return shapes whether claude considers the request fulfilled β†’ let claude confirm the tool satisfies the request before calling it - State whether the tool return an object, list, boolean, or status - Note if the output needs further processing or is ready to use directly - Signal how error surface: return null, exception, or error field L2: Disambiguation Strategies - Anchor by domain β†’ scope the tool to specific data domain - Name the trigger β†’ state the exact request type that triggers it - Exclude explicitly β†’ list what is doesn’t handle (e.g not perform bulk updates or deletions) L2: Overlapping vs Separated tool - Overlapping tools β†’ 2 tools with similar scope and vague descriptionsβ†’ claude can’t confidently tell them apart β†’. Guesses β†’ misroute roughly half the time - Separate tool β†’ same 2 tool, with explicit domain anchors, triggers condition, and exclusion clauses β†’ claude select the right one every time L2: Testing tool selection reliability β†’ consistent correct routing across a board input distribution - Try varied phrasings of the same underlying intent - Try requests that could match several tools β†’ check selection - Try edge case β†’ request tool should not handle L2: 3 testing pattern for tool - Positivite test - Negative test β†’ send the request tool should not handle - Disambiguation test β†’ send a request matching 2 tools β†’ confirm the right selection on specificity (not order)

L3: Well-designed schema is a contract β†’ claude know exactly what to send and in what format L3: Schema vocab - Tool schema β†’ json schema on a tool definition β†’ accepted parameters, their type, constraints, and which are required - Standard format for defining tool input β†’ JSON - Claude reads the schema to understand what args are valid β†’ caught invalid args shapes before the tools execute - Input Parameter β†’ name field in the schema carry a value Claude must supply β†’ require or optional β†’ clear names reduce hallucinated or mismatch args value - Naming parameters clearly β†’ the first signal Claude uses to understand a field’s purpose - Use specific, unambiguous names β†’ snake_case and avoid abbreviations the model might misread - Should reflect domain meaning (not implementation details) - Anti-patterns - Too generic - Overloaded names β†’ reusing one name for diff purposes β†’ wrong links over a session - Cryptic Abbreviations β†’ Cryptic short names forces Claude to guess at meaning and raise args error - Description of params β†’ as Micro-prompts β†’ treat every description as a mini-instruction to the model - This read by Claude to understand the field’s expected content - Description guide arg generation as prompt instruction do - Include the field’s purpose, valid values and edge case behaviour - Omitting descriptions degrades argument quality event the types are not correct - Weak vs Strong description - Weak: e.g just provide the type but not the format, source, or constraint β†’ Claude is left to guess at the specifics - Strong: e.g UUID of the authenticated user from session token β†’ provide the source, format, constraint β†’ leaving no room for misinterpretation - Schema validation β†’ check that confirms Claude’s generated arguments match the declared schema before the call runs - Schema constraint props β†’ hints catch malformed value before exec - Type - Pattern - Format

L4: Not every parameter carries equal weight - Required Parameter: JSON schema use a top-level β€œrequired” array β†’ which param Claude must supply β†’ must be present in every tool call - Designing β†’ a lean required set produces more reliable tool calls across contexts - Only mark a field required if the tool cannot function without it - Can the tool provide a sensible default or skip the field - Over-requiring force Claude to hallucinate values to sastify the schema - Optional Parameter: Fields not listed are optional and Claude may omit them - Default value: Optional fields should have a default or handle absence gracefully L4: Required field decision framework - Can’t function without it β†’ required - Useful but no critical β†’ optional - Rarely provided β†’optional + handle its absence L4: Enum field for constrained input β†’ use when parameter accepts only a fixed set of values - Declare enum as an array of the only acceptable string value - Claude selects from the enum instead of free-form - Great for routing - Compare with free-form string: - Free-form: leave Claude unlimited options β†’ may produce some keyword (urgent, high-priority) β†’ break routing logic - Enum β†’ force Claude select exact value β†’ routing code work predictably without normalization step - Use case in practice - Routing input β†’ directing actions to diff handlers or workflows - Extraction field - Best when the valid set is stable and small - Avoid when valid values changes often or number > 100 L4: Schema contrains props - Enum β†’ listed set - Minimum/maximum β†’ numeric boundary β†’ limit and scores - Additional properties β†’ if set to false it blocks field not declared in the schema for strict input

L5: When a tool fails β†’ the error response shapes what Claude does next L5: 3 core error concepts - Tool error response β†’ structured reply indicating failure: error type, readable message, optional machine-readable fields - Error code: short, stable identifier β†’ code and model route errors without parsing text - Sugessted action: optional field tell Claude what to do next: retry, ask the user, escalate L5: Good Error Response include β†’ more than a status code - Error type β†’ stable, machine readable code β†’ field β€œerror_code” - Message β†’ human readable explanation of failure β†’ field β€œmessage” - Context β†’ which input or resource caused the failure β†’ field β€œcontext” - Suggested action β†’ retry, escalate, ask the user β†’ field β€œsuggested_action” L5: USER ERROR VS System Error - User error: caller provided bad input: wrong format, missing field, value out of range β‡’ claude should surface this to the user and ask for a correction rather than retry - System error: downstream services unavailable, timed out, return an unexpected response β†’ retry, wait or try alternative without involving user L5: Design error message β†’ Error message written for humans often confuse model-driven recovery β†’ structured errors let Claude take the right next action without extra prompting - Keep codes stable across the version - Include the input value that triggered error - Avoid raw stack trace as the primary message L5: When to include suggested action - rate_limited β†’ retry with delay - invalid_input β†’ ask user to correct value - service_unvailable β†’ try an alternative or escalate L5: Escalation vs Recovery - Recovery β†’ Claude handles the error itself β†’ retries, reformats, switch tool - Escalation β†’ the error need human β†’ claude surfaces it plainly instead - Error Routing β†’ use error_code to choose: recover or escalate L5: Avoid common error design mistakes β†’ clear, specific errors save tool calls and reduce user frustration - Don’t return a success status with an error buried in the body - Don’t use vague code like FAILED or ERROR with no subtype - Don’t omit context when the same code can mean diff things

L6: Not every failure is final β†’ some error are transient, some calls only partially complete L6: Retry and Reliability Concepts - Partial Success β†’ some items in batch fail β†’ response says which items success or not - Idempotentcy β†’ calling a tool repeatedly with the same input give the same result β†’ make retry safe - Circuit Breaker β†’ a pattern that halt calls to a failing service after failure to prevent cascades L6: When retry appropriate β†’ error code should tell Claude the category of failure - Transient errors like timeout/rate limits β†’ good to retry - User error like invalid input β†’ should not be retried unchanged - System error without suggested action β†’ need judgment before retry L6: Idempotent vs Non-Idempotent Tools - Idempotent β†’ calling tool twice with the same input is safe - Non-idempotent β†’ each call produces a new side effect L6: Design tool for safe retry β†’ let claude retry on transient failure without duplicate - Accept a client-generated idempotency key with each request - Store the key and return the same response for duplicate calls - Use conditional writes that check state before modifying it L6: 3 retry failure mode - Duplicate side effects β†’ non-idempotent - Infinite Retry Loop β†’ retry without backoff or limit - Stale State β†’ retry after partial completion re-applies steps that already successed L6: Surfacing Partial success to claude β†’ claude can decide what to retry, skip, surface to user - Return a partial_success status , not a plain error - Include a successed list with IDs or key that completed - Include a failed list with per-item error codes and context L6: Circuit Breaker states - Closed β†’ healthy, call pass through β†’ count failure - Open β†’ Calls blocked at once β†’ a fallback or error return - After a wait β†’ move to half-open - If probe failed β†’ change to probe successes β†’ half-open - Half-open: a probe call tests recovery - Has a threshold error to break L6: When to use circuit breaker - Use one when a downstream tool has intermittent outages - Set a failure threshold that triggers the open state - Define a recovery probe interval to test for restoration

L7: Every tool you give an agent loads into its context on every turn (whether it’s used or not) L7: Tool distribution foundation - Tool bloat β†’ anti pattern where agent get too many tools β†’ increase content usage and routing confusion - Specialist subagents β†’ subagent with narrow tool palette and focused mission β†’ used to isolate context and cut error - Context isolation β†’ giving subagent only tools and context they need β†’ keep the orchestrator’s context clean L7: Cost of too many tools β†’ fewer tools keep the context windows focused and agent faster - Each tool consumes token from the available context budget - Large tool catalogs slow inference and increase cost per turn - Irrelevant tools crowd out content the model actually needs L7: Bloated agent vs Specialist Agent - Bloated agent: receive the full tool catalog β†’ context fills with description of tools it will never call β†’ routing error rise a similar tools compete and response get slower and less predictable - Specialist agent: receives only the tools its mission needs β†’ context stays lean, routing is unambiguous and each call has a single obvious candidate β†’ behavior is consistent and auditable L7: Tool overlap and routing ambiguity - Overlapping tool descriptions cause split routing across similar tools - The model may alternate between tools on identical inputs - Ambiguity compounds when tool names are similar or generic L7: Scoping principles - One agent - One mission β†’ agent gets a single role and only the tools that role require - Tool earn their slot: add a tool only when no existing tool covers the need β†’ if not remove - Audit the Catalog β†’ review tool assignment periodically β†’ prototype tools often outlive their use - Name precision matter β†’ name and descriptions must be distinct enough to identify the right tool L7: Specialist subagent pattern β†’ specialist subagents each receive a narrow palette size exactly for their role - Specialization makes each subagent’s behavior easier to predict and test L7: Tool distribution vocab - Tool Routing β†’ model choice of which tool to call from the palette β†’ based on the request and description - Tool descriptionβ†’ the text explaining what a tool does β†’ the primary signal claude uses to route request - Cross-Cutting tool β†’ a tool that applies across role (logging, status reporting) valid on several agents L7: When to consolidate tool β†’ not every tool should be isolated β†’ some functions genuinely span several agent

L8: Tool selection is driven by description-text quality - understanding the failure modes build reliable routing L8: Selection failure modes - No-Good-Match β†’ no tool match the request β†’ model fabricate, refuses, misroutes - Two-Good-Match β†’ two tools both look valid β†’ the model alternates or flavors the longer description - Disambiguation β†’ writing distinct verbs and narrow scopes so each tool clearly wins. - Writing descriptions that distinguish tools clearly is the highest impact fix for routing error - Use distinct verbs: retrieves vs writes vs delete vs summaries - Narrow scope: list what the tool does AND what is does not do - Avoid generic name like β€˜process’ or β€˜handle’ that match everything L8: How claude select a tools - Claude reads the request and scores each tool’s description for relevance - The highest-scoring match wins (regardless of tool’s actually capability) - Poorly written description produce poor matches, even for capable tool L8: No-Good-Match vs Two-Good-Match failure - No-Good-Match β†’ no strong candidate β†’ model event a tool name, refuse to answer, call the least-wrong option β†’ produce wrong output β†’ silent error - Two-Good-Match β†’ 2 strong candidates β†’ alternates between them across turn or pick whichever has the more detail descriptionβ†’ non-deterministic behavior β†’ hard to audit L8: System Prompt Guidance - Explicit Routing Rules: Name tool and say when to use it - Tie-Breaking instructions: when 2 tools are close, pick one by a clear rule - Negative Examples: tell model which tool NOT to call L8: Logging tool selection in production β†’ turn tool selection from a blackbox into a measurable prop - Log every tool selection: which tool was chosen, and what the request was - Track selection frequency to find tools that are never/rarely called - Alert when a tool’s calibrate drops sharply from its baseline L8: Architecture - Tool description β†’ explaining what tool does β†’ main signal for Claude use when routing - Tool Routing β†’ model choice which tool to call from palette based on description - Selection Drift β†’ Gradual decay of routing accuracy as description go stale or start to overlap β†’ drift is invisible without logging L8: Add a new tool or refine an existing β†’ default to refinement - add tool only when scope expansion is clearly necessary - Adding a tool increase catalog size + introduces potential overlap - Refining a description improves routing without growing the palette - A new tool is justified only when the use case is truly outside existing scope

Quotes

My Take