CCA-F Section 5: Tool Design & MCP - Tool Design Fundamentals
Summary
(To be synthesized)
Key Ideas
L1: Claude read tool descriptions and use that text as the primary signal for routing decisions L1: The tool description defines - Tool description β natural language text β Claude what a tool does and when to use - Routing β how claude matches a request to the best-fitting tool from description - Misrouting β when a vague description makes claude call the wrong tool L1: Description is the decision - Tool name are hints β not rules βthe description do it - Claude performs semantic matching between the request and description - Strong description narrows the candidates set to one obvious choice β what tool does, when to call tool, what input is expects and what is return β claude can confidently select with minimal ambiguity - A week or missing description makes every tool equally likely to be chosen β give only name or vague phrase β Claude has no basis for confidently selection β misroute or skip the tool L1: How selection actually work β clear/complete description is the single most important - Assign implicit relevance weights based on semantic fit to the request - Higher-confidence matches win - NO single keyword trigger selection β. Full description context matters L1: 4 routing outcomes - Correct call - Misroute β wrong tool called due to 2 descriptions read as too similar - No call - Fallback β ask the user to clarify rather than guessing between unclear options L1: The name is not enoughβ every tool needs a description that remove the ambiguity the name leaves behinds L1: 3 description failure modes - Vague intent β claude canβt separate it from similar alternatives - Missing scope β omits what the tool does not handle β overlapping tools create selection ambiguity - No output signal β doesnβt say what the tool returns β claude canβt confirm it will produce whatβs needed
L2: Description need - What is does β 1/2 sentences purpose statement β describe the toolβs function without ambiguity - When to use/call tool β condition under which tool should be selected β helps Claude choose among similar tools - When not to use/call tool β boundary condition that scope the tool L2: Write the description as 3 part as above L2: Including input format expectation β description should signal what kinds of inputs the tool expects β input hints help Claude form the call correctly (not just select the tool) - Mention required parameter type - If input is structured, name the format - Avoid restarting the schema verbatim β summary the input intent L2: Including output format int the description β what tool return shapes whether claude considers the request fulfilled β let claude confirm the tool satisfies the request before calling it - State whether the tool return an object, list, boolean, or status - Note if the output needs further processing or is ready to use directly - Signal how error surface: return null, exception, or error field L2: Disambiguation Strategies - Anchor by domain β scope the tool to specific data domain - Name the trigger β state the exact request type that triggers it - Exclude explicitly β list what is doesnβt handle (e.g not perform bulk updates or deletions) L2: Overlapping vs Separated tool - Overlapping tools β 2 tools with similar scope and vague descriptionsβ claude canβt confidently tell them apart β. Guesses β misroute roughly half the time - Separate tool β same 2 tool, with explicit domain anchors, triggers condition, and exclusion clauses β claude select the right one every time L2: Testing tool selection reliability β consistent correct routing across a board input distribution - Try varied phrasings of the same underlying intent - Try requests that could match several tools β check selection - Try edge case β request tool should not handle L2: 3 testing pattern for tool - Positivite test - Negative test β send the request tool should not handle - Disambiguation test β send a request matching 2 tools β confirm the right selection on specificity (not order)
L3: Well-designed schema is a contract β claude know exactly what to send and in what format L3: Schema vocab - Tool schema β json schema on a tool definition β accepted parameters, their type, constraints, and which are required - Standard format for defining tool input β JSON - Claude reads the schema to understand what args are valid β caught invalid args shapes before the tools execute - Input Parameter β name field in the schema carry a value Claude must supply β require or optional β clear names reduce hallucinated or mismatch args value - Naming parameters clearly β the first signal Claude uses to understand a fieldβs purpose - Use specific, unambiguous names β snake_case and avoid abbreviations the model might misread - Should reflect domain meaning (not implementation details) - Anti-patterns - Too generic - Overloaded names β reusing one name for diff purposes β wrong links over a session - Cryptic Abbreviations β Cryptic short names forces Claude to guess at meaning and raise args error - Description of params β as Micro-prompts β treat every description as a mini-instruction to the model - This read by Claude to understand the fieldβs expected content - Description guide arg generation as prompt instruction do - Include the fieldβs purpose, valid values and edge case behaviour - Omitting descriptions degrades argument quality event the types are not correct - Weak vs Strong description - Weak: e.g just provide the type but not the format, source, or constraint β Claude is left to guess at the specifics - Strong: e.g UUID of the authenticated user from session token β provide the source, format, constraint β leaving no room for misinterpretation - Schema validation β check that confirms Claudeβs generated arguments match the declared schema before the call runs - Schema constraint props β hints catch malformed value before exec - Type - Pattern - Format
L4: Not every parameter carries equal weight - Required Parameter: JSON schema use a top-level βrequiredβ array β which param Claude must supply β must be present in every tool call - Designing β a lean required set produces more reliable tool calls across contexts - Only mark a field required if the tool cannot function without it - Can the tool provide a sensible default or skip the field - Over-requiring force Claude to hallucinate values to sastify the schema - Optional Parameter: Fields not listed are optional and Claude may omit them - Default value: Optional fields should have a default or handle absence gracefully L4: Required field decision framework - Canβt function without it β required - Useful but no critical β optional - Rarely provided βoptional + handle its absence L4: Enum field for constrained input β use when parameter accepts only a fixed set of values - Declare enum as an array of the only acceptable string value - Claude selects from the enum instead of free-form - Great for routing - Compare with free-form string: - Free-form: leave Claude unlimited options β may produce some keyword (urgent, high-priority) β break routing logic - Enum β force Claude select exact value β routing code work predictably without normalization step - Use case in practice - Routing input β directing actions to diff handlers or workflows - Extraction field - Best when the valid set is stable and small - Avoid when valid values changes often or number > 100 L4: Schema contrains props - Enum β listed set - Minimum/maximum β numeric boundary β limit and scores - Additional properties β if set to false it blocks field not declared in the schema for strict input
L5: When a tool fails β the error response shapes what Claude does next L5: 3 core error concepts - Tool error response β structured reply indicating failure: error type, readable message, optional machine-readable fields - Error code: short, stable identifier β code and model route errors without parsing text - Sugessted action: optional field tell Claude what to do next: retry, ask the user, escalate L5: Good Error Response include β more than a status code - Error type β stable, machine readable code β field βerror_codeβ - Message β human readable explanation of failure β field βmessageβ - Context β which input or resource caused the failure β field βcontextβ - Suggested action β retry, escalate, ask the user β field βsuggested_actionβ L5: USER ERROR VS System Error - User error: caller provided bad input: wrong format, missing field, value out of range β claude should surface this to the user and ask for a correction rather than retry - System error: downstream services unavailable, timed out, return an unexpected response β retry, wait or try alternative without involving user L5: Design error message β Error message written for humans often confuse model-driven recovery β structured errors let Claude take the right next action without extra prompting - Keep codes stable across the version - Include the input value that triggered error - Avoid raw stack trace as the primary message L5: When to include suggested action - rate_limited β retry with delay - invalid_input β ask user to correct value - service_unvailable β try an alternative or escalate L5: Escalation vs Recovery - Recovery β Claude handles the error itself β retries, reformats, switch tool - Escalation β the error need human β claude surfaces it plainly instead - Error Routing β use error_code to choose: recover or escalate L5: Avoid common error design mistakes β clear, specific errors save tool calls and reduce user frustration - Donβt return a success status with an error buried in the body - Donβt use vague code like FAILED or ERROR with no subtype - Donβt omit context when the same code can mean diff things
L6: Not every failure is final β some error are transient, some calls only partially complete L6: Retry and Reliability Concepts - Partial Success β some items in batch fail β response says which items success or not - Idempotentcy β calling a tool repeatedly with the same input give the same result β make retry safe - Circuit Breaker β a pattern that halt calls to a failing service after failure to prevent cascades L6: When retry appropriate β error code should tell Claude the category of failure - Transient errors like timeout/rate limits β good to retry - User error like invalid input β should not be retried unchanged - System error without suggested action β need judgment before retry L6: Idempotent vs Non-Idempotent Tools - Idempotent β calling tool twice with the same input is safe - Non-idempotent β each call produces a new side effect L6: Design tool for safe retry β let claude retry on transient failure without duplicate - Accept a client-generated idempotency key with each request - Store the key and return the same response for duplicate calls - Use conditional writes that check state before modifying it L6: 3 retry failure mode - Duplicate side effects β non-idempotent - Infinite Retry Loop β retry without backoff or limit - Stale State β retry after partial completion re-applies steps that already successed L6: Surfacing Partial success to claude β claude can decide what to retry, skip, surface to user - Return a partial_success status , not a plain error - Include a successed list with IDs or key that completed - Include a failed list with per-item error codes and context L6: Circuit Breaker states - Closed β healthy, call pass through β count failure - Open β Calls blocked at once β a fallback or error return - After a wait β move to half-open - If probe failed β change to probe successes β half-open - Half-open: a probe call tests recovery - Has a threshold error to break L6: When to use circuit breaker - Use one when a downstream tool has intermittent outages - Set a failure threshold that triggers the open state - Define a recovery probe interval to test for restoration
L7: Every tool you give an agent loads into its context on every turn (whether itβs used or not) L7: Tool distribution foundation - Tool bloat β anti pattern where agent get too many tools β increase content usage and routing confusion - Specialist subagents β subagent with narrow tool palette and focused mission β used to isolate context and cut error - Context isolation β giving subagent only tools and context they need β keep the orchestratorβs context clean L7: Cost of too many tools β fewer tools keep the context windows focused and agent faster - Each tool consumes token from the available context budget - Large tool catalogs slow inference and increase cost per turn - Irrelevant tools crowd out content the model actually needs L7: Bloated agent vs Specialist Agent - Bloated agent: receive the full tool catalog β context fills with description of tools it will never call β routing error rise a similar tools compete and response get slower and less predictable - Specialist agent: receives only the tools its mission needs β context stays lean, routing is unambiguous and each call has a single obvious candidate β behavior is consistent and auditable L7: Tool overlap and routing ambiguity - Overlapping tool descriptions cause split routing across similar tools - The model may alternate between tools on identical inputs - Ambiguity compounds when tool names are similar or generic L7: Scoping principles - One agent - One mission β agent gets a single role and only the tools that role require - Tool earn their slot: add a tool only when no existing tool covers the need β if not remove - Audit the Catalog β review tool assignment periodically β prototype tools often outlive their use - Name precision matter β name and descriptions must be distinct enough to identify the right tool L7: Specialist subagent pattern β specialist subagents each receive a narrow palette size exactly for their role - Specialization makes each subagentβs behavior easier to predict and test L7: Tool distribution vocab - Tool Routing β model choice of which tool to call from the palette β based on the request and description - Tool descriptionβ the text explaining what a tool does β the primary signal claude uses to route request - Cross-Cutting tool β a tool that applies across role (logging, status reporting) valid on several agents L7: When to consolidate tool β not every tool should be isolated β some functions genuinely span several agent
L8: Tool selection is driven by description-text quality - understanding the failure modes build reliable routing L8: Selection failure modes - No-Good-Match β no tool match the request β model fabricate, refuses, misroutes - Two-Good-Match β two tools both look valid β the model alternates or flavors the longer description - Disambiguation β writing distinct verbs and narrow scopes so each tool clearly wins. - Writing descriptions that distinguish tools clearly is the highest impact fix for routing error - Use distinct verbs: retrieves vs writes vs delete vs summaries - Narrow scope: list what the tool does AND what is does not do - Avoid generic name like βprocessβ or βhandleβ that match everything L8: How claude select a tools - Claude reads the request and scores each toolβs description for relevance - The highest-scoring match wins (regardless of toolβs actually capability) - Poorly written description produce poor matches, even for capable tool L8: No-Good-Match vs Two-Good-Match failure - No-Good-Match β no strong candidate β model event a tool name, refuse to answer, call the least-wrong option β produce wrong output β silent error - Two-Good-Match β 2 strong candidates β alternates between them across turn or pick whichever has the more detail descriptionβ non-deterministic behavior β hard to audit L8: System Prompt Guidance - Explicit Routing Rules: Name tool and say when to use it - Tie-Breaking instructions: when 2 tools are close, pick one by a clear rule - Negative Examples: tell model which tool NOT to call L8: Logging tool selection in production β turn tool selection from a blackbox into a measurable prop - Log every tool selection: which tool was chosen, and what the request was - Track selection frequency to find tools that are never/rarely called - Alert when a toolβs calibrate drops sharply from its baseline L8: Architecture - Tool description β explaining what tool does β main signal for Claude use when routing - Tool Routing β model choice which tool to call from palette based on description - Selection Drift β Gradual decay of routing accuracy as description go stale or start to overlap β drift is invisible without logging L8: Add a new tool or refine an existing β default to refinement - add tool only when scope expansion is clearly necessary - Adding a tool increase catalog size + introduces potential overlap - Refining a description improves routing without growing the palette - A new tool is justified only when the use case is truly outside existing scope