The article explores how modern agentic applications powered by large language models rely on integrations with external services, and how the model context protocol has become a standard way to expose these tools and their instructions. Earlier approaches used manual function calling where developers implemented authorization, application programming interface logic, and carefully crafted instructions for each function. The model context protocol simplifies this by bundling both integration functions and tool usage instructions, becoming the default tool layer in many agentic pipelines, including those that also use skills and code execution features from Anthropic.
The author highlights two core issues with the current model context protocol design: all tool parameters are exposed to the model, forcing it to generate values that the application often already knows, and there is no out of the box control over the quality or style of tool and parameter descriptions. Since tools are passed into the model application programming interface and also injected into the system prompt, non standardized and verbose tool schemas can directly affect model reasoning. The article notes that while approaches like hiding arguments in FastMCP address unnecessary parameter exposure, many real world model context protocol servers still push values such as user identifiers into the prompt for convenience, and the focus here is on the second issue of description quality and inconsistency across tools from different providers.
To study the impact of description wording, the author inspects real model context protocol tools and shows that different domains use very different formats and levels of detail, even when both are technically correct. To quantify the effect, the experiments use NVidia’s When2Call dataset, selecting samples where the model must choose among multiple tools and only one is correct, and treat any no tool call response as incorrect. For the language model the author uses OpenAI’s gpt-5-nano, and generates alternative, more detailed tool and parameter descriptions with OpenAI’s gpt-5-mini, instructing it to complicate the text with explanations and usage examples while keeping the original JSON structure intact.
The benchmark includes four experiment settings: a baseline with original descriptions, replacing only the correct tool descriptions with the generated detailed versions, replacing only the incorrect tools, and replacing all tools. The table of results shows: “MethodMean accuracyAccuracy stdMaximum accuracy over 5 experimentsBaseline76.5%0.0379.0%Correct tool replaced80.5%0.0385.2%Incorrect tool replaced75.1%0.0176.5%All tools replaced75.3%0.0482.7%Table 1. Results of the experiments.” From the table above it is evident that tools complication introduce bias to the model, selected large language model tends to choose the tool with more detailed description, while excessive detail across all tools can also confuse the model. The author notes that the dataset uses a small number of tools per call, the average number of used tools at each sample is 4.35.
The findings suggest that tool descriptions provide a direct mechanism to manipulate and significantly adjust model behavior and accuracy, and that large language models can develop tool biases that could be misused by model context protocol providers, similar to previously reported style biases. To address this in practice, the author presents a proof of concept called Master MCP, a proxy model context protocol server that can connect to any number of upstream model context protocol servers but present itself as a single server to an agent or model. By default, Master MCP ignores parameters whose names start with an underscore, allowing those to be filled in programmatically or via default values, and lets users adjust tool and parameter descriptions through a simple user interface over the aggregated JSON schema. The article closes with an invitation for community contributions and outlines potential extensions such as logging, monitoring with advanced analytics, and tool hierarchies and orchestration that could incorporate machine learning based strategies.
