Function calling with NVIDIA NIM for large language models

NVIDIA NIM supports function (tool) calling so large language models can return structured function arguments for external services. Enable and control tool calling with environment variables and the tool_choice and tools request parameters.

Function calling, also called tool calling, lets NVIDIA NIM connect large language models to external services by returning structured function arguments that your application can execute. In NIM, function calling is controlled at inference time with the tool_choice and tools request parameters, and server behavior is adjusted using environment variables for LLM NIMs. The feature allows the model to output a function call representation instead of or in addition to a text response, which can then be executed and fed back to the model to produce a final answer.

To enable tool calling you must set specific environment variables when launching LLM NIM servers. NIM_ENABLE_AUTO_TOOL_CHOICE=1 turns on tool calling. NIM_CHAT_TEMPLATE can override the default chat template by pointing to an absolute path to a .jinja file to help the model format tool-call output. NIM_TOOL_CALL_PARSER defines how model output is post-processed into a tool call and accepts values such as ‘pythonic’, ‘mistral’, ‘llama3_json’, ‘granite-20b-fc’, ‘granite’, ‘hermes’, or ‘jamba’, or a custom identifier. If you specify a custom parser name, provide its python file path in NIM_TOOL_PARSER_PLUGIN. Note that for LLM-specific NIM containers that ship tool calling support, you should not set tool-calling environment variables externally because the feature is enabled automatically in those images.

The documentation warns that if the chat completion response contains an empty tool_call field while the function call appears only in freeform content, the post-processing step failed; you should update the chat template or choose a different parser. Supported models include GPT-OSS-20B and GPT-OSS-120B, Llama 3.1/3.2/3.3, Mistral models, and Llama Nemotron Nano, Super, and Ultra variants. Inference request parameters require both tool_choice and tools together. tool_choice accepts ‘none’ to disable tools, ‘auto’ to let the model decide, or a named tool choice that forces a specific function by name. The docs include example workflows and code samples for basic function calling, multiple tools including parameterless tools, named tool usage, and a sample tool parser plugin with the corresponding environment variable settings.

55

Impact Score

Micron samples 256 GB DDR5 9200 MT/s RDIMM server modules

Micron has begun sampling 256 GB DDR5 RDIMM server modules built on its 1-gamma technology to key ecosystem partners. The company positions the new modules as a higher-speed, more power-efficient option for scaling next-generation Artificial Intelligence and HPC infrastructure.

Microsoft emails show early doubts about OpenAI

Court emails show Microsoft executives were unconvinced by OpenAI’s early Artificial Intelligence progress in 2018 while also worrying that rejecting the lab could push it toward Amazon. The messages reveal internal tension between skepticism over technical claims and concern about competitive and public relations fallout.

Apple explores Intel chip manufacturing alliance

Apple has reached a preliminary agreement with Intel to manufacture some chips for its devices, reflecting mounting pressure on semiconductor supply chains as Artificial Intelligence demand absorbs advanced capacity. The move also aligns with Washington’s push to expand domestic chip production and revive Intel’s foundry business.

Contact Us

Got questions? Use the form to contact us.

Contact Form

Clicking next sends a verification code to your email. After verifying, you can enter your message.