Function calling, also called tool calling, lets NVIDIA NIM connect large language models to external services by returning structured function arguments that your application can execute. In NIM, function calling is controlled at inference time with the tool_choice and tools request parameters, and server behavior is adjusted using environment variables for LLM NIMs. The feature allows the model to output a function call representation instead of or in addition to a text response, which can then be executed and fed back to the model to produce a final answer.
To enable tool calling you must set specific environment variables when launching LLM NIM servers. NIM_ENABLE_AUTO_TOOL_CHOICE=1 turns on tool calling. NIM_CHAT_TEMPLATE can override the default chat template by pointing to an absolute path to a .jinja file to help the model format tool-call output. NIM_TOOL_CALL_PARSER defines how model output is post-processed into a tool call and accepts values such as ‘pythonic’, ‘mistral’, ‘llama3_json’, ‘granite-20b-fc’, ‘granite’, ‘hermes’, or ‘jamba’, or a custom identifier. If you specify a custom parser name, provide its python file path in NIM_TOOL_PARSER_PLUGIN. Note that for LLM-specific NIM containers that ship tool calling support, you should not set tool-calling environment variables externally because the feature is enabled automatically in those images.
The documentation warns that if the chat completion response contains an empty tool_call field while the function call appears only in freeform content, the post-processing step failed; you should update the chat template or choose a different parser. Supported models include GPT-OSS-20B and GPT-OSS-120B, Llama 3.1/3.2/3.3, Mistral models, and Llama Nemotron Nano, Super, and Ultra variants. Inference request parameters require both tool_choice and tools together. tool_choice accepts ‘none’ to disable tools, ‘auto’ to let the model decide, or a named tool choice that forces a specific function by name. The docs include example workflows and code samples for basic function calling, multiple tools including parameterless tools, named tool usage, and a sample tool parser plugin with the corresponding environment variable settings.
