Microsoft has introduced FIDES, short for Flow Integrity Deterministic Enforcement System, as an experimental security feature in Agent Framework. The release targets prompt injection, which it describes as the #1 risk on the OWASP LLM Top 10, by replacing heuristic defenses with middleware that enforces information-flow control. Content is labeled along two dimensions, trusted or untrusted for integrity, and public, private, or user_identity for confidentiality. Those labels propagate automatically through tool calls, messages, and context providers, and are checked before a sensitive tool is allowed to execute.
The core example is a GitHub issue triage agent that reads public issue bodies, posts follow-up comments, reads files, and writes patches. In the attack scenario, a malicious issue includes hidden instructions telling the agent to read .env and post its contents publicly. FIDES treats the entire issue body as untrusted as soon as read_issue(…) returns it. That allows the agent to summarize or classify the report, but it blocks privileged actions when policy rules are violated. A call to write_file(…) can be refused because the tool declares accepts_untrusted=False, and a call to post_comment(…) can be blocked because the tool limits output with max_allowed_confidentiality=”public” while private content is in scope. With approval_on_violation=True, blocked actions become human approval prompts instead of silent failures.
Microsoft says the main difference from defensive prompts, sanitization, or monitoring is determinism. Prompt injection works because a model cannot reliably distinguish developer instructions from instructions embedded inside data. FIDES moves the decision away from the model and into the framework. LabelTrackingFunctionMiddleware propagates security labels across tool outputs and downstream transformations, while PolicyEnforcementFunctionMiddleware checks the current context before each tool invocation. The result is a split where the model can decide what it wants to do, but the framework decides what it is allowed to do.
The system also includes a stricter isolation option for untrusted text. With auto_hide_untrusted=True, untrusted tool output is replaced by a var_<id> reference, stored separately, and processed by quarantined_llm using a separate tools-free model. In that mode, the main model never reads raw attacker-supplied text directly. Microsoft positions this as stronger defense-in-depth, though it adds another model call and means the main agent works from sanitized summaries rather than the original content. With auto_hide_untrusted=False, the main model can still read the raw content, but policy enforcement remains active.
FIDES ships in the core package in version 1.3.0 and later and is marked experimental. Microsoft says it is best suited for agents that ingest content from uncontrolled sources, operate privileged tools, or handle mixed-sensitivity data that must not flow into public outputs. The company also notes limitations, including opt-in labels per data source, conservative most-restrictive-wins propagation, coarse approval workflows, and a single-turn quarantined LLM design. Sample applications include email_security_example.py and repo_confidentiality_example.py, and Microsoft is directing broader feedback on the security model to discussion #5624.
