Simon Willison’s LLM, a Python library and command-line interface for interacting with large language models, has seen a series of rapid developments, adding significant capabilities to better serve developers working with modern Artificial Intelligence models like GPT-4, Llama, Claude, and Gemini. The tool started as a utility for interacting with services like ChatGPT, and has grown into a feature-rich platform supporting a wide range of workflows.
Key releases include LLM 0.5, which introduced plugin support for integrating additional models, including self-hosted ones, expanding its flexibility beyond commercial API providers. By version 0.9, LLM incorporated tools for working with embeddings, supporting enhanced search and retrieval-augmented generation (RAG) scenarios. Notable updates in LLM 0.13 and subsequent annotated releases refined compatibility with both local models and service APIs, and cemented its status as a versatile tool for both research and production use cases.
Recent milestones stand out for their focus on multimodality and structured data workflows. LLM 0.17 enabled users to send prompts to models with image, audio, and video inputs directly from their terminal, pushing LLM into vision and audio Artificial Intelligence applications. Version 0.22 and 0.23 delivered annotated release notes and unveiled support for schemas, empowering users to extract structured information from unstructured content, with updates to key plugins like llm-anthropic and llm-gemini. LLM 0.24 tackled the challenge of long context windows through fragments and template plugins, optimizing inputs for models capable of processing extensive documents. Version 0.25 further enhanced video processing by allowing video files to be converted into JPEG frames for analysis by vision models, leveraging new plugin capabilities.
Together, these updates position LLM as a uniquely comprehensive, open-source solution for developers, journalists, and data practitioners seeking advanced control over language models, embeddings, multimodal inputs, context management, and schema-driven information extraction in the evolving Artificial Intelligence landscape.