Researchers have introduced a framework that integrates large language model based Artificial Intelligence agents with a robot operating system, aiming to make robot programming more flexible and accessible through natural language. The design targets a longstanding limitation in robotics, where experts typically must break tasks into atomic actions and manually assemble behaviors. That approach remains effective in controlled settings, but it is less suited to dynamic environments such as homes or healthcare contexts where capabilities may need to be updated quickly by non-experts.
The system divides responsibilities between experts and non-experts. Experts provide an initial library of pre-trained atomic actions such as picking and navigation, while non-experts interact through a chat interface without needing to write code. The framework centers on four connected parts: an atomic action library, an imitation learning module, an atomic action optimizer, and an Artificial Intelligence agent. Imitation learning allows users to expand the robot’s skill set by physically guiding the robot or demonstrating tasks, and the optimizer uses large language models plus Bayesian optimization to tune parameters in action code. The Artificial Intelligence agent then selects actions from user instructions and text-based environmental observations, supporting single-step execution, multi-step sequencing, custom code, and behavior trees for more complex logic.
Testing across several robots and environments showed the framework could handle both planning and adaptation. In a kitchen setup using a UR5 arm, the robot completed a 12-step coffee-making task from a single natural language prompt, demonstrating strong long-horizon planning without human intervention. Non-experts then added actions such as stirring and pouring through demonstration, enabling a later pasta-cooking task. In tabletop rearrangement experiments, performance dropped when relying only on the language model, but success rates remained consistently high when human corrections were added. The system also reused earlier feedback, applying prior corrections in later trials without being told again.
The framework also worked in remote and unstructured scenarios. An operator in Europe successfully controlled a robot in Asia using natural language, completing pick-and-place tasks despite a 2-3 second delay. In a laboratory setting, the system interpreted textbook-style instructions to conduct a pH test. Bayesian optimization improved air hockey performance from 30 % to 52 %, and a quadruped robot demonstrated real-time failure recovery in an office environment by resolving issues such as gripper obstructions.
Several reliability challenges remain. Performance was sensitive to prompt wording, and small phrasing changes could cause failures. The model could also be distracted by incidental examples or generate actions not present in the action library, though few-shot prompting reduced that behavior. Even with those limits, the framework showed that natural language control, imitation learning, and feedback-driven adjustment can make robotic systems more usable while still falling short of general-purpose autonomy.
