Y Combinator backs new wave of computer vision startups in 2026

Y Combinator’s 2026 computer vision cohort spans infrastructure, developer tools, and industry-specific applications from retail security to aquaculture and healthcare. Startups are increasingly pairing computer vision with large vision language models and foundation models to tackle real-time video, automation, and domain-specific analysis.

Y Combinator’s latest batch of computer vision startups reflects a shift toward vision systems that combine real-time video understanding, foundation models, and automation across both digital and physical environments. Overshoot targets developers who need real-time vision applications by connecting live video feeds to the largest collection of Vision Language Models with 3 lines of code and returning responses in less than 200ms, which it claims is 10x faster than any existing inference platform. OnDeck positions itself as infrastructure for Vision Language Models, enabling enterprises to find any object, behavior, or event in footage without training models or collecting data, and published a NeurIPS workshop paper showing new methods with Vision Language Models beating traditional computer vision even at niche tasks. Eventual and Encord focus on the data layer, with Eventual’s Daft engine built for petabytes of multimodal data and Encord offering human-in-the-loop labeling, testing, and curation for Artificial Intelligence teams shipping models to production.

Several startups are applying computer vision to operational problems in specific verticals. Lexius transforms retail cameras into an Artificial Intelligence security guard that detects shoplifting and organized crime while automating case generation, while Kirana AI deploys an on-premise GPU to monitor grocery stores 24/7 for theft, safety, and customer service issues and plans to automate ordering and pricing through point-of-sale integrations. OctaPulse uses vision to automate hatchery QA in fish farms, cutting inspection time from about 5 minutes to under 30 seconds per fish with more than 90 percent accuracy and targeting the $300B aquaculture industry where farms can spend >$200K a year on technicians and geneticists. In healthcare, Mecha Health builds x-ray foundation models that beat Microsoft, Google, and OpenAI on clinical accuracy metrics, helping radiologists go from reading 1 scan per hour to 1 scan every 5 minutes for what it frames as a 40B+ x-ray report generation market, while MICSI introduces Artificial Intelligence software that doubles MRI resolution and halves scan time, estimating an additional 2 million of revenue per MRI scanner.

Manufacturing and industrial automation emerge as another major theme. Allus AI develops vision foundation models for factories to see and improve production in real time, and Bucket Robotics converts CAD and sample data into production-ready models that adapt as parts and lines change, targeting what it calls a $700B automation push in American manufacturing. LineWise and Optifye.ai capture live video on the shop floor to either extract know-how into structured standard operating procedures or monitor worker performance in real time, while Cerrion uses standard CCTV cameras to detect bottlenecks, citing a Pepsi supplier producing 500 bottles per minute that now automatically reacts to fallen bottles. Additional startups extend computer vision into public safety and infrastructure, with EdgeTrace unifying camera networks for public safety teams, DeepNight building next-generation night vision with Artificial Intelligence, and Mach9 transforming geospatial imagery into insights for urban development. Across this cohort, many teams bring experience from companies such as Uber, Meta, AssemblyAI, and self-driving car programs, and frequently combine computer vision with Artificial Intelligence agents, generative models, and multimodal analytics to automate tasks that were previously manual, slow, or impossible at scale.

55

Impact Score

How to run MiniMax M2.5 locally with Unsloth GGUF

MiniMax-M2.5 is a new open large language model optimized for coding, tool use, search, and office tasks, and Unsloth provides quantized GGUF builds and usage recipes for running it locally. The guide focuses on memory requirements, recommended decoding parameters, and deployment via llama.cpp and llama-server with an OpenAI-compatible interface.

How evolving technology reshapes modern crime and enforcement

Rapidly shifting consumer technologies are creating new vulnerabilities for criminals to exploit just as they equip governments with powerful tools for surveillance and prosecution, raising fresh questions about security and civil rights.

Contact Us

Got questions? Use the form to contact us.

Contact Form

Clicking next sends a verification code to your email. After verifying, you can enter your message.