Developers debate large language model coding in complex production codebases

Hacker News users shared detailed experiences using large language models inside messy, established codebases, from fully agent-driven workflows to strict bans on generated code. The discussion highlights productivity gains, testing strategies, and persistent limits around context, integration testing, and trust.

The original poster describes a startup that has deeply integrated large language models into everyday development across a monorepo that includes scheduled Python data workflows, two Next.js apps, Temporal workers and a Node worker. Each engineer receives Cursor Pro with Bugbot, Gemini Pro, OpenAI Pro, and optionally Claude Pro, and the poster estimates that large language models are worth about 1.5 excellent junior/mid-level engineers per engineer, which they argue easily justifies paying for multiple models. Heavy use of pre-commit hooks, type checkers, tests and auto-formatting lets models focus on producing types and tests, while coding standards and conventions are encoded in .cursor/rules and AGENT.md-style files to steer agents away from raw SQL and toward specific schema files.

The team leans on GitHub Enterprise primarily for its Copilot issue assignment feature: their rule is that if you open an issue you must assign it to Copilot, which then opens a pull request, and roughly 25% of “open issue → Copilot PR” results are mergeable as-is and get to ~50% with a few comments. The poster says that overall, for roughly ?k/month, they are getting the equivalent of 1.5 additional junior/mid engineers per engineer, with these “large language model engineers” consistently writing tests, following standards, producing good commit messages and working 24/7. However, they also report pain points: Copilot’s model choice cannot be controlled for issues or reviews, agents in worktrees are fragile, and verifying changes often requires spinning up Temporal, two Next.js apps, several Python workers, a Node worker, and a browser, which makes integration testing slow and difficult to automate.

Other commenters report a wide spectrum of experience and caution. Some developers find large language models highly effective for boilerplate, unit and integration test generation, one-off scripts, and refactoring in smaller or well-structured areas, treating tools such as Claude Code, Copilot or Cursor as a junior pair programmer and insisting on small, incremental changes with plans and tests first. Several teams describe elaborate guardrails: dockerized dev containers without production credentials, CONTRIBUTING.md or Claude.md files encoding rules, custom linting and test pipelines, feature or roadmap markdown files that act as persistent memory, and staged, stacked pull requests with multiple automated review agents. Others emphasize that context window limits, legacy code complexity and long-range architectural concerns still defeat current models, arguing that they cannot replace a human mental model of a large, messy codebase and that they tend to duplicate code, miss subtle concurrency bugs or fail on giant legacy files. At the far end, one open source maintainer states that their project has banned all large language model generated code after repeated experiments produced plausible but fundamentally wrong suggestions, reflecting ongoing skepticism about relying on these tools in critical, long-lived systems.

55

Impact Score

What businesses need to know about the EU cyber resilience act

The EU cyber resilience act is turning product cybersecurity into a legal requirement for companies that sell digital products into the European Union. A key compliance milestone arrives in September 2026, well before the full regulation takes effect in 2027.

Claude Mythos and cyber insurance’s next inflection point

Claude Mythos is being treated by governments and regulators as a potential systemic cyber risk with implications for financial stability and insurance markets. Its emergence is intensifying pressure on insurers to clarify whether Artificial Intelligence-enabled cyber losses are covered, excluded, or require new stand-alone products.

OpenAI expands ChatGPT ads with self-serve manager

OpenAI is widening its ChatGPT ads pilot with a beta self-serve Ads Manager, new bidding options and broader measurement tools. The push signals a deeper move into advertising as the company expands the program into several international markets.

OpenAI launches Artificial Intelligence deployment consulting unit

OpenAI has created a new consulting and deployment business aimed at helping enterprises build and roll out Artificial Intelligence systems. The move mirrors a similar push by Anthropic and signals a broader effort by model providers to capture more of the enterprise services market.

SK Group warns DRAM shortages could curb memory use

SK Group chairman Chey Tae-won warned that customers may reduce memory consumption through infrastructure and software optimization if DRAM suppliers fail to raise output. Demand from Artificial Intelligence data centers is keeping the market tight as memory makers weigh expansion against the long timelines for new fabs.

Contact Us

Got questions? Use the form to contact us.

Contact Form

Clicking next sends a verification code to your email. After verifying, you can enter your message.