What OpenAI Codex Missed in a Legacy .NET Codebase
AI code review tools are often marketed as near–senior-engineer replacements: point them at a repository and expect deep architectural insight. I wanted to see how true that is, so I tested OpenAI Codex’s web-based code review on a real-world legacy .NET C# application.
The result? Useful—but shallow in the ways that matter most.
The Setup
I pointed Codex directly at a GitHub-hosted legacy .NET solution using the web interface—no IDE plugins, no hand-holding. This is a non-trivial codebase that’s been evaluated by other AI tools before, making it a good benchmark.
The goal wasn’t to test syntax knowledge. It was to see whether Codex could reason about a system.
What Codex Gets Right
Out of the box, Codex quickly identified:
- Unused variables and redundant methods
- Overly static implementations
- Minor data access and code-structure issues
For code-level feedback, it’s fast and competent. This kind of review can absolutely save time during refactoring or cleanup.
Where It Falls Apart: Architecture
The problems started when I asked for a holistic review.
Even after providing a detailed scorecard covering architecture, testing, and maintainability, Codex produced:
- A vague, overly positive executive summary
- Inflated scores for architecture and testing
- No serious discussion of systemic design flaws
The application looks layered, but in reality the domain is tightly coupled to Entity Framework and the database sits at the center of the system. This is a well-known architectural anti-pattern—and Codex largely missed it.
Similarly, the solution contains only end-to-end tests, with no meaningful unit or domain testing. Yet the AI implied reasonable test discipline.
The Real Limitation
Codex still reasons locally, not systemically.
It evaluates classes and methods well, but struggles to:
- Trace dependency flow across projects
- Identify architectural coupling
- Penalize designs that appear structured but are fundamentally flawed
This is especially dangerous in legacy systems, where bad design is often repeated consistently across the codebase—making it harder for AI to recognize as a problem.
Final Thoughts
AI code review isn’t useless—but it’s not a replacement for architectural judgment.
Use it for:
- Code-level feedback
- Cleanup and refactoring suggestions
Do not use it for:
- Architectural evaluation
- Assessing system health
- Trusting numeric scores at face value
AI is improving quickly, but for now, architecture still belongs to humans.


Top comments (0)