Code review used to be about catching bugs a tired developer missed at 4pm. Now it’s about catching bugs a tireless AI confidently introduced at any hour. The failure modes are completely different — and most review processes haven’t caught up.
AI-generated code presents a novel challenge for reviewers: it’s consistently formatted, syntactically correct, and confidently wrong in ways that human-written code rarely is. A junior developer’s mistakes are obvious — wrong variable names, missing null checks, confused logic. AI’s mistakes are subtle — correct-looking code that uses a deprecated API, implements a pattern that contradicts the project’s conventions, or handles nine out of ten edge cases while silently ignoring the tenth.
The review culture that worked for human-written code doesn’t work here. It needs to evolve.
What AI-generated code gets wrong
Before we talk about how to review differently, let’s understand the specific failure modes:
Hallucinated APIs and methods
AI will confidently call methods that don’t exist, use options that an API doesn’t support, or reference libraries at the wrong version. This is less common with modern AI tools that can read your package.json and source files, but it still happens — especially with less popular libraries or recently changed APIs.
// AI generated this — looks reasonable
const result = await prisma.user.findUnique({
where: { email },
include: { preferences: { orderBy: { updatedAt: 'desc' } } }
});
// Problem: orderBy inside include isn't supported in this
// Prisma version. It compiles, passes type checking, and
// silently ignores the ordering.
Convention drift
AI follows its training data’s conventions, not necessarily yours. You might use functional patterns throughout your codebase, but AI introduces a class because that’s how the training data did it. Each instance is small, but over time, the codebase loses coherence.
Over-engineering
AI loves abstraction. Ask it to add a feature and you might get an interface, a factory, a strategy pattern, and a registry — when a simple function would do. This isn’t a bug in the AI; it’s a reflection of training data where “good code” is disproportionately represented by design-pattern-heavy examples.
Silent edge case omission
AI handles the cases it “thinks about” well. But it can miss edge cases that are specific to your system — unusual data shapes, race conditions in your specific infrastructure, or business rules that aren’t obvious from the code alone.
The new review checklist
Here’s what reviewers should focus on when the code is AI-generated:
1. Intent alignment
The most important question: does this code do what we actually wanted? Not what the AI was asked to do — what the business requirement is.
AI can perfectly implement the wrong thing. The developer asks for user deactivation, the AI implements user deletion. The code is clean, tested, and completely wrong for the requirement. Start every review by confirming the PR actually addresses the ticket.
2. Convention compliance
Does the code match your project’s patterns? Check:
- Are imports organized the way your project does it?
- Does error handling use your project’s error classes and patterns?
- Are naming conventions followed (your team’s conventions, not generic best practices)?
- Is the code where it should be in the project structure?
This is where a well-maintained CLAUDE.md or .cursorrules pays dividends — it reduces convention violations at the source. But review should still verify.
3. Dependency verification
Did the AI add new dependencies? Check package.json / requirements.txt changes carefully. AI sometimes adds packages for functionality that already exists in your project, or pulls in heavy dependencies for trivial tasks.
> Before approving: Does this new dependency duplicate
> something we already have? Is it actively maintained?
> Is the bundle size acceptable?
4. Edge case interrogation
Instead of looking for bugs line by line, ask yourself: what inputs or conditions would break this?
- What happens with empty input?
- What happens with null or undefined?
- What about concurrent access?
- What about extremely large inputs?
- What about the user who has been in the system since 2018 with legacy data?
AI tends to handle the “normal” cases well. Your job is to think about the abnormal ones.
5. Test quality, not just test existence
AI is great at generating tests — and terrible at generating meaningful tests. Watch for:
- Tautological tests that test the implementation rather than the behavior
- Missing negative tests — AI tests the happy path thoroughly and skips failure scenarios
- Mocked-away reality — tests that mock so aggressively they don’t actually test anything
// AI-generated test that tests nothing useful
it('should return the result', async () => {
const mockService = { getUser: jest.fn().mockResolvedValue(mockUser) };
const result = await mockService.getUser('123');
expect(result).toBe(mockUser); // You just tested your mock
});
The reviewer’s evolving role
In a traditional code review, the reviewer is looking for mistakes. In an AI-assisted codebase, the reviewer is doing something more nuanced: they’re the quality architect.
Think of it this way:
| Traditional review | AI-era review |
|---|---|
| “You have a typo on line 42” | “This pattern contradicts our architecture decision from RFC-12” |
| “Missing null check” | “What happens when this runs against legacy data from before the 2024 migration?” |
| “Use const instead of let” | “This abstraction adds complexity without clear value — can we keep it simpler?” |
| “Add a test” | “This test mocks the core logic — can we write an integration test instead?” |
The review shifts from mechanical correctness (which AI handles well) to architectural judgment (which AI doesn’t).
Practical process changes
Add “AI provenance” to PRs
Knowing that code was AI-generated changes how you review it. Encourage developers to note which sections were AI-generated, which were human-written, and which were AI-generated then significantly modified. This helps reviewers allocate attention.
Time-box reviews differently
AI-generated PRs are often larger because AI writes fast. But larger PRs need more review time, and reviewers face pressure to approve quickly. Set team expectations: review quality matters more than review speed, especially for AI-generated code.
Use AI to review AI
There’s no shame in using AI tools to help review AI-generated code. The key is using them for different things: use AI to check for obvious issues (security vulnerabilities, performance problems, deprecated APIs), then use your human judgment for the architectural and intent-alignment questions that AI can’t answer well.
> Review this PR diff. Focus on: security issues,
> deprecated API usage, potential race conditions,
> and any patterns that don't match the conventions
> documented in CLAUDE.md.
The AI catches the mechanical issues. You catch the judgment issues. Together, you cover more ground than either alone.
The culture shift
The hardest part isn’t changing the checklist — it’s changing the culture. Code review in an AI-assisted world requires reviewers to be more assertive, not less. It’s easy to see clean, well-formatted, passing-tests code and approve it. The discipline to slow down, question the architecture, and push back on unnecessary complexity is what separates good review cultures from rubber-stamp cultures.
Build that discipline into your team’s identity. The best code reviewers aren’t the ones who find the most bugs — they’re the ones who keep the codebase coherent, intentional, and maintainable. That role just got a lot more important.
Evolve your review culture
Join the Coductor community for weekly discussions on code review practices, AI-era development workflows, and strategies from teams navigating this shift.