How to Review AI-Generated Code (Without Losing Your Mind)

A practical framework for reviewing code written by AI. What to check, what to skip, and how to catch the bugs that Claude and Cursor love to introduce.

By Keaton 8 min read
code review ai coding best practices quality

You asked Claude to write a React component. It generated something that looks… fine? Tests pass. Looks clean. Ship it.

Three weeks later, it’s silently dropping user data in production.

This is the vibe coder’s dilemma: AI code is convincingly wrong. It compiles. It runs. It passes casual inspection. But there are usually 2–3 critical issues hiding in plain sight, and they’re exactly the kind of thing Claude writes with absolute confidence.

If you’re building with AI, you need a review framework that catches what AI misses without spending three hours per file. Here’s mine.

Why AI Code Review Is Different

Traditional code review is about style, architecture, and catching human carelessness. AI code review is different. You’re not looking for typos or logic errors (AI is actually pretty good at those). You’re looking for:

  • Confident hallucinations: API methods that don’t exist, dependencies that aren’t installed, patterns that look right but are subtly broken
  • Happy-path thinking: AI tends to write for the sunny scenario, then forgets about null checks, error states, and edge cases
  • Spec drift: AI will “improve” your requirements without asking, adding features you didn’t want, making assumptions about your data model, or inventing patterns that don’t exist in your codebase
  • Security theater: Hardcoded API keys in comments, SQL string concatenation masquerading as parameterized queries, environment assumptions that will blow up in production

You’re not reviewing a human who had limited time or made a mistake. You’re reviewing a system that was trained on billions of lines of code, much of it mediocre or wrong, and is now confidently mixing best practices with legacy patterns and security vulnerabilities.

The 5-Point Review Framework

1. Does It Actually Work? (Run It First)

Read-only code review is a trap when AI is involved. Always run the code first.

  • Paste it into a test file or REPL
  • Actually execute it with real inputs (not just happy path)
  • Watch for import errors, missing dependencies, type mismatches
  • Check the output. Is it what you expected?

AI will generate code that looks syntactically perfect but imports from a package that doesn’t exist, or uses a method that was deprecated in v4, or assumes a runtime API that only exists in one environment.

Spend 2 minutes actually running it. It’ll save you hours.

2. Check the Edges

This is where AI fails hardest. It’s great at the main flow. It’s terrible at:

  • Null/undefined checks
  • Empty arrays and objects
  • Off-by-one errors
  • Race conditions in async code
  • Type coercion edge cases

If Claude generates a sorting function, it’ll work perfect for most arrays but will probably break on mixed types or when you have duplicate values. If it writes a date parser, it’ll handle happy-path inputs but miss leap-year logic, timezone edge cases, and RFC compliance.

Ask yourself: what would break this? What happens when the input is empty, null, undefined, or a type that’s technically valid but unexpected?

Then look at the code and see if it handles those cases. Usually it doesn’t.

3. Security Scan

This one keeps me up at night. AI will:

  • Hardcode secrets in config files, then generate comments explaining them
  • Write string concatenation queries and call them “parameterized”
  • Make assumptions about request origin, user authorization, or input validation
  • Generate regex patterns with ReDoS vulnerabilities
  • Create file paths by string concatenation without sanitization

Spend 30 seconds asking: could this code leak secrets, execute arbitrary code, or trust user input?

If it touches auth, database queries, file system operations, or external APIs, add another 30 seconds and actually trace the data flow. Where does user input go? Can it be manipulated? What stops it from being abused?

4. Performance Red Flags

AI often writes code that works but is inefficient:

  • React components that re-render on every parent update (missing useMemo, useCallback, or memoization)
  • N+1 query patterns (fetching user → fetching each user’s posts → fetching comments on each post)
  • Memory leaks from event listeners, timers, or subscriptions that never clean up
  • Unnecessary loops, map/filter chains that could be one pass, or string operations in hot paths
  • Creating new objects/arrays in render methods, breaking reference equality

Look for:

  • Fetching in loops (almost always wrong)
  • Event listeners added but never removed
  • Creating functions in render methods
  • Missing dependency arrays in useEffect
  • Requests made on every render

5. Does It Match the Spec?

Here’s the thing about AI: it gets bored with your spec and rewrites it.

You ask for “save the user’s preference to localStorage.” Claude decides to also add indexedDB, a sync queue, offline detection, and a conflict resolution strategy. Most of it is good code. All of it is wrong.

Your spec was: store one value. Not: architect an offline-first persistence layer.

Read the original request. Does the generated code do exactly that, or did it take creative liberties?

Common AI Code Smells

These patterns should trigger a deeper look:

Over-engineering: Multiple layers of abstraction for a simple task. Factories, builders, decorators, and adapters when a function would work.

Mystery dependencies: The code brings in packages you didn’t ask for, installing complexity you don’t need.

Hallucinated APIs: Method names that sound right but don’t exist. Checking the actual library docs and finding nothing.

Copy-pasted patterns: Code that looks like it was pulled from a tutorial and grafted onto your codebase without adaptation.

Verbose error handling: Try-catch blocks that catch errors but don’t actually handle them, or catch-alls that swallow real problems.

Comment salad: More comments than code, often explaining what the code does rather than why it exists.

The 80/20 Rule

Here’s the hard truth: AI gets about 80% of the work right. Your job is catching the critical 20% — the issues that would break production, compromise security, or violate your spec.

Don’t try to perfect every line. Don’t argue about naming conventions or whether a loop should be a forEach or a map. Focus on:

  • Does it work?
  • Is it safe?
  • Does it match the spec?
  • Will it perform?

Everything else is negotiable.

Tools That Help

You can automate a lot of this.

TypeScript strict mode: Forces you to handle null/undefined. More configuration work, but fewer runtime surprises.

ESLint + Prettier: Catch obvious issues. Configure it strict and let it fail the build if there are problems.

Snyk or similar: Scan dependencies for known vulnerabilities. AI will pull in packages with CVEs.

Test coverage: Write tests first, then review the code against them. If a test fails, the review catches it immediately.

Lighthouse/performance audits: For frontend code, run performance profiling. React DevTools Profiler is your friend.

The more you can automate, the more you can focus on the thinking parts: Does this make sense? Is it the right approach? Will it work at scale?

When to Rewrite vs. When to Patch

Not every issue requires a rewrite.

Patch if:

  • The core logic is sound but needs minor tweaks
  • It’s an edge case you can fix with a guard clause or try-catch
  • Performance can be improved by removing one loop or adding memoization
  • The approach is right but the implementation needs a line or two changed

Rewrite if:

  • The architecture is wrong (e.g., AI generated a client-side solution to a server-side problem)
  • Security is broken in a fundamental way
  • The approach doesn’t match your codebase’s patterns
  • You don’t understand what it’s doing (you should be able to explain it in one sentence)

A 50-line function that AI wrote and you don’t fully trust? Rewrite it as a 10-liner. It’s faster than debugging later.

The Bottom Line

Reviewing AI code is a skill. You’re not looking for perfection. You’re looking for: does it work, is it safe, does it match the spec, will it perform?

Run it. Check the edges. Scan for security issues. Benchmark performance. Verify the spec. Do that in order, spend maybe 5–10 minutes per file, and you’ll catch 95% of the problems before they hit production.

The goal isn’t to become an expert code reviewer. It’s to be fast enough that you can ship with confidence.


Keep Reading

  • Debugging AI-Generated Code — When the code passes review but fails in production, here’s how to track it down.
  • Production-Ready Vibe Coding — Moving from “works locally” to “runs in production” when AI writes half your codebase.
  • Vibe Coding Security Guide — Deep dive on AI-specific security issues and how to prevent them.
  • Prompts — Pre-written prompts for generating code that’s easier to review.
  • Tools — The linters, scanners, and profilers that make code review faster.

Join the Discussion