Refactoring with AI: From Spaghetti to Clean in Minutes

Every codebase has that file. The one developers open, scroll through, mutter something unprintable, and close again. The 800-line controller that handles twelve different concerns. The utility module where functions go to die. The “temporary” workaround from 2019 that’s now load-bearing infrastructure.

You know it needs refactoring. You’ve known for months. But the task is so large and so risky that it never survives sprint planning. There’s always something more urgent.

AI changes this calculation entirely. Refactoring that would take a developer two days of careful, tedious work can now happen in an afternoon – with better test coverage than the manual version would have had.

Why AI is uniquely good at refactoring

Refactoring is the perfect AI task for three reasons:

The intent is clear. You’re not designing something new. You’re restructuring existing code to be cleaner while preserving identical behavior. This clarity of purpose makes it easy to describe what you want.
The verification is straightforward. If the tests pass after refactoring, the behavior is preserved. If they don’t, you know exactly what broke. There’s no ambiguity about success criteria.
The work is tedious but mechanical. Extracting a class, renaming variables across files, splitting a module, converting callbacks to async/await – these tasks require attention to detail but not creative problem-solving. AI has infinite attention to detail and zero boredom.

Real refactoring patterns with AI

Pattern 1: The god function decomposition

Before: A 200-line function that handles validation, business logic, database operations, and response formatting.

def process_order(request):
    # 40 lines of input validation
    # 30 lines of inventory checking
    # 50 lines of payment processing
    # 30 lines of order record creation
    # 25 lines of notification sending
    # 25 lines of response building
    ...

The prompt:

This process_order function does too much. Break it into 
separate functions with single responsibilities:
1. validate_order_input(request) -> OrderInput
2. check_inventory(items) -> InventoryResult  
3. process_payment(order_input, inventory) -> PaymentResult
4. create_order_record(input, payment) -> Order
5. send_notifications(order) -> None
6. process_order(request) should orchestrate these

Preserve all existing behavior. Keep error handling for 
each stage. Use our existing types from src/types/.

After: Six focused functions, each testable in isolation, with the orchestrator function reading like a table of contents.

This is a refactoring that a human developer would approach identically – but the mechanical work of extracting code, adjusting variable scopes, threading parameters through, and updating error handling would take an hour or more of careful editing. AI does it in seconds, and you spend your time reviewing instead of typing.

Pattern 2: The legacy pattern upgrade

Codebases accumulate outdated patterns like geological strata. Callbacks that should be promises. Promises that should be async/await. Class components that should be hooks. jQuery that should be vanilla JS.

Before:

function loadUserData(userId, callback) {
  db.query('SELECT * FROM users WHERE id = ?', [userId], 
    function(err, rows) {
      if (err) {
        callback(err, null);
        return;
      }
      var user = rows[0];
      db.query('SELECT * FROM orders WHERE user_id = ?', [userId],
        function(err, orders) {
          if (err) {
            callback(err, null);
            return;
          }
          user.orders = orders;
          callback(null, user);
        }
      );
    }
  );
}

The prompt:

Convert this callback-based function to async/await. 
Use our db.queryAsync() wrapper that returns promises. 
Add proper TypeScript types. Handle errors with try/catch 
and throw typed errors from our errors module. Keep the 
same return shape.

After:

async function loadUserData(userId: string): Promise<UserWithOrders> {
  try {
    const [user] = await db.queryAsync<User>(
      'SELECT * FROM users WHERE id = ?', 
      [userId]
    );
    
    if (!user) {
      throw new UserNotFoundError(userId);
    }

    const orders = await db.queryAsync<Order>(
      'SELECT * FROM orders WHERE user_id = ?', 
      [userId]
    );

    return { ...user, orders };
  } catch (error) {
    if (error instanceof UserNotFoundError) throw error;
    throw new DatabaseError('Failed to load user data', { cause: error });
  }
}

Cleaner, typed, properly error-handled, and actually readable. The transformation is mechanical enough for AI to nail, but tedious enough that developers procrastinate on it.

Pattern 3: The module split

A single file has grown to contain multiple loosely-related concerns. This is the refactoring developers dread most because it involves creating new files, moving code, updating imports across the project, and making sure nothing breaks.

The prompt:

src/utils/helpers.ts is 600 lines with unrelated functions 
grouped loosely by comments. Split it into:
- src/utils/string.ts (string manipulation functions)
- src/utils/date.ts (date formatting and calculation)
- src/utils/validation.ts (input validators)
- src/utils/array.ts (array helpers)

Keep src/utils/index.ts as a barrel export so existing 
imports from '@/utils' still work. Update any direct 
imports of helpers.ts across the project.

In Claude Code, this is where the tool truly shines. It can read the source file, identify which functions belong to which category, create the new files, write the barrel export, search the project for imports to update, and modify them – all in one pass. This is a 30-minute manual task that happens in under a minute.

Pattern 4: Adding types to untyped code

Migrating a JavaScript project to TypeScript, or adding types to a loosely-typed codebase, is one of the highest-ROI refactoring tasks you can do – and one of the most tedious.

Add TypeScript types to this module. Infer types from 
usage where possible. For function parameters that could 
be multiple types, use the most restrictive type that 
doesn't break existing callers. Flag any places where 
the inferred type would be `any` so I can decide manually.

AI handles 90% of typing work correctly. The 10% it flags for your attention is the genuinely ambiguous stuff that requires human judgment – exactly the distribution of work you want.

The refactoring safety net

Never refactor without tests. This isn’t new advice, but AI makes it easier to follow.

Before refactoring:

Write comprehensive tests for the current behavior of 
process_order(). Test every code path including error 
handling. These tests should pass with the current 
implementation -- they'll serve as a safety net during 
refactoring.

After refactoring:

Run the existing tests against the refactored code. 
If any fail, fix the refactored code to preserve the 
original behavior -- don't change the tests.

This sequence – generate tests for current behavior, then refactor, then verify tests still pass – is the gold standard for safe refactoring. AI makes each step fast enough that you can actually do it instead of just intending to.

When to not let AI refactor

AI refactoring has limits. Be cautious with:

Performance-critical code. AI tends to optimize for readability, which sometimes means adding function call overhead, creating intermediate arrays, or using higher-level abstractions that are slower. If the code is hot-path, profile before and after.

Code with implicit behavior. If the code has side effects that aren’t obvious from reading it – global state mutations, event emissions, cache warming – AI might “clean up” the side effects and break downstream consumers it doesn’t know about.

Code you don’t have tests for and can’t easily test. Refactoring without tests is risky whether a human or AI does it. If the code is untestable in its current form, the first refactoring step should be making it testable, not making it clean.

Start with the easy wins

You don’t need to tackle the scariest file first. Start with refactoring tasks that are:

Low risk: utility functions, formatters, validators
High visibility: code that multiple developers touch frequently
Well tested: existing test coverage gives you a safety net

One successful AI-assisted refactoring builds team confidence for the next one. Within a few weeks, that 800-line controller doesn’t look so intimidating anymore.

The technical debt in your codebase isn’t there because your team doesn’t know how to write clean code. It’s there because the cost of cleaning it up has always exceeded the immediate benefit. AI tips that equation. Refactoring that was too expensive to justify last quarter is now an afternoon’s work. The only question is which mess you clean up first.