Automated Test Generation with Claude Code: A Practical Guide

I asked Claude Code to "write some tests" and got back a wall of mocks that tested absolutely nothing.

typescript

// What Claude actually gave me
vi.mock('../utils/fetchUser')
vi.mock('../utils/validateInput')
vi.mock('../lib/database')

it('should work', () => {
  expect(mockFetchUser).toHaveBeenCalled()
})

That's not a test. That's a mock invocation check. After months of trial and error figuring out how to get Claude Code to generate tests that actually catch bugs, here's everything I've learned.

By the end of this article, you'll know:

The prompt structure that makes Claude generate meaningful tests
How to configure CLAUDE.md for consistent test generation
How to automate a TDD loop with Vitest
The command that makes Claude automatically fix failing tests
How to permanently solve the over-mocking problem

What Makes Claude Code's Test Generation Different from Template Tools?

When GitHub Copilot or similar tools generate tests, they look at your function signature and fill in a boilerplate. It's convenient, but the tests rarely reflect what the code is actually supposed to do.

Claude Code's strength is that it can generate tests that understand the intent of your code — not just its interface. But you have to draw that out with the right instructions.

Without guidance, Claude focuses on a function's interface rather than its behavior. It mocks every dependency and verifies that the mocked functions were called. Technically not wrong, but these tests catch almost zero bugs.

Before generating any tests, define what "good tests" means for your project — and communicate that definition through both CLAUDE.md and your prompts.

How Does Writing Test Configuration in CLAUDE.md Change Things?

Adding the following to CLAUDE.md ensures Claude Code always references your project's testing philosophy, regardless of what you ask for in each conversation.

markdown

## Testing Policy

### Tools
- Test runner: Vitest
- Assertions: Vitest built-in (expect)
- Mocking: vi.fn(), vi.spyOn() — minimal use only
- Coverage: @vitest/coverage-v8

### Test Writing Rules
1. Only mock external I/O (DB, HTTP APIs, file system)
2. Do NOT mock utility functions or pure functions
3. Test names follow the format: "when X, it returns Y"
4. Always include edge cases (empty arrays, null, boundary values)
5. Place test files next to the implementation (`*.test.ts`)

### Coverage Thresholds
- statements: 80%
- branches: 75%
- functions: 80%
- lines: 80%

### Prohibited
- Using `as any`
- Testing implementation details (internal variable names, private methods)
- Modifying production code solely to make tests pass

The "only mock external I/O" rule alone makes a visible difference in test quality. CLAUDE.md is read before any individual prompt, making it the most efficient place to establish project-wide conventions.

Why Does "Write Tests" Always Fail — And What's the Right Prompt Structure?

Every test generation prompt should include these four elements.

1. Specify the target

Write Vitest tests for `src/utils/calculateDiscount.ts`.

2. Constrain the scope

This function has no external dependencies. Do not use mocks.
It's a pure function — test input/output combinations exhaustively.

3. Specify edge cases

Make sure to cover these cases:
- discountRate is 0
- discountRate is 1.0 (100% off)
- price is 0
- price is negative (expect an error)

4. Define the naming convention

Use the format "when [condition], it returns [result]" for all test names.

Here's what Claude generates with proper prompting:

typescript

import { describe, it, expect } from 'vitest'
import { calculateDiscount } from './calculateDiscount'

describe('calculateDiscount', () => {
  describe('valid inputs', () => {
    it('when discountRate is 0.2, it returns 80% of the price', () => {
      expect(calculateDiscount(1000, 0.2)).toBe(800)
    })

    it('when discountRate is 0, it returns the original price', () => {
      expect(calculateDiscount(1000, 0)).toBe(1000)
    })

    it('when discountRate is 1.0, it returns 0', () => {
      expect(calculateDiscount(1000, 1.0)).toBe(0)
    })
  })

  describe('boundary values', () => {
    it('when price is 0, it returns 0', () => {
      expect(calculateDiscount(0, 0.2)).toBe(0)
    })
  })

  describe('error cases', () => {
    it('when price is negative, it throws an Error', () => {
      expect(() => calculateDiscount(-100, 0.2)).toThrow('Price must be non-negative')
    })

    it('when discountRate exceeds 1, it throws an Error', () => {
      expect(() => calculateDiscount(1000, 1.5)).toThrow('Discount rate must be between 0 and 1')
    })
  })
})

The difference from a bare "write tests" prompt is substantial.

How Do You Automate a TDD Loop with Vitest?

Here's the TDD workflow I use with Claude Code.

Step 1: Generate tests from specs before writing implementation

Based on the spec below, write Vitest tests. Do NOT write the implementation yet.

Spec:
- Create a `parseJapaneseDate(str: string): Date` function
- It converts strings like "2026年2月26日" into Date objects
- It throws InvalidDateError for malformed input
- Timezone is fixed to JST (UTC+9)

Step 2: Verify the tests fail

bash

npx vitest run src/utils/parseJapaneseDate.test.ts

They should all fail — there's no implementation yet. That's exactly where you want to start.

Step 3: Generate the implementation

Write an implementation that makes all tests in
`src/utils/parseJapaneseDate.test.ts` pass.
Do not modify the tests — solve it on the implementation side only.

Step 4: Iterate in watch mode

bash

npx vitest --watch src/utils/parseJapaneseDate.test.ts

json

{
  "scripts": {
    "test": "vitest",
    "test:watch": "vitest --watch",
    "test:ui": "vitest --ui",
    "test:coverage": "vitest run --coverage"
  }
}

How Do You Set Coverage Thresholds and Wire Them into CI?

Set up coverage in vitest.config.ts:

typescript

import { defineConfig } from 'vitest/config'

export default defineConfig({
  test: {
    coverage: {
      provider: 'v8',
      reporter: ['text', 'json', 'html'],
      thresholds: {
        statements: 80,
        branches: 75,
        functions: 80,
        lines: 80,
      },
      exclude: [
        'node_modules/**',
        'src/**/*.d.ts',
        'src/**/*.stories.tsx',
        'src/app/**', // Exclude Next.js App Router pages
      ],
    },
  },
})

Add a GitHub Actions workflow to enforce coverage on every PR:

yaml

# .github/workflows/test.yml
name: Test

on:
  push:
    branches: [main, develop]
  pull_request:
    branches: [main]

jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
        with:
          node-version: '20'
          cache: 'npm'
      - run: npm ci
      - run: npm run test:coverage
      - name: Upload coverage
        uses: codecov/codecov-action@v4
        with:
          files: ./coverage/coverage-final.json

When coverage drops below threshold, the CI job fails and blocks the PR merge. It's the most effective way to build a testing culture on a team.

How Do You Make Claude Automatically Fix Failing Tests?

When tests fail, pipe the output directly to Claude Code and have it fix the implementation.

bash

# Capture test output
npx vitest run 2>&1 | tee /tmp/test-output.txt

Then hand it to Claude Code:

Here's the test failure log. Please fix the implementation code.
Do NOT modify any test files.

---
[paste /tmp/test-output.txt contents here]
---

File to fix: src/utils/parseJapaneseDate.ts

If you're using the Claude Code CLI, you can pipe this directly:

json

{
  "scripts": {
    "test:fix": "vitest run 2>&1 | npx claude-code 'Tests are failing. Fix only the implementation files to make them pass. Do not touch test files.'"
  }
}

This pipeline passes the failure log as context automatically — no manual copy-paste needed.

How Do You Prompt Claude Code for E2E Tests with Playwright?

E2E tests are harder to generate than unit tests, but Claude Code handles them well when you describe user flows in plain language.

Write a Playwright test for the following user story:

User story:
- User navigates to the login page
- User fills in email and password, clicks Submit
- User is redirected to the dashboard
- Page title reads "Dashboard"

Constraints:
- Base URL: http://localhost:3000
- Test user: test@example.com / password123
- Use data-testid selectors only (no class names or id attributes)

Generated output:

typescript

import { test, expect } from '@playwright/test'

test.describe('Login feature', () => {
  test('redirects to dashboard with valid credentials', async ({ page }) => {
    await page.goto('/login')

    await page.getByTestId('email-input').fill('test@example.com')
    await page.getByTestId('password-input').fill('password123')
    await page.getByTestId('submit-button').click()

    await expect(page).toHaveURL('/dashboard')
    await expect(page).toHaveTitle('Dashboard')
  })
})

Specifying data-testid is key. Class names and id attributes break when you refactor UI — test-specific data attributes stay stable.

Why Does Over-Mocking Happen and How Do You Fix It?

The root cause of over-mocking is that Claude Code tries to "make tests safe to run." Anything that looks like an external dependency gets mocked.

Fix 1: Explicitly prohibit mocks

This utility function has no side effects.
Do NOT use vi.mock(). Test it directly.

Fix 2: Make Claude declare its mocking strategy first

Before writing any tests, explain which dependencies you'll mock and why.
Wait for my approval before writing the implementation.

Fix 3: Add a mock decision rule to CLAUDE.md

markdown

## When to Mock

Mock these:
- Database connections
- External HTTP APIs
- File system operations
- Email/SMS sending

Do NOT mock these:
- Pure functions (input/output only, no side effects)
- Date processing (except Date.now())
- String and number operations
- Array and object manipulation

Once this is in CLAUDE.md, you stop fighting the same battle in every prompt. The rule becomes part of your project's culture.

Wrapping Up

The key to getting good tests from Claude Code is being explicit about what not to test.

Write mock decision rules in CLAUDE.md — consistency across the entire project
Include four elements in every prompt (target, constraints, edge cases, naming) — specificity determines quality
Generate tests before implementation — the TDD loop ensures coherence
Pipe failure logs directly to Claude — the auto-fix cycle dramatically cuts debugging time
Enforce coverage thresholds in CI — builds the habit of writing tests

Stop saying "write tests" and start investing in CLAUDE.md and prompt design. The quality difference is in a different league.