Skip to main content
Intermediate4 min read

Test-First Steering

Write failing tests first so the agent has a concrete, verifiable target instead of an ambiguous description.

Signals

  • Agent output is structurally correct but uses wrong values, types, or behavior
  • You find yourself writing detailed prose descriptions of exact return shapes
  • The agent makes reasonable but wrong assumptions about edge cases
TDDtest-firstfailing testsspecificationverifiable target

Relationship Map

3.3Scaffold Fi…1.2Safety Net3.2Checkpoint …4.1Test-First St…

Problem

You tell the agent: "Build a function that calculates shipping costs based on weight, destination, and shipping speed." The agent produces a function. It handles weight correctly but uses country codes instead of your internal region IDs. It implements three shipping speeds when you have four. It rounds to the nearest dollar when your system uses cents.

Every ambiguity in your description was a decision the agent made without asking. "Destination" could mean country code, zip code, region ID, or address object. "Shipping speed" could mean any set of tiers. "Calculates shipping costs" doesn't specify return type, error cases, or edge behavior.

Natural language descriptions are inherently ambiguous. The agent resolves ambiguity by guessing. Sometimes it guesses right. When it guesses wrong, you spend time debugging code that "works" but doesn't match your requirements.

Solution

Write tests that encode your requirements before asking the agent to implement. Tests are unambiguous specifications — they pass or fail with no room for interpretation.

Write the tests first:

// __tests__/shipping.test.ts
describe('calculateShipping', () => {
  it('returns cost in cents for standard domestic', () => {
    const result = calculateShipping({
      weightOz: 16,
      regionId: 'us-east',
      speed: 'standard',
    });
    expect(result).toEqual({ costCents: 599, estimatedDays: 5 });
  });
 
  it('applies heavy package surcharge over 48oz', () => {
    const result = calculateShipping({
      weightOz: 64,
      regionId: 'us-east',
      speed: 'standard',
    });
    expect(result.costCents).toBeGreaterThan(599);
    expect(result.surcharges).toContain('heavy-package');
  });
 
  it('rejects invalid region IDs', () => {
    expect(() =>
      calculateShipping({
        weightOz: 16,
        regionId: 'invalid',
        speed: 'standard',
      })
    ).toThrow('Unknown region: invalid');
  });
 
  it('supports all four shipping speeds', () => {
    const speeds = ['standard', 'express', 'overnight', 'freight'] as const;
    for (const speed of speeds) {
      const result = calculateShipping({
        weightOz: 16,
        regionId: 'us-east',
        speed,
      });
      expect(result.costCents).toBeGreaterThan(0);
    }
  });
});

Then hand the tests to the agent:

Implement calculateShipping in lib/shipping.ts.
The tests are already written at __tests__/shipping.test.ts.
Make all tests pass. Don't modify the tests.

Every ambiguity is resolved by the tests: cents not dollars, region IDs not country codes, four speeds not three, specific error messages, specific return shape. The agent has a verifiable target and no room to deviate.

The test file communicates more than prose:

AmbiguityProse DescriptionTest Specification
Currency unit"Returns the cost"costCents: 599
Location format"Based on destination"regionId: 'us-east'
Tier count"Different speeds"Explicit array of 4 speeds
Error behavior"Handle invalid input"toThrow('Unknown region: invalid')
Return shape"The shipping cost"{ costCents, estimatedDays, surcharges }

You don't need complete test coverage. Even 3-4 tests that capture the key behaviors dramatically improve output quality. Focus on:

  1. The happy path with realistic data
  2. One edge case that reveals the expected error handling
  3. One test that encodes a non-obvious business rule

The agent will infer the general pattern from these specific examples and produce implementation that's consistent with all of them.

Tests As Communication, Not Quality Assurance

In this pattern, tests aren't primarily about catching bugs after the fact. They're about eliminating ambiguity before implementation starts. The tests are a specification language that happens to be executable. You're trading imprecise English for precise code — and the agent understands code better than English anyway.

Signals

  • Agent output is structurally correct but uses wrong values, types, or behavior
  • You find yourself writing detailed prose descriptions of exact return shapes
  • The agent makes reasonable but wrong assumptions about edge cases
  • You spend more time describing the behavior than it would take to write a test

Consequences

Benefits:

  • Eliminates ambiguity — tests pass or they don't, no interpretation needed
  • Agent output is verifiable in seconds, not minutes of manual review
  • Tests survive the implementation — you get a test suite as a byproduct
  • The agent often produces better code when given tests, because the constraints narrow the solution space
  • Composes with Scaffold First for maximum constraint: types + tests + implement

Costs:

  • Requires you to write tests, which takes time upfront
  • Not all behaviors are easy to test (UI, async interactions, third-party integrations)
  • Brittle tests can over-constrain the implementation
  • The agent may "teach to the test" — passing tests while missing the intent