Unit Testing PHP Applications with PHPUnit: A Practical Guide

20 Jun 2026
Darwinbark Team
Web Development

Why Untested Code Is a Liability, Not Just a Gap

Code without tests is not neutral — it is a standing liability, because every future change to it carries unverified risk, and every refactor requires either manual re-verification of every affected behavior or simply hoping nothing broke. Tests convert that ongoing manual-verification burden into a fast, automated check that catches regressions immediately, at the moment they are introduced, rather than after they have shipped and a user has discovered them.

The Anatomy of a PHPUnit Test

A test method asserts that a specific piece of code behaves as expected under specific conditions, following the common arrange-act-assert structure: set up the necessary state, perform the action being tested, and assert the result matches expectations.

public function testDiscountIsAppliedCorrectly()
{
    $order = new Order(totalCents: 10000);
    $order->applyDiscount(new PercentageDiscount(10));

    $this->assertEquals(9000, $order->totalCents());
}

Unit Tests vs Integration Tests vs Feature Tests

A unit test exercises a single class or function in isolation, typically with dependencies replaced by test doubles, and runs extremely fast. An integration test exercises how multiple real components work together (a repository class actually hitting a real test database). A feature test exercises a complete user-facing flow end to end (an HTTP request through the full application stack to a response). All three have a place; relying solely on one tier — either only fast isolated unit tests that miss integration bugs, or only slow full feature tests that make the suite too slow to run frequently — leaves real gaps in what a test suite actually catches.

Mocking Dependencies Without Over-Mocking

Replacing a real dependency (an external API client, a payment gateway) with a test double lets you test your own code's logic without actually calling the real external system during tests. The risk on the other side is over-mocking — replacing so much of the system under test with mocks that the test verifies your mocks behave as configured rather than verifying your actual code's real logic:

public function testOrderIsMarkedPaidOnSuccessfulCharge()
{
    $gateway = $this->createMock(PaymentGateway::class);
    $gateway->method('charge')->willReturn(new ChargeResult(success: true));

    $service = new OrderService($gateway);
    $order = $service->processPayment($order);

    $this->assertTrue($order->isPaid());
}

This test correctly mocks the external payment gateway (something genuinely external and slow/costly to call in a test) while exercising the real OrderService logic that decides what to do with the gateway's result — the actual logic under test remains real, only the genuinely external dependency is replaced.

Test Database Strategy

Tests that touch a database need a clean, predictable starting state for every test, typically achieved by wrapping each test in a transaction that rolls back afterward, or by resetting and re-seeding a dedicated test database between test runs. Sharing mutable database state across tests without this isolation is a common source of flaky tests that pass or fail depending on execution order or what a previous test happened to leave behind — a class of bug that is often far more time-consuming to diagnose than the actual feature bugs tests are meant to catch.

What to Test, and What Not to Bother Testing

Test business logic with real branching and edge cases — discount calculations, permission checks, anything with conditions that could plausibly be wrong. Testing trivial getter/setter methods or framework-provided functionality with no custom logic of your own adds maintenance burden without meaningfully reducing real risk. The useful heuristic: would this test actually catch a real bug if someone introduced one, or would it just break harmlessly every time the implementation changes shape without the underlying behavior actually changing?

Test-Driven Development: When It Helps and When It Does Not

Writing a failing test before the implementation (red-green-refactor) is genuinely valuable for code with clear, well-understood requirements and meaningful logic, since it forces you to think through expected behavior and edge cases before writing the implementation. For exploratory work where requirements are still being discovered through experimentation, writing tests first can feel like premature commitment to a design that has not yet proven itself — in those cases, writing tests immediately after settling on a working approach, rather than strictly before, is often more practical without sacrificing the core value tests provide.

Keeping a Test Suite Fast Enough to Actually Run Often

A test suite that takes twenty minutes to run gets run far less often than one that takes twenty seconds, which defeats much of the value tests are meant to provide as a fast feedback loop. Keeping the bulk of a test suite as fast unit tests, with a smaller number of slower integration and feature tests reserved for genuinely cross-component behavior, keeps the everyday feedback loop fast while still catching the integration-level bugs that pure unit tests would miss entirely.

Closing Thought

A good test suite is not measured by raw test count or coverage percentage alone — it is measured by whether it actually catches real regressions before they reach users, and whether it stays fast and reliable enough that the team actually runs it constantly rather than treating it as a slow, occasionally-skipped formality. Building that kind of suite takes deliberate choices about what to test, how to isolate dependencies, and how to manage test database state, not just writing as many assertions as possible.

Need a testing strategy that actually catches real bugs before your customers do? We can help build it.

Data Providers: Testing Multiple Scenarios Without Duplicating Test Code

Many bugs hide specifically in edge cases — a discount of exactly 0%, a negative quantity, a boundary value right at a threshold. Writing a separate, nearly-identical test method for each scenario is repetitive and easy to let drift out of sync; PHPUnit data providers let one test method run against many input/expected-output pairs, keeping the test logic in one place while still covering every scenario explicitly:

/** @dataProvider discountScenarios */
public function testDiscountCalculation($total, $percent, $expected)
{
    $this->assertEquals($expected, applyDiscount($total, $percent));
}

public function discountScenarios(): array {
    return [
        'no discount' => [10000, 0, 10000],
        'half off' => [10000, 50, 5000],
        'full discount' => [10000, 100, 0],
    ];
}

Testing Exceptions and Error Paths Deliberately

It is easy for a test suite to only ever exercise the happy path, leaving error-handling code — arguably the code most likely to contain real bugs, since it is exercised far less often in normal operation — completely untested. Explicitly asserting that the right exception is thrown under the right invalid conditions closes this gap and catches regressions in error handling specifically, a category of bug that tends to surface in production exactly when something has already gone wrong elsewhere, compounding the original problem.

public function testThrowsWhenChargingNegativeAmount()
{
    $this->expectException(InvalidArgumentException::class);
    $gateway->charge(-500);
}

Continuous Integration: Making Tests a Gate, Not a Suggestion

A test suite that exists but is not actually required to pass before code merges provides far less real protection than one wired into CI as a hard gate — tests that are optional to run are, in practice, tests that eventually get skipped under deadline pressure, exactly the moment a regression is most likely to slip through unnoticed. Configuring CI to block merges on test failure, and keeping the suite fast enough that this gate does not become a frustrating bottleneck developers route around, is what actually converts a test suite from documentation into a real safety net.

# .github/workflows/tests.yml
- run: vendor/bin/phpunit --stop-on-failure

Code Coverage: A Useful Signal, Not a Target to Game

Coverage percentage tells you what lines were executed during tests, not whether the assertions made about those lines are actually meaningful — a test that runs a function but asserts nothing real about its output inflates coverage without providing real protection. Use coverage reports to find genuinely untested, risky code paths worth addressing, not as a target number to hit through superficial tests written purely to move the percentage rather than to catch real bugs.

Case Study: A Test Suite That Passed While the Feature Was Broken

A team building an invoicing feature wrote thorough-looking tests that mocked the entire database layer, asserting that specific methods were called with specific arguments rather than verifying any actual computed result. The tests passed consistently through several sprints, while a real bug in the underlying date-range calculation silently generated invoices with an off-by-one day error in production, completely invisible to a test suite that never actually executed or verified the real calculation logic at all, only that certain methods had been invoked. Replacing the over-mocked tests with tests that exercised real date logic and asserted on real computed output caught the bug immediately, within minutes of being written. The broader lesson: a green test suite proves only that whatever was actually tested behaves as the tests describe — it says nothing about logic the tests never genuinely exercised, no matter how comprehensive the suite appears by test count alone.

A Glossary for This Topic

Test double: any object that stands in for a real dependency during a test, including mocks, stubs, and fakes. Arrange-act-assert: the common three-part structure of a unit test — set up state, perform the action, verify the outcome. Flaky test: a test that passes or fails inconsistently without any actual code change, often caused by shared mutable state or timing dependencies. Code coverage: a metric measuring what proportion of code executes during a test run, useful as a signal but not a guarantee of test quality.

Frequently Asked Questions

How much code coverage is "enough"? There is no universal number — coverage of genuinely risky business logic matters far more than a specific percentage target, and chasing a coverage number can produce low-value tests that inflate the metric without improving real protection.

Should I write tests for code I am about to delete or rewrite? Generally no — tests are an investment in code expected to persist; writing thorough tests for code with a known short remaining lifespan is rarely worth the time spent.

What is the difference between a stub and a mock? A stub simply returns canned data when called; a mock additionally lets you assert it was called with specific arguments a specific number of times — the distinction matters because over-relying on call-count assertions, as in the case study above, can produce tests that pass without verifying real behavior.

Step-by-Step: Introducing Testing Into an Untested Codebase

Step one: do not attempt to write tests for the entire existing codebase at once — start with the highest-risk, most business-critical logic (billing calculations, permission checks) where a bug would cause the most damage. Step two: write characterization tests first for legacy code with unclear intended behavior, capturing what the code currently does rather than guessing what it should do, as a safety net before any refactor. Step three: wire the test suite into CI as a hard merge gate from the start, not as an optional, easily-skipped step. Step four: as new features are built, require tests for new logic as a standard part of the definition of done, growing coverage organically through ongoing work rather than through a separate, large retrofitting effort. Step five: periodically review and delete tests that no longer provide real signal (testing removed features, redundant coverage of the same logic) to keep the suite lean and fast.

A Comparison Table: Test Types at a Glance

Unit tests: fastest, isolate single units of logic, miss integration-level bugs between components. Integration tests: moderate speed, catch real cross-component issues, more setup and maintenance overhead than pure unit tests. Feature/end-to-end tests: slowest, most realistic coverage of actual user flows, the most expensive to maintain and the most prone to flakiness if overused as the primary testing tier.

Security Considerations Checklist

Never let test database credentials or test API keys for third-party services leak into production configuration, and conversely, never let real production credentials end up in a test environment by accident — keeping environment configuration strictly separated avoids both a test accidentally hitting a real external system and a test environment exposing real, sensitive credentials. Treat test fixtures and factories containing realistic-looking personal data (names, emails) as something to review for accidental real data, since copying a real production record into a test fixture for convenience can leak actual customer data into a less-protected test environment or version control history. Ensure CI runners do not have broader credentials or network access than the tests genuinely require, since a compromised CI pipeline with overly broad access is a real, underrated attack surface.

Accessibility Considerations

Automated accessibility checks (verifying ARIA attributes, color contrast, keyboard-navigable focus order) can be incorporated directly into a feature test suite, catching accessibility regressions the same way functional regressions are caught — treating accessibility as a tested property of the application rather than a manual, easily-skipped review step that happens inconsistently or not at all under deadline pressure.

How This Plays Out at Different Scales

A small project with a handful of contributors can often get by with a modest test suite and manual review discipline. A growing team with multiple contributors working in parallel needs the CI-as-a-gate discipline described earlier, since manual review alone cannot reliably catch regressions across simultaneous, independent streams of work. A large codebase with years of accumulated history needs deliberate investment in test suite speed and organization (parallelized test runs, clear separation between fast unit tests and slower integration suites) to keep the feedback loop fast enough to remain genuinely useful as the codebase and test count both continue to grow.

What to Do When You Inherit a Codebase With Zero Tests

Inheriting a sizable, untested codebase that has nonetheless been running in production successfully for years is not a reason to panic or to demand a testing-first rewrite — it is a reason to start exactly where the case studies in this guide point: characterization tests around the highest-risk logic first, wired into CI as a gate from day one going forward, with coverage growing organically through normal feature work rather than through a separate, large, and likely never-finished retrofitting project. A working untested codebase still represents real, validated business value; the goal is reducing the risk of future changes, not retroactively justifying the absence of past tests.

Final Checklist Before Calling a Test Suite "Good Enough"

High-risk business logic (billing, permissions, anything with real consequences for a bug) has explicit, meaningful test coverage. Tests assert on real computed outcomes, not just that certain methods were called. CI blocks merges on test failure as a hard, non-optional gate. The suite runs fast enough that the team actually runs it constantly, not occasionally under protest. Flaky tests are treated as bugs to fix promptly, not tolerated as background noise.

Closing Thought, Revisited

A test suite's real value is measured by the production incidents it silently prevented, which are by definition invisible — nobody writes a postmortem for a bug that a test caught before it ever shipped. That invisibility makes testing easy to under-invest in relative to its actual value, which is exactly why the discipline of writing tests that verify real behavior, kept fast and genuinely enforced, pays for itself many times over across the life of a codebase, even though the payoff is rarely as visible or celebrated as the features tests quietly help ship more safely.

Snapshot Testing for Complex Output

For code producing large, structured output (a generated report, a complex API response) where asserting on every individual field is tedious and brittle, snapshot testing captures the full output once, reviewed and approved as correct, then flags any future test run where the output differs from that approved snapshot. This is not a replacement for targeted assertions on critical fields, but a useful complement for catching unintended changes in output shape that a narrower, field-by-field test might miss entirely.

Testing Time-Dependent Code Reliably

Code that behaves differently based on the current date or time (a discount that expires at midnight, a report scoped to "this month") is notoriously hard to test reliably if a test simply calls the real system clock, since the test's correctness then depends on exactly when it happens to run. Freezing time within a test to a fixed, known value removes this dependency entirely, letting a test assert deterministic behavior regardless of when in the test suite's actual execution it happens to run.

Carbon::setTestNow('2026-01-15 23:59:00');
$this->assertTrue($discount->isActive());

Web Developer

Mobile Apps

SaaS Products

WhatsApp Solutions