top of page

When Test Reports Stop Talking to People

  • Arek Frankowski
  • Nov 18, 2025
  • 10 min read

Updated: 6 days ago

"Our test suite passed," the developer announced. The business analyst looked at the console output - pages of cryptic assertions, stack traces, and technical jargon. "Great," she replied. "But what did it actually test?" The developer scrolled through the logs, searching for something human-readable. Nothing.


This scene repeats itself everywhere. We've invested heavily in test automation, but the reports speak only to those who wrote the tests. Business analysts, project managers, and domain experts are locked out of understanding what testing actually covers. The very people who best understand the business rules can't verify that tests check the right things.


The usual solution is documentation - maintaining separate test plans, requirement matrices, and traceability documents. But documentation drifts from reality. Tests change, requirements evolve, and the documentation becomes fiction. Another common answer is Behavior-Driven Development, but that introduces its own complexity - multiple layers to write and maintain, constraints on test implementation.


What if there's a different approach? What if tests could remain technically straightforward while automatically generating human-readable narratives from their execution?


That's what CenterTest's Narrative feature does. It captures test execution and translates it into business-readable reports - both standard format and Gherkin-style Given/When/Then - without requiring BDD's three-layer architecture. The test author writes direct technical code. The framework generates the readable narrative. One layer of code, multiple views of execution.


Let's explore what this means for how we think about test communication.


The Translation Problem

The conventional answer is test reporting - dashboards showing pass/fail rates, coverage metrics, execution times. And that's valuable information. It tells us whether tests passed. It shows trends over time. It highlights problem areas.


But here's what that misses: those metrics don't explain what the tests actually do. A passing test suite might check the wrong things. A failed test might represent a genuine bug or a stale assertion. The numbers tell us outcomes without explaining meaning.


Consider what a typical automated test report shows: "TestPolicyCreation.testPersonalAutoWithLiability - PASSED (12.3s)". For the automation engineer, that's enough. She knows what that test does, wrote it herself, can read the code if needed. But for the business analyst who needs to verify that liability coverage rules are properly tested? That single line is opaque.


The problem isn't just technical. When business stakeholders can't understand what tests do, they can't:

 

  • Verify that business rules are properly validated

  • Identify gaps in test coverage from a domain perspective

  • Participate meaningfully in quality discussions

  • Trust that automation actually protects what matters

 

This creates an invisible wall between technical and business people. Testing becomes a black box. Trust erodes or becomes blind faith. Either way, the organization loses collaborative intelligence from diverse perspectives.


So the real question isn't "How do we report test results?" but rather "How do we make test execution transparent to everyone who needs to understand it?"


The BDD Promise and Its Hidden Costs

The common solution to this communication problem is Behavior-Driven Development. Write tests in natural language using Given/When/Then syntax, they say. Business stakeholders can read the scenarios. Developers implement the steps. Everyone understands what's being tested. Problem solved.


Except it's not. BDD introduces its own complexity that teams discover too late.

Consider what BDD actually requires. You start with a natural language scenario:

Given a 35-year-old driver with a clean record
When they request a Personal Auto policy with basic liability
Then the premium should be calculated correctly 

That's the first layer - the business-readable specification. But this doesn't execute itself. You need a second layer - the step definitions that map natural language to code:

@Given("a {int}-year-old driver with a clean record")
public void driverWithCleanRecord(int age) {
    driver = new Driver(age);
    driver.setAccidentHistory(AccidentHistory.CLEAN);
}

@When("they request a Personal Auto policy with basic liability")
public void requestPolicy() {
    policy = policyService.createPolicy(driver, PolicyType.PERSONAL_AUTO);
    policy.setCoverage(CoverageType.BASIC_LIABILITY);
} 

But that's still not enough. Those step definitions need a third layer - the actual page objects, service wrappers, and technical implementation that interacts with the system:

public class PolicyService {
    private PolicyPage policyPage;

    public Policy createPolicy(Driver driver, PolicyType type) {
        policyPage.navigate();
        policyPage.setDriverInfo(driver);
        policyPage.selectPolicyType(type);
        return policyPage.submit();
    }
} 

Three layers. Each must be written, maintained, and kept synchronized. Change the UI? Update the page objects and potentially the step definitions. Change business terminology? Update the Gherkin and step definitions. Add a new scenario? Write all three layers.


What's really happening here? BDD optimized for one goal: making test specifications readable before execution. It assumed the cost of three layers was worth the benefit of readable scenarios. But that assumption deserves scrutiny.


The maintenance burden is real. Every change ripples through multiple layers. A simple modification - say, changing how we represent coverage amounts - might require updating:

 

  • The Gherkin scenario syntax

  • The step definition regex patterns

  • The parameter parsing logic

  • The underlying implementation calls

 

Teams discover this complexity too late. They invested weeks writing BDD scenarios and step definitions. Now they're maintaining three layers for every test. The business stakeholders still don't read the Gherkin files - they're stored in the codebase, require git access, and aren't inviting to non-technical readers.


But here's the deeper issue: BDD constrains how you write tests. The Given/When/Then structure is rigid. Step definitions must be reusable across scenarios, which forces generic implementations. You can't easily use programming constructs like loops or conditionals without breaking the natural language abstraction. Complex test logic becomes awkward to express in BDD format.


So teams face a dilemma. Write everything in BDD format and accept the maintenance burden plus technical limitations? Or write some tests in BDD and others in direct code, creating inconsistency? Or abandon BDD after investing in it, accepting the sunk cost?


There's another question hiding here: what if we've conflated two separate problems?


Making tests readable for business stakeholders is one problem. Writing tests efficiently is another problem. BDD tries to solve both simultaneously by making the test specification itself the business-readable artifact. But what if those problems don't need the same solution?


What if we could write tests in straightforward technical code - one layer, not three - and generate the business-readable narrative from execution? Test execution and test documentation serve different needs. Keeping them separate might serve both better than forcing them together.


Making Tests Tell Their Story

Consider a team testing an insurance policy system using CenterTest. They write tests in direct technical code - single layer:

@Test
public void testPersonalAutoPolicyWithBasicLiability() {
    driver.setAge(35);
    driver.setLicenseDate(LocalDate.of(2005, 3, 20));
    driver.setAccidentHistory(AccidentHistory.CLEAN);

    policy.setType(PolicyType.PERSONAL_AUTO);
    policy.setCoverage(CoverageType.BASIC_LIABILITY, 100000, 300000);
    policy.addDriver(driver);

    double premium = policy.calculatePremium();
    assertThat(premium).isEqualTo(1250.50);
} 

Clean, direct, maintainable. No Gherkin files. No step definitions. No regex patterns to update. The test does what it needs to do without ceremony. A developer can read this, modify it, debug it without navigating through abstraction layers.

But standard test output from this code looks like:

Setting field 'PolicyType' to 'Personal Auto'
Setting field 'LiabilityCoverage' to '100000/300000'
Assertion: PremiumAmount equals 1250.50 

This works for developers. But the underwriter reviewing test coverage sees technical field names and raw data. She can't quickly verify that the test captures the business scenario she cares about: "When a customer selects basic liability coverage for personal auto, the premium calculation includes the correct base rate and risk factors."


Here's where CenterTest's Narrative feature changes the equation. The framework captures execution events and automatically translates them into a readable narrative:

Running Scenario: Personal Auto Policy Creation with Basic Liability
Given: Environment (Test) as PolicyUser with role PolicyAgent
When: Setting Policy Type to Personal Auto
When: Setting Liability Coverage to $100,000/$300,000
When: Setting Driver Age to 35
When: Setting Years Licensed to 15
Then: Verifying Premium Amount equals $1,250.50 

Or in a more natural format:

Creating a Personal Auto policy
- Set coverage type to Basic Liability ($100k/$300k)
- Primary driver: 35 years old, licensed for 15 years
- Verify: Premium calculates to $1,250.50 

What changed? The test code remained a single technical layer - simple, direct, maintainable. But its execution generates a business-readable narrative automatically. No Gherkin files. No step definitions. No synchronization headaches. The test author writes efficient code. The business stakeholder reads a comprehensible narrative. Each gets what they need.


Here's what makes this approach powerful: it leverages what execution already knows. During test execution, the framework sees every action - fields being set, pages being navigated, assertions being verified. It already has the information needed for a narrative. It just needs translation rules to express that information in business terms.


This isn't artificial intelligence or magic. It's structured capture and templated translation. The templates are written once, work for all tests, and can be localized for different languages. Change a template, and every generated narrative updates - far simpler than BDD's scenario-by-scenario updates.


The Two Audiences Problem

Consider another team that implemented narrative reporting. Initially, they generated detailed technical narratives - every field setting, every assertion, complete execution traces. The reports were comprehensive but overwhelming. Business analysts complained: "There's too much detail. I just need to understand the key scenarios and outcomes."


So the team simplified. They created high-level summaries: "Policy Creation Test - PASSED. Premium Calculation Test - PASSED." Now business analysts complained differently: "This tells me tests passed, but not what scenarios were covered or how the rules were validated."


What's really going on here? The team had fallen into a common trap: trying to serve two different audiences with the same report. Technical team members need execution details for debugging - which fields were set, what values were used, where assertions failed.


Business stakeholders need scenario comprehension - what business cases were tested, what rules were validated, whether coverage is adequate.


One report can't optimize for both needs. A debugging report clutters scenario understanding. A scenario summary omits troubleshooting details. The solution isn't finding the perfect middle ground - it's accepting that different people need different views of the same execution.


Consider how one team addressed this. They generated two report formats from each test run:


Technical Report - For engineers debugging failures:

Container: DriverInfo
  Field: DateOfBirth -> 1988-05-15 (Overridden from default)
  Field: LicenseDate -> 2005-03-20
  Field: AccidentHistory -> None
  Assertion: RiskScore = 72 (Expected: 72) ✓ 

Narrative Report - For business stakeholders validating scenarios:

Given: 35-year-old driver licensed since 2005
When: No accident history recorded
Then: Risk score calculated as Low Risk (72/100) 

Same execution, different stories. The technical report shows exactly what the test did for debugging. The narrative report explains what scenario was validated for business verification. Engineers use one; business analysts use the other. Both views remain synchronized because they're generated from the same test run.


Generating readable narratives isn't just about translation. It forces better test design. When test execution must produce coherent narratives, test authors think differently.


They consider: "Will this sequence of actions make sense to a business reader?" They organize tests around business scenarios rather than technical convenience. They use meaningful action descriptions instead of cryptic codes.


The narrative feature becomes a design constraint that improves test clarity. It's harder to write confusing tests when they must generate readable narratives. The feature serves double duty - making results accessible while encouraging better test structure.


Beyond Translation: Understanding Through Language

We often think of test reporting as an output problem - how to display results. But narrative generation is really a communication problem - how to help different people understand the same testing work.


Traditional test reports optimize for one thing: machine-readable status. Pass/fail, execution time, assertion counts. These metrics serve automated processes - CI/CD pipelines, trend analysis, alert triggers. They're designed for systems, not people.


Human understanding requires different information. People need:

 

  • Context: What business scenario was this?

  • Intent: What rules or behaviors was it validating?

  • Outcome: Did it work as expected?

  • Implications: If it failed, what does that mean?

 

A narrative bridges this gap. It takes machine-optimized test execution and translates it into human-optimized comprehension. Not by changing the test, but by presenting its execution in language that matches how people think about the domain.


But here's what makes this challenging: narrative generation requires human judgment embedded in the test framework. Someone must decide what information matters for human understanding, what technical details to hide, how to structure the flow. These decisions shape whether narratives actually help people understand or just add noise.

Consider language itself.


The CenterTest Narrative feature supports multiple languages - English, Polish, and others through configuration. Why does this matter? Testing happens in global organizations where domain experts speak different languages. An underwriter in Warsaw shouldn't need to decode English logs to verify Polish insurance rules are properly tested.


This seems like a minor implementation detail. But it reveals a deeper principle: making testing transparent to business stakeholders means meeting them in their language - both literally (French vs English) and conceptually (business scenarios vs technical assertions).

The most valuable features aren't always the most technically sophisticated.


Narrative generation doesn't change test execution or improve test coverage. It simply makes existing testing visible to people who couldn't see it before - without the architectural compromises that BDD requires. Tests remain technically straightforward with full programming flexibility. The readability comes from generation, not from constraining the source.


What This Means for Testing

So where does this leave us? Test execution can serve two purposes: machine validation and human understanding. We've optimized for the first while neglecting the second. Narrative generation addresses that neglect.


But it's not just about generating reports. It's about recognizing that testing is collaborative work. When business stakeholders can read test narratives, they verify that tests match business intent, identify missing scenarios from domain knowledge, and participate meaningfully in quality discussions. This changes testing from purely technical work to shared work - adding business verification as a dimension that code reviews alone can't provide.


Is this always necessary? Context matters. A team working on highly technical infrastructure might not need business-readable narratives. Their stakeholders are all engineers. The technical test reports serve everyone's needs. Adding narrative translation would be overhead without benefit.


But teams building business applications - policy systems, banking platforms, healthcare software - operate differently. Their domain experts aren't programmers. Their quality depends on correctly implementing complex business rules. For these teams, CenterTest's Narrative feature isn't a nice-to-have. It's a communication tool that enables collaboration between technical and business expertise without the architectural overhead of BDD.


The next time someone proposes BDD for business readability, pause and ask: "Do we need business-readable source code, or business-readable documentation?" These aren't the same thing. BDD gives you both, but at the cost of three-layer complexity. Narrative generation gives you efficient technical code and readable documentation separately - each optimized for its purpose.


And when someone proposes generating narrative reports from test execution, don't ask "How much work is this?" Ask instead: "Who needs to understand our testing but currently can't? What decisions would they make differently if they could read our test scenarios? Are we willing to maintain three layers of abstraction to achieve that, or could we generate the readability we need from technical tests?"


The answers might reveal that the most valuable testing improvement isn't adopting BDD or writing more tests. It's keeping test code technically clean while generating the human understanding that business collaboration requires.

bottom of page