Introducing Obvyr

I’ve spent most of my career working on teams that cared deeply about automated testing. We wrote tests before we wrote code. We maintained coverage thresholds and argued about the merits of unit tests versus integration tests. We did the right things.

And yet, we still deployed bugs. We still got burned by flaky tests that eroded confidence until failures were routinely ignored. We still had “it works on my machine” moments that nobody properly investigated until a production incident forced us to. We had test suites that felt comprehensive but couldn’t tell us which tests were actually earning their place.

The problem wasn’t that we weren’t testing. The problem was that we were flying blind.

Every test run produces data. How long it took. Which tests passed and which failed. Whether it ran on a developer’s laptop or in CI. Which environment variables were set. That data, produced hundreds of times a day across a typical engineering team, was being thrown away. Or at best, buried in a CI log that nobody was reading.

Obvyr is what I built to change that.

What Obvyr is

Obvyr is a test execution tracking platform. It captures data from every automated test run, whether on a developer’s machine or in a CI/CD pipeline, and turns that data into insight over time.

The insight it surfaces answers questions that traditional testing dashboards can’t:

Which tests fail inconsistently? Not just “did this test fail today” but “has this test ever failed without a genuine bug to justify it?”
Where do local and CI environments diverge? Which configuration differences are causing tests to behave differently depending on where they run?
Which tests are actually protecting you? Of your thousands of tests, which ones have caught real bugs, and which are noise?
Is test quality improving or degrading?

These questions matter because test suites are not static. They degrade. Tests that were reliable six months ago become flaky. Coverage that felt sufficient develops gaps where bugs keep slipping through. Without data collected over time, you can’t see it happening until something goes wrong.

How it works

Obvyr has three parts.

The collectors are how execution data gets into Obvyr. Right now there are two: a CLI tool that wraps any test command directly, and a Gradle plugin for JVM projects. The CLI works with any language or framework; you run your tests through it and it handles the rest. The Gradle plugin integrates at the build level, which suits teams already running Gradle-based workflows. The goal is to support as many build systems and environments as possible, so more integrations are planned.

The API receives, stores, and processes that data. It uses an event-driven pipeline to extract structured results from each execution, build health profiles for individual tests over time, and produce aggregated metrics for each project. All data is scoped to your account.

The UI is where the insight lives. Dashboards show test health trends, surface flaky tests with their failure patterns, and let you filter by environment, tag, or time range. It’s where the raw data becomes something you can act on.

Why now

The timing feels important. AI coding tools are accelerating the rate at which engineers produce code, and test code is no exception. AI-generated tests can look thorough without being meaningful. Coverage numbers can look healthy while hiding gaps. The distance between how fast code is being written and how rigorously it’s being validated is widening.

Obvyr is, among other things, a response to that. When AI is writing your tests, you need evidence to validate that those tests are doing their job. Not a coverage percentage from a single run, but behavioural data from thousands of runs across real environments.

Who it’s for

Obvyr is for engineering teams that take automated testing seriously and want to move from assumption to evidence. If you run automated tests regularly and want to understand what that investment is actually worth, it’s for you.

It works best in teams that have adopted continuous integration, because CI is where the richest data lives. The more often your tests run, the clearer the patterns become.

Try it

Obvyr is in early access. If this sounds like something your team needs, sign up at obvyr.com and I’d love to hear what you think.