
Self-healing test automation has been getting a lot of attention lately. The promise is simple: tests that fix themselves when the UI changes, reducing maintenance and keeping pipelines green.
If you’ve spent time dealing with brittle selectors, that sounds like a win. And to be fair, it is.
But it also misses the bigger issue.
What self-healing actually improves
Most self-healing solutions operate at the interaction layer. When a locator changes or the DOM shifts, the test adapts and continues execution.
That’s useful. It cuts down on noisy failures and saves time that would otherwise go into updating selectors. For teams with large UI suites, this alone can make a noticeable difference.
But it only addresses one slice of the problem.
Where brittleness really comes from
Tests don’t break just because an element moved. They break because they are written with rigid assumptions: a fixed sequence of steps, specific elements to interact with, and a precise expected outcome.
Even if the selectors are healed, everything else remains tightly coupled to how the system behaves today. That becomes fragile as soon as the product evolves. UI flows change. Features get personalized.
AI-driven components introduce another layer of variability. Outputs can differ not just in wording, but in quality. A recommendation engine may surface relevant products for one user and irrelevant ones for another. An LLM-powered support response may be helpful in one run and confidently wrong in the next.
Consider a checkout test that verifies a user can complete a purchase. Self-healing keeps it running when the “Place Order” button gets restyled or moved. But suppose the team adds a new upsell modal between cart and payment, and the modal occasionally shows a misleading discount. The test still finds its way to the confirmation page. It still passes.
The test passed. The user experience didn’t.
This creates two distinct failure modes: structural brittleness and behavioral brittleness.
Why intent matters
To close that gap, tests need to move beyond scripts.
What matters is not whether a specific element was clicked. What matters is whether the system achieved the intended outcome. Did the user complete the purchase without friction? Did the search return useful results? Did the system respond in a safe and coherent way?
This is where intent starts to matter more than the script.
In practice, this shows up in a few ways:
- Assertions written against outcomes and the path to them, not individual steps
- LLM-as-judge evaluations for non-deterministic outputs
- Property-based checks that validate invariants across many inputs
- Visual or semantic diffing that flags meaningful UX regressions, not pixel noise
The shift is from “did the script execute” to “did the system meet its goal within acceptable bounds.”
It is tempting to frame this as a choice. Either invest in self-healing or move to intent-based testing. In reality, the best results come from combining both.
Self-healing stabilizes the interaction layer. It reduces noise from UI changes and keeps automation usable at scale. Intent-based testing operates at a higher level. It evaluates whether the system is actually doing the right thing from a user and product perspective.
Together, they address two different kinds of brittleness:
- Structural brittleness, where tests fail because the UI changed
- Behavioral brittleness, where tests fail or pass based on outdated assumptions
Ignoring either one leaves gaps.
Rethinking what “stable” means
A stable test suite is not one where tests always pass. It is one where test results are meaningful.
If tests fail because a selector changed, they are not useful. If tests pass even when the user experience regresses, they are not useful either.
Stability comes from reducing noise at the bottom and improving signal at the top. Self-healing helps with the first. Intent-based testing helps with the second.
Where this leaves us
Automation is going through a shift. Systems are more dynamic, more personalized, and increasingly influenced by AI. The gap between test scripts and real user outcomes only grows wider in these environments.
Self-healing is a step forward, but it is not a complete answer. It makes existing tests easier to maintain. It does not make them better at measuring quality.
The teams that get this right will stop measuring their suites by how many tests pass, and start measuring them by how much they actually know about their product when the pipeline turns green.




