Beyond the Perfect Test: Building Confidence Through Strategic Testing

Strategic Insights, November 2025 ed.

Nov 25, 2025

Industries with high consequences (medical, aerospace, automotive) have evolved sophisticated testing strategies that balance regulatory requirements, budget constraints, and time-to-market.

Great engineers don’t find the perfect test. They stack imperfect evidence until confidence exceeds the decision threshold.

We introduced evidence stacking in this month’s AMA: “Investigate It: How to design tests that actually move your confidence needle, and when to stop testing.”

Let’s look at evidence stacking from this case study for medical devices.

A case study in evidence stacking

In the medical device industry, manufacturers must prove shelf-life stability. Sterilization affects the expiration date, yes. But, there’s more to shelf-stable medical products than maintaining sterility.

Sterilization itself is a grueling process to put anything through. We’re exposing products to high temperatures, humidity, and cycling (when using EtO methods). Then, there’s the similar cycles in storage. Add vibration (plus more temperature and humidity cycling) from shipping and handling, and this product is seeing a lot of stress. All of this is before the product is used, when it is exposed to even more stresses from the use environment: being wrenched, pulled, inserted into a body, connected to other things, removed from the body.

Then consider the packaging itself. It must maintain a sterile barrier between the device inside and the outside world, for as long as the expiration date. It follows that the longer the expiration date, the more beneficial to customers. A 5 year shelf life is beneficial to a hospital who is managing stock of medical devices across several sites, and gives ample time for planning and logistics. But, to have to wait to give this new, potentially life-saving product to patients for 5 years while we do real-time aging is not a good option, either.

So, how, then do the engineers at medical device manufacturers ensure their product is still stable and safe after all of that? And, make it to the market in a reasonable time, to help people. And, under the scrutiny of the FDA and other regulatory bodies?

They use evidence stacking. Medical device manufacturers use a testing portfolio approach to establish shelf life claims when real-time testing would take too long.

Here’s how: They stack evidence.

Each action informs the later one. Time and costs increase with each step, but they also give us a confidence boost that the product is going to perform as needed at the end of the expiration date.

Our Hypothetical Case Study Could Be:

Starting confidence: 35%. It’s low because we have no data on the shelf life of this new polymer that we’re molding.
After literature search on polymer degradation: 40% (+5%) and after material datasheet review: 45% (+5%) Natural first steps and their fast and easy. Now that we know a little more and reviewed it with our team, our confidence is boosted.
After accelerated aging calculation/reliability life analysis: 55% (+10%) We’ve now run simulations on the information we have. We’re feeling better about it but really need data from our own parts.
After accelerated aging tests on components: 65% (+10%) Now, we’ve tested the new polymer as used on our parts. Even if it’s prototype, it more closely represents how we’ll be using it. We run the simulations, again, and it still looks promising.
After 3-month accelerated test passes: 80% (+15%) The data is following our model. We’re feeling really good about this. Plus, we’re showing that our safety margin is not being used up over time.
After first real-time data point (6 months): 90% (+10%) The real-time data is now also following our model. We’re feeling really good about our decisions.

Total time: 6 months vs. 3 years | Total cost: $180K vs. $500K (full real-time only)

With each step, we’re improving our confidence in our design decisions. I note reliability life analysis in “analysis/simulation”, but it can continue to be used after actual test results are available.

And look at the time difference! Answers in 6 months and $180K. Versus 3 years and $500K.

This is standard practice for medical device manufacturers. After awhile, it becomes a routine part of design development and verification/validation. When a manufacturer creates a family of products with the same materials and basic manufacturing methods, the process becomes even more routine and standardized.

The caveat in this: manufacturers must follow-up their accelerated aging testing with REAL TIME aging, setting aside product to sit on the shelf for 1 to 5 years in real time. There is a small probability that the real time aging fails while the accelerated aging had passed. If the real time aging contradicts the accelerated aging. recalls may need to be made. But years of practice have shown that this evidence stacking is worth doing and is accepted practice.

“The primary reason for using accelerated-aging techniques in the qualification testing of a medical device is to bring the product to market at the earliest possible time... Applying accelerated-aging test techniques in conjunction with a comprehensive knowledge of the materials involved is a prudent method of doing business, with the benefits of early product introduction far outweighing the minimal risk of premature product failure.” (1)

It’s not just accelerated aging. It’s evidence stacking.

Evidence Stacking as a Portfolio Concept

We covered some reasons why we’d want a portfolio: to save time and money and increase our confidence in our decisions.

The other benefit is that our assumptions must be validated. If we “put all of our eggs in one basket” with one, big test - that brings into question the assumptions we made in designing and performing that one test.

Did we overlook a significant source of real-life stress in our test design?
When we performed the test, did we account for the variables and repeatability of our results?
How does our test process relate to the field?

By developing a test portfolio, we increase our confidence in our decisions by having a varied look at the problem we’re trying to solve. This also helps insulate us from our mistakes in our confidence assignment: either our over-confidence or our under-confidence.

Notice, too that we do the cheap and fast thing first, saving the expensive actions for last. This is strategic sequencing. First we learn, then we act.

Medical device manufacturers use this portfolio today. Aerospace and automotive industries are adding an additional layer: digital twins.

Adding to the portfolio: digital twins

Digital twins are the emerging middle layer between analysis and testing. Digital twins collect sensor data from what’s in use or in the field. Aerospace and automotive industries are capitalizing on these digital twins. For Airbus, they are “building each aircraft twice: first in the digital world, then in the real one”. (2)

What does this give the engineers?

Earlier testing and validation of hardware and software. In fact, Siemens shows a 75% improved first-pass yield for engineering designs (fewer design revisions) (3)
Optimizes initial design, manufacturing, and operations. Again, Siemens shows a 25% reduction in physical test programs by using virtual testing (3)
Helps identify predictive maintenance measures. Reliability life models and real-time data can help predict when maintenance is needed.

Stopping Rules

The other side is that we test too much!

Again, this is where defining the ‘problem’ and our confidence boost concept can come in handy. Do a reality check against the gap between your current problem and where you are, given whatever steps you’ve taken in the “engineering actions” portfolio.

What will one more test or analysis really get you? We’ll never reach 100% confidence. Use a risk approach to determine what level of confidence you really need to get to move forward with a decision. If our decision carriers a lot of risk (to the users, the project, or other measures we identified in “Frame-It”, then we’ll need more confidence in our decision. Not all decisions carry the same weight, and not all decisions have to be vetted to 99% confidence.

Another way we can evaluate when to stop is to base it on cost. Will the cost of the next test be greater than the information value that it adds? We look at that scenario and a simple way to compare between choices in next month’s AMA post, “Choose-it”. Next month, we’ll conclude our 3-month series about late-stage design decisions.

What design decision is keeping you up at night? Hit reply and let me know—your challenges shape my content. And if you need hands-on help: Book a discovery call

(1) “General Aging Theory and Simplified Protocol for Accelerated Aging of Medical Devices.” Medical Device + Diagnostic Industry, 22 Nov. 2023, https://www.mddionline.com/design-engineering/general-aging-theory-and-simplified-protocol-for-accelerated-aging-of-medical-devices.

(2) “Digital Twins: Accelerating Aerospace Innovation from Design to Operations.” Airbus, 24 Apr. 2025, https://www.airbus.com/en/newsroom/stories/2025-04-digital-twins-accelerating-aerospace-innovation-from-design-to-operations.

(3) Careless, James. “Digital Twinning: The Latest on Virtual Models.” Aerospace Tech Review, 29 Aug. 2021, https://aerospacetechreview.com/digital-twinning-the-latest-on-virtual-models.

Quality during Design

Discussion about this post

Ready for more?