MTBF, MTTR, MTTF, and FIT: Design Reliability Measures for Electronics

January 16, 2020

Cadence PCB Solutions

PIcture of the north pole with a thumbtack at its center on a map

1897 gave us Edison’s Kinetoscope—a precursor to the videos that occupy everyone’s time and the discovery of the electron by English physicist J.J. Thomson. And….Swedish engineer S.A. Andrée innovated with then state-of-the-art transportation technologies.

Andrée, Nils Strindberg, and Knut Fraenkel set out on a real mission of scientific discovery by attempting to float over the North Pole in a 97-foot tall, varnished silk balloon filled with hydrogen. Unfortunately for the crew, their beloved Örnen could not achieve the buoyancy needed to attain cruising altitude or to maintain any speed. After bumping along on the ice in a suspended basket, the Swedes landed the balloon halfway between their launch point and the North Pole. All died of exposure.

Design Reliability in an Era of Predictability

Despite all the inventions of the era, the crew of the Örnen could not rely on sensors connected to the IoT or predictive maintenance. Nor could they consider reliability predictions that calculated and forecast the design feasibility, potential failure areas, system design factors, and reliability improvement issues for hydrogen balloon components functioning in very cold conditions.

Different measures help evaluate the possibility of success and assist with determining if a system requires redundancy in the form of back-up systems, components, sub-systems, assemblies, and components. Reliability or R(t) defines the probability that a component or system remains operable. In this context, reliability occurs as a probability from zero to one.

The concept of reliability varies slightly between repairable items such as an aerospace guidance system and non-repairable items such as semiconductors that we happily throw away after the first failure. We define a system as repairable if we can restore the system to its normal operating point through component replacement or through repairs when a failure happens. For non-repairable items, reliability is the probability that the item will perform its desired function without failure for a stated period of time under specific conditions. For repairable items, we see reliability as the probability that the component or system will not fail during the time interval zero to t1.

Feet Don’t Fail Me Now!

Every reliability prediction has a basis in failure rates. A conditional failure rate tells us about the anticipated number of times that a component or system will fail within a specific time period. Calculations based on complex models measure the reliability of the item. A reliability prediction model may include temperature, environmental, mechanical stress, and other types of data.

Let’s consider how these predictors work for repairable and non-repairable items. Although the definitions differ, both types of items have decreasing, constant, and increasing failure rates.

Failure Patterns	Repairable Items	Non-Repairable Items
Decreasing Failure Rate	Reliability improves with progressive repair	Item becomes less likely to fail as the survival time increases
Constant Failure Rate	Externally induced failures	Application of loads at a constant average rate in excess of design specifications
Increasing Failure Rate	Component or equipment has aged beyond useful life	Failure rate increases because of material fatigue or mechanical weakness caused by cyclic loading

Determining the patterns becomes more problematic when we consider complex systems that consist of repairable and non-repairable items. Because of those factors, tracking reliability involves arranging assembled components in a logical series structure. That is, the reliability of an assembly equals the sum of the individual component failure rates.

However, the complexity of the system also tells us that two sub-systems probably will not enter a failed state simultaneously. As a result, engineering teams build stochastic life models of a complex repairable system based on the accelerated life models of components. Stochastic models describe random events occurring within a continuum. With all this in mind, the system life model includes algorithms and software tools that determine stress on the components and the average availability and availability distribution for any number of failure modes regardless of complexity or size.

A Mean, Mean World

Rather than assume that a balloon will transport us across the Artic, we use a range of measures for checking the reliability—or the functional versus the non-functional state--of repairable products, hardware modules, non-repairable systems, and devices. Those measures include:

Mean Time Between Failure (MTBF)
Failure in Time (FIT)
Mean Time to Repair (MTTR)
Mean Time to Failure (MTTF)

Below you’ll find a synopsis of each of these measures.

Mean Time Between Failure (MTBF)

Mean Time Between Failure (MTBF) measures the amount of time that passes before a repairable or non-repairable component, assembly, or system fails. Why do we care? In brief, MTBF can tell us when conditional or preventive maintenance should occur. With the amount of time usually given in hours, MTBF analyzes actual failures in a large group of repairable products. “Mean time” represents the statistical value or “mean” over a long period of time and with a large number of units. Rather than showing the typical life of a product, MTBF represents a statistical measure over a large family of products.

We can view this in terms of the expected amount of time between two consecutive failures.

MTBF = Number of hours of operational time / Total number of failures.

Mean Time to Repair (MTTR)

Mean Time to Repair (MTTR) applies only to repairable items and equals the total amount of time used to perform all corrective or preventative maintenance repairs divided by the total number of the repairs. In effect, MTTR compares the expected span of time from a failure to the repair or:

MTTR = Total maintenance time / Total number of repairs.

Given this calculation, MTTR measures the efficiency of repair programs and the ability of organizations to respond to a repair issue. MTTR can work as a factor for determining the repair or replacement of assets, establishing repair inventories, and for rental/purchase decisions

Mean Time to Failure (MTTF)

Mean Time to Failure (MTTF) evaluates the reliability of non-repairable items and equals the mean time expected until the first failure of a component, assembly, or system. For repairable items, MTTF equals the expected span of time from repair to the first or next failure.

MTTF = Total hours of operation / Total number of units.

Failure in Time (FIT)

The Failure in Time (FIT) measure aligns with how organizations report MTBF information. A FIT analysis shows the number of expected failures per one billion hours of operation for a semiconductor device. Both measures provide information about performance as well as the availability and reliability of components.

Two robotic assemblages comprised of circuits and wires

Systems, despite how carefully they are assembled, can still surprise us with their fallibility.

Someone Has to Pay

We can also consider reliability in terms of populations and life-cycle costs. In terms of population, the operation and maintenance of systems may differ by system or component type. For example, robotic systems include different types of repairable components that have different operations and maintenance needs than the systems used for aerospace vehicles. The size of the system population, the time needed to replace components and the number of maintenance channels and routines impacts the life-cycle costs for the systems. Establishing a designed MTBF and a MTTR that meets design parameters assists with predicting and lessening life-cycle costs.

With any system design working towards improved reliability and more intentional lifetime management, Cadence's suite of design and analysis tools will help you achieve your means. Furthermore, utilizing an industry standard, customizable layout tool in OrCAD will be assuredly a great first step in getting your design out the door appropriately.

If you’re looking to learn more about how Cadence has the solution for you, talk to us and our team of experts.