Skip to main content

Design Reliability

 

Image Credit: Shutterstock

 

Reliability Matters

 

The sun rises every day. It is reliable. In practical terms, it has always risen and will continue to do so. On a long enough timeline, however, our star will burn out. That's what the astrophysicists tell us. We humans and our Internet will be long gone by then. In the meantime, we deal with things breaking and refusing to work for us. 

 

On Aging Well

 

Time and heat conquer all electronics eventually. In many cases, the thermal energy itself isn't the culprit of degradation. Cycling between running and resting temperatures ages the devices. Take two standard incandescent bulbs rated for 1,200 hours each. Most of the bulbs will outlast that number but will fall short of the 10,000 hours you can expect from CFL or the astonishing reliability of LEDs that shine for 25,000 hours. These numbers come from a comparison that is worth a look if you like your interiors and exteriors illuminated at night at minimum ongoing cost.

 

Meanwhile, these two incandescent bulbs are going to be part of a thought experiment. As a background, there is a bulb at a fire station in Livermore that was switched about 114 years ago and has not been turned off, has not failed nor been replaced in all that time. The lightbulb has its own website and Wikipedia page. It is reportedly burning at a luminescence of a 4 Watt bulb though it was initially a 30 or 60 Watt bulb. That is a degradation to useless levels but gives an upper limit for lifespan. Replicas are available!

 

We'll start our light-a-thon experiment by switching our two bulbs on right now. One will remain on (if the grid holds up) and the other will be switched off and on over one-minute cycles. The second one will only burn 50% of the time. I'm betting that it fails before the continuously burning light.  I'll revisit this article in the year 2132 to see which one lasted the longest.

 

The Bathtub Curve

 

There are two periods over the course of a product lifetime when defective units can be expected to occur. One is the so-called “infant mortality” where the  product was doomed from the start and will not work without repair. The other is when the weakest part of the product has worn out from regular use. In between these two times, there is (hopefully) a period with very little in the way of failure.

 

Early detection of the infant mortality takes place at the factory. Stress testing the units on a shaker table will reveal mechanical concerns. The Burn-in room will be a true test of the electronics. Beyond the "shake and bake" tests,  accelerated aging techniques include a salt-fog chamber, microwaves, over-powering and under-powering the components. Basically, we're amplifying the effects of anything the unit is likely to encounter in the field. Make the unit operate outside of its limits and fix whatever breaks first. Repeat until it meets the requirements.

 

 

Image credit: Tech Designs Forum

 

The first part of the testing is to winnow out any products that were a failure right off of the assembly line. The second part is to postpone, as much as possible, the time when the gadget becomes failure prone. The long flat bottom of the tub on this timeline is where the warranty applies. This is also where the metric of Mean Time Between Failures (MTBF) is calculated. MTBF is the key measure of reliability for components. The MTBF of the whole is defined by the failure rate of the individual pieces.  An item that cannot be designed to last as long as the rest of the product is usually excluded from the warranty. Filters and other routine maintenance parts fall into this category.

 

Some guru (small g) advice: Look at the fine print on the back of a warranty card. More words means more exclusions. Warranty Engineering is the financial side of Reliability Engineering. The game looks good but is tilted in the producer’s favor. Otherwise, they are doing it wrong.

 

Component Derating

 

If that Century Lightbulb had been rated as an over-achieving 4 Watt bulb, it would still be in spec. Other electronic components follow a similar, but compressed, timeline of slowly becoming less than they were at the beginning. As a musician, I take listening to music as seriously as anyone. When the head-unit in my car refused to eject the CD, the dealership replaced the stereo with a new one of the same type. There was no mistaking how much clearer the treble was when I got the car out of the shop. Same speakers, same settings, but the fresh electronics added more presence. It is not noticeable day-to-day but baking in the sun every day and pounding out my tunes every commute took a toll on the piece parts.

 

As a rule, the Dept. of Defence prefers equipment that works as intended for a very long time though very tough conditions. Wherever a 25 Volt capacitor is required for safe operation, we use a 50 V cap. 100% derating, twice as good as it has to be is the answer when you have Uncle Sam's deep pockets. Also required is a production methodology where each component can be traced back to the source. The record keeping is necessary because if a particular date code of a particular part is found to be unreliable, we have to have a way to find every single example in the field or the stock-room or wherever and bring those units up to par.

 

Class 3 Design Practices

 

When we think of highly reliable printed circuit boards, the IPC Class 3 standard is in play. A large chunk of the Class 3 burden is on the fabricators who have to reserve space on every PCB panel for a number of coupons. The list of tests required on those coupons is deep. The Certificate of Compliance is the required souvenirs of the trip through the PCB examination gauntlet. Someone once asked the entire Internet about the difference(s) between Class 2 and Class 3 The answer? A lot.

 

 

Image credit: US Coast Guard

 

Field service is fine when you are in a comfortable repair lab. It is not so fine when the weather is too rough for sane boating while your hero has to go out. The equipment cannot be the reason for mission failure.  We go that extra mile in aerospace and similar industries based not on product price but instead based on the cost of potential failure. How much is riding on this object doing what it is supposed to do?

 

Take that piece of information into account when you decide if you require a mainstream or a high-reliable solution. You cannot sprinkle it on afterward. Leo Lambert a Guru (big G) in the field of educating PCB Designers described high-reliability Class 3 PCB work this way:

 

As an anecdotal example, the product has to be designed from the ground up, you cannot put Pirelli Tires on a Volkswagon and expect it to be a Ferrari, it won't work.

 

Before we get to any of that drama, we are well served to make our most educated guess on where the pain points might show up on a design. Beyond the eyeball test, we have Signal Integrity (SI) and Power Integrity (PI). These disciplines go hand-in-hand with design in helping us see the future as we move through the layout stages.

 

SI/PI teams can pre-discover unintentional antennas or hot spots through simulation.  With some effort and cooperation with these teams, we can model different scenarios and arrive at a suitable trade-off. There is always some compromise and risk-taking in PCB Layout. You want to get to the best compromise with the lowest number of iterations.

 

These simulation cycles are like free board spins. Think of a layout iteration in terms of a PCB tape-out, quote, DFM, fab, stencil, XY programming, the Bill of Material cost plus all of the Fab, Assembly, Test, Debug, and Rework time. You can never recover time. It is a huge leg up on the competition to skip as many iterations as possible while ending up with a solid design that lasts for as long as intended. The hours or days that the simulation adds to the overall cycle is insignificant in comparison to the weeks or months trial-and-error can add. The final heat maps showing everything in control make a nice wallpaper as a memento of a job well done.

 

Reliability begins as early as the pad-stack geometry and doesn’t end for about 114 years if you are doing it right. Getting into the first year is the easy part. Going the distance takes discipline and a fair amount of forethought, not to mention record keeping. The more we rely on machines, the more the machines depend on us to perfect the process.

 

References:

https://learn.eartheasy.com/guides/led-light-bulbs-comparison-charts/

http://www.centennialbulb.org/

https://www.guru99.com/reliability-testing.html

http://www.techdesignforums.com/practice/technique/ensuring-the-reliability-of-non-volatile-memory-in-soc-designs/

https://www.ipc.org/4.0_knowledge/4.1_Standards/free/J-STD-003C-Amendment-1.pdf

http://www.circuitnet.com/experts/86649.html

http://www.sixsigmaconcept.com/reliability-engineering.html


 

 

About the Author

John Burkhert Jr is a career PCB Designer experienced in Military, Telecom, Consumer Hardware and lately, the Automotive industry. Originally, an RF specialist -- compelled to flip the bit now and then to fill the need for high-speed digital design. John enjoys playing bass and racing bikes when he's not writing about or performing PCB layout. You can find John on LinkedIn.

Profile Photo of John Burkhert