This is the eighth section in the back-to-school series for PCB Designers and those who may want to know more about it.
- Memory Routing
- What is Random Access Memory?
- Learning About Double Data Rate Dynamic Random Access Memory
- High Bandwidth Memory
- How to Approach Routing of DDR DRAM
Memory routing can easily fill a book but I’ll try to boil it down to a chapter so you don’t fall asleep. There are two basic types that we’ll call volatile and nonvolatile. Volatility has nothing to do with the personality of the chip although some types can be more finicky when it comes to routing rules. Volatile memory is a bit more humanoid in that it forgets everything when it goes to sleep. Non-volatile memory chips remember everything when you wake them up in the morning. Go get a cup of coffee if your eyelids start getting heavy. Here we go!
Tape drives, hard disk drives, and thumb drives are examples of nonvolatile memory. Hard disk drives are still relevant but a lot of their sockets are getting filled with solid state memory chips that are smaller and more efficient. In any case, these devices are marketed based on the number of Megabytes, Gigabytes, or even Terabytes of storage.
In addition to SATA, the devices may converse with the SoC or microcontroller using a “scuzzy” (SCSI) interface. Like any interface that has been around for a while, the Small Computer System Interface has been given a few makeovers. Serial Attached Scuzzy (SAS) is a common protocol.
Just as with Ethernet and everything else, serializing the data cuts down on the number of wires. Fewer wires make it easier to keep the entire data stream in sync with the clock. We like it easy! You will be routing to a connector for a disk drive and to a chip for solid state drive. Routing to a connector means that there are more links in the chain so managing the timing budget can be tricky.
When you shop for a new device, they talk up the storage but then they always have to mention the RAM. Random-Access Memory is that other kind and it comes in smaller increments. Bill Gates famously misstated that “64 Kilobytes ought to be enough for everybody.” That was back when a good hard drive was 40 Megs.
These days, the first Gigabyte of RAM is free and you start paying extra for multiple Gigs. Of course, RAM comes in many flavors. Dynamic RAM and Static RAM are two variants. DRAM is the marketing tool mentioned above and SRAM uses a different architecture that doesn't require periodic refresh cycles. SRAM is also smaller, faster, and more expensive. You’ll find SRAM used on the same silicon as the CPU.
DRAM cells are based on a capacitor and a transistor. So we know DRAM and SRAM, what about SDRAM? Trick question; Synchronous Dynamic Random-Access Memory has better efficiency because of the multitasking during read and write cycles. You can consider the rest of this chapter to lean towards the Graduate School Zone but read on if you’re curious.
Want more? What about Double Data Rate DRAM? Yeah, DDR DRAM is the stuff. Again with the efficiency but this time it’s using both of the clock edges. The Data group is the one that switches on the rising and falling parts of the clock cycle. Numerous improvements come in each generation. There are a lot of pins on a DDR chip relative to most memory. Access to more registers gives DDR serious bandwidth. The dimensions of the registers are set in the firmware.
Figure 1. Image Credit: Author - Spacing rules for memory busses are just as complex as length and width constraints. They are often the most difficult aspect to meet. Localized regional spacing constraints are common in dense pin fields.
The address section includes separate connections that describe the bank, (BA) the row (RAS), and the column (CAS) of each bit of data. Various other hooks are used to monitor and control the memory banks; write enable, clock enable, reset, chip select and the synchronization pin come to mind.
Each bit of the data bus must arrive and settle into its state ahead of the clock cycle. This is the setup time. The entire bus must also remain steady while being read in or out after the clock. They call this hold time. The little cap in each memory cell only holds its charge for so long. Even if the memory isn’t in read or write mode, the information decays as the cap discharges. To solve this little problem, there are refresh cycles going on quite often. That’s another pin.
Another wrinkle and the new hotness is High Bandwidth Memory. (HBM) They’ve learned to stack up the dice and join them up with through-silicon-vias. All of that happens right next to the CPU which takes a load off of the board. If you’re doing those substates, look forward to 1000+ pins between the processor and the memory stack. You’ll probably have fond memories of the good old DDR5 days.
If those DDR5-days are also your potential future days, then listen up. DDR 3, 4, and 5 come from similar cloth at the PHY level. That’s our domain so let's cut it up into chunks of PCB Design. The beauty is that while no two memory implementations are the same, the design guidelines crossover pretty well. Here’s a general step-through for a typical DDR bus routing.
Figure 2. Image Credit: Author - A typical DDR memory bus routing using three layers for signals. Other layers not shown for clarity.
Color code byte lanes > brown, red, orange, yellow, green, blue, violet, grey, etc. These colors correspond with the color code for resistors. Clock pairs (CLK) set the length of the Address (AD) and Command (CMD) single ended lines. Clocks also regulate the strobes (DQ) but normally to a lesser extent. The strobes, in turn, regulate the data. You’ll wind up using different shades of the same colors to keep up with all of these interrelationships.
There are a lot of data lines but they are broken up into sub-groups of eight. Each octet of data lines (D0-D7, D8-16, D17-D24, etc.) form a byte lane and each byte lane has its own strobe. There is tight matching within each byte lane and relatively loose matching between lanes. So, D0-D7 is a tightly knit unit and D8-D16 is another tightly knit unit but there is more latitude from lane to lane. Data is the bulk of the routing and tuning and would be the first priority.
The actual length-matching numbers within and among all of these groups are outside the scope of this guide. It will depend on the memory device and how it is being used. As you get into newer versions of DDR, the margins shrink. They shrink so much that you have to start accounting for delays inside the packages.
The routing on the substrates becomes part of the length calculation. We add a tailored amount of pin delay to each of the processors and the memory device pins in the constraint editor. The routing lengths on the board will be unequal but still in tune when we consider the entire time of flight.
Note that there is also a difference in propagation delay depending on if the traces are routed on an outer layer or an inner layer. Capturing all of this information in the electrical constraint manager is no small feat. Get all of that done and it’s time for step-2. Are you ready for this?
Figure 3. Image Credit: Author - The microcontroller was designed to use the outer layer for routing of certain byte lanes. The spacing is tighter than I’d like but it worked.
2. Fan-out the entire memory chip(set) and memory section of the processor. DDR devices are packaged with generous pin pitch. In many cases, you can do yourself some favors with creative fan-out directions rather than uniformly fanning outward from the center of the device. Look to improve the crossovers and lengthy connections during the fan-out stage. All of the members of each group should use the same number of vias. Vias are such a pain for signal integrity that they need to be shared equally.
3. Get the decoupling caps and terminations and everything else placed. Connect the caps to power and ground with utmost concern for short inductive loops from the cap’s power and ground paths. Make the most of good design practices for the power distribution. There are not that many power pins so take good care of them.
The termination resistors form the boundary of the chip’s area. In addition to the shunt resistor at the end of the line, the technology may require a damping resistor in series somewhere along the lines between the processor and the memory. Resistor packs with four or eight individual resistors are popular for this application.
The resistors are becoming less common as On Die Termination (ODT) has integrated the terminations onto the processor. That’s a nice development. It takes a lot of 49.9 ohm resistors for all of those terminations. Why not 50 ohm? When you use 1% tolerance on the resistors, the nearest value is 49.9 ohm. The consistency of the resistor value is good so deal with the price of the 1% tolerances.
4. Route all of the connections keeping the various colors together on the same layer. Leave some extra space for step 5. If you have two primary routing layers, then the aim would be to route even-numbered byte lanes on one layer and the odd-numbered byte lanes on the other. This will tend to distribute the pins and vias that use each layer.
The idea is to increase the launching options that you already optimized during fan-out. Revisiting fan-out during routing is a common occurrence. Sometimes placement also gets a second thought. Setting aside the auto-router demonstrations, a properly routed memory bus takes time. The length-matching tools are helpful but the results will be more compact if you drive the bus yourself.
5. Tune one-byte lane at a time - make the longest member as short as possible. There’s guidance on sharp corners but no other reason to stay on 45-degree angles if it helps cut corners. Imagine stretching a rubber band around all of the routing obstacles.
Can anything be done by looking at the inflection points and moving a via or a passive part? If it's no longer the longest, shorten the new longest one. Anything you can do to shorten the longest member of the group is a good thing.
Have you ever looked at a strobe light for any length of time? It can be nauseating. The rest of the signals feel the same way about being routed near the strobes so keep some extra distance between the DQ traces and everything else. They are essentially clock nets.
6. When you can't shorten any more, lengthen the clock diff-pair until it is long enough to meet the spec, (not necessarily as long as the longest line in the group.) Phase-match the pair very closely. The closer the Positive and Negative sides of the diff-pair are to matching, the more that is left as tolerance for tuning the single-ended connections.
It may seem more intuitive to tune the clocks first. The thing is that they will be too long and too short. Where do you stop? Knowing the length of the longest tent-pole allows you to stretch the clocks until they are too long but not too short. You solve the too-long errors by lengthening the too-short data lines. When all of those meet the clock length minus the tolerance, then the clocks will be clear of length matching errors.
7. Meander the short ones to get within spec of the clock/strobe tolerance, again, not necessarily as long as the longest or the clock. The clock/strobe target should be the middle ground. This can be a soothing process or a painful one.
The meanders should be laid out to maximize the use of space. Route one trace along the edges of the previous one, hugging that trace and adding the wrinkles for tuning in ways that preserve the routing area for other functions. It’s something else to learn.
8. Highlight the lane that meets timing to a special color and don't touch it unless you have to. Sometimes, it works out and sometimes the other lane needs a little more space. Bite the bullet and make the change and make sure that group is still tuned. Move on to the next one.
9. All done with tuning? Route VREF and other non-length-matched lines. VREF requires a thick line and a wide air-gap. It is a partner of VTT which would use copper planes. It is typical to bury the high-speed connections on the inside layers of the board. That gives you the outer layers for the voltage distribution and miscellaneous signals. Every pin on a memory chip can be considered critical in some way.
10. Submit to SI/PI before continuing with the layout but leave that area alone. Will a memory device work at all without doing all of this length and impedance matching? Maybe but it will bog down when the cyclic redundancy check counts up all of the bits sent and compares that number to the bits received.
If those numbers don’t match, it means that a packet of data was dropped and the chip says, “What?” and the whole set of instructions have to be sent again. The result is glitches in the data stream that can cause the video to stutter. If it’s bad enough, the program or even the whole system program can crash. Then you have to go back to step one; coffee!