"She was mostly immensely relieved to think that virtually everything that anybody had ever told her was wrong"
Douglas Adams, "So Long and Thanks for all the Fish"
I'm in the process of teaching myself Ruby on Rails at the moment. There's no great reason for this, other than the fact that I kept hearing people talk about it and curiosity got the better of me. That's not immediately relevant though. What is relevant is that in parallel, I'm learning Javascript, and one of the cool new things I learned was this - white space, commenting, and descriptive variable names are bad. Think about it. All your Javascript, including your comments, white space and big variable names, has to move from the server to the user's browser, consuming bandwidth (think time and money) along the way. Wow. Ponder the implications of that for a moment. Some of that indisputably good software advice you were given, such as GOTO's being evil, is just plain wrong.
That's bad news for people who just accept what they're told, turn their brains off, and treat guidelines as unbreakable rules. Actually, it's probably bad news for those who follow behind, dealing with the results. But anyway...
The reason I'm writing anything here is that one of the big "rules" that's mentioned all the time in Ruby on Rails is "DRY" - Don't Repeat Yourself. Don't duplicate code or information, because that's always bad. Right? Actually, no. It's wrong.
In some contexts.
Which is all very fortuitous for me, because I get to rehash a blog post I wrote internally in Sept 2005 ("Colouring outside the lines" for any Verilaber who want's to check how much reuse I managed to achieve here). One of the many "rules" I looked at was "You should never duplicate code" because this bugs the hell out of me. In testbench design, there are sometimes very good reasons for duplicating code, yet I've seen engineers mindlessly removing all duplication from a working testbench. By unthinkingly applying rules they didn't really understand, they wasted time swapping probable advantages for improbably advantages, and risked injecting bugs into working code. Like we don't have enough to do already in verification!
So, why is duplicating code bad? Well let's be clear. It's not bad. It's only bad in some contexts, and to understand which ones, it's worth understanding why not duplicating code is good.
You might think that an advantage of not duplicating code is that it's faster to just write the code once, but that's not always true. Making specific code generic takes time and effort, so what commonly happens is that you find that you are repeating yourself, so you do a refactoring session to replace the duplicated code with a shared version. This means that you have already spent time writing the code multiple times, and on top of that, you then have to write a version that can be shared, remove the original code, and then fix any issues. It's not going to be faster than just duplicating, that's for sure.
"Always program as if the person who will be maintaining your program is a violent psychopath that knows where you live"
The advantage really comes during maintenance when you have to change the code. Rather than change it in 100 places, you only have to change it in 1 place, That's a great thing to have. But it's only a great thing to have if the cost of removing the duplication is smaller than the cost of updating the code in P places. When P = 100, it's a no-brainer. When P = 2, it's more difficult to call. Now, it depends on how often you'll have to change the code. If you have to change it N times, and if N is large, then removing duplication is probably good. So basically, if N*P is large, then removing duplication is probably a good thing.
Probably. It's time to consider context now. We write testbenches, and a lot of the time, these don't need to be maintained. We verify the RTL, the RTL ships, and we move on to new designs. Testbench maintenance only really occurs when we need multiple releases (respins or phased FPGA releases) of the design, or if we want one testbench to work with multiple derivative designs. For many testbenches, N is only large if the design is unstable, so we're constantly modifying the testbench to keep up. That brings something else to consider though. We remove duplicated code because the code is doing the same thing in P places. However, what if that becomes false after you've removed the duplication? What if you were doing FOO in two places, but now because of a last minute, badly thought out design change, you have to do FOO in one of those places, and BAR in another. In that case, you'd have been far better off just keeping the duplicated code, because now you have one block of code that needs two different behaviours. Ouch.
So if N*P is high and D (the amount or potential amount of divergence) is low, then removing duplication is good. Otherwise, you might be better off just allowing code to be duplicated (while keeping a close eye on what N, P and D do during the project).
Time for a real example. I have one DUT that can be targeted at an ASIC or an FPGA, and in either case, it can be in RTL or gate version. How many testbenches should I have? Someone blindly applying the DRY rule might say one. You should instantiate the DUT once in just one testbench, and use `defines (or similar), to deal with any differences that come up. It would just be pure evil to have "DUT dma(.clock(clk) ..." appear in different places.
Someone who thinks about it a bit deeper might say...
- P = 4 (e.g. we connect the clock and reset in four places)
- N might be around 10. We have four FPGA releases planned, and we'll probably get six gate level releases
- D will be pretty large because of signal name changes. That is, the clock connection might remain constant across all releases, but the port map is going to change like crazy to deal with FPGA targeting and gate level renaming
...and go with four testbenches. Sure, we're probably going to have to tinker slightly every time we release a new FPGA release, or generate a new gate level design (port changes), but the growing differences between the four design types will mean that a single testbench will become a massive headache of special case handling dealing with differences between the nominally identical versions of the design. Any common code that needs to get changed will only need changed in four places, and as it's not expected to change much anyway, it's not a major headache. Someone going through this process might decide that the flexibility offered by maintaining separate testbenches is more useful than the benefits offered by removing duplication.
"Part of the problem with brittle design is due to overgeneralization. Good programmers tend to like to factor out the common aspects of their code, incorporating widely-used functionality into a single subroutine or class. [...] These kinds of mechanisms tend to break when a platypus is encountered"
And that's really the tradeoff we're making here. Being DRY means reducing your flexibility to deal with divergences in the functionality, but it means that maintenance will be easier if it doesn't diverge. You have to think about that before declaring that duplication is good or evil. Things are never that black or white. My experience has been that flexibility has always been more useful to me than maintenance when doing testbench design. Flexibility means I can deal with a change on the day of code freeze. That's more important to me than saving a couple of hours during a more leisurely and unlikely maintenance phase. So anytime I see duplicated code, and I feel my fingers start to itch to "fix" it, I take a moment to think about the context. It might save some headaches later to just leave it as it is.

Recent Comments