"She was mostly immensely relieved to think that virtually everything that anybody had ever told her was wrong"
Douglas Adams, "So Long and Thanks for all the Fish"
I'm in the process of teaching myself Ruby on Rails at the moment. There's no great reason for this, other than the fact that I kept hearing people talk about it and curiosity got the better of me. That's not immediately relevant though. What is relevant is that in parallel, I'm learning Javascript, and one of the cool new things I learned was this - white space, commenting, and descriptive variable names are bad. Think about it. All your Javascript, including your comments, white space and big variable names, has to move from the server to the user's browser, consuming bandwidth (think time and money) along the way. Wow. Ponder the implications of that for a moment. Some of that indisputably good software advice you were given, such as GOTO's being evil, is just plain wrong.
That's bad news for people who just accept what they're told, turn their brains off, and treat guidelines as unbreakable rules. Actually, it's probably bad news for those who follow behind, dealing with the results. But anyway...
The reason I'm writing anything here is that one of the big "rules" that's mentioned all the time in Ruby on Rails is "DRY" - Don't Repeat Yourself. Don't duplicate code or information, because that's always bad. Right? Actually, no. It's wrong.
In some contexts.
Which is all very fortuitous for me, because I get to rehash a blog post I wrote internally in Sept 2005 ("Colouring outside the lines" for any Verilaber who want's to check how much reuse I managed to achieve here). One of the many "rules" I looked at was "You should never duplicate code" because this bugs the hell out of me. In testbench design, there are sometimes very good reasons for duplicating code, yet I've seen engineers mindlessly removing all duplication from a working testbench. By unthinkingly applying rules they didn't really understand, they wasted time swapping probable advantages for improbably advantages, and risked injecting bugs into working code. Like we don't have enough to do already in verification!
So, why is duplicating code bad? Well let's be clear. It's not bad. It's only bad in some contexts, and to understand which ones, it's worth understanding why not duplicating code is good.
You might think that an advantage of not duplicating code is that it's faster to just write the code once, but that's not always true. Making specific code generic takes time and effort, so what commonly happens is that you find that you are repeating yourself, so you do a refactoring session to replace the duplicated code with a shared version. This means that you have already spent time writing the code multiple times, and on top of that, you then have to write a version that can be shared, remove the original code, and then fix any issues. It's not going to be faster than just duplicating, that's for sure.
"Always program as if the person who will be maintaining your program is a violent psychopath that knows where you live"
The advantage really comes during maintenance when you have to change the code. Rather than change it in 100 places, you only have to change it in 1 place, That's a great thing to have. But it's only a great thing to have if the cost of removing the duplication is smaller than the cost of updating the code in P places. When P = 100, it's a no-brainer. When P = 2, it's more difficult to call. Now, it depends on how often you'll have to change the code. If you have to change it N times, and if N is large, then removing duplication is probably good. So basically, if N*P is large, then removing duplication is probably a good thing.
Probably. It's time to consider context now. We write testbenches, and a lot of the time, these don't need to be maintained. We verify the RTL, the RTL ships, and we move on to new designs. Testbench maintenance only really occurs when we need multiple releases (respins or phased FPGA releases) of the design, or if we want one testbench to work with multiple derivative designs. For many testbenches, N is only large if the design is unstable, so we're constantly modifying the testbench to keep up. That brings something else to consider though. We remove duplicated code because the code is doing the same thing in P places. However, what if that becomes false after you've removed the duplication? What if you were doing FOO in two places, but now because of a last minute, badly thought out design change, you have to do FOO in one of those places, and BAR in another. In that case, you'd have been far better off just keeping the duplicated code, because now you have one block of code that needs two different behaviours. Ouch.
So if N*P is high and D (the amount or potential amount of divergence) is low, then removing duplication is good. Otherwise, you might be better off just allowing code to be duplicated (while keeping a close eye on what N, P and D do during the project).
Time for a real example. I have one DUT that can be targeted at an ASIC or an FPGA, and in either case, it can be in RTL or gate version. How many testbenches should I have? Someone blindly applying the DRY rule might say one. You should instantiate the DUT once in just one testbench, and use `defines (or similar), to deal with any differences that come up. It would just be pure evil to have "DUT dma(.clock(clk) ..." appear in different places.
Someone who thinks about it a bit deeper might say...
- P = 4 (e.g. we connect the clock and reset in four places)
- N might be around 10. We have four FPGA releases planned, and we'll probably get six gate level releases
- D will be pretty large because of signal name changes. That is, the clock connection might remain constant across all releases, but the port map is going to change like crazy to deal with FPGA targeting and gate level renaming
...and go with four testbenches. Sure, we're probably going to have to tinker slightly every time we release a new FPGA release, or generate a new gate level design (port changes), but the growing differences between the four design types will mean that a single testbench will become a massive headache of special case handling dealing with differences between the nominally identical versions of the design. Any common code that needs to get changed will only need changed in four places, and as it's not expected to change much anyway, it's not a major headache. Someone going through this process might decide that the flexibility offered by maintaining separate testbenches is more useful than the benefits offered by removing duplication.
"Part of the problem with brittle design is due to overgeneralization. Good programmers tend to like to factor out the common aspects of their code, incorporating widely-used functionality into a single subroutine or class. [...] These kinds of mechanisms tend to break when a platypus is encountered"
And that's really the tradeoff we're making here. Being DRY means reducing your flexibility to deal with divergences in the functionality, but it means that maintenance will be easier if it doesn't diverge. You have to think about that before declaring that duplication is good or evil. Things are never that black or white. My experience has been that flexibility has always been more useful to me than maintenance when doing testbench design. Flexibility means I can deal with a change on the day of code freeze. That's more important to me than saving a couple of hours during a more leisurely and unlikely maintenance phase. So anytime I see duplicated code, and I feel my fingers start to itch to "fix" it, I take a moment to think about the context. It might save some headaches later to just leave it as it is.

Davie, a few comments. I agree with the main point of your post that rules should be considered in context.
1. The Do Not Repeat Yourself rule in Ruby on Rails is really a consequence of the Model-View-Controller architecture ( more accurately, Database-Model-View-Controller ) that it shares with other frameworks. The main thing that this rule guards against is repeating code on both the client and server side of the application. In its context, a good rule, I hope you'll agree.
2. I think your post down plays the benefits of refactoring for its own sake. Suppose instead of stopping the urge to refactor, you just went with it. Then sometime later you come to realise that the refactoring needs undoing. Have you completely wasted your time ? I would argue not.
First of all, during the initial "merge common code into a single struct/class/method" refactoring you probably tidied the code up, taking the more elegant of the two or more implementations. Secondly, bugs in the common implementation get fixed for all implementations, not just one. If it's a class we're talking about here, you might well write up some specific unit tests for this class just to stop these bugs from being reintroduced.
Now, at a certain point, you realise that at least one of the uses of this code is in fact sufficiently different from the others that it really does need splitting out from the others. This new implementation will be higher quality than it would have been if you had never merged it in with the others in the first place. Although structurally you have ended up in the place that you would have if you never did the refactoring in the first place, quality wise you have made a big leap forward.
Now I don't know about you, but I'll let you into a secret - sometimes my testbenches have bugs in them ! The process of refactoring and then rerefactoring reduces the liklihood of those "15 minute before tapeout" bugs and is therefore nearly always a good investment.
So my advice is "if you have refactoring itch, scratch it. If it itches again, scratch it again".
3. Please cut out the pseudo science ! I've had reviews done of my code along the lines of "the more code you write, the bugs there are. So please reduce the length of your variable names".
The guiding concept in all this shouldn't be P,N or whatever other variables you want to introduce. It should be elegance. Is the code more elegant after this refactoring ? If so, go ahead and do it. Now of course elegance is inherently unquantifiable, and varies from person to person. What Mozart did shocked his predecessors and contemporaries, but he would have been equally shocked by say John Coltraine. There are personal opinions and schools of thought in programming as well as Music. Nevertheless, elegance is a much better quide than pseudo science - which I would argue, in this case, is "out of context".
Posted by: Adam Rose | 02/09/2008 at 02:19 PM
Hi Adam,
thanks for taking the time to read this, and for leaving a comment. I suspected that I wouldn’t receive any short “I agree” comments for this one ;-)
On point 1:
Ruby-on-Rails only appeared in the blog because that’s what caused DRY to be in my head. I have no real comment on whether DRY is good all the time in Rails.
On point 2:
“I think your post down plays the benefits of refactoring for its own sake”
I didn’t mean to, although I’d argue that doing something “for it’s own sake” isn’t the best argument for doing it. I’d expect to see some net benefit to doing it.
“Suppose instead of stopping the urge to refactor, you just went with it. Then sometime later you come to realise that the refactoring needs undoing. Have you completely wasted your time?”
If the code is in someway “better” than it was, AND the original code wasn’t fit for whatever purpose, then no, you haven’t wasted your time. However, if the code was already fit for purpose, and after refactoring is buggy, then yes, you have wasted your time.
Before spending time on something, I like to have a stab at working out what the cost/benefit will be. It’s difficult to do accurately (and probably impossible to know if you got it right), but my “pseudo science” describes the things that run through my head when I try and decide if removing duplication would be a better use of my time, than say, writing a new checker, improving the stimuli, or seeing my family.
In the case of this kind of refactoring, I think about how bad the maintenance chore will be if I don’t do it (P*N), and how likely it is I’ll either have to undo it or add special case handling (D).
“This new implementation will be higher quality than it would have been if you had never merged it in with the others in the first place. ”
Ahh, but that’s an assumption, and one I’ve seen go badly wrong in real projects. It would be wrong to assume that refactoring code to remove duplication can’t introduce bugs. It can.
“quality wise you have made a big leap forward”
And that’s another assumption. You might have made a big leap in /any/ direction. All we can really know is that we have spent precious project time changing code. It may or may not be better than it was, and it may, or may not, have been a good use of time. As you say, testbenches have bugs, so changing code increases the chances of making mistakes.
“The process of refactoring and then rerefactoring reduces the likelihood of those "15 minute before tapeout" bugs and is therefore nearly always a good investment”.
Alternatively, I could say, “The process of refactoring and then rerefactoring reduces the likelihood of getting anything finished before tape-out” or “The process of refactoring and then rerefactoring increases the likelihood of those ‘15 minute before tapeout’ bugs”.
Who can say which is correct? The people involved in the work could make a good stab, if they keep their brain engaged, which is really the only point I wanted to make.
On point 3:
An interesting approach. I’m always wary of “elegance” as a measure. How do I define it, measure it, explain it, teach it or review for it? If I’m going to justify time on making a change, and risk introducing a mistake while I’m at it, I’d like to have an idea what this “elegance” thing is, and how I’d measure its value.
“Is the code more elegant after this refactoring ? If so, go ahead and do it. ”
Can I take it from this that the following is also true? “Is the code less elegant after this refactoring ? If so, don’t do it”.
If so, then they will make good guidelines if we can just define and measure that pesky “elegance” thing.
“Nevertheless, elegance is a much better guide than pseudo science”.
Elegance is up there with “smells nice”. I know what both mean to me, and taking three clean pieces of code and merging them into a stream of if/else blocks to deal with the bits that /weren’t quite/ duplicated isn’t elegant in my book. It can be time wasting and error prone.
Given the choice between real science (which we don’t have for this), pseudo science (call it useful guidelines) and elegance (just make it pretty, and, oh, pretty means whatever you want), I think I’ll stick with my “pseudo science” for just a bit longer.
Cheers
David
Posted by: David Robinson | 02/09/2008 at 03:43 PM
This reminded me of Wil Shipley's advice on writing inflexible code: "Pimp My Code, Part 14: Be Inflexible!".
If time was infinite then we could spend all the time we wanted making our code perfect, but deadlines need to be met and trade-offs made. When things (mostly) work you can think about performance and refactoring, but don't spend extra time on early code that might be modified later or thrown away. Of course, that is not an invitation to right ugly code the first time. Think ahead, but don't over-think.
Posted by: Scott Roland | 09/09/2008 at 10:20 PM