What the hell does DRY mean, really?
Most programming maxims are not particularly useful as stated, and, in truth, I believe that the way these principles are delivered and espoused is frequently actively harmful. Worse, their tendency to be harmful seems to be most prevalent in those who, like me, are invested in understanding them deeply.
The problem is one of context and intuition. If we give someone a particular piece of code and say, "This feels too repetitive. It could be rewritten as ..." then they can compare apples to apples and glean some knowledge. On the other hand, if we cite some generic programming principle, then it can't really convey any value without further discussion. So, their best use seems to be as a mnemonic, and even as a mnemonic, their use is dubious because they're superficial and therefore tend to be misleading. I've long thought that DRY, especially, is egregiously misleading.
"Don't repeat yourself" doesn't really mean anything without more context. Should we avoid repeating specific letters? What about basic programming structures like loops? If we need a program in Python that does the same thing as another program in Ruby should we avoid porting it? Is it a sin to have multiple applications or even operating systems that attempt to solve similar problems? If a person could use email instead of a messaging app, then should we abstain from building the latter since an alternative is available?
One could argue that this isn't repetition because specifics clearly differ, but then again, don't they always? If we use an email client today and then again tomorrow, can we really say that it's the same thing experientially? If we copy and paste a piece of code to a different place, then isn't that actually not the same as the original because of how it interacts with its environment?
At this point, one might be thinking, "Why don't you go read The Pragmatic Programmer, the original source for the principle?" And, one would have a point, and maybe there's good advice in there that makes the principle abundantly clear.
However:
- In my experience, telling people to go read the book is never how DRY is given as advice.
- When I first thought through all of this, I did not own The Pragmatic Programmer, nor did I know from where this advice originated. And, one might also be thinking, "It would have been wise for you to figure out where the advice originated and interpret it in that context." And again, one would be right; I can only say that when I first received this advice, I was not particularly wise, and, therefore, an honest attempt at interpreting DRY in its intended context is not the topic of this post. Instead, I will provide some useful knowledge that came from shaking the tree that DRY once led me to.
- I suspect that most of the people who receive this advice are young and not particularly wise (which is the target audience for sage wisdom) and do not interpret it in its original context. Corollary: Sage advice that requires that the recipient possess an abundance of categorically similar wisdom is probably not good advice. Futhermore, I suspect that even if I were to read The Pragmatic Programmer I would find that most of the following arguments still apply, because unless the original definition, "Every piece of knowledge must have a single, unambiguous, authoritative representation within a system," uses some very restricted definition of the word "knowledge" the advice is still pretty far off of the mark.
In the process of poking this thing, we can see that what we're really talking about is abstraction; we have to decide what exactly it is that we're not repeating, and that means deciding on a level of abstraction. If we assume that, at a minimum, we're talking about something that could be represented by the abstractions that our programming language provides then we can get some leverage in our discussion; let's attempt to reinterpret the rule as "If a thing can be represented by a single abstraction in our programming language then don't use more than one."
This proves to be valuable advice in certain situations. For example, if we could write N functions that don't accept any parameters, then that would often be worse than writing a single function that accepts a parameter that ranges over N. Unfortunately, this interpretation of DRY still breaks down pretty quickly. For example, what if we were to write a function that simply sends its input to an interpreter? With our new restricted definition of DRY, we must do that if the option is available since anything else would involve needless repetition of functions that could be reduced to a single function with an argument (the input script). But, of course, we'll never get anything done that way; for one thing, this problem will just recurse into the language that we're interpreting. But we observe that simple designs that push more complexity into our data are not necessarily a win, which is useful knowledge. We're simply trading apples for oranges which is only useful if we'd rather have oranges than apples.
So, it seems that we must expand our definition: "If a thing can be represented by a single abstraction in any programming language, then don't represent it with more than one." But, wait, that's crazy, right?! That means that in order to follow DRY, we have to know every conceivable programming language. And, this isn't just some absurd example — this is in the spirit of the principle which wants us to pursue the least repetitive means to write any particular piece of code. But from this, we can observe that the appropriate choice of abstractions depends on the author, which is certainly useful knowledge.
Even if we try to go the other way and say, "Okay, no interpreters, always work within a single language," we're still vulnerable to obscene data-driven patterns that are just shy of writing our own DSL to solve every problem, since there's no hard line between arbitrary input data and an interpreted script. But from this, we can observe that there's always a balance between how sophisticated our inputs are and how repetitive our code is, which is useful knowledge, too.
That's why DRY, in particular, is less a piece of advice and more a way to flagellate oneself and others, since I can literally look at any piece of code and argue that it's not DRY. But we have stumbled onto a more useful discussion which unifies our previous observations: Clearly, abstractions are useful in programming. How and when should we use them? And if we keep pulling that thread, we'll find that it's very much a skill and an art, and learning how to make those decisions certainly isn't something that can be conveyed in a sentence, or even a book.
So, I can't give you a prescriptive guiding light, like DRY, that's not silly, but here are some principles that I think do a better job of expressing the spirit of DRY:
- Abstractions that are useful in one context are not necessarily useful in another. For example, when throwing together a rough prototype, the motivations behind DRY often don't apply: There's probably no future reader who has to understand the code; by the time the code is ready to be rewritten, the plan will be to change most or all of it; the overhead of abstracting everything nicely eats up time. However, the situation is clearly different when working on a ten million line body of mature code.
- What makes sense in one language doesn't necessarily make sense in another. It's often useful to explicitly enumerate variations on data structures to sate type systems, which tends to provide a lot of value and be scalable enough in those languages. And language constructs are going to look a lot different between Haskell and C.
- While there are certain situations where product trajectories are predictable, betting on how code will need to change in the future is often a poor bet. It's frequently better to omit sophisticated abstractions (object-oriented polymorphism) in favor of solutions that are simple and comprehensible; this approach can enable us to make cross-cutting code changes without needing to rework abstractions, and often comes at the cost of only a little bit of extra repetition. In other words, KISS is frequently (but not always) right.
- Our choice of abstractions ought to be considerate of other people who work with the code. If only one person on a team can understand the code then they had better be the only one who needs to work with it (hint: this literally never happens).
- When we decided to use more abstractions, it's useful to prefer abstractions that look like DSLs that cover a whole domain (Haskell's type class polymorphism is really good at defining these kinds of abstractions). When we can ask and answer arbitrary questions within a domain we can mitigate much of the risk that comes from overcommitting to a particular strategy while still increasing our level of abstraction. Consider how useful and flexible the DOM is in JavaScript, and compare that to how brittle it would be to expose C++ or Rust interfaces directly that accept a bunch of user-supplied hooks. However, this flavor of abstractions often comes at a higher cost than the ad-hoc variety, so we need to strike a reasonable balance. The extra cost of building abstractions that aren't brittle is why DRY is frequently bad advice. Often, it's better to be a little bit repetitive but work at a level of abstraction that's well understood than to try to elevate the level of abstraction in a way that's brittle. The former results in a little bit more code, but it's code that's easy to work with. The latter often results in big, difficult to debug, monolithic classes that developers actively avoid working with (which results in more rot over time).
P.S., if you're looking for a way to start building web apps that strikes a nice balance between being concise and being flexible, check out Sapling