Archive for category Programming

Step Away From The Clipboard

We’ve all done it.  And we’ve almost all regretted doing it.  So its time to talk about an uncomfortable subject for many.

Copying and pasting code.

The temptation is constantly there.  You see some code here that works (or at least appears to work).  You obviously don’t want to reinvent the wheel, and maybe some aspect of the code makes re-factoring it out difficult.  Maybe you agonize about it a little, or maybe you blissfully ignore the dangers.  In the end, a Control-C and Control-V later, and that block of code has reproduced it self.  The world doesn’t tragically end, everyone you know doesn’t die of a horrible disease, and your system still continues to function.  So you figure, hey, copying and pasting code isn’t so bad, and so copy and paste that same line of code again.  And again.  And again.  And before you know it, your computer’s clipboard has become an indispensable tool.  Maybe you go even so far as to push others to follow in your copy and pasting footsteps.  I’ve seen people put in their documentation something along the lines of “if you want to do x, copy this code from file y”.  And the world still continues to go on.

Then, suddenly things change.

You find a bug in that original code you copied.  Its an easy fix, except that single bug has now reproduced like a virus, infecting your entire system.  Or maybe that code was golden, but the requirements do what they love to do so much and completely change.  Again, a quick fix to the original code will fix it, but the copying and pasting has, best case, increased the difficulty of the fix by a factor of the number of times you copied that code.  Worst case, different copies will have subtle differences (maybe some are still on an older revision of the change), maybe some are so different you are no longer able to recognize their lineage, but under layers of re-factoring the old functionality still lives on.

We all know this risk, but problem is actually more severe than what I’ve described.

Copying and pasting the same code, the same functionality, the same patterns; it all means you’ve missed a chance to abstract out some part of your system.  If you have dozens of classes that follow nearly the same pattern, there is something important about that pattern that needs to be captured at a higher level.  Not only will that make your code more maintainable, abstracting out these patterns makes it easier to reason about.  You can make better deductions about code that is made up of higher abstractions than code that simply looks similar to that other code over there.

And of course in many cases, copying and pasting code is a symptom of a larger problem, a lack of understanding of the original code.  It is too easy to find code that does something similar to what you want it to do, and then copying it verbatim.  But is that “something similar” exactly what you want?  Maybe its doing something subtly different, something you might not need or even want.  Maybe there were valid assumptions made when that code was originally written that you have no business making.  But if you took the easy route and just copied it without understanding it, you will have no idea that is the case

In other cases its a sign of laziness.  Not the type of laziness that avoids work, in fact re-factoring to prevent the need for a copy and paste job is usually less work.  But a more intellectual type of laziness.  Work that just involves repeating something someone else did is easy, whereas if you can easily get past that easy part you are stuck with more challenging work.  Moving code around is easy, solving problems is the hard part.

Of course its not always your fault.  Maybe you are working in a language that makes certain kinds of patterns difficult to abstract, maybe even to the point it appears its actively resisting the concept of you being productive (I suppose it could be worse for Lambda, considering what is happening with Jigsaw).  Are there are plenty of times when what you are copying actually is too small to be successfully reused.  I’m not saying never copy and paste code, or never reuse the same patterns or functionality.  Just next time you catch yourself doing it, please stop and ask yourself the following question:

Is there a better way to do this?

, , ,


Agile: Principles vs Practice

Years ago at one of my previous jobs, I remarked to a coworker when we were trying to move the organization to agile that “Before, we were about as agile as an 80 year old man with arthritis.  Now we are about as agile as an 80 year old man with arthritis wearing a leotard.”  We had done a lot of work dressing ourselves up with agile methodologies, but our core way of operating did not change.  And when that happens, the result is about as pretty as the analogy implies.

Agile, it seems, is yet another example of a promising technique that has degraded down to a technology buzzword.  As an abstract concept it is easy to wrap your head around, but attempts to implement it seem to have mixed results.  But I think there is still quite a bit of usefulness in it, if you know where to look.  Lets start by pulling out the thing that began the whole movement, the Agile Manifesto.

Manifesto for Agile Software Development

We are uncovering better ways of developing software by doing it and helping others do it. Through this work we have come to value:
Individuals and interactions over processes and tools
Working software over comprehensive documentation
Customer collaboration over contract negotiation
Responding to change over following a plan

That is, while there is value in the items on the right, we value the items on the left more.

I find a few things about this interesting. Lets start with the first value. “Individuals and interactions over processes and tools”. Yet despite this ‘value’, think about how many agile tools are in existence. Tools that let you draw up story boards, generate burn down charts, calculate velocity, etc., all without needing to directly interact with other individuals in the project (and don’t get me started about ‘agile processes’). As far as the second one goes, I’ve heard it remarked how ironic that a philosophy that demphasizes documentation has so many books written about it. The third one is a nice idea, but largely depends on your customer. Often customers see the whole point of paying someone to build a software product for them is so they don’t have to be involved, and to be frank, I’m not certain they are wrong.

But its the last value that I think is both the most core to agile development, and is most often misunderstood. Being able to quickly respond to change, such as new requirements or new discoveries about existing requirements. Because unless you are working on a tiny project you can hack out in a weekend, its unlikely what you finish building looks much like what you had in your head when you started. This idea has been often interpreted as meaning skimping on the design phase, and just go through endless cycles of code, test, repeat until you get something that meets your requirement.

In fact, I think this tends to accomplish the exact opposite of the stated goal; being able to respond to change (unless by change you mean throw everything out and start from scratch). Without considering design principles such as encapsulation and separation of concerns, changing the direction of your codebase is going to be a painful exercise of pushing mounds of unmaintainable spaghetti code around. And not going through a design phase requires you to solidify assumptions in your codebase, which then becomes hard to fix should those assumptions turn out to be false (which, as agilists like to remind us, they likely will).

The best way to write code that can respond to uncertainities and changing requirements is to break the problem down. Then should a requirement change that renders one part of it obsolete, you don’t have to rewrite your entire application, just that single module. And if each module can be made small enough that its risks can be better understood, you will have a much better chance at mitigating those risks once you have reduced the scope of the problem to a small enough size. Of course this break down requires some up front design and thought about the problem. You can’t just divide everything up willy nilly and expect to get pieces that will eventually fit together. For instance, if you have requirements that the application be both reliable and scalable, you can’t separate those two out as distinct requirements. Not only are those requirements that tend to be impacted by every part of your application, but if they are implemented incorrectly they can conflict each other.

Nowhere in the Agile Manifesto’s right side items is the word ‘design’ mentioned as a lesser value, so its odd that its so often assumed agile dismisses it. In fact, much of what agile preaches applies to design just as much as code. Work with your stakeholders when designing software, don’t just throw it in a modeling tool never to be seen again. Make sure your design is focused on working software, not just building up documentation.  Make sure your design is not too rigid that it can never respond to change. And always, always remember to test your design decisions, which typically requires working with your customers to make sure what you are designing is what they indeed want.


1 Comment

Agile Estimation

You a member of an agile development team planning out your next sprint.  You have estimate your velocity at 33.  You currently have a load of 32.  There is a remaining story that has been estimated as having 2 story points (using a Fibonacci sequence).  Would it be a mistake to try to fit it?

If you find yourself asking this, you are doing it wrong.

One premise of most agile techniques is that we are really bad at estimating.  Story points do not try to correct that fact, they simply work around it.  Unless you have the gift of psychic clairvoyance, there is no point in attempting high precision estimates because any such estimate will be wrong.  Hence the use sizes that increase either exponentially or through a Fibonacci sequence.  Assuming you were reasonably accurate, that story you estimated at 13 points might be as little as 12 points.  Or it might be closer to 15.

Remember high school chemistry when you learned about significant digits?  Story points are so low precision they don’t even have one significant digit.  And in a calculation involving low precision measurements, claiming a higher precision result is misleading at best.  Its downright fraudulant at worst.

So back to our above scenario, claiming your story point load is “32” is wrong. You don’t have enough precision in your measurements to say that.  In reality, your load without that extra story is better expressed as “around 30”.  And with the extra story, it is also “around 30”.  If your current load is dominated by a couple 13 point stories, those are what will determine whether or not you meet your goal.  If it is dominated by many small 1, 2, or 3 point stories, you are misleading yourself if you argue you can predict exactly how many you are going to finish.

Is my point that you should give up on estimating?  Of course not.  Just don’t obsess yourself with getting all your numbers to line up.  Commit to enough that you feel comfortable with, and then give yourself plenty of stretch goals. That way you can meet your commitment if your estimates were too low, and you will have enough to do the entire sprint if your estimates were too high.  Because all you really know is that it is unlikely your estimates are spot on.


Leave a comment

Programming in a keyboard-less world

Just the other day, my brand new Transformer Prime tablet arrived.  Aside from being a high quality tablet (quad core processor, one of the very first to offer Android 4.0), it is well known for having a docking station accessory, complete with a keyboard, that essentially transforms it from a tablet to a 10 inch net-book.  My phone, a HTC G2, also has a fold out keyboard, as did its predecessor, my old G1 phone.  So I think its safe to say, I am a fan of physical keyboards.  Sure, voice recognition can be good for some things, and Swype produces a nice on screen keyboard, but if I want to type anything of substance, I’m much more comfortable typing it on a nice hard QWERTY keyboard with actual buttons I can press.

Which makes the fact that I’m writing this post a bit ironic.

There was an interesting podcast last week from the IEEE about the keyboard going the way of the typewriter.  Of course I was rather dismayed by the thought.  Its not just that oncreen touch keyboards will replace them, but that new input devices, such a stylus with handwriting recognition or a microphone with voice recognition, will be the computer input of the future.

I would argue we are nowhere close to that with today’s technology, at least from where I stand.  Being that my Transformer keyboard hasn’t arrived, I originally tried to “write” this blog post with my tablet’s “voice keyboard”, and I couldn’t get through the first paragraph without getting frustrated and giving up.  I haven’t really tried using a stylus recently, but handwriting such as mine is typically so bad even I have trouble reading it, so I won’t begrudge any computer program which can’t read it (and before you try to say that’s just because I’m so used to keyboards I’ve lost the ability to write neatly, my grade school teachers would be quick to point out I never had that skill, even before I learned to type).  On the other hand, I can type at a reasonably fast pace with pretty good accuracy, so there is no debate on which method is more proficient for me.

But one argument made on the podcast was that kids today will likely grow up so used to voice recognition and handwriting recognition that they may view keyboards as obsolete.  That they may offer a technically superior method of writing fast will not matter to them.  After all, one could easily argue that command line interfaces can be much more productive that GUIs for many tasks, but outside of hard core hackers, the world has largely moved away from them.  Even software developers have largely embraced tools such as Eclipse as an alternative to hacking on the command line.

And I can’t deny that there are some areas which keyboards are not very good at.  For instance, look at writing math problems.  Math is typically full of Greek letters, superscript/subscripts, and other things which are just plain hard to type.  Sure, there are usually obscure keyboard shortcuts for them, and specialized software for it (such as Mathematica), but no real general purpose solution.  When I was trying to take notes for the Stanford Machine Learning class on Evernote last year, I can’t tell you how much time I wasted trying to come up with notations for random symbols that kept on coming up.

And of course more creative endeavors such as building “mind maps”, that is just hard to do without a more free flow input format.  That’s why many still argue that pen and paper is a superior note taking device.  Keyboards are great for writing lines of text using a small set of well known characters, but are rather limited beyond that.

So as keyboard-less input becomes more and more mainstream, how will that affect computer programming?  Today, programming is a perfect example of lines of text optimized for keyboard input.  Using voice recognition to write a Java program?  How insane would that be?  “For, begin paren, double, no capital ‘D’ double input colon input-capital-V-vals, end paren, open bracket” instead of just typing:

for (Double input: inputVals) {

Case sensitivity, the frequency of special characters and common symbols, terse variable names, camelCase, none of that will work with voice recognition input.  Computer programming is clearly not a place where you want creative, free form input, but you want to heavily restrict it to what are legal values.

Or is it?

Will computer languages evolve to utilize the advantages of newer input methods.  Will they start to incorporate more free-form writing rather than just plain text?  Will it even be possible to come up with languages like that?  Or will future freshman computer science students have to spend hours learning ancient typing techniques that have become obsolete outside of writing programs?

I suppose time will tell.



Clojure Conj Day 1 Wins and Losses

So a quick recap of the good and bad during the first day of the 2011 Clojure Conj.

Lets start with the good:

  • A fantastic talk by Arnoldo Muller-Molina on using Clojure in some very interesting bioinformatics problems.
  • Two very interesting talks on logic programming, one by Ambrose on Clojure’s core.logic, another less formal talk by William Byrd and Dan Friedman (two of the authors of the Reasoned Schemer, which I really need to read) on minikanren, which included writing a program to write 50 functions that return 6 (though I think it would have been more useful had it found functions that returned 42…).
  • I finally got my print copy of Clojure In Action (which I ordered as part of Manning’s early access program when the book was still in Beta… just over 18 months ago).
  • A great talk by the precocious Anthony Grimes on Clojail.
  • A lot of people I talked to are using Clojure not just for hobby development, but in their day jobs as well.
  • The Sheraton seems very capable of handling the size of the crowd.

But of course there were a few negatives as well.

  • Ragweed is in season and I really should have started taking allergy pills a few days ago.  And by the sneezes I heard behind me, I wasn’t the only one.
  • Parking in the lot is a tad bit expensive for those of us not staying at the hotel (though of course I didn’t have to travel, so I can’t complain about cost too much).
  • Lunch was, well, less than inspired.  I mean come on, make your own sandwiches?  I can make those at home and bring it with me…
  • No bagpipes (yet…).

All in all, a very good first day.


1 Comment

Big Data at Strange Loop

Ok, time to finally review the talks I attended at last month’s Strange Loop conference in St Louis.  The last two weeks were a tad bit busy (a bit more on that later), so this post was delayed a bit.  But lets start with the sessions on big data, starting with the first keynote of the conference.  Data was a common theme at the conference and was one of the conference’s dedicated tracks.  That shouldn’t be much of a surprise to anyone following the software industry these days, as the need to analyze huge amounts of data is becoming more essential for businesses.  So Erik Meijer (architect at Microsoft) kicked things off with his talk, “Category Theory, Monads, and Duality in (Big) Data.”  I can’t find a link to the slides of the talk, but here is the paper it is based on.

Even though the title contained references to category theory and monads, you didn’t need a PhD in Mathematics to get what Erik Meijer was getting at.  And that was a very good thing, since his talk was very useful.  The essence of it was a comparison between traditional table-based SQL databases and the new breed of so-called NoSQL object databases, specifically that they are not as different as we tend to think.  In fact, he proposes replacing the ambiguous term NoSQL with CoSQL, to show how the two have a mathematical duality between them.  Basically in table based databases, you have entities (each of which can stand on their own) using foreign-primary key relationships to point to their parents.  Meanwhile in object based CoSQL databases, you have parent entities pointing to their children, who really have no context outside of their parents.

It was a really interesting talk, and not just because it had a lot of abstract math in it (I guess that may be an odd phrase for most of the world to hear).  He finished with a plea for developers to make their design decisions not on emotion or on what appears hot today, but on which design better modeled their data.  Both have advantages.  While object based CoSQL databases are more open and composable and tend to horizontally scalable, the rigidity of table based SQL databases offer plenty of advantages as well, when the problem domain calls for it.

, , , ,

Leave a comment

Odd Circle-like-things Day 0

So today was, lets say Day 0 of Strange Loop.  Aside from some hiccups on the plane to St Louis regarding a very important part of the plane (we were delayed for half an hour while they fixed the flushing mechanism in the lavatory), things have gone well.  The actual conference begins tomorrow, but today were the optional workshops and, of course, was the day most people arrived for the conference proper.  Let me preface this post by stating that the rooftop bar is very nice.  Where else can you drink Scotch while discussing the uptake of Clojure within business software companies with some of the smartest people in the industry?  But as a warning, that might impact the quality of this post…

Anyway, I had signed up for only one workshop, Nathan Marz’s Cascalog workshop.  Cascalog is a Clojure based library for data processing with Hadoop.  Think SQL on steroids.  I have interests in both Clojure and Hadoop, and yet don’t have too much experience with databases, so it was an obvious session for me to sign up for.  It was a three hour workshop, but at the start Nathan warned he was used to giving similar workshops for around 9 hours so this one might be a little rushed.  I had initially laughed, thinking 3 hours sounded like a long time, but it definitely was rushed.  Yet we learned quite a lot.  Cascalog definitely is a powerful library which I will certainly play around with later.  I was very impressed with the quality of the queries that could be made with what seemed very intuitive Clojure code, not the convoluted nonsense many SQL queries end up as.

Tomorrow comes the hard part though, choosing between several very different, yet very interesting talks that are scheduled at the same time.  In fairness to the conference planners, with the number of talks that look interesting it would have been mathematically impossible for them to schedule everything so I could attend everything that I wanted.  But still, as Alan Dipert of Relevance said, the thing I am least looking forward to is having to choose between talks.  I just hope enough will be recorded that I will be able to watch the ones I miss later.

, ,

Leave a comment

ClojureScript announced

Rich Hickey and the Clojure/core team just announced (ok, they announced it last week, but I was on vacation then, so I actually have an excuse to be blogging late) a new project, ClojureScript, a Clojure to JavaScript compiler.  With Clojure being a great language to develop and JavaScript being nearly ubiquitous for web programming (though not always that great to program in), it does certainly look interesting.  I was a little disappointed as when I heard rumors of it at the TriClojure meetup earlier this month, I was hoping it would be an Android library (that wasn’t slow and didn’t add a few megs to the app), but I will have to play with it some.  I haven’t done much with JavaScript in several years (for which I am thankful for), but this should give me an excuse to get back into it.

The “Clojure”/”Google Closure” thing is going to get a bit confusing though.

, ,

Leave a comment

Engineer, Developer, or Gardener?

Whats the difference between an engineer, a scientist, and a mathematician?  An engineer sees his equations as an approximation of reality.  A scientist sees reality as an approximation of his equations.  A mathematician just doesn’t care.

Cars driving over bridgeThere was discussion in a relatively recent Basement Coders podcast (ok, it went out mid May and its not even the most recent podcast they have released, that tells you how often I post to this blog) about whether or not Software Engineering is a good title for professionals writing software.  It was sparked by a blog post suggesting that a better title would be “Software Gardeners”.  Reading the blog post, I can’t agree with much that he says, and while I have never like the term “Software Engineering” (though less so recently, in fact the subtitle of this blog is “A blog exploring the world of software engineering”), but “Software Gardener” is much worse.  I don’t want to diminish what gardeners do, as it certainly does involve a lot of work.  But what gardening involves is mostly preparing the environment and letting the plants grow.  That might be a good metaphor for, say, management.  But not software development.  There you need to do the work to build the product.

Craig from the podcast, however, seems to disagree with the term “software engineering” as well, but for different reasons.  His argument is that writing software is not deterministic, while mechanical engineering (such as building a bridge or skyscraper) is.  I’ll have to disagree with that as well.  Software can be deterministic, it just usually isn’t.  And similarly, engineering in the physical world is rarely deterministic.  You can build a nuclear reactor that is safe from any imaginable disaster, only to have it meltdown after being hit by both a magnitude 9.0 earthquake and a resulting tsunami.  That kind of real world interference is not something we have to think much about in the world of software.

So why does software seem so indeterministic?  There are many reasons for this, but the biggest is that comparing the development of a web application for posting messages for friends and a bridge really does not work.  In the web application, you can sacrifice quality for speed to development, which you can’t do with the bridge since human safety is involved.  This results in more attention being spent on detail for the bridge, giving the impression that it is more deterministic.  If you spent as long as bridge engineers spend writing your web application, it will probably be very stable (but of course you will be the last one to the market).  Instead, compare something like developing the software behind a car vs engineering the car’s engine.  When your car breaks down, how often is it a software failure vs a problem with the physical engine.

So why do I have issues with the term software engineering, and generally prefer “software developer”?  I did graduate with a degree in engineering (though just barely, I started in Virginia Tech’s College of Arts and Sciences, but about a year before I graduated the Computer Science department got moved to the College of Engineering).  But I do still see a difference between developing software and the more physical engineering disciplines.  And that’s where the quote at the top of this post comes in (which I first read on VT’s Mathematics department webpage back when I was in school).  Based on that description, developing software is neither a science nor an engineering field.  Its a branch of mathematics.  We spend huge amounts of resources trying to distance our code from the real world of the computer electronics.  From assembly languages abstracting out the processor to the operating system abstracting out the assembly languages to the virtual machines abstracting out the OS, we’ve been moving further and further away from real world with each year.  We don’t see our programs as an approximation of the real world, nor do we see the real world as an approximation of our programs.  We simply do not care about the real world.

And to be honest, that’s why I love developing software.  I probably should update my blog subtitle though.

, ,


Clojure Tip of the Day

Avoid writing testcases that compare data structures that include infinite lazy sequences.  It kinda does what you would expect…

And yes, this is speaking from experience.

, ,

Leave a comment