Simon Harriyott

Regular Expressions: sorry, have we met?

Every now and again, I come across a little coding task that seems to be best solved with a nifty regular expression. I usually come to that conclusion after contemplating a string mangling exercise involving at least two SubStrings, a couple of Lengths and a Replace.

The problem is that even though I only used regular expressions a couple of months ago, I cannot remember the first thing about them. I have to start right from the beginning. Is it star dot, or dot star to match any number of any character? How do I make it lazy, not greedy? What is the "or" character?

So I find the last regular expression I wrote, hoping that will help. It doesn't. There is a long line of characters, which start to look familiar, but it takes so long to read, parse and understand that it isn't all that much use. I then Google "regular expressions", and have a read up, and I understand it, have a go at my coding task (while still flicking back to the reference website) and it's done.

As I know that I'll look at it again in a few weeks and be baffled by it, I try and comment it. It's a really hard thing to comment, as I now know all about regular expressions, and any dimwit can do it, so I just need to explain what the expression is for. Then I realise that loads of people don't use them, and an hour ago I couldn't remember how they work, so I think I actually need to explain what the regular expression does, character by character.

Of course, this isn't relational (as in database), as if someone changes the regular expression, the two screens-worth of comments isn't automatically updated, and the next developer won't update it (they never do), so it is then wrong, so it doesn't really seem worth writing it in the first place.

Now, I do like regular expressions, but they seem to be the only coding activity that possesses the property of instant forgetability. Everything else that is used infrequently, such as threading or reflection, doesn't get forgotten so completely. Sure, a couple of method names need searching for in intellisense, but the basics are still there.

Imagine walking up a hill, and then having to stop for a while to do something (like phoning a restaurant to book a table), and then continue up the hill. You start off from the same place you stopped. That's like normal coding. With regular expressions, it's skateboarding up the hill. When you finish the phone call, you look round and find you've rolled to the bottom again.

[Tags: ]
12 May 2007