Stop Rolling Your Own CSV Parser!

September 12, 2006

Would you write your own XML Parser? Only if you're f***ing crazy.

Yet developers constantly write their own "little" csv parsers.

How does this madness occur?

Step 1 -- Ignorance

"Oh this will be easy, I'll just read the file one line at a time, calling String.Split(',') to break each line into an array.

"Then I'll be able to refer to each item by number."

(You're already headed for stormy water... anything you do from now on will only drive you into the rocks harder and faster...)

continues...

Step 2 -- First Doubts

"Oops. I need to handle for commas, which are either escaped (by prefixing them with a special symbol) or contained inside quotes."

(So you decide to use regular expressions. After a bit of tinkering you've got a nice little regular expression that seems to work.)

(That ringing in your ears is Jamie Zawinski saying "Some people, when confronted with a problem, think 'I know, I'll use regular expressions.' Now they have two problems. )

Step 3 -- Uh oh

"The quotes worked good at first... but quotes need to be escaped too. And sometimes there's double quotes, sometimes single quotes. Easy -- I'll just fix my regular expression."

Step 4 -- The Descent into Chaos

You start to adopt a 'test-driven' approach, only it's more of a 'panic driven' approach. You write numerous test cases for your unwieldy csv parser. It behaves nice.

You test it on more real world examples... it breaks your existing code and you need a new test case or two.

You begin to add new test cases, and trying always to do the simplest thing that will get the code to work.

It's now eight weeks since you said "I know! I'll just use String.Split(... ". You have grown a long beard, which is particularly annoying as you are a woman. You have lost all boundaries in regard to personal hygiene. Managers circle your desk like vultures circling a wounded leopard.

Step 5 -- Enough!

You lift your head from the keyboard for just moment when a thought strike you. The problems you are facing have been faced before. You are re-inventing the wheel.

You download a code sample from the internet, and use your test cases to try them out. The downloaded code is much worse than what you've written yourself.

You download more samples from the internet. They're all broken. In. Different. Ways.

When you try to contact the developers of each library to see how they work, you find that the developers have generally retired and/or passed away and/or quit working in the IT industry. You consider how fortunate they are.

Step 5 -- Help me!

You go to the blog of someone you know and trust. You email that person. That person writes back and says, in big letters:

Just Use Marcos Meli's File Helpers.

The great thing about File Helpers is not just that it works, but that it is actively developed by Marcos, and if you need an improvement to it, you can contact Marcos (marcosdotnet at yahoo.com.ar). He's a real person who cares about getting his library to work properly. He's not just stopping at a 'good-enough' solution.

The other, and perhaps even greater, advantage is one you never dreamed of. Now you don't have to refer to fields by number. No more " myArray[4] " -- you can now say " myCustomer.Id ".

The resulting code is so readable that you'll survive your next code inspection without getting your arms and legs torn off by Terry (Head Code Nazi and leader of the local chapter of The Programming Gestapo).

You can stop re-inventing the wheel and get on with your day job: cranking out more bugs, faster.

(But it's a good thing this experience gave you a chance to try out test driven development!)

My book "Choose Your First Product" is available now.

It gives you 4 easy steps to find and validate a humble product idea.

Learn more.