"The string is a stark data structure and everywhere it is passed there is much duplication of process. It is a perfect vehicle for hiding information." —Alan Perlis

When you switch between "Compose" and "Edit HTML" views, some amount of whitespace (although not all of it) is destroyed.

Even when posting using the ATOM API, the posted HTML is mangled in semi-arbitrary ways.

Properly-quoted "<" and ">" (i.e. "<" and ">") are quoted again. Additional line-breaks are added. is converted to white-space, and then white space is collapsed.



"If you are going to produce real XHTML in a tool usable by ordinary users, then you cannot do it by string concatenation. You need to assemble your content by serializing an XML DOM tree. If you want to allow plugins, then your plugin API cannot allow plugin authors to stick arbitrary strings in the output. Rather, they should be allowed to add nodes to the DOM tree, or to manipulate existing ones."

I use ScribeFire as my HTML editor. It manages OK, except it doesn't include linebreaks, <p>s or <div>s to separate lines. So, leave the "Convert Line Breaks" option on in your blog's settings. In "Settings -> Basic -> Global Settings", disable "show compose mode for all your blogs". The compose view is destructive, and switching between it and "Edit HTML" will eat whitespace each time you do it; it also seems to sometimes eat bits of formatting when you publish even if it's just on the page. Edit a post in ScribeFire. To save drafts, use the "save as note" functionality. This doesn't publish it to be a blogger draft, but there's no way to get the data into blogger directly. You can use the HTML tab as you normally would, to add tags that aren't supported (such as "<pre>").

Switch to the HTML ("<A>") tab in scribefire. select all. copy. Click "New Post" in the blogger web UI. click in the text field. paste.

I switched to blogger recently expecting a more "professional" blogging experience. I thought I'd be able to use a GUI editor and not concern myself with the details of the blog engine. Apparently I was wrong.Writing that last post, I had some pretty serious problems with getting the formatting to come out right. Blogger does a couple of really terrible things:This is one of the reasons that I'm such a stickler for treating data as structured data, and not making arbitrary heuristic guesses about it. It's not just a matter of handling obscure, nerdy edge cases that average users won't run into. In fact, it's the opposite. Nerds (like myself) can figure out whether you're double-quoting your HTML entities or doing improper whitespace conversions. But what does a regular Joe do when a "frustrated" smiley (">.a page on the Habari wiki Strangely enough, this page concludes that the important thing is not to build their next-generation blogging tool on top of a technology that lets them produce valid output (serializing DOM trees) but that the important thing is not producing valid output, but string concatenation. They very clearly put an implementation technique above a good experience for users.(This is your brain. This is your brain on PHP. Any questions?)I don't want to pick on the Habari developers overmuch. After all, the problem that inspired this post was with Blogger, and Wordpress has the same issue. In fact, the Habari guys are mostly notable for having considered the implications of their decision so carefully; it's just a surprise to me that they walked all the way up to the right answer, looked at it, made sure it was right, and then decided to ignore it and keep on going.Here's the surprise for the Habari developers, and basically everyone else who writes web applications that process HTML:. It is a general principle of software development. The only reason youwhen you're doing XHTML is that the browser isn't correcting for hundreds of minor mistakes, and rather than screwing up immediately it screws up one time in a thousand when a user managed to type a " ultra large , the different pieces need to be able to talk to each other using clear and unambiguous formats. These points of integration, the places where system A talks to system B (a blogging system talks to a web browser or a blogging client, for example) are absolutely the most critical pieces to test, test, and test again. If you have a bug in your system, you can find it and fix it; but if you have a bug which only arises from anbetween your system and two others, your test environment needs to be 3 times bigger, and the error is at least 3 times harder to catch. But it gets worse. If you're dealing with 4 systems, then your test environment is 4 times bigger - but the bug is 6 times harder to catch. And so on.Fred Brooks observed that adding more programmers to a project running behind schedule makes it later . This is because of the additional channels of communication. Now imagine that one of your developers has a curious speech defect: when he says "lasagna" he actually means "critical bug", and vice versa. When he hears one, he understands it as the other. Working alone, this is a harmless eccentricity, but as soon as you put other developers into the mix, strange effects start taking place. He desperately tries to tell them about the delicious lasagna he had last night, and they can't understand why he's losing sleep over it. Or, he is sanguine as his fellow engineers tell him about all the italian food they're eating, while the business is losing millions of dollars It's sort of like if every time he said "my own blogging platform and I'll know that it can handle HTML correctly. Until then though, I've worked out a strategy for posting to blogger which seems to mostly preserve the formatting that I want to see. I figure that other Python developers might be interested in this, since I frequently see posts to blogger which eat indentation.The presence of numerous properly-escaped HTML characters in this post should be an indication that it works.