I have been asked recently to write an article, somewhat along the lines of this one but longer, and with a somewhat different angle, asking a little bit different questions: What makes a science blog? Who were the first science bloggers and how long ago? How many science blogs are there? How does one differentiate between science blogs and pseudo-science, non-science and nonsense blogs? The goal of the article is to try to delineate what is and what isn't a science blog, what are the overlaps between the Venn diagram of science blogging and some other circles, and what out of all that material should be archived and preserved forever under the heading of "Science Blogging".

We've had these kinds of discussions for years now... but I'll give it my best shot. And I need your help - let's crowdsource this a little bit. I was active on Usenet in mid-90s, started political blogging in 2003, but only joined the science blogosphere somewhere around late 2004 or early 2005. I am much more familiar with biology and neuroscience corners of the blogosphere than, for example, math, space or psychology circles (thought I increased my breadth as I was assembling this network). There were several science bloggers before me, posting their stuff for several years before I discovered them. They will know stuff I don't. I hope bloggers, old and new, join me in this project, fix my errors, add missing information, and more, in the comments (and perhaps someone can put the final result on Wikipedia later on).

Defining a science blog

Defining a science blog - heck, just defining a blog - is difficult. After all, a blog is just a piece of software that can be used in many different ways.

What is considered a science blog varies, and has changed over the years. Usually it is meant to be a blog that satisfies one or more of these criteria: blog written by a scientist, blog written by a professional science writer/journalist, blog that predominantly covers science topics, blog used in a science classroom as a teaching tool, blog used for more-or-less official news and press releases by scientific societies, institutes, centers, universities, publishers, companies and other organizations. But is a blog written by a scientist that never covers science really a science blog? Is a blog by a PhD in dentistry who spews climate denialism in every post a science blog?

What is considered a science blog also changes with the advances in technology. There is now a fine-grained division of blogging into macro-, meso- and microblogging. Initially, this distinction was made by technology. Macroblogging happened on platforms like Wordpress or Blogger, mesoblogging on sites like Posterous or Tumblr, and microblogging on social media like Twitter and Facebook. But technology moves, and now it is possible to do all three "sizes" (or is it "speeds"?) on any of those platforms - and some people do.

Is a one-liner posted on a blog the same as a one-liner posted on Twitter? Some posts on Facebook and Google Plus are longer and more thorough than some others that use the more traditional blogging platforms like Wordpress, Blogger or Drupal. Yet G+ is very new and Facebook, until recently, had quite a short word-limit. Many people used blogging software to do very brief updates back when that was the only game in town. Today, quick updates, links etc. are done mainly on social media and many bloggers use the traditional blogging software only for longer, more thorough, one could even say more "professional" writing.

Finally, blogging is not just about text. There is photoblogging, videoblogging, podcasting etc. And for each of these specialized types of blogging, one can potentially use a traditional blog software, or instead choose to do it on social networks, or on specialized sites, e.g., Flickr, Picassa, Instagram, Pinterest, Tumblr, YouTube, DeviantArt etc. Does all of that count?

The beginnings of science blogging

Pin-pointing the exact date when the first science blog started is a fool's errand. Blogs did not spring out of nowhere overnight. The first bloggers were software developers who experimented with existing software, then made some new software, fiddling around until they gradually hit on the format that we now think of a 'blog' today. The evolution was gradual in the world of blogging, and it was also gradual in the more specific world of science blogging.

The earliest science bloggers were those who started out doing something else online - updating their websites frequently, or participating in Usenet groups - then moving their stuff to blogging software once it became available in the late 1990s and early 2000s.

As much of the early online activity focused on countering anti-science claims, e.g., the groups battling against Creationism on Usenet, it is not surprising that many of the early science bloggers came out of this fora and were hardly distinguishable in form, topics and style from political bloggers. They brought a degree of Usenet style into their blogs as well: combative and critical of various anti-science forces in the society. And certainly, their online activity had real-world consequences and successes, for example the Dover trial for which a decade of resources accumulated by the bloggers and their community, in some cases presented at the trial itself by those same bloggers, helped defeat a Creationism bill in a resounding manner that, in effect, makes all future efforts to introduce such bill relatively easy to defeat.

Phil Plait, Chad Orzel, Razib Khan, Derek Lowe, David Appell, Sean Carroll, P.Z.Myers (whose blog started as a classroom teaching tool), Tim Lambert, Chris Mooney, and Carl Zimmer were some of those early science bloggers. Panda's Thumb blog and Larry Moran's Sandwalk are for all practical purposes direct descendants of the old Usenet groups. Real Climate has, I believe, similar origins. Among early adopters of blogging software, rare are the exceptions of people who instantly started using it entirely for non-political (and non-policy) purposes, just to comment on cool science, or life in the lab etc., e.g., Jacqueline Floyd, Eva Amsen, Jennifer Ouellette, Zen Faulkes and Grrrlscientist.

In those early days, we pretty much all knew, read, linked, blogrolled and responded to each other, despite a wide range of interests, backgrounds, topics, etc. As the blogosphere grew, the nodes appeared in it, concentrating people with shared interests. Those nodes then grew into their own blogospheres. Medical blogosphere, skeptical blogosphere, atheist blogosphere and nature (mostly birding) blogosphere used to be all part of the early science blogosphere, but as it all grew, these circles became separate with only a few connecting nodes. Those connecting nodes tend to be veteran, popular bloggers with large readerships, as well as bloggers on networks like this one which tend to want to have representatives from many areas, e.g., medical bloggers mixed in with paleontology bloggers mixed in with space bloggers, etc.

Some key moments in the evolution of science blogging

I will now try to identify some of the events and developments in the history of science blogging that, in my opinion (and please disagree in the comments), were especially important in the direction science blogging evolved: the changes in styles, the growth in size, and the rise in respectability.

Tangled Bank, and other science blog carnivals

What is a blog carnival?

It is a crowd-sourced online magazine, occurring at a regular interval, usually rotating hosting blogs for each edition. Bloggers submit their best posts from a particular period or on a particular topic to the next editions’ host who accepts (or rejects) the entries, and edits a blog post that contains nicely arranged and introduced links to all the entered posts. Thus, it is a well-defined, well-archived, regular, rotating linkfest. Usually all the included bloggers link back to the carnival from their blogs (as well as other online sites, e.g., social networks) thus bringing attention and traffic to the host, as well as to all the bloggers whose work is included in that edition.

The very first such "rotating blog magazine" was started in 2005 under the name "Carnival of Vanities" (from which the phenomenon got its name) and the concept quickly spread like wildfire.

One of the very first carnivals was started by by P.Z. Myers. This was Tangled Bank (unfortunately, the archive appears to be gone). This weekly rotating linkfest helped science bloggers discover each other, promote themselves and each other, encourage new people to start blogging, and start building a community. Several spin-offs showed up later, e.g., Grand Rounds (medicine), Skeptics' Circle (countering pseudoscience), I and the Bird (birds), Circus of the Spineless (invertebrates), Berry Go Round (plants), Change of Shift (nursing), Friday Ark (animals, mostly photos), Encephalon (neuroscience), The Accretionary Wedge (earth science), Carnival of the Blue (marine science), The Giant’s Shoulders (history of science), Festival of the Trees, Carnival of Mathematics, Carnival of Space, and a few dozen others. Some of those are still around, but most have closed after a good multi-year run.

I have written quite a lot about blog carnivals before, what they are, why people should participate, and how carnivals affect journalism and science.

With the more recent development of social media, the carnivals are not seen as important for community building as they once were. First came the feed readers, and feed aggregators (especially FriendFeed) that made it easier for one to track and filter blog posts and other content by topic or some other criteria. The primary function of the carnivals - to build community - could easily be done in these new spaces. Then Twitter came along, though it took some time for people to figure out how to use it, to invent various Twitter norms (e.g., RT, hashtags, @reply), and to build apps that make Twitter more useful (though this is now endangered).

A little bit later, Facebook bought FriendFeed and imported all of its good functionalities (e.g., "Like" button, "Share" button, "Friend of Friend", "Pages", video embed, toggling between "Top stories" and "Most recent" on the homepage feed, etc.), lifted the word-limit on status updates, made importing other feeds easy, and made long-form blogging easy as well. Finally, a year ago, Google Plus was launched - essentially FriendFeed on steroids, linked more and more intimately to all the other Google stuff, from your Gmail to Google Docs to YouTube to Picassa. Give them another year, and G+ will become what FriendFeed would have been if it was not sold and continued to be developed.

All of those platforms make community-building easier than traditional carnivals. It is easier to do. It is easier for newbies to join in and get noticed. It is easier for one to individualize a degree of engagement with that community. But easier the community-building gets, harder it is to perform the second key role of carnivals - as archives. Each edition of a carnival is a magazine, a snapshot of the moment, and a repository of pieces that both their authors (by submitting) and hosts (by accepting) thought were good and important. And when a carnival dies, and the archives' host subscription expires, all those historically important links are gone!

In place of carnivals, what people tend to like these days are linkfests done by individuals who serve as trusted filters. I started doing it myself a couple of months ago, picking perhaps a third of the links I tweet over a period of a week and organizing those links in a single blog post.

In the very first installment of my Scienceblogging Weekly, I wrote:

These one-editor carnivals seem to be the fashion of today. But old-style carnivals were, in my opinion, better both at community building and as historical archives.

Research Blogging

Second important moment was the start of a new blog, Cognitive Daily, written by Dave and Greta Munger. They pioneered the form of blogging that was later dubbed 'researchblogging' - discussing a particular scientific paper (which is referenced at the bottom), usually in a way that lay audiences can understand.

At the time, science blogging was developing its own norms, as there is no such thing as "word limit" online (blog posts tend to be much longer than traditional news articles, not cutting out any relevant context out of the article), bloggers instinctively understand the value of links (which forces them to research much more thoroughly than the usual daily news article), blogs tend to have a more chatty and personal style, yet most science bloggers are either experts in their fields (thus no need to interview other experts just to get the quotes) or have acquired expertise by covering a topic for decades (e.g,. Carl Zimmer on evolution), thus can speak with authority.

Even today, but especially in the early days, bloggers usually did not care to cover brand new papers the moment the embargo lifts. In the early days, coverage of papers was quite rare. Apart from debunking pseudoscience, much of early blogging was more educational than journalistic - covering decades of research on a topic, or explaining the basics. If they covered a paper, bloggers were just as likely to cover an old, historical paper as a new one.

But when Dave and Greta started their blog, others took note. With the researchblogging style, not only can the blogger report on a paper, but there is also a way to embed videos, polls, animations, etc, to make the readers engage much more actively - which their readers did. In many a post they did a sort of quick-and-dirty replication of studies online, with readers as volunteer subjects.

This format of blogging rapidly took off - many bloggers started emulating it, and especially new bloggers immediately started doing this style of blogging, probably vastly outnumbering the anti-pseudoscience bloggers today. Formation of the ResearchBlogging.org site (more about it below), with its icon, code and aggregator, also made this type of blogging attractive to newcomers. Probably the best example is Ed Yong, who instantly took to the format, blogging about at least one paper per day, often covering nifty papers that the rest of the media missed. And Ed covered new papers. The moment embargo lifted. This was obviously journalism even to the most traditional eyes. This was something that other journalists, or people hoping to get into journalism, could also do. So they did. In droves.

Blog Networks

Third important moment in the history of science blogging was the start of science blogging networks. The first one was NPG's Nature Network. It was essentially an accident - the site was supposed to do something else, but ended inviting people to write blogs instead. Unfortunately, due to technical architecture, it is not well connected to the rest of the world (for example: posts, if they show up on Google Blogsearch at all, show up with several days of delay). One had to remember to go there instead of having the links thrown in one's face wherever one may be online. Also, the initial strategy of the network was to ask researchers to blog, but very few of them took to the format very well - most of their blogs had one post and then died. Those few who did start blogging well, found themselves isolated, not knowing who is reading them, or even how many did. After a decade, the network has undergone some changes, the bloggers have rotated in and out with some excellent writers there now, and it appears to be more visible now than it used to be when it first started.

The second network (launched in January 2006), Seed Media Group's Scienceblogs.com was what really made a difference. Here was a media organization vouching for the quality of bloggers they hired to write on their site. And they picked bloggers who already had large readership and traffic, as well as clout online, the likes of P.Z.Myers, Orac, Grrrlscientist, Tara Smith, the Mungers, Revere, David Kroll, Tim Lambert, Ed Brayton, Razib, etc. This gave the network's bloggers respectability, and the rest of the mainstream media got into a habit of checking Scienceblogs.com as their source of science news online.

A couple of other networks started relatively early in the history (Scientificblogging.org which was later renamed Science2.0, Discover, Discovery News, Psychology Today, Smithsonian...), but mainly dwelled in the shadow of Scienceblogs.com until the infamous #Pepsigate (more about that below). I wrote quite a lot about the role of networks at the time of Pepsigate, in my farewell post at Scienceblogs.com and a couple of more subsequent posts immediately after.

Open Laboratory

The fourth important moment was the first edition of the Open Laboratory, annual crowdsourced anthology of the best writing on science blogs. After five years of getting published at Lulu.com, the sixth edition is about to get published by FSG, imprint of Scientific American at MacMillan. Here was, as early as January 2007, a collection of some amazing blog writing about science, in traditional book format, built by the community itself. It really helped the community define itself. Gaining an entry into the anthology became a big deal. The Open Laboratory was a project designed to go together with the first ScienceOnline conference, and although the publication date is now completely different from the date of the meeting, the books are still a project of the ScienceOnline organization. The conference itself added to the feeling and spirit of the community in a way that gatherings of techie, skeptical, atheist or political bloggers could never accomplish.

For many people, seeing words printed on paper still carries a certain dose of respectability. After all, the real estate of the paper is expensive. A book is a result of a large investment of time, money and effort - either bottom-up, by the author (sometimes perceived as a result of a big ego), or top-down, with an editor choosing what material is worth the investment.

Open Laboratory turned that on its head. Authors submit what they think is their best work, trusting that a jury of peers will fairly assess them, choose the best pieces, perhaps improve them a little bit (more this year than in previous years), and that the entire community will help promote the final product. Inclusion of a blog post in #openlab is not just a result of the whim of an editor, but a result of two or three rounds of judging by multiple people all of whom are also science bloggers and writers. This mutual trust matters.

Awards

Early on there were Koufaxes, later Webbies, and all sorts of other blogging awards. Some of those had awards for science blogging. But if the managers of the award allow bloggers who only pretend to be scientists and use seemingly-scientific language to push pseudoscience (e.g,. global warming) into the Science section of the awards, then real science bloggers react with disdain, then ignore that particular award in the future. When the award is set up essentially as a popularity contest, and when such anti-science bloggers, due to hordes of followers, win such contests, then there is no real reputation linked to that victory, thus there is no need for science bloggers to expend their energies or in any way promote such awards.

Fortunately, over the last few years, a reputable award for science blogging emerged (the fifth important moment in the evolution of science blogging), the 3 Quarks Daily Award, with three rounds, one with reader voting, one with jury voting, and final judgement by the prominent judge who declares the final winners out of ten or so finalists. The winners get money, and proudly sport the 3QD buttons on the sidebars of their blogs.

The aftermath of #Pepsigate

The sixth important moment was #Pepsigate, when Scienceblogs.com broke up and about a quarter of the bloggers left. The time was ripe for it - there were too many science bloggers around, yet only blogs at Scienceblogs.com got any traffic or respect. That was an unstable situation. So many good bloggers were out there, writing wonderfully, but were essentially invisible under the shadow of "The Borg".

In the wake of #Pepsigate, existing networks (e.g., Discover, Nature Network) redesigned their sites and brought in some of the bloggers fleeing Scienceblogs.com. New networks sprung up almost instantly to lure in more of these blogging veterans. There were new networks started by organizations like Wired, The Guardian, PLoS, NatGeo, AGU, ACH as well as self-organized science blogging collectives like Scientopia, Field Of Science, Science3point0 and Lab Spaces. The last one to launch was Scientific American network which just celebrated its first anniversary last week.

Being on one of these networks became a stamp of approval for the bloggers, and we quickly built Scienceblogging.org site (which is about to undergo a thorough rebuild and redesign, also a project of ScienceOnline organization) to help people find all of the networks, collectives and key group blogs all in one place. While the inclusion there is not as stringent a process as it is on ScienceSeeker.org, this site is also a proxy for quality in some ways, as most of the blogs appearing there wear the imprimatur of traditional organizations, be it the media, publishers, or scientific societies, or the warranty by their colleagues who invited them to join their collectives. This site has, to many in the mainstream media as well as bloggers and readers, replaced scienceblogs.com as the "homepage" where they start their day.

Aggregators

I have already mentioned above that an important moment in the history of science blogging was the start, by Dave Munger, of the website ResearchBlogging.org which aggregates blog posts from science blogs but only if the posts contain the code indicating that the post is covering a paper. The code also renders the citation correctly in the post itself. As the site has editors who decide which applicants can be accepted (or rejected), this became an unofficial stamp of approval, the first method of distinguishing who is and who is not a science blogger.

A couple of years later, when PLoS started accepting bloggers onto their press list, being a member of ResearchBlogging.org was the criterion used for acceptance to the press list (I should know - as I was the one doing the approval at the time as their blog/online manager). A little later, PLoS introduced its Alt-metrics on all of their papers. One of those metrics counts the number of blog posts written about the paper. Going through Google Blogsearch and Technorati bring in all sorts of spamblogs, or people who use blogging software to post copies of press releases, instead of genuine science bloggers. Thus PLoS used ResearchBlogging.org as a filter on their papers.

As ResearchBlogging.org is owned by Seed Media Group, now controlled by NatGeo, and as there seems to be no technical support, financial support, or development of the site any more, people who are using it are advised to switch instead to the successor site, ScienceSeeker.org - another project of the ScienceOnline organization, a much better site that serves the same purpose but also does much more, has some funding (and is asking for more) and is in constant development. Dave Munger is, again, one of the key people involved in the development of this site. At ScienceSeeker.org, one can filter by discipline, or only show posts that have the ResearchBlogging.org code in them, or only show posts that ScienceSeeker editors have flagged as especially good. Both ResearchBlogging.org and ScienceSeeker.org now count (as far as I know) around 1200 blogs on their listings (with much, but not total, overlap). More blogs need to be added for the site to become a more comprehensive collection, but blogs that are on there are a pretty good snapshot of the core of the scientific blogosphere today.

Size of the science blogosphere

It is relatively easy to count science blogs in "smaller" languages, e.g., German, Italian, French, Spanish or Portuguese, with several dozen each at most. It is much more difficult to count science blogs written in English, Russian, Chinese or Japanese - those most likely count in multiples of thousands. But it is impossible to make a good estimate as it depends on one's definition.

Searching Google or Technorati brings up many blogs with a "science" tag that have nothing to do with science - or worse (spam blogs, anti-science blogs, etc). Researchblogging.org and ScienceSeeker.org are still too small to be useful for counting the total size of the blogosphere.

How does one count blogs that have not been updated in six months - on hiatus or dead? How does one count multiple blogs by the same person, perhaps not even updated simultaneously but successive editions of the blog (e.g., as the person moves from one network to another)? One blog or many? Does one count classroom blogs, at least those that are not set on 'private'? How about institutional news blogs? Are they "real blogs" or just an easy software to use to push press releases? And do press releases count? We can fight over this forever, I guess, so I'd rather concede that blogs are uncountable and to leave it at that.

Rising power and respect

I have written recently, much more briefly than here, about the history of science blogging and the problem of delineation of who is in and who is out. In that article I also mentioned some events that added to the respect of science blogs, e.g., Tripoli 6 affair, George Deutch affair, the PRISM affair, and #arseniclife affair (finally concluded last night!), though there have been many other cases in which science bloggers uncovered wrongoing, or forced media to pay attention to something, or forced action on something important. Some of those cases involved clearing the record within science, others had effect on broader society or policy.

Each one of these cases strengthened the respect for science bloggers. In some cases they did a much better job reporting than the mainstream media did. In others, they tenaciously persisted on a story until they finally forced the mass media to pick up the story and broadcast it to bigger audiences that, in turn, could effect a change (e.g,. by calling their representatives in Washington). In many ways, science bloggers shocked the old system and built a new system in its place.

Increased reputation also came from cases in which bloggers solved scientific problems online, in public, for everyone to see. The most famous case is, of course, the Polymath Project, in which Tim Gowers and his readers solved an old mathematical problem in the long comment section of his blog post. The details of the project, as well as why it was so important for open science, were wonderfully detailed in Michael Nielsen's book Reinventing Discovery: The New Era of Networked Science.

The best such example to date is the #Arseniclife affair because it did two things simultaneously. First, the scientists with relevant expertise took to their blogs to critique, criticize and debunk the infamous paper about the uptake of arsenic instead of phosphorus by the DNA of a strange bacterium living in a Californian lake. That is not so new - bloggers criticize studies all the time, with expertise and diligence and thoroughness.

But importantly, the second thing also happened - the attempt at replication of the experiment was live-blogged by Rosie Redfield, describing in painstaking detail day-to-day lab work, getting technical feedback from the commenters, resulting in the Science paper demonstrating that experiment could not be replicated. This was a powerful demonstration of the process of Open Notebook Science as one of the things that scientists these days can do with their blogging software.

Professionalization of science bloggers

You may have noticed a few weeks ago the so-called Lehrer affair (scroll all the way down here for several representative links). In the aftermath, Seth Mnookin used his blog to further explore the professionalization of blogs and the blurring of the lines between blogging and mainstream journalism: Part 1, Part 2, Part 3.

One of the most interesting reactions by some of the Scienceblogs.com bloggers during #Pepsigate was "we are not journalists, I am not the media". But they were. If your blog is indexed by Google News, hosted by a media company, you are the media. New media perhaps, but still media. More personal, more conversational, but still media.

The issue with Jonah Lehrer was something people called "self-plagiarism", i.e., re-using one's own old words in a new article. This is the clash between old media ("our content is exclusive!") and new media ("my blog is my writing lab where I develop my ideas over time"). Judging from all the discussions, journalists, bloggers and readers are all over the place regarding this issue. Is it OK to re-use one's old words if one is not paid? Is it OK if one is transparent (perhaps using links to old posts, or quotes - I am all for it and do it myself a lot)? Is it OK on a blog but not in an article (and how does a reader know what is what)? Is it OK to reuse one's own tweet or Facebook update (because it is not always thought of as "blogging", attitude which I find silly), but not OK to reuse words that occurred on a Wordpress platform? What is the real difference here?

Obviously, the times are in flux. Some science bloggers would rather not be considered media, and not asked to write the way journalists write. Some prefer to use their blogs as writing labs, often repeating and reiterating ideas and words and sometimes entire passages in new contexts, with a new angle or twist, gradually adding and changing their own thinking over the years, introducing new readers to old ideas (after all, who digs through the years of archives?), with no intention of ever turning that material into commercial fare, e.g., a magazine article or a book.

If your beat is debunking anti-vaccination misinformation, how many ways can you do that if you post every day? And getting a couple of hundred dollars per month for editor-free posting on someone else's site is not really "professional writing" in a traditional sense. Writing under the banner of a well known media organization, while it confers respectability by virtue of being chosen to be there, does not automatically means that blogging is the same as reporting news or writing professional op-eds. There is much more freedom guaranteed. More editorial control would require much more money in exchange.

On the other hand, some science bloggers see their blogs as potential marketing tools for themselves as writers. Their blogs are a different kind of a "writing lab" - a place to write more fine-tuned kinds of pieces, more 'journalistic', in hope of being seen and then getting gigs and jobs in the media. They tend to cover new papers, rather than write broader educational pieces. They try to proofread and polish their posts better. And why not? Nothing wrong with that. Just like there is nothing wrong with NOT wanting to do that either. Many scientist-bloggers really have no journalistic ambitions. Others do. Each has different goals, thus different writing styles and forms, slightly different ethics (neither one of them wrong, just different), and different understanding what their blogs are all about.

During one of those debates about professionalization of science bloggers, I sometimes heard a sentiment that bloggers with no journalistic ambitions should not confuse everyone by being on networks hosted by media organizations. As an editor of one of those networks, I beg to differ. I want all kinds of bloggers, all styles and formats, because I want to diversify our offering, I want to have something for every kind of reader - from kids to postdocs, from teachers to researchers and more. I want to blurry the line between old and new media, make it so new, more Web-native forms of stories become a norm, not just the old tired inverted pyramid.

The world of media is rapidly changing and, in many ways, returning to the many-to-many communication that we are used to, the 20th century broadcast model being the only weird exception in history. Mixing and matching various styles of communication in one place, especially a highly visible place, is a good thing for science, as each piece will be interesting to a different subset of the potential audience, which will keep coming back for more, looking around, learning how to appreciate other styles as well.

I want cool science to be everywhere in the media ecosystem - from movies and television, to theater and music, to newspapers and magazines, to books and blogs and tweets. I want the science communicators to practice the new journalistic workflow which assumes, almost by definition, that a lot one says will be repeated over and over again in various places in various contexts. Self-plagiarism does not make sense as a concept in this model. Self-plagiarism IS the new model - that is how good ideas get pushed (as opposed to pulled) to as many audiences, in as many places, over as many years as possible.

On one hand, bloggers need to adjust. Moving from indy blogs to Scientific American put a lot of our bloggers into a phase of self-reflection. They sometimes try to write perfect posts (and sometimes need encouragement to just throw things up on their blogs even if they are not entirely perfect). But blog posts are not supposed to be, with occasional exceptions, polished, self-contained pieces. A blog post is usually one of many in that person's series of posts on the same topic, reflecting personal learning and growth over the years. Or a post on something new to the person, a way to organize one's own thoughts about a very new topic. That post is also a part of an ongoing conversation the blogger has with regular readers and commenters. That post is also part of a broader online (and sometimes also offline) conversation.

A blog post is just a ginormous tweet in a series of other ginormous tweets, usually, but an occasional polished diamond is certainly welcome as well. It is a writing lab, after all, so occasionally a perfect article may appear. But focusing on that goal is misguided - a blog is a place to think in public. And if the media host understands that, then there is no question or problem of "self-plagiarism".

On the other hand, readers also need to adjust. When they arrive at a media site, they should learn not to expect a self-contained inverted pyramid every time. Blogs have been around for fifteen years, they are not so novel any more, it's easy to see if a place is a blog, if it reads like a blog, and one should know what one should expect on a blog. I think that most complaints in the comments are really trolling - people who dislike what scientific research concluded complain about typos, or format, or length, in order to divert the discussion that makes them personally uncomfortable. Our bloggers have full moderation powers to deal with such comments in any way they see fit.

Saving science blogs forever

A couple of weeks ago I was at a meeting at the Library of Congress about archiving and preserving all the science that is happening online - from data to journal papers to discussions. This includes blogs and social media as well. Here are is my own personal summary of what I learned there.

- Capacity. Apparently, this is not a problem. LoC has as much space as needed to save everything forever.

- Technical difficulties and link rot. Saving plain text is easy. But many formats, and especially concerning multimedia, will require some tough technical gymnastics. There are so many formats out there, it will be hard to make a repository that is easily searchable, browsable, complete, and usable. But it is not impossible.

I am a total technological Luddite - apart from HTML (and heavy use of the Web) I do not know anything about computers, code, internet and how it all works. But I know that if Dave Winer puts a lot of effort and time into a project and thinks it is important, one is wise not to ignore it. It may not work, or it may, but his track record suggests one should pay attention. After all, he picked up an abandoned old project and from it developed RSS (no, not RSS readers, the actual RSS infrastructure underneath it) - yes, the stuff that all of the Web runs on right now, how do you think you get all those articles brought to you, listed, automatically tweeted, etc.? Via RSS, of course. Dave also developed the first blogging software, promoted it, blogging took off, and now blogs are ubiquitous. Dave invented podcasting, and now it's all the rage.

So I am watching carefully what he is doing with Radio2 and River2. I still have to play with it, see if I can figure out how to do it myself, but my first impression is that RSS, a super-simple blogging platform and something like open source Twitter had a wild orgy and this is their offspring. This looks like an easy, simple and open way for anyone to put any kind of content anywhere online, to curate one's own and others' content, and to easily move stuff from one place to another. And this last piece is, I think, the key. One can move a blog post, or entire blog, from one place to another and that does not change the URL and does not break the links. If something like this takes off and everyone uses it, the problem of link rot will become very minor.

And link rot is a big problem. After #Pepsigate, many bloggers feel the freedom to move from one network to another, or on and off networks, with considerable ease and speed. What happens to the archives? A couple of weeks ago, someone at National Geographic flipped the wrong switch and years of archives from almost a 100 science blogs were gone. Completely gone, even blocked from viewing at Wayback Machine and Internet Archive and Google Cache and what not. It took a dozen of tweets to get the attention of some of their bloggers who contacted the relevant person who flipped the switch back on Monday morning, making all those historically very important archives accessible again. See how easy it is to erase history? Perhaps with Radio2+River2, if it is universally used, this would not be a problem. Wait and see.

- Curation. For a huge archive to be useful to users - and that's what such an archive is for - it has to be organized in a meaningful way. Should it be by topic? Or by person? By narrow area, or by a whole discipline (human genome or entire genetics)? Or by technological platform (tweets to the left, datasets to the right, blog posts straight ahead)? Or separate independent blogs from network and institutional blogs? If all of the stuff all of the science bloggers in the world have ever posted on all of their blogs is to be archived and preserved, how should that material be organized? Chronologically, minute by minute? Or in chunks akin to blog carnivals? Or sorted by topic? Should papers be connected to blog posts that discuss those papers? Should #arseniclife be its own "unit"?

Another problem is privacy. Facebook has many privacy settings. Tweets, and some blogs, occasionally switch from private to public to private - what is a repository to do with stuff that is uncertain if it is private or public at any given time? Should the archiving be opt-in? In that case, how does one ensure that most of the people opt in so the repository is of decent completeness?

Also, many blog posts are reactions to other sites. A blog post may debunk a claim from a creationist, or anti-vax or GW-denialist blog, linking to it and quoting from it. If science blogs are preserved, but anti-science blogs are not, there will be link rot right there, preserving reactions without the context of the reactions. So perhaps all those antiscience and pseudoscience blogs should also be preserved - they may be bad science, but they are an important aspect of today's society and will be interesting to future historians. In which case, how does one label them? They are clearly not science blogs (although some of them pretend to be), so they should not be just thrown into the same bag. Which is why this delineation between "real" science blogs and other stuff has to be made.

And how will this decision be made and by whom? Should something like ScienceSeeker be used as an edited, peer-reviewed collection of respected science bloggers? If so, how does one get more bloggers to know about this and apply to it?