It is now common for researchers to post original materials, data, and/or code behind their published research. That’s obviously great, but open research is often difficult to find and understand.

In this post I discuss 8 things I do, in my papers, code, and datafiles, to combat that.

Paper

1) Before all method sections, I include a paragraph overviewing the open research practices behind the paper. Like this:

2) Just before the end the paper, I put the supplement’s table of contents. And the text reads something like “An online supplement is available, Table 1 summarizes its contents”

3) In tables and figure captions, I include links to code that reproduces them

#Code

4) I start my code indicating authorship, last update, and contact info.

5) I then provide an outline of its structure

Like this:

Then, through the code i use those same numbers so people can navigate the code easily [ ].

6) Rule-of-thumb: At least one comment per every 3 lines of code.

Even if something is easy to figure out, a comment will make reading code more efficient and less aversive. But most things are not so easy to figure out. Moreover, nobody understands your code as well as you do when you are writing it, including yourself 72 hours later.

When writing comments in code, it is useful to keep in mind who may actually read it, see footnote for longer discussion [ ].

Data

7) Codebook (very important). Best to have a simple stand-alone text file that looks like this, variable name followed by description that includes info on possible values and relevant collection details.

8) Post the rawest form of data that I am able/allowed to. All data cleaning is then done in code that is posted as well. When cleaning is extensive, I post both raw and cleaned datafiles

Note: writing this post helped me realize I don’t always do all 8 in every paper. I will try to going forward.

In sum.

1. In paper: open-research statement

2. In paper: supplement’s table of contents

3. In figure captions: links to reproducible code

4. In code: contact info and description

5. In code: outline of program below

6. In code: At least one comment per every three lines

7. Data: post codebook (text file, variable name, description)

8. Data: post (also) rawest version of data possible