Update: 09-07-14: Due to the interest this post has generated, I’ve put a sign up form at the bottom of the post for people who are interested in being part of a collaborative effort to build a PKB solution.

As a graduate student, much of my time is spent acquiring knowledge from various sources. Indeed, this is a quintessential feature of academic work, which I enjoy very much.

Since so much knowledge is now digital, there is no shortage of material from which I can learn. On the contrary, I’m usually drowning in too much information. But that’s a discussion for another day. For me, right now, the major problem is that I lack an easy and effective system for capturing and recording my learning. My memory alone will not suffice. What I need is a personal knowledgebase, which I define as an external, integrated digital repository for the things I learn and the resources from which they come.

Many have tried to solve the problem I’m encountering now, and numerous digital solutions exist. Some of the most popular options include Evernote, Devonthink, and Voodoo Pad. Over the course of my graduate studies, I’ve tried many of these programs, but all have fallen short of what I really need, given my own workflow.

Therefore, I’m writing this post to sketch out what my ideal personal knowledgbase would look like. I hope that, by doing so, I might get some clarity on what I really need and how I might be able to craft my solution. And, I hope that, if you, dear readers, find my design interesting and valuable, we might work on coming up with a prototype.

Requirements of my Personal Knowledgebase

The features of my ideal PKB are the following:

Minimal Effort to Capture and Maintain This is probably the most important feature of the entire thing. I want the process to be as seamless as possible. Input needs to be super easy. Whether it’s grabbing extracts from sources, or doing manual input (i.e. typing stuff), the process needs to be fast and easy. A good example of the ease I’m talking about would be that of nvALT, a simple plain text program. Working just from the keyboard, whenever a thought occurs to me, I can instantaneously open up a dialog box, write a few lines, then close out and return to my work. That’s how my head works, and so I need tools that work in the same way. In the same vein, once information gets into my system, I want to be able to manipulate it quickly and easily, either with a graphical interface (see below) or simple command line functions. And, of course, I want to access and use my PKB with similar speed and ease. There is no point in having a PKB if I don’t want to use it because of bloat.

All in one place Right now, different aspects of my makeshift PKB are in different locations and applications. This is a messy situation. In my ideal PKB, everything would be in one place – the resources, the extracts, the pages, etc.

Non-proprietary (open source) file formats I don’t want to use proprietary file formats mostly because I want to make sure my knowledgebase is future proof. I don’t want to rely on a particular piece of software to use my system. Moreover, by using non-proprietary and open source files, I can access and use my PKB across all platforms. So, for example, I would (and already do) use text files for my text input, since this format can read and accessed across all computers, and it is never going away.

Ability to accept various kinds of input (text, video, audio, etc.) I mostly work with text, but increasingly my learning includes video and image files, especially in science. So, I want to integrate all these different source types seamlessly.

Linkability (is that even a word?) One of the most powerful features of the internet is hyperlinking. In the same way that the web is a dense network of pages, I want my PKB to be richly connected. Wikipedia is a prime example of a knowledgebase with abundant hyperlinking.

(is that even a word?) Semi-automated input and organization Gathering information is fairly easy. It is the extraction and organization of the information that takes time. It’s true, all good things take work, but if the amount of effort needed to create and maintain a PKB is too high, I’ll likely not use it much. So, I need my ideal PKB to have functions to automate extraction and some elements of organization. That’s why I say ‘semi-automated’. I know full automation is not feasible, since the imposition of meaning onto the raw information is something that I must do, not the computer. But there are a lot of things the computer can do, and so, I’d like to make it work for me as much as I can.

Searchable The ability to search is one of the major advantages of a digital repository, as we all understand.

Multiple organization schemes Organization is the difference between a masterpiece and a mess. You can have the best raw ingredients, but if you don’t impose structure on those components, the value of those components is minimal. Meaning is about organization and relationships. Thus, I need my PKB to have multiple modes of organization. A folder (directory) structure is what we’re accustomed to, but there are other ways of relating information, such as outlines and mind maps. I’d like to seamlessly move between different ways of viewing the same information.

Web-based (plus or minus) Where will my PKB live? It could be on my computer. I’m OK with that. But no one else would be able to take advantage of my efforts. I’d like my knowledgebase to be accessible to other people, and the web allows that. Another thing is that, if I will be utilizing hyperlinks frequently, I need to keep my PKB in the same place, otherwise all the links in the system will break if I ever move anything. Thus, it would be wise to start building my PKB in a place where I know I will never move it. Hosting on the web would seem like a good option, or maybe a personal server.

(plus or minus) Accessible and operable via Graphical (GUI) and Command-Line interfaces I want to be able to interact with my PKB by graphically (using a mouse to move stuff) or via the command line, for more niche tasks. I find using a GUI to ‘drag and drop’ things to be ideal for imposing idiosyncratic organization (I’ll say more below).

Cross platform We all have multiple devices now. For me, I work from my laptop and my iPad, and sometimes iPhone. I want to be able to access my PKB from all these. This is another reason a web-based PKB would be good, since all devices, Mac or PC, mobile or desktop, can access the web through a browser.



A Personal Wiki – The Closest Thing to my Ideal

Many of the features that I’ve outlined here are available through existing tools. The closest thing to my ideal PKB is a personal Wiki, built on a platform like MediaWiki or DokuWiki. The problem with these tools though is that they lack the organizational features I’m seeking, and they also don’t have strong automation. You can create a rich web of hyperlinked information, but it all has to be curated and organized by hand (lots of typing, lots of time). Wikipedia works well because the effort is crowdsourced. But I’m just one person, and so I can’t spend all day pecking away to create some record of my learning, otherwise I’ll spend all my time documenting, rather than actually using my PKB.

I’ve used Devonthink Pro Office (DTP) for my PKB for a while, and it is quite powerful and has many of the features I desire. Academics, particularly those in the humanities, have used DTP to good effect. DTP got me only so far though. I could extract information, form hyperlinks, and use ‘Smart Groups’ to automate organization, but beyond the sorting into those Smart Groups, I could not add any additional organization, like hierarchical outlining. It would just be a jumble of knowledge nuggets that I couldn’t do anything to. Moreover, DTP is a proprietary database with its own linking scheme. If I ever were to decide to not use DTP, so much of my efforts would be lost. I’d rather not rely on that.

My ideal PKB would be a hybrid of existing tools. I’d like the open and easy format of a Wiki, the smart automation of Devonthink, and the organizational power of an OmniOutliner or Tinderbox.

I’ve seen some very interesting homebrew PKBs that also approximate the features of my ideal PKB. Prominent among these is the researchr platform created by Stian Håklev, a graduate student at the University of Toronto. Long before I was thinking about this topic, Stian was tinkering with his researchr Wiki, which aims to integrate all the different sources he reads. This prototype inspired me to pursue my own PKB solution. Several spin-offs on the researchr wiki also inspired me, such as this one from a former graduate student at Carnegie Mellon. Still, impressive as they are, these personal wikis require significant effort to create and maintain, which would be a deterrent for me. Nothing will be completely effortless, but I want to minimize the amount of extra energy I must dedicate to capturing and documenting my learning. Thus, in its current form, researchr is not the right tool for me. But I really like the brilliant vision of what it could be.

Alternative homebrew solutions include simply using the file system on your computer with additional support from tools like nvALT and Devonthink. Jason Heppler, a history PhD student, describes this setup in over at GradHacker. I like the simplicity of the file system, and now that Mac OSX incorporates tagging, systems like these make sense. But still, key components are missing for me.

Workflow-Inspired Design

What I’m doing right now

So what do I actually want? To answer that question, I need to take stock of my current workflow.

The basic processes and products in my academic workflow are the following:

Input

Learning (read/watch/listen) from resources

Engaging with resources: Filtering and annotation

Capturing (extracting the best bits from resources)

Processing: Adding metadata and organizing

Using

Output

Peer-reviewed publications

Teaching materials

To help visualize this, I’ve diagramed how I work.

Resources

My workflow begins with resources. These are the raw materials from which I learn. I gather information mostly from texts in the form of PDFs. Occassionally I’ll read a paper book, and increasingly, I’m watching videos on sites like Coursera. When I’m in a meeting or other face to face situation, I also jot notes using my iPad.

Where do the resources reside?

Right now, I keep most of my resources in my file system. Scientific articles I keep in Papers. All other documents I keep either in Devonthink or on my desktop. Videos I mostly watch on the web.

This situation is not ideal. I’d rather have all my resources in one place. The best case would be to house the resources in my PKB. But right now, I don’t do that.

Engaging with resources

Only a fraction of the information I’m exposed to is important to me. In the case of text (PDF), I’ll indicate that something is important and worth capturing with a highlight. A certain piece of information might also spur me to add my own thoughts or comments via an annotation.

If I do nothing else at this point, my highlights and comments are locked within the PDF. This greatly diminishes the value of this information. What I need to do is free them from the PDF prison. For that, I need to capture.

Capture

In my current scheme, I capture by extracting highlights using a custom Applescript. It takes each highlight from the PDF viewer Skim and sends them as individual Markdown-formatted text files to Devonthink Pro.

An individual extract looks something like this. I took this highlight from a recent paper in the journal Neuron.

Each extract has:

A paraphrase

The original quoted content

A hyperlink to the highlight in the PDF

My own comments

A formatted reference

This is all the information I would need when using this extract in a downstream application like writing a publication.

Processing

Once inside of my Devonthink database, I can add other relevant information such as tags.

Also, using the Smart Group function, I can replicate a single extract to multiple folders. So, for example, the above extract might be appropriately placed in Smart Groups called “Wnt”, “TRPV1”, “DRG Neurons” based on the specific internal contents, or in higher level groups like “Pain”, “Neurobiology”, or the like, using tags. This sorting based on pre-determined criteria is what I mean by ‘Semi-Automation’. Using Smart Group and tagging, I don’t have to duplicate information in multiple locations. The computer does that for me. This is a huge time saver, and in my ideal system, I want to maintain these features.

But this is where my processing stops though. Beyond grouping text files in folders, I can’t impose any more structure. What I’m left with is just a bunch of text files, whose relationship to each other I cannot express given the constraints of Devonthink.

So I’m at an impasse.

Future vision: What I want to do, but can’t do with my current tools

I’ll now describe what I would like to do with my ideal PKB.

My workflow would largely be the same all the way to the point of extraction.

Instead of using Devonthink, imagine that my PKB has the structure of a Wiki, composed of interlinked pages.

Highlights from a PDF source would be sent to a kind of holding place, a ‘global inbox’ (to steal a term from Devonthink), where they await my attention. Here, I will apply additional processing such as adding comments and tags.

Then, based on tags and content, the extracts would be distributed to myriad ‘topic pages’ based on pre-determined criteria. The distribution mechanism I’m thinking about is pretty much what a Smart Group does now. Based on search criteria, extracts that meet those criteria will be sent to all the appropriate topic pages. Importantly, a single extract can be represented on more than one topic page automatically. With a conventional wiki, since all the content is hand-written, information must be replicated across several topic pages if it relevant to more than one. This is tedious, both on the input side, but also during updating, since you to go all pages where some information is represented and amend it one by one. I want to avoid this.

Using the same extract as above, this is what it would like inside of a topic page entitled “Wnt Signaling”. Each topic page has its own inbox, which is where extracts sit until I place them into the body of the topic page.

To move an extract into the body, I simply drag and drop with my mouse. Applications such as OmniOutliner (which I used to make this mockup) already have this kind of functionality. On the web, Workflowy is an exemplar of a sophisticated outliner that allows for drag and drop of content.

Then, once the extract is in place, I can choose to expand it to see the additional information that I care about. Since I don’t need to see that information all the time, I can collapse and just see the paraphrase, which is usually all I’m interested in. The metadata are only really relevant when it comes time to use this Topic page for writing a publication.

For me, extracts are movable parts, or, as Daniel Wessel at Organizing Creativity calls them, Lego Blocks. Any extract can be moved within a topic page, and also between pages. At the end of the day, there is just one extract, but it is replicated among the numerous topic pages where it is relevant. Any change I make will be propagated throughout my PKB, unless otherwise specified (there are times when you don’t want to propagate a change).

An additional way that I might move extracts into certain locations in the topic page would be to have subheadings or subsections with their own ‘smart folder’ search criteria. So, I envision that a subheading would have its own search criteria based on content and tag information, and so new extracts would automatically move under the subheading, without having to be moved manually from the topic page’s inbox. If I knew that very specific kinds of information would go into certain places in my knowledgebase, then this method would be incredibly useful. In this way, I could build dynamic, ‘smart outlines’ that are updated as new information comes into my knowledgebase. Konrad Lawson at the blog Munnin describes a similar idea for a smart outline, from which I drew inspiration for this implementation here. This kind of page would be incredibly useful for writing publications, since as I’m reading and capturing, I could also be sending knowledge nuggets to one or many topic pages that would support my later writings.

Other features that might be interesting would be the ability to hover over an extract and see all the other pages where that extract resides. I could also see a feature that would visualize these lateral relationships using a kind of network graph. Getting a big picture view of all the connections would support creativity and idea generation.

The Payoff – How Would I Use my PKB?

So what do I get for my fiddling? Why do I even want this elaborate structure of information and how would it benefit me?

My number one use case would be when I’m wanting to write a research article or review, and I ask myself “What do I already know about X?”. Right now, I rely on my memory to direct my efforts. I can search my Papers database but that usually just takes me to whole articles, rather than to specific content within the paper. I don’t want to re-read an article. I want to go right to the meat. Without a good PKB, we in academic work reduplicate our efforts and waste time. Moreover, with the moveable extracts, I could drag and drop the best parts into an outline for the paper, and this would support my writing. No need to type everything out again if I’ve already invested that effort. Walton Jones describes how he uses his own custom PKB to support his learning and writing academic papers. My system would perform similarly for me.

Another important use case would be to use my PKB pages to support or template some kind of activity I want to do. For example, recently, I’ve been learning to use R to do statistical analysis of some biological data. Learning by doing is my favorite way to learn. As I’m going through the rigors of working up my data, I’m learning many useful tips and procedures, ones that I’m sure I’ll have use for in the future. I glean these nuggets from PDF books, website and videos, predominantly. When I return to doing the same kind of work in the future, I’m going to want to quickly locate the right resources, and to go directly to the relevant parts, and not have to wade through a whole book or website. Having an ‘R Analysis’ topic page with all my extracts, insights and comments would be incredibly useful. I’ve informally been keeping a kind of topic page in nvALT with some links or notes to myself regarding where the good info lies (e.g. see ‘Learning R page 14’) but this is not really scalable or sustainable. I need something more robust and systematic, such as the PKB I’ve described.

I would also like to use my PKB to share information with other people. Using my R example again, I know I’m not the only person in the world who would like to have a crib sheet full of R tips from around the web and from books. I’d love to send my PKB pages to colleagues who might also benefit. I’ve experienced this firsthand with the Learnstream wiki. I came across an informative page on concepts and it helped me write a paper I was working on. I’d love if other people could benefit from my research efforts in the same way.

Conclusion

So that’s my ideal system. A kind of hybrid between a Wiki, Devonthink and OmniOutliner. Given that there are existing tools that already perform certain of the functions I seek, I know what I’m proposing is in the realm of the possibility. But it’s now a matter of bringing it all together. The workflow I’ve described is my own, but I imagine that many other academics work in a similar manner and could benefit from such a tool.

I’ve described my ideal PKB from the user’s perspective, since I don’t have the coding knowledge to conceive of what the backend would look like. For the coders out there, what would be necessary to build something like this? Is it feasible to adapt existing Wikis and add an OmniOutliner-esque interface? Is it reasonable to want to store all my resources (PDFs, images, videos, etc.) on the web? Maybe with my own server? I’m revealing my ignorance here, but I’d love to know how I could craft the system I’ve described, and I’m not sure where to start.

Anyway, that’s what I’ve got. As always, I’d love to hear comments, criticism, feedback, etc.

For those who are curious to try out existing tools, I’ve compiled some resources below.

Resources

Off-the-Shelf Tools

Homebrew Solutions

Random Articles and Blogs on Capturing Knowledge

Devonthink for Historians

How to Start Using Devonthink – This site has a lot of insights on academic work and how to be systematic and organized in that pursuit.

Hackademic – Tools: A great general resource on crafting an academic workflow to capture knowledge. The author of this blog knows his stuff

Want to help build an awesome Personal Knowledgebase?

Sign up here