I’m going to get on a soapbox, here. I hope that this elicits some debate/discussion/verbal thinking aloud by others.

I was recently forwarded a posting from LinkedIn,[1] which, like many, discusses the coming wave of needs for data scientists. The amount of data generated is increasing exponentially. The person forwarding to me the link wondered why archaeologists are rarely included in the list of ‘big data’ fields – fields which have developed capacities to organize, manage, mine, and analyze large sets of information to extract meaning and insights about questions of relevance for a myriad of societal needs. I wonder that as well. I think it may have to do with some misconceptions by those outside the field and missed opportunities on the part of the humanities and social sciences.

To break down the post’s argument, data is increasing, and we need more data scientists – people who understand the power of large datasets and can derive meaning from them. We therefore need more STEM (Science, Technology, Engineering, Math). There are more articles, blogposts, and reports using this argument than you or I can shake a stick at. It is a common rallying cry – more STEM. We hear this cry from the White House, Congress, state houses, and the business world. More STEM.

I would disagree with the perceived solution. Don’t get me wrong – I have nothing against STEM. I love STEM. Some of my best friends are involved with STEM. I’ve published in STEM. The amount of advances in STEM research is incredible, with new discoveries daily. The voice of Carl Sagan has been joined by Bill Nye and Neil deGrasse Tyson, persuasively arguing for a love and appreciation for STEM topics. My own work in STEM has opened up incredible avenues for research and discovery. STEM is awesome.

Understanding the complexities of data bits and being able to pull and synthesize and analyze information are important skills to have. We need to develop more people with them. However, where does one generate the questions that need the big data – the questions that are not answerable by the sums of ones and zeros, but are soft and malleable – the questions that involve the complex, nuance, and multivariate elements which constitute our integrated, globally-connected world? Where does one go to understand context, predict implications and impacts?

The social sciences and humanities. As much as there is a need for the skills in data management and manipulation, there is a need for understanding what the data means.[2] As argued by the chief analytics officer at SAS Australia:

The software can do the crunching of information but the talent to interpret lies with the data scientists who know how to apply that number-crunching capability.[3]

Consistently, it appears that those from within the world of data science see the need for skills in application, creativity, and synthesis. I DO NOT mean to imply that critical thinking, synthesis, and application are lacking or not emphasized in STEM subjects. This clearly happens. I DO NOT mean to imply that there aren’t real-world questions and answers emanating from these areas. What I do see, typically, is that proponents of STEM often miss that these skills are developed equally – and for some situations most effectively – in the social sciences and humanities. Within society, there seems to be a popular sense that combining English and math is somehow an ill fit.

Bollocks.

There are a myriad of examples of the ‘digital humanities.’ The NEH has had a DH division for some time now, and the number of conferences, symposia, books, and articles is quite large and growing. Within archaeology, once can point to a number of blogs[4], research centers/clusters[5], and data repositories[6] – the ones referenced are just examples that immediately pop into my head. There are others (and, dear reader, if you wish to share your favorites, please feel free). There is clearly a substantial and growing number of people involved in this type of research.

So, why doesn’t this permeate through to a popular understanding of humanities and social sciences as being a part of ‘big data’?

Simply put (which should always indicate that what is to follow not at all simple), some of us missed the boat, and it’s easier to create new data than transfer old data.

As humans, we’ve been recording information for millennia. On caves, tablets, papyri, books, and now as 1s and 0s. Just as it’s a pain to move from vinyl to tapes to cds to mp3s to digital streaming, it takes energy to move information from previous versions to the new. This takes time, but we are well down this road – Project Gutenberg and Perseus come easily to mind. Part of the issue, then, is that much of our data hasn’t been in digital format, so there’s been no need to go there.

Oftentimes, these are projects that involve humanists who have picked up skills in data science on the side, or in some cases hybrid individuals who have taken the plunge and gained formal training in humanistic and computational fields. Formalized programs or degrees in DH are appearing, but I get the sense that these are seen by some as ‘soft’ on the humanities – somehow not true to the traditional rigor of humanistic inquiry and put in place to ‘serve’ those with real questions.

What I would hope to see is a shift in perception – inside and outside of academia. Data is data – whether stored on a tablet, scroll, book, or in a table. What I see as happening is that when the digital revolution hit, we didn’t see it as the new means of information storage as we should have. We didn’t retool, and we certainly didn’t alter our training of students to incorporate this new system. Instead, we wagged our heads and dismissed these approaches as incompatible and ill-fitting to our pursuits. The proponents of technology didn’t assist by seeing the easy association of “T” with “S,” “E,” and “M.” Today, our information is in analog and digital form. Understanding how the digital is organized not only unlocks interesting approaches in the humanistic and social sciences, but increases the capability to combine these elements with information coming out of STEM.

And that, is where some awesomely insane things can happen.

I would like to see a future that consists of informatically trained humanists and humanistically trained informaticists. In a perfect world, informatics should be driven across the curriculum. Information is used by all aspects of society. Increasingly, information is digital. Acquiring, organizing, and analyzing this information, therefore, is a need for anyone – it is a skill that should not be relegated to one sector of society. We have terms for a period when information and the keys to it are restricted to a confined group of people – ‘Dark Age.’

This is a call for greater engagement. A call for an end to the notion of ‘digital humanities’ in place of a construct where ‘digital’ is a given. A construct where it is expected that anyone coming out of an institution of higher learning can be assumed to have literacy in data mining and processing. Computational skills should be released from the view as the domain of those with specific skills. Those skills should be seen as basic to any form of inquiry. Certainly, there’s room for programs in Computer and Data Science, but all should have an understanding of who these tools come to bear on their own areas of interest.

This will require those of us in the universities to think about what and how we teach; what are the principle building blocks of research skills required for furthering the progress of inquiry; what are the elements that have resonance and meaning both to the pursuit of knowledge, but also transfer to other careers and opportunities? Answers/suggestions to these questions are welcome.

Until the revolution happens, I would note that archaeology is a place within the social sciences and humanities where the nature of the work deals with ‘big data’ – much of it fragmentary, from a variety of sources from a wide array of disciplines, rarely in the same format or scale. This exploration is not a unique occurrence – archaeology routinely requires generating and engaging with these wide-ranging data types. Archaeology develop questions that demand people to collect, organize, process, and synthesize data to develop models and interpretations about complex natural and human interactions – questions that regularly cross across the disciplinary boundaries of the humanistic, social, natural, mathematic, and computational sciences. It is set of inquiry that routinely requires practitioners to retool their skill sets and toolboxes based upon the question at hand.

That sounds like an exciting application of data science.