



A simple n-gram comparison between philosophical texts and scientific texts. Philosophical texts are scraped from the Notre Dame Reviews , which has reviews of the famous philosophical texts/books (4813 reviews in total between 2002-2018). Scientific texts are the abstracts of the articles on atmospheric /climate science scraped from the AmericanMeteorological Society Journals . The total of 44886 scientific abstracts span a period between 1944-2018.





N-grams are combinations of the adjacent n number of words in a given text (e.g., "I am the" is a 3-gram or "my own" is a 2-gram). I considered only 1- and 4-grams in the comparison, the 2- and 3- gram comparison is not very interesting.

I scraped these texts for another work in mind (my next post on finding similar texts using machine learning), consequently, philosophical texts are over a variety of subjects while the scientific texts are on a specific field. This creates a bit of an apples-to-oranges situation. However, as I worked on the texts I realised a few interesting differences even with such a non-ideal comparison, so I just decided to share. Scientific texts from journals covering a wide variety of fields such as Science or Nature would be ideal for a better comparison, though I am sure it wouldn’t be so trivial to scrape down the content of those high-profile journal web sites (I already got a temporary ban from the American Meteorology Society web site, and I think I made them change their server settings to not allow frequent connections ) .





The web scraping was done using Python Beautifulsoup library. The n-grams were obtained using the vectorization method in the scikit-learn package of Python. The frequency of occurrence for each n-gram is calculated via the TDF-IF scores (a measure of the weighted frequency of occurrence of words) within the same machine learning package.





Click the figures to enlarge the plots for easier reading.





1-grams





Gender bias in philosophy? Thats what I heard. The reason we don't see that in scientific texts is because journal articles are not written to indicate the gender of the authors (they are always written in passive voice). It would be interesting to identify the gender of the authors of the scientific articles and see how they look like.

The common words in scientific texts contain more field-specific terms.

Philosophical texts give the message: "life is good" (selective misperception).

4-grams

4-grams are more indicative of the differences in the articulation and the command on language. Philosophical texts contain more sophisticated language which is harder to understand. (Another thing I heard before: Bad writing is promoted in the field of philosophy)

I think 4-grams show the picture: science is evidence based and philosophy is mental.

Three of the four most common 4-grams of philosophical texts:

on the one hand



on the other hand



at the same time

... philosophers just love ambiguity don't they?

One thing that jumped out to me from the scientific texts was how beloved the El Nino phenomenon has been for the last 5-6 decades. However, the northern hemisphere has been a more popular subject than the southern possibly because of the abundance of land up north and its effect.

When I combine 1- and 4-grams of philosophy, the message I get changed: "that there is no such thing as good life".



