Exploring Nursing Ghost Stories through Machine Learning: Evidence of Haunted Hospitals or End-of-Life Experiences?

NOTE: Click to open graphics for an expanded and clearer view of the findings they contain



As reported in earlier posts, the Allnurses.com web site hosts a long-running moderated discussion thread called “Nursing Ghost Stories” (NGS). The NGS collection spans over a decade (2005-2017) amounting to 199 pages as of the time of this writing. As a dataset NGS contains multiple first and second hand accounts and commentary on paranormal type experiences

The archive contains some classic examples of ghosts and hauntings phenomena. Patients were generally the percipients in ghost experiences. Sometimes the ghosts in question appeared to be former nurses in period dress, or former doctors and patients, or former area residents.

Surprisingly however, these kinds of ghost encounters did not dominate the collection. And a wordcloud analysis had revealed that occurrences of the word “haunted” or its variants were seldom mentioned throughout the entire archive

In actuality, the NGS archive conveys several varieties of psi and post-mortem survival phenomena. The archive contains several examples of extrasensory perception and presentiment in particular. The archive also contains some reports of near-death experiences (NDEs)

There were also examples of after-death communication (ADC), which are sensed-presence or apparitional experiences involving deceased family members or friends. Unlike hauntings which are place-centered, ADC encounters are person-centered. ADC encounters are more common among widows and widowers

However, the more representative apparitional encounters involved nearing death awareness (NDA) type experiences. In NDA situations, terminally-ill patients experiencing death-bed visions will have perceptions of welcoming apparitions of deceased relatives or loved ones

Terminal patients will also appear to hold conversations with persons who are not physically present in their room. Sometimes nurses described these aspects of NDA experiences as dementia

It is also not uncommon for gravely-ill patients to be alert and conversant in their final hours before death, a phenomenon called “terminal lucidity”

Provided below are examples of exchanges regarding NDA situations as characterized by nurses working in long-term care and palliative care settings

I’ve been a hospice nurse for 5 years. I have been with hundreds of people at the time of their death & I can tell you first hand that if the patient is alert enough to speak, you’ll hear them talking to loved ones that have already passed over That is so true. I, too am a hospice nurse and when pts. start talking to their dead relatives, you know that they have about a week MAX before they are gone

From experience I’ve learned that when a pt tells you they’re going to die…they usually do…and if they start talking to dead family members…they usually die…it’s like the family members have come to take them…..



As a follow-on to the earlier wordcloud project, we wondered whether unsupervised machine learning, in particular topic generation models, could discover the abovementioned themes in the NGS archive



Generative topic models view documents as having a latent semantic structure of topics that can be inferred from co-occurrences of words in documents



For this project, the Latent Dirichlet Allocation (LDA) topic model was employed. LDA views documents as probability distributions over topics and topics as probability distributions over words

All documents share the same collection of topics, but each document contains those topics in different proportions. The LDA algorithm samples words across topics until it arrives at topics and word selections that most likely generated the documents

Various packages and libraries for natural language processing within Python were used to include: the Natural Language ToolKit (NLTK) for processing the data set; scikit-learn to prepare and fit the LDA model; pyLDAvis to display the results and t-Distributed Stochastic Neighbor Embedding (t-SNE) to map topic distances

The project pipeline involved: data set processing; conversion of words and documents into a document-term matrix and vector space; fitting the LDA models; and displaying the results

Processing. The data set was decomposed into 199 documents from its constituent web pages. In contrast to the wordcloud project, the set of stopwords was enlarged to find meaningful insights in the NGS archive

The core set of stopwords consisted of commonly-used prepositions, conjunctions, and contractions. Stopwords from the wordcloud application were used as a start point for this purpose

Since the archive consisted of first or second hand accounts, words related to stories and/or storytelling were added to stopwords, along with words related to the maintenance of the thread

Since spontaneous experiences can occur at any moment, words conveying times were removed. While many experiences were singular events, numeric references involving cardinal (e.g. one, two) and ordinal (e.g. first, second) rankings were removed

Titles of persons were removed (e.g. Mr., Mrs., etc.); however, person and gender types (e.g. man, woman, etc.) and interpersonal relationships (e.g. family, friends, or strangers) were preserved

Domain-related words relating to patient care or standard procedures were removed (e.g. hospital, unit, shift, staff, work, station, monitor, code)

Conversion. Vector transformations converted the data set into a document-term matrix for mathematical processing. The rows of the matrix correspond to documents with columns corresponding to the frequency of a term

Count vectorizers count word frequencies. Term Frequency-Inverse Document Frequency (TF-IDF) vectorizers normalize (divide) word counts by their frequency in the documents



Both vectorizers converted words to lower case and removed non-word expressions. The vectorizers were parameterized to look for bigrams (or words that were often used together)



Model Fit/Display. The LDA model was fitted using ten topics. Words within topics were sorted and ranked with respect to their frequency in and relevance within a topic

The LDA model used Count and TF-IDF vectorization and ran with a maximum of 100 iterations. LDA model results were displayed using pyLDAvis and t-SNE to map topic distances

Results. Although topics produced from the model are unlabeled, words within topics usually can be woven into a coherent theme

The first four pyLDAvis graphs provide the top 30 words and bigrams in Topics 1 through 4 using Count vectorization

Topic 1 is the most representative of the body of stories in the thread and generated around 86% of the content. Words in Topic 1 included: “nurse” and “patient”; both nurses and patients were percipients and sometimes sources of “ghost” experiences. If apparitions represented unrecognized persons, patients had “asked” whom they “saw.” Many apparitional encounters involved patients who were “heard” “talking” to deceased “family” members or a “friend.” These telepathic types of apparitions were often described as “sitting” near the bedsides of patients, or transiting their rooms or into an adjacent “hall” on their “floor.” Overall, this could be considered an apparitional experiences topic

Topic 2 is derived from user commentary and seems reflective of internal varieties of psi functioning. Words in Topic 2 included: “dreams”, “feel(ings)” and a “sense” of awareness or presentiment of events that were happening or about to “happen”, usually in connection with the deaths of family members. In other cases the dreams were possible telepathic connections with lost “loved” ones. Overall, this can be considered a extrasensory perception topic and it generated 7% of the content

Topic 3 appears reflective of external forms of psi and survival phenomena to include auditory and physical encounters commonly associated with hauntings and poltergeists. Words in Topic 3 included: “haunted”, “voice(s)”, and other imitative sounds such as “music.” There were also reported instances of anomalous telephone contact involving phone calls from the dead and various "creepy” experiences in connection with affected televisions, call lights and other electrical appliances. The topic also included accounts involving perceptions of animal psi. Overall, this could be loosely considered a hauntings and poltergeists topic and it generated around 4% of the content

Topic 4 is also derived from user commentary and seems reflective of general discussions on the paranormal, religious and exceptional experiences. Discussions included: “paranormal” television, “movie” and “radio” entertainment; and “photo” or other evidence from paranormal investigations. Discussions also involved ghost stories outside a nursing context; some were urban legends and a few were probably larks. Overall, this could be loosely considered a paranormal discussions topic and it generated around 3% of the content

The fifth pyLDAvis graph provides the top 30 words in Topic 1 using TF-IDF vectorization.



The findings were close to those encountered for Topic 1 with the Count Vectorization. However, it appears to be a combined apparitional experiences and extrasensory perception topic accounting for 94% of the content. This consolidation arises from the fact that TF-IDF vectorization lowers the contribution weight of commonly used words

This project again demonstrates the usefulness of topic generation models for finding meaningful patterns in masses of unlabeled or unstructured data. The LDA topic discovery method indicated several varieties of psi and survival experiences that went beyond haunting stories



Many apparitional encounters described in the archive represented the union of nearing death awareness (involving death-bed visions of welcoming apparitions) and after-death communication experiences (involving apparitions of deceased family members and friends)

Even though the algorithm knows nothing intrinsically about the above experiences, the model was able to infer topics and words corresponding to the most representative kinds of encounters

Greater insights might be gained by structuring the NGS dataset and labeling the experiential elements within it. Follow-on research could employ semi-supervised methods to train models to classify types of psi and survival experiences and to find correlates within them

Specifically, deep learning models could be trained on the semantics around typologies of apparitions with tagged documents. Parapsychology categorizes apparitions along four lines: living agent; crisis; post-mortem; and haunting

If an apparition is seen within ±12 hours of a person’s death, that represents a crisis apparition

If an apparition is seen 24 hours or more after a person’s death, that apparition is post-mortem

If the apparition is of a long-deceased person and has a location affinity, that is a haunting apparition



However, what emerges from machine learning is that the reported encounters do not typically involve haunted hospitals, but rather primarily reflect end-of-life experiences



And the apparitional experiences in NGS appear roughly consistent with survey results elsewhere. Apparitional experiences rarely occur in the general population, but when they do, the apparitions are likely to represent recognized persons, known to the individuals who are perceiving them





REFERENCES

Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent dirichlet allocation. Journal of machine Learning research, 3 (Jan), 993-1022.



Gauld, A., & Cornell, A. D. (1979). Poltergeists. Routledge Kegan & Paul. Sponsored



Guggenheim, B., & Guggenheim, J. (2017). After-Death Communication (ADC) Experiences – ADCs. The ADC Project, www.after-death.com

Guggenheim, B., & Guggenheim, J. (1997). Hello from Heaven!: A new field of research, after-death communication, confirms that life and love are eternal. Bantam. Sponsored

“Hello From Heaven!”. (Mar 26, 2013). Documentary on After-Death Communication: Featuring Bill and Judy Guggenheim. ABC 20/20. (Aired: April 12, 1996). YouTube

Kircher, P. and Callanan, M. (2017, Dec 14). NDEs and Nearing Death Awareness in the Terminally Ill. International Association for Near Death Studies (IANDS).



Natural Language Toolkit: NLTK 3.2.5 documentation. (2017, Sep 24). NLTK Project.



Pearson, P. (2014). Opening Heaven’s Door: What the Dying May be Trying to Tell Us about where They’re Going. Random House Canada. Sponsored



Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., … & Vanderplas, J. (2011). Scikit-learn: Machine learning in Python. Journal of machine learning research, 12(Oct), 2825-2830.



Sievert, C., & Shirley, K. (2014). LDAvis: A method for visualizing and interpreting topics. In Proceedings of the workshop on interactive language learning, visualization, and interfaces (pp. 63-70).

What’s Your Best Nursing Ghost Story? (2017, Oct 30). AllNurses.com







IMAGES

pyLDAvis Graph of Topic 1 (Count Vectorization) from Nursing Ghost Stories Corpus. (2018, Apr 08). © Maryland Paranormal Research ®. All rights reserved.

pyLDAvis Graph of Topic 2 (Count Vectorization) from Nursing Ghost Stories Corpus. (2018, Apr 08). © Maryland Paranormal Research ®. All rights reserved.

pyLDAvis Graph of Topic 3 (Count Vectorization) from Nursing Ghost Stories Corpus. (2018, Apr 08). © Maryland Paranormal Research ®. All rights reserved.

pyLDAvis Graph of Topic 4 (Count Vectorization) from Nursing Ghost Stories Corpus. (2018, Apr 08). © Maryland Paranormal Research ®. All rights reserved.

pyLDAvis Graph of Topic 1 (TF-IDF Vectorization) from Nursing Ghost Stories Corpus. (2018, Apr 08). © Maryland Paranormal Research ®. All rights reserved.