Updated twice, last on 21 December 2016. Notes below.

tl;dr - So, does it work?!

NeuroOn, the self-proclaimed "world's first smart sleep mask" isn't a medical grade device, but it's much better than a coin toss.

Its total accuracy in detecting sleep stages is 65%.

One of the biggest problems with NeuroOn is that when used as an alarm clock almost every third time (31.6%) it will choose the worst possible moment for waking, assuring lack of energy and grogginess after awakening.

Comparing NeuroOn's sleep stage results to a professional polysomnography scored by a human expert:

When polysomnography detects a sleep phase suitable for waking up, NeuroOn agrees in 73.8% of the cases. In the rest 26.2% it isn't a big deal, since it will just wait until the next good opportunity to wake you up.

When polysomnography says not to wake the user up, NeuroOn agrees in 68.4%. In the rest 31.6% - nearly a third of all cases - it could try to wake the user, resulting in grogginess and complete lack of energy till the end of the day. This is a big deal and may defeat the purpose of NeuroOn's alarm clock altogether.

And for the people who would like to understand what actually happened here...

Long awaited results

This post is part of a series dedicated to NeuroOn sleep mask and its scientific viability. The full code and all signal samples are available on Github.

Months after the first estimates I would like to present you the results of the NeuroOn experiment in a form accessible to laymen. The analysis took nearly four months due to my stubbornness in assuring that it's scientifically sound, completely open and reproducible at different machines. After roughly two months of dedicating every weekend to it, I realized my own understanding of mathematic, statistical analysis and SciPy / NumPy stack falls short for this challenge and asked Ryszard Cetnarski for help. Together we have been able to create a coherent Jupyter Notebook and open it for peer review on the Internet. In November 2016 we presented a scientific posterat the Aspects of Neurosciences conference in Warsaw. Until the publication of this blogpost the only feedback we gathered regarded only small sample size and graph descriptions. If you have any additional suggestions, please send them to me.

I'd like to thank Intelclinic for providing us a test unit of NeuroOn, Ryszard Cetnarski for his tremendous help, as well as everybody who contributed to this analysis (in random order): Michal Kawalec, Adam Golinski, Bartosz Krol, Jaroslaw Hirniak, Karolina Stosio, Karol Benq Siek, Dawid Laszuk, Lorenzo Braschi and Piotr Migdal.

What did the experiment measure?

Our initial experimental hypothesis - "Signal gathered by the NeuroOn mask is of a good enough quality to detect a sleep stage in real time, given processing power of an average smartphone" proved to be too complex for a simple analysis, and while we could infer a lot from the signal's quality (discussed later), we were forced to change it.

Our final hypothesis was "NeuroOn achieves medical-grade results in sleep staging as compared to a human expert working on a PSG signal", as claimed by IntelClinic (even though they backed out of that at some point).

EEG-based polysomnographs are the best medical- and scientific- grade devices for analyzing sleep and sleep stages, used widely in hospitals and research. We used Aura PSG used in clinical trials in Poland - and we will be considering it a single source of truth, being as close to the original brain signals as possible.

We analyzed signal from A1-F1 electrodes of the EEG (detailed description and electrode placement is available in the analysis blogpost), pre-cleaned by the Aura PSG amplifier and sleep staging performed by a human technician.

It is worth noting that there are only two nights (roughly 16 hours) of recordings captured on a healthy caucasian male in his 25s. To achieve any significant results we should conduct experiments on a more varied population n > 30 for more than 14 nights, including people with known sleep disorders.

For NeuroOn we used the signal gathered by the three electrodes (single differential channel) on the device pressed to patient's head so firmly they left marks the next day. The sleep staging was performed by offline (not real-time) algorithm executed on an external machine afterwards. The software used to do it was provided to us by Intelclinic on the 08.03.2016 under a condition that we will not try to reverse engineer it, to which we obliged.

We do not have any information about algorithm implementations on mobile devices used with end-user NeuroOn masks or their possible limitations.

NeuroOn's time delay

First, we assumed that both NeuroOn's and PSG's signals do correlate and compared them. It turns out that the devices' clocks were desynchronized, with NeuroOn's running roughly 160.5 seconds late and having a slowly growing delay on the course of the 8-hour recording. For the second night the device's clock was 160.7 seconds late . Both of these results were acquired using cross correlation between the signals as discussed in a Jupyter Notebook.

After finding the delays from both nights we assumed that the hypnograms - sleep staging graphs from both devices do correlate and decided to analyze their time shift. It turns out that in addition to 160 seconds of signal delay, NeuroOn hypnogram had an additional 90 seconds delay in detecting a sleep phase. This hypnogram was acquired by running the Intelclinic's algorithms offline, using developer's scripts - we currently have no data on delays in real-time taking place on mobile devices, as intended for end users.

Total accuracy in detecting sleep phases

With the clock synchronization no longer an issue, we could start comparing sleep staging between the two sources. The Jupyter Notebook is a good read for anyone interested in the code itself.

Since usage of EEG-based polysomnography and human-conducted sleep staging are at the moment of writing both academical and industrial standard, we assumed that PSG sleep stages are our single source of truth to which we compared NeuroOn's hypnograms.

We used Cohen's kappa coefficient analysis. Heatmaps represent confusion matrices normalized by rows ("Given a sleep stage detected by PSG, what was probability of NeuroOn to detect it as...?") and joint probability matrices which can give insight in the frequency of respective sleep phases.

precision recall f1-score support rem 0.70 0.60 0.64 4033 N1 0.00 0.00 0.00 2190 N2 0.57 0.91 0.70 10050 N3 0.62 0.50 0.56 6690 wake 0.28 0.01 0.03 2238 avg / total 0.53 0.60 0.53 25201 accuracy: 0.60

And the second night:

precision recall f1-score support rem 0.80 0.85 0.82 6030 N1 0.00 0.00 0.00 1181 N2 0.70 0.70 0.70 10640 N3 0.64 0.76 0.70 6750 wake 0.31 0.14 0.19 600 avg / total 0.67 0.70 0.68 25201 accuracy: 0.70

What's interesting, NeuroOn's staging algorithm never detected N1 sleep stage, which affected its total score.

Accuracy describes all sleep stages detected by NeuroOn compared to those detected by human in PSG signal. Average value of NeuroOn's accuracy from both nights is 0.65 , putting it far below any requirements for medical usage. It doesn't have to disqualify NeuroOn from personal use however.

To illustrate how different NeuroOn's results are different from a purely random "coin toss", here's a bootstrapped analysis from the second night:

If NeuroOn's sleep staging was truly random, the score would fall much closer to the randomly permuted sleep scores.

Specific tests and current promises

After initial campaign marketing NeuroOn as a medical-grade device allowing tracking multiple sleep scores and helping in polyphasic sleep the company has backed off from their promises, replacing them with something much more manageable. Maybe they don't need overall accuracy to deliver them?

Most users may be interested in these two questions:

Will NeuroOn wake me up when it has an opportunity to?

Will NeuroOn not wake me up when it shouldn't?

Basing on Tassi, P., & Muzet, A. (2000). Sleep inertia. Sleep Medicine Reviews, 4(4), 341–353. we can select WAKE, N1 and N2 sleep stages as those allowing wake up call, and N3 and REM as those during which NeuroOn's user should not be disturbed.

We aggregated the results of these sleep stages, allowing NeuroOn to misidentify stages within families - WAKE/N1/N2 and N3/REM, since the errors shouldn't be detectable by an end-user.

Normalized and aggregated results from the first night:

Second night:

And mean:

There are two indicators of specific significance:

There is 26.2% chance that NeuroOn will not detect a stage which allows an easy wake up. This isn't harmful to an end-user, since only consecutive misidentification of several stages might cause the alarm clock to go off too late.

chance that NeuroOn will not detect a stage which allows an easy wake up. This isn't harmful to an end-user, since only consecutive misidentification of several stages might cause the alarm clock to go off too late. There is however 31.6% chance to misidentify a stage which doesn't allow easy wake up in a healthy person. This may be the single disqualifying feature of NeuroOn. If a person is woken up in N3 or REM (which NeuroOn interprets as N1, N2 or WAKE), they will suffer from sleep inertia and grogginess.

This means that using NeuroOn's alarm clock - in perfect conditions, keeping it well pressed against one's forehead and while not having any sleep disorders - may result in extremely bad waking up nearly 1/3 of the times.

Since lucid dream induction is quite complex and still discussed by many researchers, we don't feel that discussing its application in NeuroOn's app is within the scope of this analysis. What we can assess is NeuroOn's ability to detect REM sleep - roughly 72.3% of PSG-detected REM stages are detected as REM by NeuroOn (mean from both nights).

Beyond sleep staging - NeuroOn's signal quality

Our initial goal was not only to analyze NeuroOn's staging quality, but also its signal gathered by just 3 dry electrodes on the forehead. Is it possible to create a real-time sleep staging algorithm based that signal?

Answering that question fully would require us to build a perfect and much more advanced version of NeuroOn, de-facto taking on IntelClinic's role in developing the device. We could conduct a much simpler analysis instead, looking for well known EEG indicators within the signal.

EEG waves defined as respective frequencies of EEG signal differ between sleep stages and are one of the most important indicators used in polysomnography. Slow waves between 1Hz and 3Hz are called Delta Waves and are used for discriminating deep non-rem sleep phases. It is reasonable to assume that NeuroOn staging algorithms use these indicators to create its own hypnograms.

With that knowledge we resorted to spectral analysis, studying delta power in NeuroOn's single-channel signal. Full analysis with more details, code and signal samples can be found in our Jupyter Notebook.

Data from the first night:

We examined how NeuroOn-recorded delta wave amplitude differs between (PSG-defined) N2 and N3 sleep stages. The results indicate that it is possible to differentiate between those two phases with approximately 75% accuracy basing on a box-plot distribution.

The delta band powers are similarly distinct in both NeuroOn and PSG, which may imply that the signal gathered can be used for advanced and precise sleep staging - maybe even more precise than the current NeuroOn's. This invalidates my initial assumption I approached NeuroOn with - that it's impossible to gather signal of good enough quality to reliably discern sleep stages from just 3 electrodes.

Contrast with current claims

The initial Kickstarter campaign was full of unfounded claims, neuro-buzzwords and outright misinformation. The team even promoted NeuroOn with "Wanna sleep 2 hours/day ASK ME :)" t-shirts. After years in development and Facebook battles with skeptics IntelClinic was forced to back off from many of them.

NeuroOn's Final Press Release reads:

Inteliclinic is a Polish startup whose Neuroon crowdfunding campaign on Kickstarter was a spectacular success. Initially, the project aimed at creating a device that would analyze users’ sleeping patterns and provide tips to people who want to sleep polyphasically (take a few shorter naps instead of a single nightly sleep episode.) After over a year of consultations with leading authorities on sleep medicine, including Christopher Drake, PhD, Director of Sleep Research, Henry Ford Hospital and former Chairman of the Board, National Sleep Foundation, Project Neuroon grew beyond sleep monitoring to include pulse tracking and light therapy. The core functionality of the device is nearly medical-grade sleep measurements and helping people who work shifts, suffer jet lags or have problems falling asleep. The device does not support polyphasic sleep.

I wouldn't say it was growing beyond, but rather realizing that the previous promises made by IntelClinic were completely unfounded in contemporary scientific knowledge. Without spending significant amounts of money on research the company wasn't able to deliver, so the startup pivoted and changed scope.

After backing off from "medical grade" device, "near-medical grade" might mean virtually anything.

Intelclinic did register two patents: "System for polyphasic sleep management, method of its operation, device for sleep analysis, method of current sleep phase classification and use of the system and the device in polyphasic sleep management" and "System, apparatus and method for treating sleep disorder symptoms" . While both of them contain technical overview into the mask's working, I am not aware of any whitepapers showcasing respective functions' feasibility.

Evaluating its effectiveness of light therapy or jet lag adjustment would require a separate experiment and should be conducted (and released together will all the data) by IntelClinic itself in order to prove its effectiveness.

Where I was wrong, where I was right

Over two years ago I wrote:

The NeuroOn sleep mask cannot work exactly as advertised - it cannot utilize a proper EEG signal. While it can detect a REM phase in sleep very roughly, it's very far from reliable sleep analysis. The majority of the population isn't able to achieve polyphasic sleep, since their brains aren't capable of that. A similar thing goes with lucid dreaming. NeuroOn at its best would be not too useful of a gadget to be put away after several uses.

WRONG Looking at the spectral analysis of NeuroOn's signal above it seems that I have been wrong in saying that it's impossible to reliably discern sleep stages from 3 electrodes located on the forehead. It looks like it should be possible, but requires much more research than IntelClinic has put in NeuroOn.

Looking at the spectral analysis of NeuroOn's signal above it seems that I have been wrong in saying that it's impossible to reliably discern sleep stages from 3 electrodes located on the forehead. It looks like it should be possible, but requires much more research than IntelClinic has put in NeuroOn. RIGHT It holds true that NeuroOn can conduct only a rough (not medical-grade) analysis of the sleep phases.

It holds true that NeuroOn can conduct only a rough (not medical-grade) analysis of the sleep phases. RIGHT Polyphasic Sleep isn't supported by NeuroOn, as confirmed by the IntelClinic.

Polyphasic Sleep isn't supported by NeuroOn, as confirmed by the IntelClinic. ? We don't know much about lucid dreaming yet, but the research required to change that might be quite costly.

Addressing possible replies

NeuroOn staging software you used was several months old. Here's a new version, and look, now the accuracy is better than 95%!

I agree that the software I was sent by IntelClinic was several months old at the time of analysis, but from my understanding it was the version which eventually landed in the consumer units.

Providing any version issued after we published our initial experiment description and sources will not give us any significant results, since it could have been tweaked to match exactly our signals.

The only way to prove that NeuroOn's algorithms have gotten better is to conduct new experiment at a third party's lab (like Sleep Disorders Center at the Institute of Psychiatry and Neurology in Warsaw) and release the hypnograms immediately afterwards.

It'd be a good idea to test it on a patient suffering from sleep disorders, or anyone else than a 25-year-old caucasian male. Preferably several people.

Our code is completely open and should work with newly acquired signals.

Alarm clock isn't the main functionality now, we're using only REM anyway!

Addressing lucid dreaming and light therapy is beyond the scope of this experiment.

Startup marketing vs research-based development

(this is my personal opinion)

Winding up several months of research, tweaking the code, trying to make sense of the data, wondering if every method is statistically significant - I can say I'm happy I could have done that. No one paid me - quite the opposite, I rented the hospital lab and PSG with my own money - yet still, it was worth it.

I'd like everyone to make their own opinion on NeuroOn by reading this pretty detailed analysis. If you don't trust it, feel free to re-check all my computations in the Jupyter Notebook.

Personally I consider NeuroOn to be a failed project, not researched enough from the start, running mostly on daring marketing promising the impossible.

Real innovation requires research. It's tedious, takes much more time than the startup community promises. But it's honest - and it's the only way that yields any results.

I view startups similar to Intelclinic as deeply harmful for everyone - customers don't get what they pay for, investors are being misguided about what they support, researchers see their work being abused for the sake of a marketing campaign, and finally the society is being manipulated to see some kind of progress and hope in all that.

At the same time as NeuroOn, another neuro-device was put on Kickstarter - OpenBCI. It's a small open hardware EEG amplifier which allows to conduct experiments much cheaper than with university equipment. It didn't promise to make everyone's life better and it wasn't marketed as well as the IntelClinic's product. Despite earning much less money, OpenBCI delivered a device fulfilling all their promises.

When it comes to real progress and innovation, I'm much more inclined to believe researchers, hackers and makers showing open whitepapers and working prototypes first.

Updates

In my original blog post I wrongly assumed that NeuroOn's sleep stage detection accuracy may be compared to actigraphy sleep/non-sleep accuracy, which is a much simpler indicator of sleep state. Its current limits are far below any device using EEG signals. These paragraphs are now removed.

I also clarified that sleep inertia is perceptible right after awakening and doesn't necessarily last all day (even though it might affect person's mood).

Added IntelClinic's patents to footnotes.

Previous version of this blogpost may be found on my Github.

Footnotes