Ear infections are typically diagnosed using specialized equipment to assess eardrum mobility: The presence of fluid in the middle ear, indicative of likely ear infection, limits eardrum mobility. Chan et al. developed a smartphone system to detect middle ear fluid that uses the microphone and speaker of a phone to emit sound and analyze its reflection (echo) from the eardrum. The smartphone system outperformed a commercial acoustic reflectometry system in detecting middle ear fluid in 98 pediatric patient ears, and the system could be easily operated by patient parents without formal medical training. This proof-of-concept screening tool could help aid in the diagnosis of ear infections.

The presence of middle ear fluid is a key diagnostic marker for two of the most common pediatric ear diseases: acute otitis media and otitis media with effusion. We present an accessible solution that uses speakers and microphones within existing smartphones to detect middle ear fluid by assessing eardrum mobility. We conducted a clinical study on 98 patient ears at a pediatric surgical center. Using leave-one-out cross-validation to estimate performance on unseen data, we obtained an area under the curve (AUC) of 0.898 for the smartphone-based machine learning algorithm. In comparison, commercial acoustic reflectometry, which requires custom hardware, achieved an AUC of 0.776. Furthermore, we achieved 85% sensitivity and 82% specificity, comparable to published performance measures for tympanometry and pneumatic otoscopy. Similar results were obtained when testing across multiple smartphone platforms. Parents of pediatric patients (n = 25 ears) demonstrated similar performance to trained clinicians when using the smartphone-based system. These results demonstrate the potential for a smartphone to be a low-barrier and effective screening tool for detecting the presence of middle ear fluid.

Here, we describe a system that uses the microphone and speaker of existing smartphones to detect middle ear fluid by assessing eardrum mobility. The system sends a soft acoustic chirp into the ear canal using the smartphone speaker, detects reflected sound from the eardrum using the smartphone microphone, and uses a logistic regression machine learning model to classify these reflections and predict middle ear fluid status. No additional attachments are required beyond a paper funnel, which acts as a speculum and can be constructed from printer paper, scissors, and tape. Real-time implementation and data processing are performed entirely on the smartphone, compatible with both iPhone and Android devices. The system demonstrated comparable performance across multiple smartphone platforms and when used by parents versus clinicians. Given the ubiquity of smartphones, this system may hold potential as a middle ear screening tool for parents as well as health care providers in resource-limited regions.

Diagnosis of OME or AOM requires detecting middle ear fluid using either pneumatic otoscopy or tympanometry ( 1 ). Pneumatic otoscopy is used by only 7 to 33% of primary care providers and is not designed for home screening purposes ( 5 ). Tympanometry necessitates a referral to an audiologist and the use of expensive equipment ( 7 , 8 ). For these reasons, in 2016, the American Academy of Otolaryngology called for research into a brief, reliable, and objective method to detect middle ear fluid as well as new in-home strategies to help parents and caregivers monitor fluid after initial physician evaluation ( 5 ).

The presence of middle ear fluid is the key diagnostic marker for the two most common pediatric ear diseases, acute otitis media (AOM) and otitis media with effusion (OME) ( 1 ). AOM, known commonly as an “ear infection,” is characterized by the presence of infected fluid in the middle ear and results in symptoms of fever and ear pain. It is a leading cause of pediatric healthcare visits, and although many cases can resolve without antibiotics, complications may include eardrum perforation, mastoiditis, facial nerve palsy, or meningitis ( 2 – 4 ). OME is the presence of middle ear fluid without signs of an acute infection and affects up to 80% of children ( 5 , 6 ). Although OME has few overt symptoms, making diagnosis more difficult, it is associated with speech delay, sleep disruption, poor school performance, balance issues, and a higher likelihood of developing AOM ( 5 ).

RESULTS

Concept and prototype Our system uses the smartphone speaker to play audible, 150-ms frequency-modulated continuous wave chirps from 1.8 to 4.4 kHz into the patient’s ear canal. The microphone remains active during the chirp, collecting both incident waves from the speaker and reflected waves from the eardrum. Sound reflected from the eardrum will destructively interfere with the incident chirp, causing a dip in sound pressure along a range of frequencies. A normal eardrum resonates well at multiple sound frequencies, creating a broad-spectrum, soft echo; as a result, the shape of the resulting acoustic dip is broad and shallow in the frequency domain. In contrast, a fluid or pus-filled middle ear, as found in OME and AOM, restricts the vibrational capacity of the eardrum; sound energy that would have vibrated the eardrum is instead reflected back along the ear canal, creating more destructive interference and resulting in a narrower and deeper acoustic dip. The acoustic dip occurs at the resonant frequency of the ear canal where the quarter-wavelength of the chirp is equal to the length of the canal (9). Thus, although individual differences in ear canal length affect the location of the dip along the frequency domain, the shape of the dip primarily depends on eardrum mobility. Our system builds upon existing acoustic reflectometry methods in three ways (10–14). First, it is a predominantly software-based solution that takes advantage of existing smartphone hardware rather than requiring a separate device. Current acoustic reflectometers require a microphone and speaker in close proximity to produce and measure sound waves along the ear canal. Many modern smartphones have a similar configuration, with a co-located speaker and microphone on their bottom edge for noise cancellation (Fig. 1A). This includes all versions of iPhone, Samsung Galaxy phones after the S5, and other Android phones, including the Google Pixel. Second, we use a paper funnel as a speculum to direct sound into the ear canal. The funnel (Fig. 1B) can be assembled using a printed paper template (fig. S1), scissors, and tape. Without this attachment, the resulting waveform can be highly variable because sound could reflect off of different structures of the pinnae. Third, the system uses a logistic regression machine learning model to classify the waveforms received by the microphone. To identify whether a patient has middle ear fluid, we first preprocessed the raw waveform to locate and isolate the acoustic dip (Fig. 1, C and D). We then used logistic regression to determine whether the shape of the dip was more indicative of a normal or fluid-filled ear. A text-based message is presented to the user indicating a result: “suggestive of middle ear fluid” or “middle ear fluid unlikely” (fig. S2). On an iPhone 5s and Galaxy S6, data processing and classification took 771.98 ms and 1.2 s, respectively. Fig. 1 Using a smartphone to detect middle ear fluid. (A) Location of speaker and microphone on the bottom of an iPhone 5s, without and with paper funnel attached. (B) Process of assembling smartphone funnel. (C) Proper placement of smartphone and funnel at ear canal entrance. (D) Raw acoustic waveform obtained when chirps are played into an ear with middle ear fluid (red) and without fluid (blue). The SD (gray) is computed across 10 chirp instances on a patient’s ear.

Clinical testing We tested system performance for detecting middle ear fluid in two separate cohorts. First, we conducted a clinical study on patients between 18 months and 17 years of age. We used this population to train the algorithm and obtain cross-validated performance measures. Second, we recruited a separate cohort of patients under 18 months of age and evaluated performance using the algorithm trained in the first clinical study. The first clinical study was conducted at Seattle Children’s Hospital surgical centers using a cohort of 98 patient ears between 18 months and 17 years of age from two different subgroups: patients undergoing ear tube placement, a common surgery performed on patients with chronic OME or recurrent AOM (n = 48 ears), and patients undergoing a different surgery, such as a tonsillectomy, with no recent symptoms of AOM or OME and no signs of middle ear fluid by physical examination (n = 50 ears). The median age of recruited patients was 5.0 [interquartile range (IQR), 2.0] years, height was 113.2 (IQR, 19.0) cm, weight was 20.0 (IQR, 9.2) kg, and the female-to-male ratio was 0.6 (table S1). A trained clinician performed all patient testing in a private waiting room just before surgery and with the patient awake and held or sitting upright (movie S1). Soft chirps were played into the ear canal using multiple smartphone models and a new paper funnel for each patient. All patients were also tested in parallel with commercial acoustic reflectometry hardware (7, 12, 15). After surgery, we prospectively assigned each ear its actual middle ear fluid status. A patient was considered positive for middle ear fluid if, during ear tube placement, an incision into the eardrum (myringotomy) yielded fluid (n = 24) or if the patient had a red, bulging eardrum consistent with AOM (n = 2). A patient was considered negative for middle ear fluid if, during ear tube placement, myringotomy yielded no fluid (n = 24) or if the patient did not receive ear tubes, did not have ear-related symptoms, and was negative for fluid on pneumatic otoscopy performed by the otolaryngologist (n = 48). For classification, we used a logistic regression algorithm on preprocessed microphone acoustic data. The sound intensity (in decibels) of each frequency along the acoustic dip was inputted as a separate feature. The algorithm was trained with iPhone 5s data collected from patients. Its classification accuracy was evaluated with leave-one-out cross-validation (LOOCV), a rigorous method to validate machine learning models (16). During each iteration of LOOCV, 97 of 98 patient ears are used to train a model that is then used to output a prediction for the remaining one patient ear. This process is repeated for all 98 ears to estimate the accuracy of a model trained on all 98 ears when tested on unseen data. A receiver-operating characteristic (ROC) curve was generated from the cross-validation step with an area under the curve (AUC) of 0.898 (Fig. 2A). The operating point was chosen to have an overall sensitivity and specificity of 84.6% [95% confidence interval (CI), 65.1 to 95.6%] and 81.9% (95% CI, 71.1 to 90.0%), respectively. With k-fold (k = 10) cross-validation (17), we obtained a comparable AUC of 0.906. To address potential bias from training on the same patient’s opposite ear, we repeated LOOCV, but during each iteration, we also excluded the contralateral ear from the training set, achieving an AUC of 0.899. We also downsampled the frequency response curve to 100 samples and obtained a similar AUC of 0.888. The fluid type was recorded as either being serous (n = 7), mucoid (n = 11), or purulent (n = 4) for 22 of 36 ears that had middle ear fluid. The algorithm correctly classified 86% (6 of 7) of ears that had serous fluid, 91% (10 of 11) of ears with mucoid fluid, and 100% (4 of 4) of ears with purulent fluid. These estimates predict the real-world clinical performance on unseen data of our final algorithm, which is trained on all 98 ears from the iPhone 5s dataset (Fig. 2B). Across male patients, for the iPhone 5s, 15 of 17 positive ears and 31 of 40 negative ears were classified correctly. Across female patients, 7 of 9 positive ears and 25 of 30 negative ears were classified correctly. For two ears, we did not record gender. Fig. 2 Classification of patient ears from clinical testing. (A) ROC curve for our middle ear fluid detection algorithm, cross-validated on data collected from patients using an iPhone 5s (n = 98), with operating point denoted by the red circle. (B) Comparison of performance for smartphone-based detection, acoustic reflectometer, and spectral angle–only classification during parallel clinical testing (n = 98). (C and D) Mean acoustic dip classified by the algorithm as with middle ear fluid (red) and without middle ear fluid (blue). Shaded region represents one SD from the mean. (E) Feature analysis indicating the weight that the classifier places on each frequency around the acoustic dip. Post hoc, we examined how the algorithm classified acoustic waveforms. Figure 2 (C and D) plots the mean sound intensities at each frequency for all ears classified by the model. The algorithm predicted that ears with narrower and deeper acoustic dips were more likely to have middle ear fluid. Similarly, on univariate analysis, sound intensities at the top and bottom of the waveform, which determine the depth of an acoustic dip, were given the most weight by the predictive model (Fig. 2E). This result indicates that the algorithm can independently identify an acoustic pattern for middle ear fluid that is consistent with the known acoustic response of the eardrum (7, 12, 15). The smartphone-based system demonstrated improved clinical performance compared to acoustic reflectometry (18), which uses custom hardware to assess middle ear fluid status. Head-to-head testing (Fig. 2B) across the 98 patient ears demonstrated an AUC of 0.898 for the smartphone-based approach compared to an AUC of 0.776 for commercial acoustic reflectometry (EarCheck Middle Ear Monitor, Innovia Medical). The smartphone algorithm’s improved clinical performance may be the result of applying machine learning over the waveform rather than relying on the hand-selected features used by acoustic reflectometers (7). When classifying patient waveforms obtained from smartphones, we found that using only the spectral angle, as described in previous literature (7), reduces the AUC to 0.687. We evaluated test-retest reliability in the pediatric patients enrolled in our clinical study. Each ear was tested twice per smartphone; between each attempt, the funnel was fully removed from the ear and reinserted. Of the 66 ears tested twice, 94% of the ears were classified the same between each attempt. When a discrepancy occurred, the algorithm used the positive result to minimize false negatives. Each testing attempt consisted of 10 chirps, and we tested the consistency across these chirps. In the clinical study, 93 of 98 ears showed no difference in classification among all 10 chirps. When doing a majority vote across the first three chirps, 96 of 98 ears showed no difference in classification compared to using a single chirp. Across all 98 ears, there was no difference when considering the classification result to be the majority (more than 5) of the 10 chirps (table S2). Last, using the algorithm trained in the first study, we evaluated the system’s performance in a separate cohort of patients under 18 months of age to assess accuracy in a younger population. We again recruited surgical patients at Seattle Children’s Hospital, using the same criteria described in the first study with the exception of age. This cohort included 15 patient ears and had a median age of 1.1 (IQR, 0.3) years, height of 76.0 (IQR, 7.8) cm, weight of 9.3 (IQR, 2.1) kg, and female-to-male ratio of 1.5 (Fig. 3A). The lowest age among this cohort was 9 months. All 5 ears that were positive for fluid and 9 of 10 ears that were negative for fluid were classified correctly (Fig. 3B). The shape of the acoustic dips paralleled those in the first clinical study: Ears with fluid had a deeper and narrower acoustic dip compared to ears without fluid (Fig. 3, C and D, and fig. S3). This shows that an algorithm trained on patients over 18 months of age can properly classify patients under 18 months. We also trained and tested our algorithm’s performance when using patients under 18 months as the training cohort. When running LOOCV across the 15 patient ears that were under 18 months of age, 5 of 5 ears positive for fluid and 10 of 10 ears negative for fluid were correctly classified. This is similar to the performance of the algorithm that is trained on the 98 patient ears over 18 months of age. Fig. 3 Classification of patient ears under 18 months. (A) Demographic table of patients under 18 months. (B) Confusion matrix of the algorithm’s performance for patients under 18 months. (C and D) Mean acoustic dip of ears of patients under 18 months (n = 15) classified by the algorithm as with middle ear fluid (red) and without fluid (blue). Shaded region represents one SD from the mean.

Performance across other mobile platforms All patients in the first cohort (n = 98 ears) were tested in parallel with both the iPhone 5s and the Samsung Galaxy S6. Using LOOCV, we estimated performance of the iPhone 5s–trained system on unseen Galaxy S6 data. Specifically, the entire iPhone 5s dataset was used for training except for one patient ear, which was “held out” for testing. The trained algorithm was then tested on Galaxy S6 data from the held-out ear. This was repeated for all patient ears in the cohort to generate an AUC of 0.851, as shown in Fig. 4A. In the same manner, we also tested a subset of this cohort using an iPhone 6s (n = 10 ears), Samsung Galaxy S7 (n = 12), and Google Pixel (n = 8). The algorithm correctly classified 80% (8 of 10) of iPhone 6s data, 91.7% (11 of 12) of Galaxy S7 data, and 83.3% (7 of 8) of Pixel data (Fig. 4B). The low sample size in these subgroups precluded generation of meaningful AUC values. Processed waveforms for a given test ear across phone models are shown in fig. S4, and waveforms for the remaining test ears are shown in fig. S5. Fig. 4 Classification performance across other mobile platforms. (A) ROC curve for our middle ear fluid detection algorithm, cross-validated on data collected from patients using a Samsung Galaxy S6 (n = 98). (B) Confusion matrices comparing performance on three other smartphones.

Performance testing with nonclinicians In a clinical setting, we evaluated the system’s performance when used by parents. Trained clinicians briefly demonstrated proper technique for testing, and the parent of a pediatric study participant subsequently performed unaided testing on their child. The parent’s results were then compared to those of the trained clinician. This cohort included 25 patient ears and had a median age of 4.0 (IQR, 6.0) years, height of 105.0 (IQR, 38.1) cm, weight of 16.4 (IQR, 13.9) kg, and female-to-male ratio of 1.1 (Fig. 5A). All 6 ears positive for fluid were classified the same by clinicians and parents, and 18 of 19 ears negative for fluid were classified the same (Fig. 5B). In addition, the mean acoustic dip was similar between clinicians (red) and parents (black) (Fig. 5, C and D). Individual curves for each patient are shown in fig. S6. Fig. 5 Performance testing with trained clinicians versus untrained parents. (A) Demographic table of patients that were tested by parents. (B) Confusion matrix of the algorithm’s performance for patient ears (n = 25) tested by parents. (C and D) Mean acoustic dip of ears tested by parents (black) and clinicians classified by the algorithm as with middle ear fluid (red) and without fluid (blue). We tested the usability of funnel construction with a separate cohort of 10 untrained adults. After playing a short instructional video (see movie S2), we first measured the time it took participants to create and mount the funnel using a paper template, tape, and scissors. The average time was 2.8 (±0.93) min. We then queried participants about the usability of the entire system; they gave an average usability rating of 8.9 (±1.1) on a scale of 1 (unusable) to 10 (extremely usable) (table S3).

Effect of confounding ear pathologies In the above studies, we exclude patients with ear pathologies that affect eardrum mobility. Next, we evaluated the algorithm’s performance in the presence of ear pathologies such as cholesteatoma (n = 1), ossicular chain discontinuity (n = 1), acute eardrum inflammation (n = 1), and previous tympanoplasty surgery (n = 3) (fig. S7). The algorithm produced false positives for middle ear fluid in all these patients. Similarly, patients undergoing myringotomy but lacking middle ear fluid may have abnormal middle ear pressure that can affect eardrum mobility. In our cohort, there were seven patients reported by the surgeon as having acutely inflamed eardrums. Only one of these patients presented without fluid on myringotomy. This patient’s ear was classified as positive by the algorithm (fig. S7). Thus, in the event that a patient presents with an inflamed eardrum but has not yet developed middle ear fluid, the algorithm would likely test positive and appropriately prompt further evaluation. The other acutely inflamed eardrums (n = 6 ears) had middle ear fluid and were appropriately classified as positive by the algorithm.