Significance Currently available smell testing methods can be confounded by the lack of prior experience or insensitivity to the odorants used in the test. This introduces a source of bias into clinical tests aimed at detecting patients with olfactory dysfunction. We have developed smell tests that use mixtures of 30 molecules that average out the variability in sensitivity to individual molecules. Because these mixtures have an unfamiliar smell and the tests are nonsemantic, their use eliminates differences in test performance due to familiarity with the smells or the words used to describe them. SMELL-S and SMELL-R facilitate smell testing in different populations, without the need to adapt test stimuli to account for differences in familiarity with the test odors.

Abstract Smell dysfunction is a common and underdiagnosed medical condition that can have serious consequences. It is also an early biomarker of neurodegenerative diseases, including Alzheimer’s disease, where olfactory deficits precede detectable memory loss. Clinical tests that evaluate the sense of smell face two major challenges. First, human sensitivity to individual odorants varies significantly, so test results may be unreliable in people with low sensitivity to a test odorant but an otherwise normal sense of smell. Second, prior familiarity with odor stimuli can bias smell test performance. We have developed nonsemantic tests for olfactory sensitivity (SMELL-S) and olfactory resolution (SMELL-R) that use mixtures of odorants that have unfamiliar smells. The tests can be self-administered by healthy individuals with minimal training and show high test–retest reliability. Because SMELL-S uses odor mixtures rather than a single molecule, odor-specific insensitivity is averaged out, and the test accurately distinguished people with normal and dysfunctional smell. SMELL-R is a discrimination test in which the difference between two stimulus mixtures can be altered stepwise. This is an advance over current discrimination tests, which ask subjects to discriminate monomolecular odorants whose difference in odor cannot be quantified. SMELL-R showed significantly less bias in scores between North American and Taiwanese subjects than conventional semantically based smell tests that need to be adapted to different languages and cultures. Based on these proof-of-principle results in healthy individuals, we predict that SMELL-S and SMELL-R will be broadly effective in diagnosing smell dysfunction.

Smell dysfunction manifests itself primarily in the reduced ability to detect or distinguish volatile chemicals. It ranges from the complete inability to smell any odors to a partial reduction in olfactory sensitivity to smell distortion, for instance, a condition in which a large number of volatiles smell like cigarette smoke. The prevalence of smell dysfunction in the general adult population is about 20% in Europe and the United States (1⇓–3). This condition is potentially dangerous because those affected are unable to detect fire, spoiled food, hazardous chemicals, and leaks of odorized natural gas (4, 5). Smell loss may give rise to health problems, including mental health symptoms such as depression, anxiety, and social isolation. It affects quality of life by altering food preferences and the amount of food ingested (5). Food is often perceived as bland or tasteless by patients with smell disorders, leading to loss of appetite or overeating (4, 5).

The most frequent causes of smell dysfunction are sinonasal diseases, upper respiratory tract infection, and head trauma. Smell loss can be congenital (6, 7), and in many cases, the cause is unknown (5, 8). Importantly, smell dysfunction is an early sign of Alzheimer’s disease (9), the most common cause of dementia in the United States that is projected to affect an estimated 1 in every 45 individuals by 2050 (10). It is well established that diminished olfactory function arises early in the progression of Alzheimer’s disease and is highly predictive of future cognitive decline (9, 11). Because of the high prevalence and dramatic consequences of smell loss, accurate diagnosis of olfactory dysfunction is important. While self-reported hearing loss tends to be accurate (12), self-reporting of olfactory dysfunction is notoriously unreliable (13, 14). Therefore, accurate diagnostic tests for smell dysfunction that can be deployed worldwide are critically important. Following a diagnosis, therapeutic options and counseling can be offered to patients suffering from smell loss (15).

In clinical smell testing, patients are presented with odor stimuli in a variety of formats, including scratch ‘n’ sniff strips, glass vials or jars, felt-tip pens, or paper scent strips used in perfume shops, and asked to answer questions about what they smell. Smell tests assess the ability of subjects to detect, discriminate, or identify odors. Olfactory threshold tests measure the lowest concentration of an odor stimulus that a patient can perceive, while discrimination tests assess the ability of subjects to distinguish two different smells. Finally, odor identification tests evaluate whether a patient can detect and match odors to standard words that describe the smell (16).

There are two major challenges to reliably testing a patient’s sense of smell. First, sensitivity to monomolecular odorants varies greatly even among subjects with a normal sense of smell (17⇓–19). If a patient has a low score on a test that assesses olfactory sensitivity with the rose-like odor phenylethyl alcohol (20), it is difficult to know whether the patient suffers from general smell dysfunction or is merely insensitive to phenylethyl alcohol with an otherwise normal sense of smell.

The second challenge is to develop a test that is not influenced by the patient’s personal or cultural experiences with the test odorants. This has an obvious influence on the results of odor identification tests, such as the University of Pennsylvania Smell Identification Test (UPSIT), for which subjects are given a booklet with 40 scratch ‘n’ sniff items and asked to select one of four words (for example, “gingerbread,” “menthol,” “apple,” or “cheddar cheese”) that best describes what the odor smells like. Whether a patient can correctly identify the smell of gingerbread depends not only on the patient’s sense of smell but also on whether the patient has previously encountered the smell of gingerbread. This in turn depends on many factors, such as the cultural and age group to which the person belongs. To address this familiarity problem, the UPSIT has been adapted for use in a number of countries worldwide by replacing unfamiliar items and adapting the answers on the multiple-choice test. For instance, the North American UPSIT was adapted for Taiwanese subjects by replacing “clove,” “cheddar cheese,” “cinnamon,” “gingerbread,” “dill pickle,” “lime,” “wintergreen,” and “grass” with “sandalwood,” “fish,” “coffee,” “rubber tire,” “jasmine,” “grapefruit,” “magnolia,” and “baby powder” (21). The strong influence of culture on such test results limits the utility of odor identification tests. Even performance on nonsemantic odor discrimination tasks depends on prior experience with the odorants (22, 23), and it is therefore important to avoid stimuli having differential familiarity in the test population.

We have developed two nonsemantic smell tests that meet both challenges by using mixtures of odorous molecules that subjects perceive as unfamiliar. We call the odor of these mixtures unfamiliar because subjects cannot readily describe what they are smelling. SMELL-S measures olfactory sensitivity, the ability to detect increasing dilutions of a mixture of odorants. SMELL-R is an olfactory resolution test that measures the ability of subjects to discriminate the smell of two mixtures that become progressively more similar as the test proceeds. Neither of the tests requires that subjects match words with a smell percept. We show that SMELL-S and SMELL-R are highly reliable olfactory tests that overcome problems with odor-specific insensitivity and that can be applied without adaptation to subjects in a different country. We expect that these tests, applied in combination, will provide the sensitivity and specificity required for early diagnosis of smell dysfunction in different populations.

Discussion In this study, we addressed current limitations in clinical testing for olfactory dysfunction by developing effective smell tests that overcome issues with odor-selective insensitivity and that can be utilized with different populations across the world. The first objective of this work was to eliminate the problems inherent in olfactory sensitivity tests that rely on a single molecule. Although it is well known that specific insensitivity to individual odorant molecules is common in normal human subjects (17⇓–19), commercial threshold tests use monomolecular stimuli such as butanol or phenylethyl alcohol to test olfactory sensitivity (26, 32). Our data suggest that this approach confounds specific and general olfactory sensitivity. We show here that the solution to this problem is to use mixtures of molecules instead of single molecules. We and other authors have shown that the inter- and intraindividual variability in threshold scores was reduced, and test–retest reliability was increased by testing olfactory sensitivity with odor mixtures rather than single molecules (33, 34). One previous study compared thresholds for single molecules to those for mixtures of 3, 6, or 12 components and concluded that the intra- and interindividual variability of the threshold decreases with increasing number of molecules in the mixture (33). A recent study came to a similar conclusion, comparing the threshold for phenylethyl alcohol to the threshold for a mixture of three molecules (34). Although the rate of specific anosmia to phenylethyl alcohol is low, interindividual variability in sensitivity to this molecule is large (20, 30). It follows that diminished sensitivity could lead to false positive results, and therefore misdiagnosis. The SD we found for phenylethyl alcohol in 75 healthy subjects was 2.75, which is consistent with previous studies that reported SDs of 2.88 (30) and 2.78 (20). SMELL-S had much lower variability, with an SD of 1.6 for SMELL-S v1 and 1.7 for SMELL-S v2. We conclude that SMELL-S is a reliable, accurate, and effective method for measuring olfactory function without conflating general loss of smell sensitivity and specific insensitivity to an odorant. A second objective of this project was to introduce a test of olfactory resolution (SMELL-R) that quantifies olfactory discrimination ability. Auditory and visual stimuli used in the clinic differ by tone frequency or letter size, leading to quantitative and standardized diagnostic tests such as the audiogram and the eye chart. In olfaction, it is more complicated to quantify similarity between olfactory stimuli. Currently available discrimination tests consist of several pairs of odorants that must be discriminated by the patient. There currently is no method to quantify how difficult the individual discrimination tests are. Is distinguishing “rose” from “leather” more or less difficult than discriminating “pineapple” and “licorice”? To overcome this problem, we used a physical scale based on the number of shared components between two mixtures. The more components two mixtures share, the more difficult it is to discriminate them (25). By using this physical scale, a patient’s olfactory resolution can be reliably determined. A third objective was to develop smell tests that utilize stimuli that have not been previously encountered by patients to minimize the influence of cultural and personal differences in prior olfactory experiences on the test results (22, 23). We accomplished this by using mixtures of 30 different molecules. These exact mixtures are very unlikely to be encountered outside the laboratory and are perceived as unfamiliar smells. Furthermore, before mixing them, the chemicals were diluted so that they had approximately equal intensity to ensure that the percept of the mixture is not dominated by a single odorant. The resulting smells of such mixtures have been described as “olfactory whites” (24). Using these stimuli is an improvement over the use of odorants that can be readily linked to their usual source but only by those who have prior experience with it (21, 35, 36). The proof-of-principle results with SMELL-S and SMELL-R presented here suggest that these tests will be useful in diagnosing smell dysfunction, but it is important to note that our sample sizes were comparatively small. Future studies with larger groups of patients with known olfactory dysfunction will be necessary to fully validate the tests. It will also be necessary to formulate the tests in a compact delivery system that automatically delivers stimuli and records subject responses. Modern advances in digital technology for odor delivery and data capture will enable this goal. Moving from these initial studies to a standardized clinical test will need to take into account the optimal solvents to assure odor stability (37) and the effect that the delivery system has on test performance (38). Finally, although the prototype test discussed here was self-administered by healthy volunteers with minimal training in about 30 min, we recognize that the use of SMELL-S and SMELL-R in geriatric patients, especially those suffering from neurodegenerative disease, will require further adaptation. Developing a universal olfactory test to reliably diagnose smell dysfunction is of great clinical importance not only because of the negative effects of smell dysfunction on quality of life but because olfactory dysfunction is frequent, can be clinically managed, and may be an effective biomarker for predicting Alzheimer’s disease, Parkinson’s disease, and other neurodegenerative diseases (9, 39).

Materials and Methods General, Subjects. All behavioral testing with human subjects took place between March 2015 and December 2016 and was approved and monitored by the Institutional Review Board of The Rockefeller University in New York, except the Taiwanese arm of experiment 3, which was approved by the Institutional Review Board of Taichung Veterans General Hospital in Taichung, Taiwan. North American subjects were recruited by The Rockefeller University Clinical Research Recruitment and Outreach Support Service (40). Taiwanese subjects were recruited by the nursing staff of the Department of Otorhinolaryngology at the Taichung Veterans General Hospital (Taiwan). All subjects gave their written informed consent to participate in these experiments and were compensated for their time. All North American and Taiwanese subjects were able to understand and follow instructions in English or Mandarin, respectively. Subjects were aged 18 or over and agreed to refrain from using perfume or cologne and ingesting anything except water 1 h before the study visit. At the beginning of each visit, subjects washed their hands with fragrance-free soap. For subjects reporting a normal sense of smell and taste, we excluded subjects who presented with current or past history of conditions that might be related to smell loss (acute or chronic rhinosinusitis, nasal tumor, upper respiratory tract infection or head trauma that altered the sense of smell for more than 1 mo, history of brain or sinonasal surgery, asthma, stroke, neurodegenerative disease, radiation therapy or chemotherapy, active smoking, or consumption of medication affecting the sense of smell during the study). Participants with self-reported smell dysfunction were not subject to these exclusion criteria. All raw data in the paper, including details about the demographics of the subjects, odorants, and composition of the test stimuli are in Dataset S1. General, Tests. To allow for self-administration and automatic data collection, we designed a custom computer application that was used for the phenylethyl alcohol and butanol threshold tests and also the SMELL-S and SMELL-R tests. The testing station comprised a computer, wireless mouse, barcode scanner, and trays with numbered stimulus containers labeled with bar codes. Triangle tests were set up so that subjects were never tested with the same set of stimuli twice in a row, to avoid the situation where subjects remembered their answers from the previous trial. Subjects used a barcode scanner to register test data automatically. Subjects took between 20 and 35 min to complete each smell test, with the exception of the UPSIT, which took 10–15 min. A standard intertrial interval was imposed to avoid odor adaptation by requiring subjects to play a computer game for 20 s. SMELL-S and SMELL-R were created with four different mixtures of 30 molecules drawn from a panel of 109 monomolecular, intensity-matched chemicals. These odorants were selected from stimuli utilized in previous psychophysical studies (24, 41). We used only molecules that minimally activated the trigeminal system, because such stimuli can be detected by anosmic subjects (42, 43). A characteristic of trigeminal activation by a molecule is a fresh, cold, burning, eucalyptus, pungent, or tickling sensation. We used a lateralization task in which an odorant is applied into only one nostril to assign a lateralization score to each molecule. It is possible to localize the stimulated nostril if it activates the trigeminal system. In contrast, it is much harder to localize an olfactory stimulus (44). Lateralization tasks were self-administered by one investigator. Two disposable squeeze bottles were placed in a device facilitating simultaneous squeezing and stimulus delivery in each nostril. Only one bottle was filled with an odor stimulus. The tip of each bottle was fitted with a foam piece that conformed to the investigator’s nostril and was placed at the entrance of each nostril. The investigator squeezed both bottles simultaneously and attempted to localize which nostril had received the stimulus. After each task, the device was spun on a rotating platform to randomize the odor-stimulus side. The final score corresponded to the number of correct tasks. There were a total of 20 tasks (45). As a control experiment, we found that the lateralization score of the trigeminal stimulus eucalyptol [PubChem compound identification (CID): 2758] at pure concentration was high (median, 20; interquartile range, 19.25–20; four trials). The lateralization score of the olfactory stimulus vanillin (CID: 1183) at pure concentration was low (median, 6.5; interquartile range, 5–12.5; six trials). The difference between the lateralization scores of eucalyptol and vanillin was statistically significant (P = 0.0009, Mann–Whitney test). Each candidate for the mixtures was tested once. We included candidates with a score of 11 and below in the design of the mixtures (Dataset S1). To intensity-match molecules to be used in mixtures, odorants were diluted and three investigators individually classified them as “too weak,” “well matched,” or “too strong.” The concentration of too weak stimuli was increased and that of too strong stimuli decreased by a factor of 10. Weak components that could not be intensity-matched even at pure concentrations were excluded from the pool of odorants. We repeated this process until most of the components fell into the optimal intensity range. For 18 components investigators could not reach a consensus about intensity, but these were nevertheless used in the mixtures (CID: 1068, 7969, 31244, 9589, 17898, 104721, 3314, 14491, 62144, 7583, 7983, 60999, 251531, 7799, 61151, 9609, 8118, and 89440). With these components, we created four mixtures of 30 components. The SMELL-S (v1) mixture was used as the ODD odor in SMELL-R (v1), and the SMELL-S (v2) mixture was used as the CONTROL odor in SMELL R (v2). The mixtures for SMELL-R (v1) CONTROL odor and SMELL-R (v2) ODD odor were unique to these tests. Details of all mixtures are in Dataset S1. Stimuli for the threshold tests and SMELL-S were presented to subjects with amber glass vials (height, 95 mm; diameter, 28 mm). Stimuli for SMELL-R were presented to subjects with amber glass jars (height, 51 mm; diameter, 55 mm). The complete list of stimuli used in this study is in Dataset S1. Threshold Tests: Phenylethyl Alcohol and Butanol. Threshold tests were administered as a series of triangle tests. Subjects were presented with three vials: two contained 1 mL solvent (paraffin oil) and one contained either phenylethyl alcohol or butanol diluted in solvent in a total volume of 1 mL. Tests comprised 16 different concentrations generated by serial dilutions (1:2) of either odorant in paraffin oil, with the starting concentrations at 0.0313% for phenylethyl alcohol and 0.25% for butanol. The subject was prompted to sniff each vial and select the one with the strongest perceived odor using an adaptive staircase procedure commonly used in smell testing (26). If they were unable to detect any difference among the three vials, they were prompted to choose one at random. The procedure started at the lowest concentration. If they identified an incorrect vial, the second next higher concentration was presented and so on, until they identified the correct vial. If the subjects identified the correct vial, they were retested at the same concentration. If they identified the correct vial in this retest, they were tested at the next lower concentration. If they identified an incorrect vial, they were tested at the next higher concentration. A reversal is when the direction in which the concentration is changed reverses. The procedure ended after the seventh reversal, or after the subject failed the level with the highest concentration twice in row, or succeeded with the lowest concentration level five times in row. The threshold was defined as the average of the concentrations at which the last two reversals occurred. If the highest concentrations were not correctly identified twice, the score was 1. If the lowest was identified five times in a row, the score was 16. SMELL-S Olfactory Sensitivity Test (v1 and v2). For SMELL-S (v1) and SMELL-S (v2), we prepared 19 serial dilutions in paraffin oil (1:2) of two different mixtures of 30 monomolecular odorants and used the last 16 dilutions, such that the tests ranged from easiest (level 1, 1:8 dilution) to most difficult (level 16, 1:262,144 dilution). Subjects were asked to sniff three vials, one of which was filled with 1 mL of a mixture of 30 components and the other two were filled with 1 mL of solvent (paraffin oil). Subjects were instructed to pick out the one vial with the strongest perceived odor. If they were unable to detect any difference among the three vials, they were prompted to choose one at random. The procedure started at the lowest concentration (level 16). We calculated the SMELL-S sensitivity score following the same adaptive staircase procedure described above. For each subject, we measured the olfactory sensitivity with two versions of the test, SMELL-S (v1) and SMELL-S (v2), which differed only by the chemical composition of the mixtures. SMELL-R Olfactory Resolution Test (v1 and v2). For SMELL-R (v1) and SMELL-R (v2), we prepared 16 pairs of mixtures of 30 monomolecular odorants that differed in how many components the two mixtures in the pair share from 0% (easiest; level 1) to 96.7% (most difficult; level 16). To create 16 levels of increasing overlapping components, we progressively replaced components of a mixture of 30 molecules (we termed this the ODD odor) with components from another mixture of 30 components that did not change in composition across the levels (we termed this the CONTROL odor). Increasing the level of difficulty by one point corresponds to an addition of two overlapping molecules between both mixtures, except from level 15–16, where we added only one shared molecule. Stimuli (8 mL) were introduced into jars containing absorbent cotton pads. Subjects were asked to sniff the contents of three jars, one of which was filled with 8 mL of a mixture of 30 components and the other two were filled with 8 mL of a mixture of 30 components with different degrees of overlap with the first jar. Subjects were instructed to pick out the odd jar. If they were unable to detect any difference among the three jars, they were prompted to choose one at random. Triangle tests started at a medium difficulty (level 8). If they identified the incorrect jar, the next easier level was presented. We calculated the SMELL-R resolution score following the same adaptive staircase procedure described above. For each subject, we measured the olfactory resolution with two versions of the test, SMELL-R (v1) and SMELL-R (v2), which differed only in the chemical constituents of the two sets of mixtures. Sniffin’ Sticks Phenylethyl Alcohol Threshold Test. The Sniffin’ Sticks (26) threshold phenylethyl alcohol threshold test is a commercial product that uses felt-tip pens filled with odorant instead of ink for odor presentation. In this study, we used threshold module (2-phenyl ethanol) of the extended Burghart Sniffin’ Sticks test (item LA-13-00015; Burghart Messtechnik). The test comprises pens containing 16 serial dilutions of phenylethyl alcohol (1:2) in solvent (propylene glycol) with a starting concentration of 4%. The test was administered as a triangle test. Three pens were presented to the subjects by the investigator in a randomized order. Two pens contained the solvent only, and the third pen contained the diluted odorant. Subjects were blindfolded with a disposable mask because the color code of the Sniffin’ Sticks reveals which pen contains the odor, and subjects were asked to identify the pen with the strongest perceived odor. The procedure started at the lowest or second lowest concentration of odorant (level 16 or 15, respectively). We calculated the threshold score following the same adaptive staircase procedure described above except that the threshold was defined as the average of the last four reversals. UPSIT. The UPSIT (marketed as the Smell Identification Test by Sensonics International) is a well-validated and self-administered smell identification test widely used in the United States (46). The test consists of four different 10-page booklets, with a total of 40 stimuli. On each page, there is a different “scratch and sniff” strip that is coated with a microencapsulated odorant and four words to choose from to describe the smell. Subjects used the tip of a pencil to release the smell of the stimuli. Subjects sniffed the odorant and selected the one word among the four options (for example, “paint thinner,” “cherry,” “coconut,” or “cheddar cheese”) that most closely matched their perception of the smell. Subjects entered their answers to the 40 multiple-choice questions manually into a booklet, and investigators transferred the data manually into a spreadsheet. UPSIT performance was scored as the number of correct answers out of 40. We used the same North American UPSIT (46) on subjects at Rockefeller University and Taichung Veterans General Hospital. The Taiwanese subjects were given a reference sheet on which the English multiple-choice questions in the UPSIT booklets were translated into Chinese by R.-S.J. (21) (Fig. 5B). Experiment 1, Design. In this protocol (Rockefeller University IRB Protocol JHS-0862), we studied the test–retest reliability of SMELL-S and SMELL-R. We invited volunteers with self-reported normal sense of smell and taste to the Rockefeller University Hospital for six visits (Fig. 2A). During these six visits, six olfactory tests were performed, each of them once during a test session (visit 1–3) and then again during a retest session (visit 4–6). There was a gap of at least 1 wk between the last test visit (visit 3) and the first retest visit (visit 4) and a gap of at least 24 h between each of the other visits. At each visit, two of the six tests were performed. Although the order of the tests was randomized, in any visit where SMELL-R tests were administered, they were always administered after the SMELL-S or the threshold tests. This experiment was done between March and June 2015. Experiment 1, Subjects. Seventy-five subjects (43 female) participated in this experiment, with a mean age of 44 (range, 21–74). Thirty-four subjects self-identified as White, 26 as Black, 6 as Asian, 2 as mixed race, and 7 as Other. Eleven subjects self-identified as Hispanic. It took an average of 21 d (range, 14–38 d) for subjects to complete all six visits in this experiment. Experiment 1, Statistical Analysis. The ICC was used to measure absolute agreement between test and retest measures for the whole cohort. A sample of n = 75 subjects provided 95% confidence that the ICC in the population was larger than 0.67 based on a sample distribution that is centered on 0.8 (47). Bland–Altman plots were used as an auxiliary tool if significant differences in interindividual variability were found between compared tests (27) (Fig. 2B). We used the nonparametric Conover squared ranks test to assess equality of variance across threshold tests. Statistical significance was reached when P < 0.05 (Fig. 3A). Experiment 2, Design. This experiment was carried out under Rockefeller University IRB Protocol JHS-0922 and was designed to evaluate the accuracy of our tests and whether SMELL-S can distinguish between subjects with specific anosmia to phenylethyl alcohol but an otherwise normal sense of smell and subjects with smell dysfunction. During a single visit in December 2016, subjects performed four smell tests. The first two tests were either SMELL-S (v2) or the Sniffin’ Sticks phenylethyl alcohol threshold test. The order of these first two tests was randomized. It was followed by SMELL-R (v2) and finally the UPSIT, as a validated commercial reference test. The investigators enforced a break of at least 3 min between tests. During some of the breaks, participants filled out a questionnaire to provide demographic information and answer questions about their sense of taste and smell (Dataset S1). In seven cases in the UPSIT tests in experiment 2, subjects did not provide an answer to a given item, and this was scored as an incorrect answer. The missing data correspond to three subjects who missed one item each and two subjects who missed two items each. Experiment 2, Subjects. This experiment included 33 subjects (22 female), with a mean age of 48 (range, 21–76). Seventeen subjects self-identified as White, eight as Black, three as Asian, two as mixed race, one as other. Two subjects opted out of self-reporting race. Four subjects self-identified as Hispanic. We re-enrolled 23 subjects from experiment 1 who self-reported a normal sense of smell and taste. These 23 were selected based on their threshold test results to have approximately even representation of subjects with low, medium, and high sensitivity to phenylethyl alcohol. In addition, we recruited 10 subjects with self-reported smell dysfunction. The self-reported etiologies are reported in Dataset S1. Experiment 2, Statistical Analysis. We performed a power analysis and determined that a study with 32 subjects (8 with smell loss and 24 with a normal sense of smell) guarantees 80% power at 5% significance to detect an area under the ROC curve greater than 0.78. Since our actual study included 33 subjects, we carried out a post hoc power analysis using the parameters above to show that we can detect an area under the ROC curve greater than 0.79. We employed Youden’s Index (31) to find the best cutoff score for SMELL-S and SMELL-R to maximize correct classification of the olfactory sensitivity and resolution of a subject, respectively (Fig. 4 C and F). We used two-sided unpaired t test with Welch’s correction to test for differences between SMELL-S and SMELL-R score in normal and dysfunctional groups (Fig. 4 A and D). Experiment 3, Design. In this experiment, we investigated how SMELL-R performs on different populations by comparing Taiwanese Taichung Veterans General Hospital (IRB Protocol TCVGH CE16119B) and North American (Rockefeller University IRB Protocol JHS-0901) subjects. The North American subjects were tested at The Rockefeller University Hospital, and the Taiwanese subjects were tested in the Department of Otolaryngology at Taichung Veterans General Hospital. The experimental design was the same in both institutions. Each subject came to the test site for a single visit, during which subjects performed the SMELL-R (v2) and UPSIT, separated by a 10-min break, in randomized order (Fig. 5A). Experiment 3, Subjects. Thirty-six subjects were recruited at both sites. All subjects were born and raised in their respective country, had never traveled to the opposite country, and had a self-reported normal sense of smell and taste. In the North American group, the mean age was 25 (range, 19–30), 23 of 36 subjects were female, and 8 self-identified as White, 14 as Black, 4 as Asian, 9 as mixed race, and 1 as American Indian or Alaska native. Six self-identified as Hispanic. In the Taiwanese group, the mean age was 26 (range, 19–30), and 26 of 36 subjects were female. Although we recruited subjects with a self-reported normal sense of smell, two of the North American subjects had UPSIT and SMELL-R (v2) scores below the cutoff for olfactory dysfunction (Fig. 5 C and D). Experiment 3, Statistical Analysis. We used the unpaired t test with Welch’s correction to test for differences in smell test performance between North American and Taiwanese subjects (Fig. 5 C and D). Statistical Analysis. Normality of data were tested throughout using the Kolmogorov–Smirnov test, and the appropriate statistics were used according to the distribution of the data. SPSS (IBM) and Prism (Graphpad) were used for all statistical analyses.

Acknowledgments We thank our research volunteers for their time and interest in the study and the staff of The Rockefeller University Hospital Outpatient Clinic for invaluable support. Chris Vancil provided custom programming for the Rockefeller University Smell Study smell test computer interface, and Joel M. Correa da Rosa and Caroline Jiang provided expert biostatistical guidance. Yuanbo Wang provided a script to compute test scores in experiment 1. We thank Barry Coller, Ashutosh Kacker, Kevin Lee, and members of the L.B.V. laboratory for discussion and comments on the manuscript. This work was funded by the National Center for Advancing Translational Sciences (NCATS), National Institutes of Health (NIH), and Clinical and Translational Science Award (CTSA) Program UL1 TR000043. L.B.V. is an investigator of the Howard Hughes Medical Institute.

Footnotes This contribution is part of the special series of Inaugural Articles by members of the National Academy of Sciences elected in 2015.

Author contributions: J.W.H., A.K., R.-S.J., and L.B.V. designed research; J.W.H. and M.W. performed research; J.W.H., A.K., and M.W. analyzed data; and J.W.H., A.K., and L.B.V. wrote the paper.

Reviewers: A.G., Synesthetics, Inc.; and T.H., Technical University of Dresden.

Conflict of interest statement: J.W.H., A.K., and L.B.V. are inventors on US provisional patent application 62/528,420, filed July 3, 2017, by The Rockefeller University, relating to the smell test methods in this manuscript.

This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1711415114/-/DCSupplemental.