Between 2000 and 2015, insecticide-based control interventions targeting mosquito vectors averted an estimated 537 million malaria cases1. Nevertheless, malaria still kills hundreds of thousands of people each year (445,000 in 2016), mainly in sub-Saharan Africa2. Additionally, there is concern that progress may have stalled after more than a decade of success in global malaria control2. Of major concern is the increase in insecticide resistance among mosquito populations throughout Africa3, which is degrading the lethality and effectiveness of vector control tools, notably indoor residual spraying (IRS) and long-lasting insecticide treated nests (LLINs) which have been the cornerstones of malaria control in the past decades4. Indeed, much of the effectiveness of LLINs and IRS comes from community-wide reductions in vector population size, not merely from preventing people from getting bitten5.

Measurement of female mosquito vector survival is an important biological determinant of malaria transmission intensity6,7. This is because malaria parasites (Plasmodium spp.) require more than 10 days of incubation inside female mosquito vectors (extrinsic incubation period, EIP) before they become infectious8–11. While there is uncertainty about mosquito survival in the field, crude estimates suggest the median lifespan of African malaria vectors is 7–10 days12. Thus, only relatively old mosquitoes can transmit the parasite13. As a result, even minor reductions in mosquito survival can have exponential impacts on pathogen transmission10,14. Consequently, accurate and high-resolution estimation of both mosquito abundance and longevity is essential for the assessment of the impact of these and other control measures.

Despite the crucial importance of mosquito demography to vector control, there are few reliable tools for rapid, high-throughput monitoring of mosquito survival in the wild. Conventionally, mosquito age has been approximated by classifying females (the only sex that transmits malaria) into groups based on their reproductive status as assessed through observation of their ovarian tracheoles15. This widely-employed technique distinguishes females who have not yet laid eggs (nulliparous) from those that have laid at least one egg batch (parous), with the latter group assumed to be older than the former because the gonotrophic cycle between blood feeding and oviposition takes ~ 4 days. While useful for approximating general patterns of survival16, this method is crude and cannot distinguish between females who have laid eggs only once or multiple times. Alternatively, more refined methods have been developed to estimate the number of gonotrophic cycles a female mosquito has gone through based on follicular relics or dilatations formed during each oviposition17, although the conversion between gonotrophic cycles and actual age is imprecise (especially now that LLINs are limiting regular access to blood-meals)18. While an improvement on the simple parity classification method, this approach is extremely technically demanding and time-consuming19. Additionally, it is unsuitable for analysis of the large sample sizes necessary for estimating mosquito population structure20.

Given these problems with ovary-based assessment, there has been significant investigation of alternative, molecular-based approaches to estimate mosquito age. These methods include: counting cuticle rings representing daily growth layers of the mosquito skeletal apodemes21, chromatographic analysis of cuticular hydrocarbon chains22, assessment of pteridines using fluorescence techniques23, transcriptomic profiling24, and mass spectrometric analysis of mosquito protein expression25. However, thus far the level of accuracy, high cost, and/or need of highly trained users suggest that they might not be suitable for application in the field.

In addition to age, identification of mosquito species is crucial for estimation of malaria transmission dynamics. In Africa, the bulk of malaria transmission is carried out by members of the Anopheles gambiae sensu latu and Anopheles funestus sensu latu species compleses26. The An. gambiae s.l. complex includes several cryptic species that can only be distinguished by molecular analysis27–29. Despite being morphologically identical, members of this group vary significantly in behaviour, transmission potential, and response to vector control measurements30. For example, two major vectors in the An. gambiae s.l. group, An. arabiensis and An. gambiae, can differ in their propensity to enter and rest in houses, their host species choice, breeding conditions, resistance to insecticides, and tolerance to dry climates6,31,32. Currently, An. gambiae s.l. species are best distinguished by polymerase chain reaction (PCR) methods33,34, which are time-consuming and expensive, and can thus only be carried out on a subsample of mosquitoes collected during entomological surveillance. Alternative techniques have been developed such as isoenzyme electrophoresis35 or chromatography of cuticular components23, but these are also very laborious and have weak discriminatory power36.

As in the case of age determination, non-PCR-based methods often rely on structural and chemical differences in the cuticle between species. In particular, near-infrared spectroscopy (NIRS) has been evaluated as a general strategy for the discrimination of insects according to their species and other traits since it does not require reagents and holds promise as a fast, practical, non-destructive, and cost-effective method for entomological surveillance. The results obtained to date have proved that the chemical composition of mosquitoes and other insects not only changes between species37–39, also across different age38,40–43, according to resistance to insecticides44 and in the presence of an infectious agent45,46. While promising, the NIRS typical approach has certain drawbacks. As it employs the most energetic portion of the infrared spectrum, the absorption bands are generated by two indirect processes: overtones (a vibration excited at a multiple of the fundamental frequency) and combinations (two or more fundamental vibrations excited simultaneously). Both processes are more incoherent and less frequent than the absorption of light by fundamental vibrations, so their absorption bands are wide and weak. As a result, NIR spectrum of a mosquito, formed by a combination of dozens of these bands, consists of a few features standing out against a background of continuous absorption47. Also, most NIRS analyses use a dispersive method to collect the absorption spectra from insects, so the reflectivity of the sample is not controlled and the intensity of the bands of the spectrum depends on how the mosquito is placed in the spectrometer. In addition, the results are normally analysed using Partial Least Squares (PLS) regression, which is prone to over-fitting (i.e. the production of a model that corresponds too closely to a particular set of data and may therefore fail to predict future observations reliably)48. This problem commonly arises when the number of samples is relatively small and the number of variables is large.

Here we tested if these limitations can be overcome by shifting the measurement range (25,000-4,000 cm-1) to the mid-infrared region (4,000-400 cm-1), employing an attenuated total reflectance (ATR) device to assess the mosquitoes, and modelling the results with supervised machine learning. The mid-infrared absorption spectrum of a mosquito contains a set of discrete well-delineated bands that depend on the fundamental vibrations of the molecules present in the cuticle, providing a wealth of information not present in the near-infrared range, where it is not possible to capture the contributions of different biochemical components of the mosquito to the spectrum and their variations among mosquitoes with different attributes, as shown in Aedes aegypti and the diptera Culicoides sonorensis49,50. However, since the mid-infrared spectral bands are affected in non-trivial ways by the development of a mosquito and the changing composition of the cuticle, it is not possible to predict traits by simply monitoring changes in band intensities49.

Here, we show that the use of supervised machine learning51 allows the determination of the age and species of two major malaria vectors, An. arabiensis and An. gambiae, from the information contained in their mid-infrared spectra. This is possible because machine learning, unlike standard statistical approaches, can recognise the complex relationships in these traits (mosquito species and mosquito age) and disentangle them from other irrelevant variation52–54. Using this approach, we are able to reconstruct simulated age distributions of mosquito populations with unprecedented reliability. The technique we propose here is time efficient (an analysis takes less than one minute per mosquito), economical, and requires neither reagents nor highly trained operators. It also represents a novel approach to the analysis of insects using spectroscopic techniques, solving some previous drawbacks, and accelerating progress towards the establishment of infrared spectroscopy as a routine approach for mosquito surveillance and evaluation of interventions.