Dissolving the embryos for viable eggs

Head louse eggs were treated overnight with 12 M urea, 74 mM Tris base, and 78 mM dithiothreitol (DTT), which have been used to dissolve and study animal keratin20. When the urea-treated eggs were observed under the microscope, they were devoid of the developing embryos (Fig. S1). Furthermore, the clear pale-brownish supernatant of the urea-treated eggs contained proteins of louse origin, indicating that the treatment successfully dissolved the embryos (Table S1). Nevertheless, all attempts to dissolve the remaining nit sheath using organic solvents, such as DMSO, ethanol, and cyclohexane or detergents such as sodium dodecyl sulfate (SDS), TritonTM X-100, and N,N-dimethyldodecylamine N-oxide, failed. Thus, the insolubility of the nit sheath deterred the SDS-PAGE analysis of the nit sheath protein, and the relatively straightforward method of protein identification using mass spectrometry.

Infrared micro-spectroscopic analysis of the head louse nit

Alternatively, Fourier transform infrared (FTIR) spectroscopy is an established tool for secondary structure characterization of proteins, and has an advantage in using solid samples. Thus, an FTIR micro-spectroscopic analysis was performed on a single nit that was devoid of the developing embryo (embryo-cleared nit) to determine the secondary structure states of the proteins in the nit. The spectroscopic result indicated that all the infrared bands were of protein origin21 (Fig. 1A) with amide I peaks of 1610 cm−1, 1626 cm−1, 1651 cm−1, and 1691 cm−1 (Fig. 1B). Unfortunately, the exact secondary structure content of the nit, expected to be mixtures of α-helices, β-strands and even denatured aggregates, were undeterminable due to the complexity and the overlap of peaks representing certain secondary structure elements in the spectrum.

Figure 1 FTIR analysis of the head louse nit. (A) FTIR micro-spectroscopic analysis was performed on a single nit devoid of the embryo, which supported that all the peaks are of protein origin. (B) The deconvoluted amide I bands are centered around 1610 cm−1, 1626 cm−1, 1651 cm−1, and 1691 cm−1, however they could not be used to determine the exact secondary structure content of the nit protein. Full size image

Amino acid composition analysis of the head louse nit

Since the insolubility of the nit sheath deterred using mass spectrometry for protein identification, the tissue-cleared nits were subjected to amino acid composition analysis by HCl hydrolysis, phenylisothiocyanate (PITC)-derivatization of the amino acids, and subsequent HPLC analysis (Figs 2 and S2). The result indicated that the nit sheath consists mostly of Gly (25.4%), Glx (Glu or Gln, 24.4%), Ala (20.2%) and Val (10.0%) residues (Table 1). The contents of cysteine and tryptophan amino acids could not be determined due to the harsh HCl hydrolysis condition, which degrades these amino acids.

Figure 2 Amino acid composition analysis of the head louse nits. The amino acid composition analysis was performed on the nits that are devoid of the embryos. The elution volumes and the areas under the amino acid peaks of the sample chromatogram were compared to that of the standards to determine the amino acid types and the relative mole percentage of the amino acids composing the nit. The contents of cysteine and tryptophan were undetermined due to the harsh HCl hydrolysis condition. Full size image

Table 1 Experimentally determined amino acid composition of the head louse nit sheath. Full size table

Bioinformatic search for candidate nit sheath proteins using the body louse genome

Due to the incompleteness of the human head louse genome sequence, the whole genome of human body louse was used as the reference for our bioinformatic search for the identification of candidate proteins that comprise the head louse nit sheath. The sequence-derived compositions using 18 amino acids of all protein products deduced from the body louse genome were compared with the experimental contents resulting from the amino acid composition analysis of the tissue-cleared nit. A root sum squares (R) of the offsets between the 18 amino acids (excluding cysteine and tryptophan) of the 10,773 human body louse proteins and the experimental values of the nit amino acid composition were calculated where the R-values ranged from 0.083 to 0.506 (Fig. 3). From this analysis, the candidate proteins of the louse nit sheath were narrowed down to the four gene products of PHUM596000 (R = 0.083), PHUM595880 (R = 0.095), PHUM403440 (R = 0.138), and PHUM595890 (R = 0.139) (Fig. 3 and Table S2). Other louse proteins with R-values greater than 0.2 showed significant discrepancies in the contents of Gln, Gly, Ala and Val residues. Of these four candidate genes of the nit sheath, PHUM596000, PHUM595880, PHUM595890 are reported to be complete sequences whereas PHUM403440 is an incomplete partial sequence. In genomic context, PHUM595880 and PHUM596000 are present in a tandem array whereas PHUM595890 is positioned in the reverse orientation between PHUM595880 and PHUM596000. Close inspection of PHUM595890 indicated multiple undetermined bases in the original shotgun reference sequence. PHUM596000, PHUM595880, and PHUM595890 are expected to encode proteins of 569, 434 and 127 amino acids respectively (Table S2). Being a partial sequence, the location of the PHUM403440 in the body louse genome has not been determined. The partial sequence available for PHUM403440 is expected to encode a protein of at least 169 amino acids (Table S2). With respect to the high Gln content found in the amino acid analysis, all the protein products of the four genes had several polyglutamine (polyQ) regions where tandem consecutive Gln residues are seen. The functions of these four genes, however, are all unknown and hence have been annotated as hypothetical proteins. NCBI-BLAST protein searches using these protein sequences showed no similar protein from other organisms in database.

Figure 3 Bioinformatic search of candidate protein. Root sum squares (R) of the 18 amino acid (excluding cysteine and tryptophan) offsets between the amino acids of 10,773 proteins encoded by body louse genome and the amino acid analysis result of the nit sheath were calculated. The R values ranged from 0.083 to 0.506, and the four candidate proteins with lowest R (shown in arrows) were identified as the candidate proteins of the nit. The values on the horizontal axis are from the order of body louse genes listed in NCBI. Full size image

DNA amplifications and sequence analysis of the candidate nit sheath genes using the body and head lice cDNA

In order to confirm that the candidate nit sheath genes are actually transcribed into mRNA in body louse, PCR amplifications of the four genes were performed using the body louse cDNA generated from female lice mRNA. Although the relatively small-sized PHUM595890 and PHUM403440 failed to amplify, the DNAs of PHUM595880 and PHUM596000 were obtained (Fig. S3), strongly suggesting that PHUM595880 and PHUM596000 are proteins encoded by genes of human body lice. Because the whole genome sequence of head louse is also known, the head louse homologs of the four candidate nit sheath genes were searched and those for PHUM403440, PHUM595880 and PHUM596000 found (Fig. S4). The PCR products of PHUM595880 and PHUM596000 in both head and body lice were successfully sequenced using the Sanger method. The sequencing results also confirmed that there are no introns in the PHUM595880 and PHUM596000 genes. Interestingly, our overall nucleotide sequence of body louse PHUM595880 suggested that its protein sequence is 100% identical to the partial sequence of body louse PHUM403440 (Fig. S4A). Also, the protein product of head louse PHUM595880 is 95.8% identical to that of the head louse PHUM403440 at the aligned regions (Fig. S4B). Moreover, the protein product of body louse PHUM595890 with a total of 127 amino acids (Table S2) differs from body louse PHUM596000 mostly due to alterations in the N-terminal 37 residues that harbor the signal sequence (Fig. S4C). Other than this, it is nearly identical to PHUM596000 with only two residues out of the remaining 90 residues in C-terminal region differing (Fig. S4C). For these reasons, we regard that PHUM403440 and PHUM595890 are partial copies of PHUM595880 and PHUM596000, respectively, and only analyze the gene products of PHUM595880 and PHUM596000 hereafter calling them louse nit sheath protein (LNSP) 1 and LNSP2, respectively.

Comparative molecular characterization of deduced amino acid sequences of head and body lice LNSP1/2

The overall amino acid sequences of head and body lice LNSP1/2 deduced from the Sanger sequencing were aligned for comparative characterization of the proteins (Fig. 4). The N-terminal regions of 18 amino acids in all four proteins are predicted as signal sequences. Subsequent regions of the proteins can be divided into three domains which hold characteristic repeating sequences (Figs 4 and S5). It is noteworthy that the three domains are connected by non-repeating sequences that are also conserved between LNSP1 and LNSP2, and that all three domains are more elongated in LNSP2 than in LNSP1 (Fig. 4).

Figure 4 Protein sequence alignment of candidate louse nit sheath proteins (LNSP1 for protein product of PHUM595880, and LNSP2 for protein product of PHUM596000). The LNSP1 and LNSP2 are 69% identical in deduced amino acid sequence. Both proteins contain the N-terminal sequences that are predicted as signal sequences (colored in blue). Subsequent regions can be divided into three domains each holding characteristic repeat sequences, which are the polyQA sequence (N-terminal domain), the polyGA sequence (middle domain), and the polyQ sequence (C-terminal domain). Conserved residues of Gln (capitalized in red) and Gly (in yellow background) are shown. The protein sequences are translated from our DNA sequencing results. Full size image

The N-terminal domain is represented by the polyQA (also partly QS) sequence and the middle domain by the polyGA (also partly GV, GS or GG) sequence. The characteristic two-residue repeats of these two domains are indicative of a β-strand folding that may further pack laterally into interacting β-sheets. In the N-terminal domain, the alternating Gln-Ala (or Ser) residues would place the side chains of Gln on one side and Ala (or Ser) on the other. When the four protein sequences of head and body lice LNSP1/2 are compared, the N-terminal polyQA domain holds interesting polymorphisms of insertions and deletions (i.e. only head louse LNSP1 has an extra 15-residue insertion and head louse LNSP2 has a deletion of 21-residue, Fig. 4). In general, more elongated repetitions of QA sequences (65–70 residues) for only LNSP2 lead to larger LNSP2 size compared to LNSP1.

Similar to the N-terminal polyQA domain, the two-residue repeats in the middle polyGA domain would place the small hydrogen side chains of Gly on one side with Ala (also Val, Ser or Gly) projecting on the other. Such repeating sequences would allow efficient interaction between the β-strands as observed in the Gly-Ala or Gly-Ser repeats of insect and spider silk fibroins. Also in this domain, two regions of 4–7 Ala residues in tandem exist. After this region, an insertion of 24-residue is only observed in LNSP2.

In contrast to the two-residue repeats of the N-terminal and middle domains, multiple polyQ sequences of 5–10 Gln residues in tandem are found in the C-terminal domain. The aligned sequences of the four proteins suggest at most nine of these polyQ regions (termed polyQ1–Q9) in LNSP1 and LNSP2 (Fig. 4). Most dramatic differences between LNSP1 and LNSP2 takes place in this C-terminal polyQ domain. For instance, the exact numbers of Gln residue in these polyQ regions are not conserved between LNSP1 and LNSP2. Also, polyQ3 and polyQ4 regions of LNSP2 are combined into one in LNSP1, and polyQ7 is only seen in LNSP2. More interestingly, regions harboring polyQ9 is deleted in only body louse LNSP1.

Head louse LNSP1 and LNSP2 show 68% identity in their overlapping amino acid sequence, which is similar to the 69% identity observed for that of the body louse LNSP1 and LNSP2 (Fig. 4). However, considerable differences in the amino acid sequences of LNSP1 (Fig. S6A) and LNSP2 (Fig. S6B) were observed between the body and head lice. The sequence variations between head and body lice LNSP1s (or LNSP2s) resulted in 3–4% differences in amino acids (For detailed comparisons refer to SI Results and Fig. S6). Moreover, some single residue variants as well as deletion and insertion changes were identified in body louse LNSP1 and LNSP2 when our sequence of the San Francisco (BL_SaFr) strain and the previously reported 2010 sequence12 of the Culpepper (BL_Culp) strain were compared (as refer to SI Results and Fig. S6). The results indicated high levels of polymorphisms in the two proteins. Overall, 1.6% of amino acids were different in both LNSP1 and LNSP2 between the BL_Culp sequence and our BL_SaFr sequence. Hence the inter-subspecies sequence variations (3–4%) between head and body lice LNSP1s (or LNSP2s) were higher than the intra-subspecies differences (1.6%) for the polymorphisms between LNSP1s (and LNSP2s) of the two body louse strains (BL_Culp vs. BL_SaFr). In any case, these overall changes in the two proteins had a small impact on the amino acid content determined; hence the previous bioinformatic search is still valid.

Accessory gland-specific expression of LNSP1 and LNSP2 in head louse gravid females

To quantify the transcription levels of LNSP1 and LNSP2 in different developmental stages of head louse, qPCR experiments were conducted (Fig. 5A). Both LNSP1 and LNSP2 were most predominantly transcribed in the 5-day old females, and then followed by the 1-day old females, neonates, 1-day old males, 5-day old males and 5-day old nymphs. The relative transcription levels of LNSP1 and LNSP2 in the 5-day old females were 5,900- and 13,800-fold higher compared to those in the 5-day old nymphs, respectively. When the transcription levels in the 5-day old females were compared with those in males, the fold differences were more than 3 orders of magnitude for both LNSP1 and LNSP2. In addition, the 5-day old gravid female exhibited significantly higher transcription levels of LNSP1 and LNSP2 (7.1 and 10.1 fold, respectively, p < 0.0001, ANOVA in conjunction with Tukey’s test) compared to the 1-day old female, suggesting that the expression of LNSP1 and LNSP2 is specifically associated with egg-laying stage. The transcription levels of LNSP1 and LNSP2 in the 5-day old females were almost identical (p = 0.565, ANOVA in conjunction with Tukey’s test). Interestingly, the overall transcription levels of LNSP1 and LNSP2 in the 5-day old female were 52–65 fold higher compared to that of the actin-5c, an internal reference gene typically showing a high expression level, suggesting that these two genes are expressed in large amounts during the oviposition period.

Figure 5 Temporal and spatial transcription profiles of LNSP1 and LNSP2 in various developmental stages (A) and different female organs (B) of head lice. The ****mark indicates the statistically significant (p < 0.0001) mean value as judged by one-way ANOVA in conjunction with Tukey’s test. Abbreviations in the horizontal axis represent the following: Neo, neonate; 5D-N, 5-day old nymph; 1D-F, 1-day old female; 1D-M, 1-day old male; 5D-F, 5-day old female; 5D-M, 5-day old male; AG, accessory gland, Ov, ovary; AT, alimentary tract. The transcription levels of both LNSP1 and LNSP2 in 5D-N, 1D-M, 5D-M and AT were too low (0.005~0.022) to be seen in the graph. Full size image

To identify the tissues expressing LNSP1 and LNSP2 in gravid females, the spatial transcription levels of LNSP1 and LNSP2 in the accessory gland, ovary and alimentary tract were determined by qPCR (Fig. 5B). Both LNSP1 and LNSP2 were exclusively transcribed in the accessory gland. Only low levels of transcription were observed in the ovary. Transcription levels of LNSP1 and LNSP2 in the accessory gland were 170- and 240-fold higher compared with those in the ovary, respectively (p < 0.0001, ANOVA in conjunction with Tukey’s test). Little detectable level of transcription was observed in the alimentary track (0.011 and 0.004 relative transcription levels for LNSP1 and LNSP2, respectively). These findings demonstrate that both LNSP1 and LNSP2 are specifically expressed in the accessory gland, and thus constitute a major part of the nit sheath protein.

Recombinant expression of partial body louse LNSP1 and its characterization

In order to characterize the function of our identified nit protein, gene encoding body louse LNSP1 was cloned into an E. coli expression vector, and tested for expression as a water soluble form. While the full LNSP1 (residues 19–438) without the putative signal sequence (residues 1–18) remained within the bacterial inclusion body during expression and hence failed to be purified as a soluble form (results not shown), a partial LNSP1 without the polyQ C-terminal domain (residues 19–303 containing only the N-terminal and the middle domains) was obtainable in a soluble form. Although the protein was soluble initially at low concentration, it turned into a tacky state with concentration during the purification step. This concentrated partial LNSP1 solidified into a thin film upon evaporation of water when applied on solid surface, and was used to glue human hair onto plastic, or stick together laboratory plastic wares (Fig. 6A). Moreover, the partial LNSP1 was also tested for adhesive strength on polypropylene (PP) films, which was measured using a universal testing machine (UTM) to pull apart the glued PP films. The adhesive property of the partial LNSP1 was compared to chymotrypsin, bovine serum albumin (BSA), and the commercially available fibrin-based biological adhesive (Tisseel®). The partial LNSP1 showed stronger adhesive property than the chymotrypsin and BSA (Fig. 6B). Also, the partial LNSP1’s adhesive property was comparable to fibrin even when the applied amount on the PP film was ~500-fold (in grams) less.