Electron density in Foldit

To facilitate work on electron density data in Foldit, new visualizations and tools, along with a tutorial puzzle to introduce them, were developed and distributed to Foldit players in periodic software updates. Electron density maps in Foldit are displayed as a visual guide in the form of an isosurface. Players have control over parameters of the density isosurface, such as the contour level, surface texture, transparency and colour, and can tag regions of the density with notes. After initial testing, it was clear that density visualization alone was insufficient to improve model building by Foldit players. Players simply ignored the density, finding that their existing, familiar strategies were most competitive on Foldit leaderboards. In response, we adapted the Rosetta fit-to-density score term elec_dens_fast into the Foldit score function6. This not only provides competitive incentive to match the density, but also allows Foldit automated tools such as structure minimization to be guided by electron density, similar to crystallographic real-space refinement. Under this configuration, players were able to fit models to several experimental electron density maps with high accuracy. An important feature was added later that allowed players to trim excess density that was distant from the player’s model from the visualization. According to Foldit player testimony, this feature has proved invaluable on certain experimental density maps where it allowed a clearer interpretation of relevant density. To protect the integrity of unpublished crystallographic work, electron density data were obfuscated before online distribution to Foldit players.

The competition

Phenix Autosolve7,8, with model-building disabled, was used to create density-modified maps of selenenomethionine (SeMet) YPL067C. To make the map manageable for the Foldit program, the map was masked beyond 5 Å from the initial solution at the start of the competition. This map was given to Foldit players, MCDB411 students and the experienced crystallographers for model building. The individual responsible for model building and refining the initial structure solution of YPL067C, before the contest was initiated, had no contact with any of the competitors.

Sixty-one students in the University of Michigan undergraduate class MCDB411 (Introduction to Protein Structure and Function) were introduced to the assignment through a description of the previous iteration of the assignment in class4, together with a 1.5 h lecture on X-ray crystallography. Students then had two in-class computer laboratory sessions in which features of Coot were presented. In the first 1.5 h lab session, the students were given basic instructions on opening electron density maps and molecules, changing map levels, scrolling and changing map size, finding secondary structure elements, converting C α representations to all-atom molecules, placing helices and strands, adding terminal residues, real-space refinement, controlling regularization and refinement, rotating and translating atoms and residues, viewing the skeleton, mutating residues, and changing rotamers. The instructors suggested that changing the weighting of the real-space refinement from the default value of 60–10 and making subsequent changes to this value as needed could help in the building process. In the second 1.5 h lab session, the students were taught how to merge molecules, look for grouped tryptophans, phenylalanines and/or tyrosines as starting places for building, and use validation tools such as density fit analysis, geometry fit analysis and unmodelled blobs. Four instructors were present in the first lab session and three in the second to answer questions on the operation of Coot. Starting from the initial lab session, students were given a total of 1 month to complete the assignment. During this period, the instructor held walk-in help sessions twice a week for 1.5 h each and answered questions on the operation of Coot as well as general model-building questions. Common questions included how to identify density for specific sequences, how to correctly merge molecules and how to approach gaps in electron density. Regarding gaps in density, students were told to model through gaps only if they were confident that the modelling would be correct based on the size of the gap and the number of residues they were modelling in. Students were not told what to do in specific cases of building through disordered segments. They were informed that water molecules would not be included in grading. One student asked whether there were external validation tools that could help and was told that the Molprobity server might be useful. Students were allowed to discuss the project and ask each other questions, but were required to do their own model building.

A Foldit puzzle was posted online with the masked electron density map and a model of the target polypeptide in fully extended conformation. Players were challenged to fold the extended polypeptide into the electron density map to achieve a good fit to density. Any advice given to MCDB411 students by the instructors as to how to begin model building was also posted on the Foldit messaging board. After 4 weeks, the puzzle was closed and 900,000-player models were scored and ranked according to the Rosetta energy function. The top scoring models were clustered into a set of 1,000 such that no two aligned to <1.0 Å C α r.m.s.d. To this clustered set, we added the 50 best unique models produced by Foldit teams or soloists, as well as any models flagged by Foldit players for special consideration—1,094 Foldit models in total. The puzzle was open to Foldit players for 28 days. Members of the winning team began playing the day after the puzzle opened, and produced the winning structure ∼23 days later—4 days before the puzzle closed.

Two trained crystallographers were given the same number of days for model building as the students and Foldit players. They were given specific instructions not to use tools outside of Coot or Molprobity and not to interact with each other during model building. The trained crystallographers spent ∼8 and 14 h, respectively, working on the puzzle. The trained crystallographers reported using the following approach to the puzzle, which corresponds well with what the instructor observed with many of the undergraduate students. First, they looked for large density blobs that might correspond to large aromatic side chains such as Trp, Tyr or Phe. Working forwards and backwards from the Trp–Phe–Val–Asn sequence proved particularly useful. Modelling in a few of these large residues led to the assignment of density to sequence location. The direction of the polypeptide chain was reversed on a few occasions, but was fixed by looking at the carbonyl density. The Find Secondary Structure tool in Coot was used, especially for regions where the density was poor. Real Space Refine Zone was used with the refinement weight set to 20 or 10, based on the instructor’s suggestion for building in an unrefined map. Regions where the density was very poor and decisions had to be made about whether to keep trying to build or not proved to be the hardest part of the task. The trained crystallographers reported that at first they did build in these sections of poor electron density. However, when they realized the extent of the guessing involved, they subsequently removed most of the model in these areas. After modelling in the residues, the trained crystallographers used the validation tools in Coot, including Ramachandran plot, Rotamer analysis and Density Fit analysis, which flagged areas with poor geometry. They also ran the structure through MolProbity, which gave similar results to the Coot validation tools. Finally, the crystallographers fixed problem areas as best as possible with the Coot modelling tools, such as Flip Peptide, Rotamers, Regularize Zone and Real Space Refine Zone. When asked to describe the difficulty level of this assignment, the trained crystallographers rated it as somewhat difficult (on a scale of: very difficult, somewhat difficult, neither easy nor difficult, somewhat easy and very easy).

Phenix Autosolve7,8 was run with default parameters (using phase_and_build) to produce the Autosolve model. The MR-Rosetta model was obtained by relaxing and rebuilding the Autosolve model in the same electron density map provided to human groups, using Rosetta mr_protocols9 with nstruct=10 and selecting the model with the lowest R free . ARP/Warp19 and Phenix Autobuild20 did not create models of as high quality as Phenix Autosolve or MR-Rosetta, and were thus not analysed in the competition.

After completion of the competition, all structures were automatically refined using Phenix to analyse the results. The refinement strategy included XYZ coordinates, temperature factors and updating waters. Notably, the best structures from Foldit, as measured by R free , came from the group of highest-scoring Foldit structures according to Foldit score.

Bioinformatics

YPL067C sequence conservation was analysed using a four-iteration PSI-BLAST of the UniRef50 database, with an E-value cutoff of 0.005. No sequences in the PDB were found. Sequence conservation was projected onto the structure of YPL067C using the Consurf server21. The top DALI16 match to the crystal structure of YPL067C was to a HIT protein of unknown function from Clostridium difficile (PDB: 4EGU), with a Z-score of 4.9. The top 47 hits were all HIT proteins with Z-scores ranging from 4.9 to 4.2 (Z-scores >2.0 are considered significant). Secondary structure predictions for the competition were generated using PSIPRED22.

Protein expression and purification

The gene for YPL067C was amplified from yeast strain Y2HGold (Clontech) and cloned into a pET28-sumo plasmid using primer 1 (5′- AAATATGGATCCATGCAACAAGATATCGTCAACGATCACCAG -3′) and primer 2 (5′- AAATATCTCGAGTCAGGCAAGTGGCTCGAAACC -3′). pET28-sumo-ypl067C was transformed into Escherichia coli BL21(DE3) cells.

Cells were grown at 37 °C overnight in 100 ml Luria Broth (containing 100 μg ml−1 kanamycin), and 10 ml was used to inoculate 1 litre Luria Broth (containing 100 μg ml−1 kanamycin). At early log phase, the temperature was reduced to 20 °C and 0.1 mM isopropyl β-D-1-thiogalactopyranoside was added to induce expression overnight. Cells were collected by centrifugation and resuspended in 100 ml lysis buffer (40 mM Tris, 10 mM sodium phosphate, 400 mM NaCl, 10% glycerol, 10 mM imidazole, pH 8.0) enriched with 1 mg ml−1 DNaseI, 1 mM MgCl 2 and two tablets of complete EDTA-free protease inhibitor (Roche). Cells were lysed by two French press cycles at 1,300 p.s.i. and centrifuged at 37,000 g for 30 min at 4 °C. The supernatant was run through a Ni-HisTrap 5 ml column (GE Healthcare) pre-equilibrated with lysis buffer at a rate of 1.5 ml min−1. Following binding, the column was washed with 60 ml lysis buffer. The protein was eluted with 20 ml lysis buffer enriched with 500 mM imidazole. To cleave the sumo-His × 6 tag, 10 μl ULP1 protease (from stock of 50 mg ml−1) was added to the eluted solution. A volume of 10 μl β-mercaptoethanol was added and the solution was dialysed overnight in 40 mM Tris, 10 mM sodium phosphate, 400 mM NaCl, 10% glycerol, pH 8.0. To remove the tag, the solution was run through a Ni-HisTrap 5 ml column (GE Healthcare) pre-equilibrated with dialysis buffer at a rate of 1.5 ml min−1, and the flowthrough was saved and diluted in eight volumes of 20 mM Tris, pH 8.0. The protein was then run through a HiTrap Q HP 5 ml column (GE Healthcare), and the flowthrough contained >95% pure YPL067C as measured by SDS–polyacrylamide gel electrophoresis. Before each experiment, YPL067C was exchanged into appropriate buffer. Expression and purification of SeMet YPL067C was performed with the same protocol except a methionine auxotroph variant of E. coli BL21(DE3) and SelenoMethionine Medium Complete (Molecular Dimensions) were used.

α-Synuclein was expressed and purified using the protocol described previously23 with minor modifications. In brief, 1% of the overnight grown culture was transferred in fresh media and induced with 0.8 mM isopropyl β-D-1-thiogalactopyranoside for 4 h after the optical density of the culture reached 0.6. The induced cells were pelleted at 4,000 r.p.m. and resuspended in 25 ml lysis buffer (10 mM Tris, 1 mM EDTA, pH 8). The lysed cells were then boiled at 95 °C for 15–20 min and centrifuged at 11,000 r.p.m. for 20 min. The supernatant was thoroughly mixed with 10% streptomycin sulfate (136 μl ml−1) and glacial acetic acid (228 μl ml−1) then centrifuged at 11,000 r.p.m. for 30 min. To the clear supernatant, an equal volume of saturated ammonium sulfate was added, and the solution was incubated at 4 °C for 1 h with intermittent mixing. The precipitated protein was separated by centrifugation at 11,000 r.p.m. for 30 min. The pellet was dissolved in equal volumes of absolute ethanol (chilled) and 100 mM ammonium acetate. Finally, the pellet was washed (twice; optional) with absolute ethanol, dried at room temperature and resuspended in 10 mM Tris, pH 7.4. The protein solution was filtered through a 50 kDa cutoff column (AMICON, Millipore) followed by ion-exchange chromatography (Q-sepharose) against a NaCl gradient. The fractions of pure protein eluted at ∼300 mM NaCl were checked on SDS–polyacrylamide gel electrophoresis and the molecular weight was confirmed by mass spectrometry. The pure fractions were pooled and dialysed overnight against buffer (10 mM Tris and 50 mM NaCl, pH 7.4). The concentration of α-synuclein was determined using ɛ 280 =5,600 M−1 cm−1. The purified α-synuclein was stored at −80 °C at a concentration of ∼100 μM until use.

Aβ 1–40 peptide was purchased from AlexoTech AB (Umeå, Sweden) and prepared as previously described24. Aβ 1–40 peptide was dissolved in 10 mM NaOH to a peptide concentration of 1 mg ml−1 and then sonicated for 1 min in an ice bath before dilution in the assay buffer. The preparations were kept on ice. α-Lactalbumin (aLA) from bovine milk (cat: L6010) and porcine citrate synthase (cat: C3260-5KU) were purchased from Sigma Inc. RCMaLA was prepared as previously described25. aLA (500 μM; freshly prepared in water) was incubated with 1 mM dithiothreitol in 0.5 M Tris and 1 mM EDTA, pH 7.0, for 10 min, then 3 mM iodoacetic acid (out of 1 M stock solution in water) was added and the solution incubated for another 30 min. aLA was then dialysed into 50 mM phosphate buffer, pH 7.0, 100 mM KCl, 10 mM MgCl 2 .

Protein crystallization

Native and SeMet YPL067C crystals were grown at 20 °C by vapour diffusion using both sitting (1 μl drops) and hanging drop methods (2 μl drops). Drops were prepared by mixing a 1:1 solution of YPL067C (25 mg ml−1) and reservoir solution (5.6–8.1% glycerol, 1.6–2.1 M ammonium sulfate and 0.1–0.2 M Tris). Crystals were cryoprotected by gradually supplementing the drop with glycerol up to 25% and were flash-frozen in liquid nitrogen.

X-ray crystallography

Data were collected at the Life Sciences Collaborative Access Team (LS-CAT) beamlines at the Argonne National Laboratory’s Advanced Photon Source at 100 K. The data were integrated and scaled using HKL2000. Phases and initial model building of the SeMet derivative were obtained using Phenix AutoSolve7,8. Native YPL067C was solved by molecular replacement with the initial SeMet structure. Iterative refinement and model building were performed using Phenix Refine12 and Coot11. Channel size was analysed using the 3V server26. Data collection and modelling statistics are shown in Table 1, and a section of the structure in its 2mFo-DFc map shown in Supplementary Fig. 10.

Fibrillar aggregation assays

Fibrillar aggregation was monitored by a thioflavine T (ThT) fluorescence assay. ThT is a benzothiazole dye that exhibits enhanced fluorescence specifically on binding to amyloid fibrils. For RCMaLA aggregation experiments, solutions containing 100 μM RCMaLA, YPL067C in varying concentrations and 20 μM ThT were prepared in 50 mM potassium phosphate buffer, pH 7.0, 100 mM KCl and 10 mM MgCl 2 (ref. 25). The ThT fluorescence assays with Aβ 1–40 peptide were performed with 2.5 μM Aβ 1–40 peptide, YPL067C in varying concentrations and 20 μM ThT in PBS, pH 7.4, 1% dimethylsulphoxide. The fibrillar aggregation of α-synuclein was tested in a solution of 70 μM α-synuclein, YPL067C in desired concentrations and 20 μM ThT in PBS, pH 7.4. For α-syuclein assays, four glass beads were added in each well to induce aggregation.

ThT fluorescence assays were performed with a final volume of 100 μl of the prepared solution in black 96-microwell plates (costar, UV Plate, 96 well) that were sealed to prevent evaporation. ThT fluorescence was measured in a Synergy HT Multi-Mode Microplate Reader (Biotek) at 37 °C, with constant medium shaking. Excitation and emission wavelengths were 440 and 490 nm, respectively. All samples were assayed in triplicate and the assay was repeated twice. Incubation of YPL067C with ThT alone produced no fluorescence increase.

Docking of α-synuclein and HTC1

HTC1 was docked against a 200-member NMR ensemble of α-synuclein27 using ZDOCK 3.0.2 (ref. 28). The top five scoring poses of HTC1 bound to each member of the ensemble were used to generate a contact frequency map of the HTC1:α-synuclein interaction. To determine the contact map, an interaction was assigned to a given residue pair if their Cα–Cα distance was less than or equal to , where λ=1.2 and are taken from the mean Cα–Cα distance for residue pairs that form intermolecular contacts in the PDB29. For each intermolecular residue pair, we reported the contact probability averaged over the extracted binding poses. To project the contact maps onto the structures of α-synuclein and HTC1 on the same scale, the contact frequency for each residue pair was averaged over all residues.

Analytical ultracentrifugation

Sedimentation velocity experiments of HTC1 (Supplementary Fig. 9) were performed using a Beckman ProteomeLab XL-I analytical ultracentrifuge (Beckman Coulter). YPL067C was first dialysed against 20 mM HEPES, pH 7.5, then diluted to a concentration of 20 or 200 μM using the dialysis buffer. Samples were loaded into cells containing standard sector shaped two-channel Epon-centerpieces with 1.2 cm path length (Beckman Coulter) and equilibrated to 22 °C in the centrifuge for at least 1 h before sedimentation. All samples were spun at 48,000 r.p.m. in a Beckman AN-50 Ti rotor (167431.7 g at the centre of the cell), and the sedimentation of the protein was monitored continuously using the interference optics. Data analysis was conducted with SEDFIT (version 14.1)30, using the continuous c(s) distribution model. The confidence level for the ME (maximum entropy) regularization was set to 0.7. Buffer density and viscosity were calculated using SEDNTERP (http://sednterp.unh.edu/).

Data availability

The final model of HTC1 is deposited in the PDB under the code 5KCI. Other models and raw data are available on request.