Last updated January 29th, 2020.

Albums available in the Boston Public Library Vinyl LP Collection on Archive.org.

Imagine if your favorite song or nostalgic recording from childhood was lost forever. This could be the fate of hundreds of thousands of audio files stored on vinyl, except that the Internet Archive is now expanding its digitization project to include LPs.

Earlier this year, the Internet Archive began working with the Boston Public Library (BPL) to digitize more than 100,000 audio recordings from their sound collection. The recordings exist in a variety of historical formats, including wax cylinders, 78 rpms, and LPs. They span musical genres including classical, pop, rock, and jazz, and contain obscure recordings like this album of music for baton twirlers, and this record of radio’s all-time greatest bloopers.

Unfortunately, many of these audio files were never translated into digital formats and are therefore locked in their physical recording. In order to prevent them from disappearing forever when the vinyl is broken, warped, or lost, the Internet Archive is digitizing these at-risk recordings so that they will remain accessible for future listeners.



“The LP was our primary musical medium for over a generation. From Elvis, to the Beatles, to the Clash, the LP was witness to the birth of both Rock & Roll and Punk Rock. It was integral to our culture from the 1950s to the 1980s and is important for us to preserve for future generations.” – CR Saikley, Director of Special Projects, Internet Archive

Since all of the information on an LP is printed, the digitization process must begin by cataloging data. High-resolution scans are taken of the cover art, the disc itself and any inserts or accompanying materials. The record label, year recorded, track list and other metadata are supplemented and cross-checked against various external databases.



High resolution imaging of album cover art. The boxed area is shown at high resolution at right.



“We’re really trying to capture everything about this artifact, this piece of media. As an archivist, that’s what we want to represent, the fullness of this physical object.” – Derek Fukumori, Internet Archive Engineer

Once cataloged, the LP’s are then digitized. The Internet Archive partners with Innodata Knowledge Services, an organization focused on machine learning and digital data transformation, to complete the digitization process at their facilities in Cebu, Philippines. An Innodata worker digitizes 12 LPs at a time, setting turntables to play and record by hand, then turning each record over to the next side. Since each LP is digitized in real time, it takes a full 20 minutes to record an average LP side. By operating 12 turntables simultaneously, the team expects to be able to digitize ten LPs per hour.



Audio stations complete with turntables & recording equipment set up in Cebu, Philippines.

Once recorded, there is a large FLAC file for each side of the LP, which needs to be segmented so listeners can easily begin at the desired song. There are two different algorithms used for segmenting; the first one looks at images of the vinyl disc to locate gaps in its grooves, which usually line up with gaps between songs. A second algorithm listens to the audio file to find the silent spaces between songs. When these two algorithms align, our engineers have a good measure of confidence that the machine has found the proper tracks.

These algorithms currently predict segmenting with about 85% to 95% accuracy, but some audio files are more difficult. For example, recordings of live music fill in the spaces between songs with applause, while classical music utilizes silence as part of a song. In order to account for these anomalies, digitized LP files are always checked manually before being added to the online database.

Identifying the empty spaces between songs for segmenting.

Currently, there are more than 5,800 LPs from the Boston Public Library LP collection available on Archive.org. The Internet Archive continues to digitize the remainder of the BPL collection in addition to more than 285,000 LPs that have been donated by others. The organization aims to engage a greater community of LP and 78 rpm enthusiasts by welcoming contributions and improvements to the recorded metadata. Many of the audio files online can be listened to in full, but some of the albums are only available in 30 second snippets due to rights issues.



“The complexity of properly digitizing LPs has been an evolving challenge, but thanks to the help of friends of the Archive, our in-house expertise, and the dedication of Innodata, I’m confident we’ve nailed it.” – Merlijn Wajer, Internet Archive Developer

For decades, vinyl records were the dominant storage medium for every type of music and are ingrained in the memories and culture of several generations. Despite the challenges, the Internet Archive is determined to preserve these at-risk records so that they can be heard online by new audiences of scholars, researchers, and music lovers around the world.