A storage cloud with 10 Gigabit Ethernet speed and scalability to hundreds of petabytes has been launched to provide virtually unlimited storage capacity to supercomputing customers.

Built by the San Diego Supercomputer Center at UC San Diego, the SDSC Cloud has 5.5PB to begin with, but “is scalable by orders of magnitude to hundreds of petabytes, with aggregate performance and capacity both scaling almost linearly with growth,” the SDSC says.

The supercomputing center believes this is the largest academic-based cloud storage system in the United States, and said it is designed for researchers, students, academics and industry users who need secure and cost-effective storage for data sets of any size. Each object stored will have a unique URL for sharing.

“SDSC’s new Web-based system is 100% disk-based and interconnected by high-speed 10 Gigabit Ethernet switching technology, providing extremely fast read and write performance,” the center said in its launch announcement last week. “The SDSC Cloud has sustained read rates of 8 to 10 gigabytes (GB) per second that will continually improve as more nodes and storage are added. That’s akin to reading all the contents of a 250GB laptop drive in less than 30 seconds.”

Supercomputing in the cloud

SDSC’s project is another example of cloud computing expanding the accessibility of high-performance computing (HPC) functionality once reserved for an exclusive set of institutions. Instead of being forced to build out huge clusters inside your own data centers, customers can outsource supercomputing needs to cloud vendors. Amazon offers special cluster compute instances for just such a purpose, and even built a supercomputer on the Elastic Compute Cloud that ranked among the Top 500 supercomputing sites in the world. Another project recently featured by Ars used the Amazon compute cloud to build a 30,000-core cluster for a pharmaceutical company that ran for about seven hours at a peak cost of $1,279 per hour.

SDSC’s storage cloud is accepting new customers and already has users including several UCSD departments and “federally funded research projects from the National Science Foundation, National Institutes for Health, and Centers for Medicare and Medicaid Services,” the SDSC says. Storage costs begin at $3.25 per month for 100GB. “Condo” pricing is available for more cost-effective long-term use. A HIPAA- and FISMA-compliant option launches Oct. 1.

The SDSC Cloud is based on the OpenStack cloud infrastructure software, with the ability to use Rackspace and Amazon Simple Storage Service APIs, allowing applications built on those platforms access to stored data. The cloud uses “Arista Networks 7508 switches, providing 768 total 10 gigabit (Gb) Ethernet ports for more than 10Tbit/s of non-blocking, IP-based connectivity,” and will soon feature both AES 256-bit encryption and off-site replication to partner UC Berkeley. High-bandwidth wide area connectivity is provided by direct connections to the CENIC, ESNet and XSEDE networks.

The cloud can be accessed through either a Web-based interface, a GUI application called Cyberduck, or a Python command line script called Swift, the command line being the best option for uploading files greater than 5GB.

Naturally, this isn’t the only supercomputing service offered by SDSC. A 10,000-core supercomputer called Trestles launched earlier this year as a tool for researchers in fields such as astrophysics and molecular dynamics, ranking #151 in the list of the world’s fastest supercomputers with a speed of 67 TeraFLOPS. The SDSC storage cloud will work in tandem with other systems like SDSC’s Data Oasis, a parallel file system capable of moving a terabyte of data in less than 20 seconds.

But perhaps just as important as speed is the infrastructure that makes scientific data easily sharable.

“The SDSC Cloud marks a paradigm shift in how we think about long-term storage,” SDSC deputy director Richard Moore says in the cloud announcement. “We are shifting from the ‘write once and read never’ model of archival data, to one that says, ‘if you think your data is important, then it should be readily accessible and shared with the broader community.’”