HPC, Clouds & Big Data Converge at ISC Cloud 2012 - Part One
Last week EGI had its Fall Technical Forum in Prague, colocated with GlobusEUROPE. The honor of delivering the opening keynote went to Bob Jones of CERN, who started off with an overview of CERN's big data requirements and the progress of the EU Science Cloud, Helix Nebula.
During the EGI Fall Technical Forum in Prague, representatives from Helix Nebula partner institutions CERN and EMBL delivered presentations.
Bob Jones, CERN, offered an overview of CERN's big data requirements and the progress of the EU Science Cloud, Helix Nebula. The project is shaping up nicely as the current two-year pilot phase continues. CERN is one of the three flagship users in addition to the European Molecular Biology Laboratory (EMBL) and the European Space Agency (ESA). The three demand-side partners were picked expressly because of the scope of their research and computing requirements. If successful, the next set of users will have the assurance of a solution that has been vetted on some of the biggest problems in terms of data size. The machine at CERN is capable of generating 1 petabyte of data per second, although only 1 percent of that goes through to the next stage. In 2011, about 22 PB of data were written, and in 2012, the figure is expected to jump to 30 PB.
Jones noted that science will continue to push against processing boundaries with upper limits defined by economic realities and budgets more than anything else. Starting in 2013, the accelerator will shut down for 13 months to be upgraded, and afterwards will generate even more data. The processing demand is "basically limitless," according to Jones.
The second speaker was EMBL's Rupert Lueck. He discussed the needs of system biology scientists and the study of DNA and life on earth. While next-gen technologies have led to genetic sequencing as a more affordable solution, the process of reading and assembly require a lot of computing infrastructure and expertise, Lueck noted. To give an idea of the scope of next-gen sequencing at the big picture, there are 8.7 billion estimated species in the world. The worldwide existing sequencing capacities can easily generate exabytes of new data each year. EMBL's flagship project for Helix Nebula will implement a novel cloud service to simplify large-scale genome analysis. Tailor-made on-demand HPC and bioinformatics resources will help scientists, inside and outside EMBL, to better meet the big data challenge.
While the scheduled ESA representative was not able to make it to the event, both Jones and Lueck provided a sense of the challenges and difficulties involved in getting multiple commercial service providers to work with each other; however, from speaking with many of the project participants, one message stands out, which is the strong willingness among all of the participants to work together to find solutions. Consensus-building is key: enabling communication in the form of regular meetings and feedback loops is essential to a project of this scope.
HELIX NEBULA PARTNERSHIP
Helix Nebula is a new, pioneering partnership between leading IT providers and four of Europe’s biggest research centres, CERN, EMBL, ESA and PIC, charting a course towards sustainable cloud services for the research communities - the Science Cloud.