The visit to South Africa by Netherlands Prime Minister Mark Rutte will include a pivotal South African-Dutch data science partnership between key institutions from both countries bringing us closer to understanding the volume of data generated by the Square Kilometre Array (SKA), which will be signed on Tuesday, 17 November 2015.

This signals the unlocking of the hidden secrets in the immense amount of data generated by SKA — the world’s biggest radio telescope. The agreement is part of the visit to South Africa by the Prime Minister of the Netherlands, Mr. Mark Rutte, and his trade delegation of 75 companies.

SKA South Africa and the University of Cape Town, through the newly established Inter-University Institute for Data Intensive Astronomy (IDIA), will sign a Memorandum of Understanding (MoU) with fellow research institutions in the Netherlands, IBM and ASTRON, to collaborate in a ground-breaking research project entitled Precursor Regional Science Data Centers for the SKA (SKA-RSDC).

The MoU will be celebrated as part of ‘House of the Future’ — a program of workshops, seminars, presentations and round tables with South African and Dutch stakeholders, taking place from 16 to 20 November 2015 in Turbine Hall, Johannesburg.

The South Africa-Dutch agreement on data science aims to establish national and regional data centers — to tackle one of the most significant challenges presented by the SKA: how to manage, process, and make accessible the immense amount of data the telescope will generate.

The data centers will provide astronomers around the world with access to the large-scale data infrastructures and associated high performance computing (HPC) needed to make sense of the data.

“We assume that there will be at least two astronomy-focused sites, one each in South Africa and Netherlands,” says Professor Russ Taylor, IDIA founding director and joint University of Cape Town/University of the Western Cape SKA Research Chair.

“The initial focus of the centers will be to service the current and future data archiving, distribution and science exploration needs of the MeerKAT and LOFAR radio telescopes in SA and the Netherlands, respectively. The activity, combining both operational and research components, is an important step on the path towards being able to efficiently extract major science value from the massive astronomical datasets which will be collected by the SKA,” says Dr. Jasper Horrell, General Manager: Science Computing and Innovation at SKA South Africa.

The techniques developed can, in turn, be applied in other fields such as big data analytics, high performance computing, green computing, and visualization analytics.

The Data Challenge for SKA

SKA (Square Kilometre Array) will be the world’s largest radio telescope — a hundred times bigger than any current radio telescope; it will revolutionize our understanding of the universe. SKA will be built in two phases — SKA1 and SKA2 — starting in 2018. SKA 1 will include two instruments — SKA1 MID (to be built in South Africa) and SKA1 LOW (to be built in Australia); they will observe the universe at different frequencies.

As astronomy has developed it has become increasingly clear that the old ways of working with data no longer apply. Precursors to SKA (telescopes on one of the two SKA sites) and Pathfinders (SKA-related technology, science and operations activities) have ushered in an era of data-intensive astronomy. One such Pathfinder — LOFAR (the Low Frequency Array telescope built by ASTRON, the Netherlands Institute for Radio Astronomy) has a data collection exceeding 20 petabytes. As a frame of reference, it would take about 2,000 years to play only one petabyte of average-length MP3-encoded songs.

With South Africa’s MeerKAT and the Netherlands’ Apertif (APERture Tile in Focus) telescopes both expected to come online in 2016, the scale of such data collection is poised to increase significantly. The large scale of the datasets and the requirements of the astronomers to perform complex scientific analyzes, which are often compute-intensive, demand innovative approaches. Data at these scales present unique challenges not just for managing the collection, but also for how researchers extract their science.

In all of these Precursor and Pathfinder facilities, the data gathering and initial processing is done onsite — close to the instruments themselves — under the control and development of the core telescope project teams. However, to make as much sense of, and derive as much value from, the data as possible, this first level of data must be made available to a broader scientific community; hence the need to develop innovative ways to access, manage and process the data. This is what the SA-Dutch regional science and data centers (RSDCs) hope to realize.

How RSDCs Could Assist, Including More Technical Detail

With previous traditional radio facilities, the majority of this subsequent analysis, including providing the required processing and storage resources, has been the responsibility of the individual researcher or science team. However, at the data scales of the SKA, this is unfeasible. One way to address this issue is to establish national or regional centers to provide users with access to the large-scale, High Performance Computers (HPC) infrastructure they will require to extract the full range of SKA science.

The main aims of these Regional Science and Data Centers (RSDCs) will be:
* to ensure that the data collected by the instruments is well-curated and made available in an easily accessible way to downstream science processing centers and institutes;
* to maintain long-term archives of science data products;
* to provide sufficient extra compute and storage for researchers to be able to reprocess data, perform customized analysis, and visualize results without having to necessarily move all the data to their local facilities;
* to provide expert support to users with their specific analyzes;
* to develop and maintain new tools and functionality to increase scientific exploitation of the data collections;
* to provide mechanisms for security and federation.

The SA-Dutch collaboration will consider a multi-tier model, similar to CERN, where the core SKA telescope facilities would produce the initial data streams, while the RSDCs would provide sufficient resources to store subsets of the SKA archive, support significant processing and post-processing capability, and further distribute data to users and smaller sites. An alternative could be a more dynamic multi-cache approach where data is distributed in a demand driven flexible way. Given their likely scale and range of function, it is natural that these RSDC facilities may hold data from more than a single instrument, or even multiple disciplines.

South Africa, Dutch Collaboration to Establish RSDCs

Where possible the work will build on the knowledge and holistic models already developed as part of the DOME Project — a Dutch-government funded project between ASTRON and IBM. Precursor instruments like MeerKAT and Pathfinder facilities like LOFAR will provide an opportunity to tackle specific, realistic problems.

“We propose to establish a federated system that links the MeerKAT, LOFAR, and Apertif science archives with distributed RSDC facilities. We assume that there will be at least two astronomy-focused sites, one each in South Africa and Netherlands,” says Professor Russ Taylor, IDIA founding director and joint UCT/UWC SKA Research Chair.

Lorenzo Raynard
SKA South Africa
+27 (0)71 454 0658

Professor Russ Taylor
Director: Inter-University Institute for Data Intensive Astronomy
SKA Research Chair: University of Cape Town & University of the Western Cape
+27 (0)60 803 9133

Professor Frikkie van Niekerk
Deputy Vice-Chancellor: Research, Innovation & Technology
North-West University
+27 (0)83 676 7236

The Inter-University Institute for Data-Intensive Astronomy (IDIA) was launched on 3 September 2015. This partnership between the University of Cape Town, the University of the Western Cape and the North-West University will develop crucial capacity for big data management and analysis, a spin-off of the SKA project. The R50-million, five-year IDIA partnership will integrate researchers in astronomy, computer science, statistics and eResearch technologies to create data science capacity for leadership in the MeerKAT SKA precursor projects, other precursor and pathfinder programs and SKA key science.

ASTRON, the Netherlands Institute for Radio Astronomy is a division of the Netherlands Organization for Scientific Research (NWO). ASTRON operates two well-known observatories in the Netherlands — the Westerbork Synthesis Radio Telescope and LOFAR, the Low Frequency Array.

The SKA project is an international effort to build the world’s largest radio telescope with a square kilometer (one million square meters of collecting area). The first phase construction of SKA is being built in South Africa and in Australia. In South Africa, the SKA site is located in the Karoo near Carnarvon in the Northern Cape Province. SKA South Africa is building the MeerKAT, 64-antenna array radio telescope, which serves as a pathfinder instrument to SKA and will be integrated into SKA Phase 1.

The DOME project investigates approaches in exascale computing (which refers to computing systems capable of at least a billion billion calculations per second) at the ASTRON & IBM Center for Exascale Technology in Drenthe, the Netherlands. The research is targeted at the specifications of SKA. SKA South Africa joined the DOME project in December 2012.

IBM-NL and ASTRON have been working together since 2012 in a 5-year collaboration totaling 32.9 million EURO to research exascale computer systems that will be needed by SKA.

MoU signatories: Mike Garrett, representing NWO and ASTRON; Alexander Brink, representing IBM; Jasper Horrell, representing SKA-SA/NRF; and Russ Taylor, representing IDIA/UCT.

Issued by the Square Kilometre Array South Africa (SKA SA) and the University of Cape Town, through the Inter-University Institute for Data Intensive Astronomy (IDIA).