U.S. Geological Survey releases huge amounts of Mars data in ready-to-use formats. This will allow scientists and the public to view Mars at high resolution with unprecedented ease. There is a huge difference between looking at a photo of the Grand Canyon and seeing it in person. If you want to look at another planet’s landscape, seeing it in person is not an option.
That’s why a team at the U.S. Geological Survey used supercomputers and cloud computing to process and release a treasure trove of ready-to-use Mars data: more than 4,800 digital terrain models, known as DTMs, and more than 155,000 ultra-high-resolution images of the surface of the planet. Using these data products, you can now easily experience the Mars landscape in high res and 3D. It’s the next best thing to seeing Mars in person.
“Now anyone on the planet with a smart phone can search, use, and marvel at these data,” said Jay Laura, lead of the team at USGS’s Astrogeology Science Center that processed the data.
The topographic data come from the Context Camera on the Mars Reconnaissance Orbiter. The Context Camera provides images with a resolution of ~6 meters per pixel covering swathes 30 km (18.6 mi) wide and up to 160 km (100 mi) long. To produce a DTM, two overlapping images of the same area go through sophisticated computer processing to create a 3D view of the overlapping area, just as our brains process information from both eyes for depth perception. The NASA Ames Stereo Pipeline can be used to process individual pairs but aligning those pairs to each other and to global topography so that they can be seamlessly combined is no small feat.
To do this, the USGS Astrogeology team first roughly aligned the individual DTMs to global low-resolution topography, then fed all of those approximately aligned DTMs back into the computer, working in batches of several hundred DTMs at a time. This took an enormous amount of computer processing, so Astrogeology used the USGS Denali super computer housed at the Eros Data Center in Sioux Falls, SD to process the data over the course of a few weeks. On a personal computer, this would have taken between two and 35 years of continuous processing!
“These data are important because they democratize the availability of high-quality Mars topographic data,” Laura said. “Getting consistent, well aligned results is not easy. We felt it was important to generate and release these products so that others could freely access the data. When these data are highly accessible, anyone can contribute to scientific discovery.”
The 4,800 DTMs released so far are just the tip of the iceberg. They represent pairs that were collected alongside higher resolution image pairs from the High Resolution Imaging Science Experiment (HiRISE) camera. There are thousands of other potential CTX image pairs that Astrogeology is working to process, which will eventually result in even greater topographic coverage at 20 meters per pixel.
Big Data Require Big Processing
USGS Astrogeology is also pioneering a new way to access and work with images from HiRISE. Each individual HiRISE image covers only a small patch of the planet’s surface – strips just 6 km (3.7 mi) wide by up to 60 km (37 mi) long – but with a resolution of 25 cm (9.8 in) per pixel, the images can be upward of 1 GB apiece and are bursting with detail.
The huge data size of HiRISE images is a blessing and a curse: to work with multiple images at once, scientists previously needed to download full images in their area of study one at a time, process them so that they can be viewed at different zoom levels, and add them to their map. The level of detail in each image also means that the entire data set is rich with information that no scientist has had the time to study.
USGS Astrogeology has led an effort to process over 155,000 HiRISE images and convert them to a stream-able form that can be easily accessed free of charge, making it possible to perform analysis in the cloud, access some of a data product without downloading the whole thing, view the images on a cell phone or tablet, or search and download them in batches in ready-to-use form.
Massive, fully processed data sets like this have been identified by the NASA Planetary Data Ecosystem Independent Review Board as necessary to support Artificial Intelligence and Machine Learning efforts in planetary science. “These data are ripe for discovery and use by machines and humans. This data release means that the HiRISE data set can now be seamlessly leveraged by machine learning scientists,” Laura said.
Releasing the entire HiRISE data catalog was the culmination of significant effort from a multi-disciplinary team of planetary scientists and cloud computing experts from across the USGS. The team worked to test and refine the processing pipeline to ensure that it ran successfully and produced reliable, scientifically useful images and associated metadata. This feat of modern computing served as a demonstration of the cutting-edge HTCondor High Throughput Computing environment paired with the USGS Cloud Hosting Solutions Amazon Web Services environment. Data were streamed from the NASA Planetary Data System cloud holdings, with more than 4000 images processed simultaneously. It took just under four hours of computing time to process the entire data set of >155,000 images, roughly 114 TB of data.
“With data releases like this, USGS is taking the lead to develop and release analysis-ready planetary science data,” Laura said. “These data are hosted by Amazon in their Open Data Registry for anyone to use for free. They are processed to the USGS’s highest standard and then converted into a cloud enabled format that will stream and just work in analysis platforms, like a GIS. We are also releasing search tools so that users can easily find and download these data and documentation to make these data more approachable and usable by everyone.”