New Open-Source System Developed to Manage and Share Complex Datasets

Visualization Sharing Data — Researchers have created a new open-source data-management system for scientists, with the hope that the system might make collaboration easier.

Simplifying How Scientists Share Data

Data is often at the heart of science – researchers track velocities, measure light coming from stars, analyze heart rates and cholesterol levels and scan the human brain for electrical impulses.

But often, sharing that data with other scientists – or with peer-reviewed journal editors, or funders – is difficult. The software might be proprietary, and prohibitively expensive to purchase. It might take years of training for a person to be able to manage and understand the software. Or the company that created the software might have gone out of business.

A research team has developed an open-source data-management system that the scientists hope will solve all of those problems. The researchers outlined their system on January 2, 2020, in the journal PLOS ONE.

“We wanted to create a file format and a dataset model that would encapsulate the majority of datasets we work on, on all the instruments in a lab,” said Philip Grandinetti, professor of chemistry at The Ohio State University and senior author of the paper. “There’s this long-standing problem, pervasive among scientists, that you buy a multimillion-dollar instrument and the companies that make that instrument have their own proprietary format, and it’s a nightmare to share with anyone else.”Large datasets are tricky to share, in part because software is often proprietary, but also in part because the files are often so large that they are hard to share in an email or through a cloud-based server. And even if the files can be exported as a file type that can be shared, important metadata – the things that explain what the dataset actually is – are often lost.

Their system, which Grandinetti and colleagues named the “Core Scientific Data Model,” is designed to share complex datasets easily, without massive files that take up a lot of bandwidth and hard drive space, and without losing metadata. Consider a dataset that includes air temperature, air pressure, wind velocity and solar flux – this system can handle it. Or consider the measurements and color of a light coming from a star in a distant galaxy – this system can handle it.

“You need a dataset that is incredibly flexible in its ability to hold all those things in one file format without losing information,” Grandinetti said. “So the idea is we created a model that we thought was flexible enough to do that.”

The Ohio State University team, in collaboration with Professor Thomas Vosegaard at the University of Aarhus in Denmark, and Dr. Dominique Massiot at the University of Orléans in France, built software that can run on a Mac or PC. They uploaded it to the web and made the code open-source (meaning anyone can look at it, use it, and download it for free.) The publication in PLOS ONE is intentional: The journal is also available to anyone, free of charge.

And, the researchers hope, the system could be a simple, free way to combine multiple types of data into one place.

“We study multiple datasets as scientists – and as a scientist myself, I’d like to be able to get the data from all those files and put them together in a way that I can work with,” said Deepansh Srivastava, a postdoctoral researcher in Grandinetti’s group.

“Instead of looking for data and plucking it from datasets, if we could simply export it as this one file type – as a core scientific data file type – we’d be able to work in a common system.”

Reference: “Core Scientific Dataset Model: A lightweight and portable model and file format for multi-dimensional scientific data” by Deepansh J. Srivastava, Thomas Vosegaard, Dominique Massiot and Philip J. Grandinetti, 2 January 2020, PLOS ONE.
DOI: 10.1371/journal.pone.0225953

Never miss a breakthrough: Join the SciTechDaily newsletter.
Follow us on Google and Google News.

2 Comments

midathala balasundar on January 5, 2020 4:03 am
Is it possible to convert the dataset into a set of graphs and mathematical equations .
WillWilliam009 on February 13, 2023 3:43 am
Thank you for this information. I like it. Right now I’m just looking for good developers to make software for my business. Now, this is very important. Perhaps you have something to advise me?

New Open-Source System Developed to Manage and Share Complex Datasets

Predicting Chaos With AI: The New Frontier in Autonomous Control

Scientists Reveal Why Using ChatGPT To Message Your Friends Isn’t a Good Idea

Scientists Use Machine Learning To Peer Into the Future

Algorithm Enables Robots to Learn and Adapt to Help Complete Tasks

Educational Touch-Screen Games Prove Effective

New Approach Uses Mathematics to Improve Automated Security Monitoring

Mathematical Framework Formalizes Loop Perforation Technique

“Inexact” Computer Chip, 15 Times More Efficient than Today’s Technology

Calculating the Total Capacity of a Data Network

2 Comments

This Experimental Drug Repaired the Gut and Reversed Severe Fatty Liver Disease

Physicists Just Turned a Black Hole Energy Theory Into Reality

Stanford Scientists Solve a 252-Million-Year-Old Mass Extinction Mystery

The Ancient Survival Mechanism Making Weight Loss So Difficult

Just 3,000 Steps a Day May Help Protect the Brain From Alzheimer’s

“Weird Clams” Reveal a New Invasion Along the U.S. Northeast Coast

Archaeologists Decapitated Tutankhamun and Glued His Body Back Together

New Black Hole Theory Solves a 50-Year-Old Problem

New Open-Source System Developed to Manage and Share Complex Datasets

Simplifying How Scientists Share Data

Related Articles

2 Comments