The living data project

Dr. Diane Srivastava is the Director of the Canadian Institute of Ecology and Evolution (CIEE) and a Professor in the Department of Zoology at UBC. The Living Data Project is an initiative of the CIEE aimed at preserving legacy datasets through collaborations between early-career scientists and holders of legacy data. In one project, UBC PhD Candidate, Jenny Munoz worked with population data on birds in Burrard Inlet, near Vancouver. This data, now accessible and formatted for long-term use, can help monitor how bird numbers are changing in this habitat.

Diane Srivastava

Data is lost at an extraordinary rate. Some estimates suggest around 17% per year for data in ecology and evolution. There's all sorts of reasons why data gets lost, the only way to really archive data is to have it in a special data repository.

Hi, my name is Diane Srivastava. I'm a professor in the Department of Zoology and at the Biodiversity Research Centre, at UBC. And since 2016, I've also been director of the CIEE, the Canadian Institute of Ecology and Evolution. \nSo the CIEE is Canada's national synthesis centre, which means that we fund syntheses of existing data studies and ideas.

So the living Data Project is an initiative of the CIEE. It's a national program, which seeks to archive legacy data sets in ecology and evolution, preserve them for the future, and to breathe new life into them through analysis and synthesis. It's a training program as well, geared at Canada's new generation of ecology and evolution scientists.

\nThere's been a revolution in open science. Science now has the tools to be much more reproducible. And we have the tools to archive well into the future important data. So this means for ecology and evolution, that we have two gaps that we need to fill. The first gap is a training gap. So we have these new methods. And we need to make sure that a graduate students are trained in these best practices.

The second gap is actually not looking forward, but looking backwards. So there's lots of ways that scientists can ensure that their data is properly archived. But this is just scientists who have the data skills, the data science tools to do this. There is also data that was collected decades ago, which is vitally important for Canada, data, which will set baselines for global change data, which establishes long term patterns of population or ecosystem change. And that data is being lost, it's being lost, because the researchers may no longer be with us, who collected the data. Or they may have grown up in a generation, which simply did not have the modern data science skills to enable this.

So what the Living Data Project does is to bring together these different generations of scientists, so the graduate students that we are training right now on these critical skills, and then the researchers, often senior researchers, who are holding these legacy data sets. And through this partnership, we make sure that we save the data.

Jenny Munoz

I was an intern for the Living Data Project and I worked along with researchers from Environment and Climate Change Canada, the Canadian Wildlife Service, and BCIT, to consolidate a dataset that they've been collecting for 25 years. The data that they are collecting in this project is of conservation importance, because a lot of the species that are occurring in that inlet are decreasing their populations, at least the local winter populations are decreasing every year. And they're trying to understand why. And so this data is going to inform that for the future.

When it first got the data, I had at least 40 different Excel spreadsheets, and 25 different years of data means 25 different ways of collecting the data. So it has many, I will say, dozens of different variables I needed to get to a single format. So it was inconsistent. We have more than 70 different types of variables and we reduced those to seven.

So I needed to work along with the researchers at Environment and Climate Change Canada to reduce this to variables that were important for actually the long term project. I think that is the time for people to start understanding that the data we collect now is going to be used in the future not even by us, but with someone in 50 years, 100 years. And you know, if we collect the data in a way that is consistent, that is clear, it will be way more useful in the future.


Check out more researchers