
Digitization
Find out how a natural history collection is digitized with Mark Pitblado, Curator of Informatics at the Beaty Biodiversity Museum at the University of British Columbia in Vancouver. Lead Curatorial Assistant, Chris Stinson also shows us the 3D scanner and tells us about working with Specify, an open-source database system. The digitization of the collection is supported by the Museums Assistance Program - Digital Access to Heritage, from the Government of Canada.
Video Transcript
Hi. My name is Mark Pitblado. I'm the curator of Biodiversity Informatics at the Beaty Biodiversity Museum. So informatics is a quite broad term and my job does encompass a lot here at Beaty. I handle the data at a museum level, so all of the collections need to share things with the world. I help to facilitate that and I also manage some of the hardware that we do here in terms of servers and data storage. So a database can come in a couple of different forms. Most people would know a database as an Excel spreadsheet where you have a series of rows and columns, but it can be more complicated than that. So the databases that we use are relational in which things relate to each other. So, for example, the curators here have a relationship to the objects within their collection, and we want to capture that within our data systems.
So the Museums Assistance Program (MAP) is what is facilitating all of the transformative work that we are doing here in the museum. Its purpose is to help us share the treasures that we have in the building across Canada so that everyone can access, whether they're in Vancouver, British Columbia, or even in Halifax, Nova Scotia. Museums Assistance Program (MAP) grant that we got has allowed us to increase the computing power to do 3D scanning, continue with the 2D imaging program that we've been doing for about the last ten years, but at a vastly increased rate. We have more computer stations thanks to the MAP program, more lights, more technical equipment, both for the imaging, both 3D, 2D and just for entering the data of the specimens that we've had a backlog of. So this program has allowed us to tackle multiple different areas at once.
First category that I mentioned is data preservation and data storage. So we want to make sure that all of the efforts that the curators have put in here and all the expertise that they bring is safe and stored for decades to come. One of the purposes of a museum is to function as a place of history where people can learn about all of the time that came before them. Second, it allows us to better share what we have within the museum, with the wider community. So biodiversity is a global effort and there are places just like Beaty all across the world and for us to share data with each other means that we can learn from different countries, different communities about the knowledge that they have there and bring it into one cohesive community. And lastly, we can modernize the way that we share data on an individual level. So when you go to our website or interact with our systems, we want that to be as frictionless as possible so that you can focus on biodiversity, curiosity and learning about what we have here and not have to worry about filling out complicated forms of tables.
So when I arrived, the museum had been on the same system for I think about a decade and we were definitely looking to transition into a system that was more aligned with some of the capabilities that we have in 2023. So as part of that, we decided to migrate to Specify. So Specify is an open source platform. And this allows us to ensure longevity by making sure that we never have a lock in with a particular vendor and that it's something that we can also contribute to Open Source means that we can suggest features, look at the code and build it with the global community. Specify also allows us to move into the cloud and this facilitates much better integration with other services around the world and allows access to people from across the country. Data is not just text records. There's a whole bunch of different formats that we have to handle. And each of those has unique things that we need to consider.
So one of the things that we first addressed as part of this MAPs project is that we wanted to have the highest quality images and videos available, including perhaps 3D scans, and have this available to everyone across the country. Now, that presents a lot of difficulties because images of that quality are also very, very large. And when you move into a cloud based system where people can access it over the web, we need to set up the groundwork to facilitate that. So as part of that, we purchased a whole bunch of storage capability for the museum so that we can ensure that all of these high-quality images that are coming in, we can back up safely and then also serve through the Web to people that are interested in looking at the collection. As closely as possible, we want the experience of viewing something on the Web to be what it would be like if you actually came to the museum and had the specimen up close.
In previous systems, the data journey was roughly the same, but it had a lot of manual check points that needed to be taken care of by staff. And the curators here are incredibly busy and they have incredibly specific knowledge for what they're dealing with. And so having to constantly check a million systems to make sure that a manual update has been pushed is incredibly taxing and not the best use of their time. When we think about the data process that goes into taking a physical specimen and making it available within a digital system; started off in the early days of just recording everything on paper. The process is entirely inefficient. It leads to a lot of errors and it's not easily shareable because you have to hand that one piece of paper off to somebody else who's interested in seeing that record. Photocopying came along. You could scan things, you can also email things, but we want to do that differently now.
The one thing that we wanted to tackle right away was all the different people that are entering data into the system. Whenever you're doing things manually, it can be prone to errors. Even the most well-intentioned people are going to make a mistake; we’re all human. And it should be the system's job to help you make sure that what you're entering is valid and not making spelling mistakes from the get go. And this helps ensure that the data that we are entering is the best quality data that we can possibly get, because it's going to be existing in the system for quite a while and then also shared with the world. So the new process for data, data entry for the curators is that they can log in through a web portal and enter data directly as if they were interacting with any other website through a form. They will be picklist values to help them choose things that have already been entered into the system. So it saves a lot of time when they're doing this work because it is one of their main responsibilities is to add things to the collection.
Even if we're able to save 10 seconds per record. Even if we're able to save 10 seconds per record. If you think about how many records are stored in the museum and how many entries are made per year, this is significant time saving for people and work that is probably not the most exciting; to be typing things into forms. So I'm glad that we can help in that respect. And on top of the curatorial staff that we have here, we also have a number of volunteers, a number of students and a number of staff that are all also interested in contributing to this system. And so by transitioning to the cloud, we can also give user accounts to those individuals to enter things into the system, and that speeds it up by instead of having one person enter data in, you can now have an entire team helping out the curators to enter data into their collections.
Specify just has everything built in already, and the expertise behind it means that we don't have to do that in-house anymore. It's all built in to Specify; taxonomy, georeferencing, all the locations, names of people, places. It's all there and we can spend our time looking after the collections rather than building a loan form or a label tag, or figuring out what the most up to date taxonomy is, because it's already built in. I see the future of museum data to pretty closely embody what actual biodiversity ecosystems look like in which you have multiple different components interacting with each other. Things are happening in real time, lightning fast. Information is constantly being shifted and flowing between different organisms. And there's these relationships happening all of the time.
I think that for a long time museums have been viewed as just these static fossils of buildings that never change and they're never going to be updated to come with the modern era. And I don't think that that’s true at all. I don't think that has to be true. I think that the museum has a unique capability to be not only a store of historical knowledge, but also this pulse of knowledge that's being put out into the world. And similar to an ecosystem, it connects to other institutions, other people, and it's all one fluid system that for the most part happens behind the scenes. You don't need to think about, Oh, I need to push this button to send this data to somebody else. It all happens in one interconnected way. And as members of the public, you can tap into that and it's available to all because the knowledge that's contained here is funded by the public. And I think that it should be available to the public to inspire that curiosity in what we do here.