• Fri. Jun 2nd, 2023

Grad student makes use of information science to discover biological diversity


May 26, 2023

Working with information science applied to plant and animal records at all-natural history museums, UO graduate student Jordan Rodriguez is locating new methods to study the evolution of crucial proteins.

As an undergraduate, Rodriguez embarked on a investigation project searching at the biases and limitations of biodiversity records from all-natural history collections and databases like iNaturalist. That perform led to a current publication in Nature Ecology and Evolution.

Now she’s a graduate student in biology professor Andrew Kern’s lab at the UO, utilizing machine mastering approaches to trace the evolution of protein diversity.

“I realized the statistical energy of operating with huge information, but my very first investigation expertise seriously set the stage for understanding the hidden pitfalls of information,” Rodriguez stated.

Obtaining millions of information points can be incredibly helpful, she stated, but only if you realize the data’s limitations.

Rodriguez’s path to computational investigation began in the Ruth O’Brien Herbarium at Texas A&ampM University-Corpus Christi, exactly where she helped digitize a collection of plant specimens. Alongside biologist Barnabus Daru, now a professor at Stanford University, Rodriguez started exploring the coverage gaps in various forms of all-natural history information.

“We have access to an abundance of information out there on what species are living exactly where,” Rodriguez stated, from legacy museum collections to field observations captured in on line databases. “But one thing we’d began to observe was that in regions ordinarily identified as biodiversity hotspots, like the Amazon rainforest, there seemed to be a mismatch in between what the information was telling us and what biology was telling us.”

Most all-natural history records fall into a single of two categories. Vouchered records are physical specimens, like these observed in museum and herbarium collections. Observational records are records of a sighting with out a physical specimen to back it up.

Thanks to the rise of smartphone apps like iNaturalist and eBird, there’s been an explosion of observational records in current years. With these tools, anybody — scientist or not — can snap a image of a plant, insect or bird and document the sighting in a public database.

Rodriguez and Daru looked at extra than a billion records and analyzed how the vouchered and observational datasets varied across various groups like plants, birds and butterflies.

The various collection approaches “lead to these fascinating variations in how separate information sets represent worldwide biodiversity,” Rodriguez stated.

Each vouchered and observational information had gaps in coverage, Rodriguez and Daru report in their paper. Each sorts of information sets have been extra probably to report species in quick-to-access regions: close to roadsides, close to airports, at decrease elevations.

And they have been each biased towards particular forms of species. Persons are extra probably to capture a image of a plant with a showy flower than the grass ideal subsequent to it, Rodriguez stated.

But the coverage gaps have been higher for observational records, probably mainly because vouchered records are usually collected extra deliberately by researchers on field collection trips. Vouchered records also had richer representation across time, with extra balance across years and seasons. Citizen scientists are extra probably to be snapping photos of serendipitous wildlife observations on a warm sunny day than in the winter, Rodriguez noted.

Regardless of these drawbacks, observational records nonetheless have a spot, she stated. They’re specifically helpful for animals and endangered plant species, exactly where it is advantageous to record a sighting with out killing something. And mainly because they are simpler to gather, scientists can access a substantially higher quantity of information points. Observational and vouchered records “are operating in concert,” Rodriguez stated.

Rodriguez hopes that her perform will encourage scientists to consider about the limitations of the information set they’re utilizing and account for feasible bias in their outcomes. Her not too long ago published investigation points to particular methods these biases show up in all-natural history information sets of several plant and animal groups. But the lessons carry into other information-focused fields.

Now at the UO, Rodriguez is shifting away from all-natural history investigation and rather focusing on population genetics, also utilizing a huge information strategy.

The undergraduate investigation project “gave me expertise with approaches and tools improvement in bioinformatics, operating with billions of information points and attempting to realize the statistics,” she stated. As a graduate student, “I knew I wanted to remain in a computationally focused lab.”

She’s not too long ago joined Kern’s lab, a computational biology investigation group that is component of the UO Information Science Initiative and the College of Arts and Sciences. There, she’s begun an exploratory project applying artificial intelligence to biological information, to disentangle the evolution of the complete set of proteins in humans, chimps, mice and rhesus monkeys.

Working with machine mastering tools equivalent to the technologies behind ChatGPT, she hopes to realize extra about the price at which proteins are evolving in these animals.

“So substantially possible lies at the intersection of machine mastering and evolutionary concerns,” Rodriguez stated.

Scientists have a wealth of genetic sequence information, and deep mastering models could be in a position to uncover new insights from it. Although such approaches take specific ability in handling and understanding information, she noted, “this is the future of evolutionary investigation.”

By Laurel Hamers, University Communications
—Top photo:
Jordan Rodriguez