May 19: Data Versioning for Reproducibility with R and Dolt

Over two years ago, Noam Ross was our [first virtual speaker]( He now returns to speak with us about data versioning.

Thank you to [EcoHealth Alliance]( for providing the Zoom link.

After Noam's talk we will be giving away a free ticket (either virtual or in-person) to an attendee at random for the eighth annual New York R Conference taking place June 8-10.

About the Talk:
The data we analyze in our work is often a moving target, be it "live" or subject to corrections, revisions, or wholesale changes to data structure and format. Keeping track of these changes as we develop code for models, analyses, and applications can be an enormous challenge. I’ll talk about some of the attempts at tackling this problem in R and with R-adjacent tools, and detail my team’s latest approach using Dolt, a relational database with git-like versioning capabilities. I’ll demonstrate our beta package {doltr}, which combines parts of {DBI}, {git2r}, and {gh} to manage data versioning and collaboration, and how we’re building it into data collection and model-building workflows.

About Noam:
Noam Ross is Principal Scientist for Computational Science at EcoHealth Alliance, a non-profit in NYC that researches the connections between human and wildlife health. Noam builds models to understand and predict disease circulation in wildlife, spillover into people, and spread through travel and trade. Noam also leads software peer review at rOpenSci, a developer collective that builds R packages and catalyzes communities to enable open and reproducible science. He has a Ph.D. in ecology from the University of California-Davis. Follow him on twitter at [@noamross](

The talk will begin at 7 PM America/New_York.