RSVP and join our 13,063 members on the New York Open Statistical Programming Meetup


May 19: Data Versioning for Reproducibility with R and Dolt

Over two years ago, Noam Ross was our [first virtual speaker](https://www.meetup.com/nyhackr/events/269506434/?utm_source=nyhackr). He now returns to speak with us about data versioning.

Thank you to [EcoHealth Alliance](https://www.ecohealthalliance.org/?utm_source=nyhackr) for providing the Zoom link.

Conversations during the meetup are encouraged in the monthly-meetup-chat channel in the [nyhackr slack](https://join.slack.com/t/nyhackr/shared_invite/zt-z14v6jw3-MIKDmEVqx~SGvn7OTRyPpg?utm_source=nyhackr).

After Noam's talk we will be giving away a free ticket (either virtual or in-person) to an attendee at random for the eighth annual [New York R Conference](https://rstats.ai/nyr?utm_source=nyhackr) taking place June 8-10. Members of the meetup can use code [nyhackr](https://rstats.ai/tickets/?utm_source=nyhackr) for 20% off tickets.

About the Talk:
The data we analyze in our work is often a moving target, be it "live" or subject to corrections, revisions, or wholesale changes to data structure and format. Keeping track of these changes as we develop code for models, analyses, and applications can be an enormous challenge. I’ll talk about some of the attempts at tackling this problem in R and with R-adjacent tools, and detail my team’s latest approach using Dolt, a relational database with git-like versioning capabilities. I’ll demonstrate our beta package {doltr}, which combines parts of {DBI}, {git2r}, and {gh} to manage data versioning and collaboration, and how we’re building it into data collection and model-building workflows.

About Noam:
Noam Ross is Principal Scientist for Computational Science at EcoHealth Alliance, a non-profit in NYC that researches the connections between human and wildlife health. Noam builds models to understand and predict disease circulation in wildlife, spillover into people, and spread through travel and trade. Noam also leads software peer review at rOpenSci, a developer collective that builds R packages and catalyzes communities to enable open and reproducible science. He has a Ph.D. in ecology from the University of California-Davis. Follow him on twitter at [@noamross](https://twitter.com/noamross?utm_source=nyhackr).

The talk will begin at 7 PM America/New_York and we will start admitting people to the event shortly before. Since this is completely remote there will be no pizza but everyone is encouraged to have [pizza](https://nyhackr.org/pizza.html) individually.

Jobs