Data versioning

“Data versioning is like flossing. Everyone agrees it’s a good thing to do, but few do it.” ~ Chip Huyen, Designing Machine Learning Systems

Unlike code versioning, it is a lot more difficult to implement data versioning in data science / machine learning projects.

It is because of the following reasons:

➡️ Data is often larger than codes.

➡️ Varying definitions of what constitutes a difference between two data versions and how to resolve merge conflicts.

➡️ Regulations on data protection and privacy make keeping historical data difficult.

Do you floss… erm version your data often?

Leave a comment

Your email address will not be published. Required fields are marked *