Skip to content
/ val Public

Tidy validation and unit tesing package for data transformation

License

Unknown, MIT licenses found

Licenses found

Unknown
LICENSE
MIT
LICENSE.md
Notifications You must be signed in to change notification settings

NetZissou/val

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

val

The goal of val is to create a unit testing culture for data cleaning tasks using R. Coming from a statistics background, the idea of defensive programming is fascinating to me. A comprehensive set of unit tests can demonstrate the developer’s ability to understand the demand from the end user’s perspective before even starting any actual implementation. The analytics community hasn’t fully adopted this software engineering practice for various reasons.

When Phelps was a teenager, Bowman would tell him to watch the “videotape” before he went to sleep and when he woke up. The “videotape” wasn’t real but a mental visualization of the perfect race. Phelps would imagine diving off the blocks and, in slow motion, swimming flawlessly. He would visualize the strokes, the pool, and the finish. He would lie in bed with his eyes shut and watch the entire completion down to the smallest details.

In most analytics projects, no matter how many sources the team has to pull or how many steps are involved in performing the transformation, the end goal is to have “cleaned” and “ready-to-go” data that everyone should use. However, most of the time, team members won’t spend the time to visualize the detailed schema of their cleaned table mentally. At best, they would make sure that the column names are aligned. But what about attributes such as data types, allowable values, parsing formats, uniqueness, mandatory, etc.? Currently, no generic built-in object in R allows developers to specify the schema. Meanwhile, no workflow helps data engineers to validate their data in a tidy manner.

At first, I thought about writing schema in JSON and YAML and parsing them into the R environment. However, when Joe Cheng shared his stories of the early ideas of developing the Shiny package in Rconf 2022, the Shiny team expected R users to easily create an interactive web application in R without needing to know HTML, JavaScript, and CSS. Therefore, I hope val could enable R users to specify table schema without knowing JSON, YAML, SQL, and designing a database. Users would create schema objects from val and work towards the detailed schema until their data has passed the validation.

Installation

You can install the released version of val from CRAN with:

install.packages("val")

About

Tidy validation and unit tesing package for data transformation

Topics

Resources

License

Unknown, MIT licenses found

Licenses found

Unknown
LICENSE
MIT
LICENSE.md

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages