Skip to content

Commit

Permalink
updated doco
Browse files Browse the repository at this point in the history
  • Loading branch information
alimanfoo committed Jul 21, 2011
1 parent 227ea3e commit ff4e779
Show file tree
Hide file tree
Showing 3 changed files with 81 additions and 9 deletions.
84 changes: 78 additions & 6 deletions README.txt
Original file line number Diff line number Diff line change
@@ -1,8 +1,80 @@
=============
CSV Validator
=============
============
csvvalidator
============

CSV Validator is a small library to support validation of tabular data contained
in delimited file formats.
This module provides some simple utilities for validating data contained in CSV
files, or other similar data sources.

Note that the `csvvalidator` module is intended to be used in combination with
the standard Python `csv` module. The `csvvalidator` module **will not**
validate the *syntax* of a CSV file. Rather, the `csvvalidator` module can be
used to validate any source of row-oriented data, such as is provided by a
`csv.reader` object.

I.e., if you want to validate data from a CSV file, you have to first construct
a CSV reader using the standard Python `csv` module, specifying the appropriate
dialect, and then pass the CSV reader as the source of data to either the
`CSVValidator.validate` or the `CSVValidator.ivalidate` method.

The `CSVValidator` class is the foundation for all validator objects that are
capable of validating CSV data.

You can use the CSVValidator class to dynamically construct a validator, e.g.::

import sys
import csv
from csvvalidator import *

field_names = (
'study_id',
'patient_id',
'gender',
'age_years',
'age_months',
'date_inclusion'
)

validator = CSVValidator(field_names)

# basic header and record length checks
validator.add_header_check('EX1', 'bad header')
validator.add_record_length_check('EX2', 'unexpected record length')

# some simple value checks
validator.add_value_check('study_id', int,
'EX3', 'study id must be an integer')
validator.add_value_check('patient_id', int,
'EX4', 'patient id must be an integer')
validator.add_value_check('gender', enumeration('M', 'F'),
'EX5', 'invalid gender')
validator.add_value_check('age_years', number_range_inclusive(0, 120, int),
'EX6', 'invalid age in years')
validator.add_value_check('date_inclusion', datetime_string('%Y-%m-%d'),
'EX7', 'invalid date')

# a more complicated record check
def check_age_variables(r):
age_years = int(r['age_years'])
age_months = int(r['age_months'])
valid = (age_months >= age_years * 12 and
age_months % age_years < 12)
if not valid:
raise ValueError(age_years, age_months)
validator.add_record_check(check_age_variables,
'EX8', 'invalid age variables')

# validate the data and write problems to stdout
data = csv.reader('/path/to/data.csv', delimiter='\t')
problems = validator.validate(data)
write_problems(problems, sys.stdout)

For more complex use cases you can also sub-class `CSVValidator` to define
re-usable validator classes for specific data sources.

The source code for this module lives at:

https://github.com/alimanfoo/csvvalidator

For a complete account of all of the functionality available from this module,
see the example.py and tests.py modules in the source code repository.

TODO finish this
4 changes: 2 additions & 2 deletions csvvalidator.py
Original file line number Diff line number Diff line change
Expand Up @@ -66,8 +66,8 @@ def check_age_variables(r):
problems = validator.validate(data)
write_problems(problems, sys.stdout)
You can also sub-class `CSVValidator` to define re-usable validator classes for
your specific data sources.
For more complex use cases you can also sub-class `CSVValidator` to define
re-usable validator classes for specific data sources.
The source code for this module lives at:
Expand Down
2 changes: 1 addition & 1 deletion tests.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
"""
TODO
Tests for the `csvvalidator` module.
"""

Expand Down

0 comments on commit ff4e779

Please sign in to comment.