Each person who has contact with the New Zealand health system is issued a unique 7 character National Health Index number (NHI). The unique identification is actually provided by the first six characters. The seventh is a checksum, which provides for an internal validity check. This package can check NHIs for a valid checksum, allowing for the detection of most typographical or other data errors.
nhiValidator is not available from CRAN,
so you should install it from GitHub (note that
you need to explicitly refer to the main
branch in the repository, as
install_github()
defaults to attempting to access a branch called
master
):
# install.packages('devtools')
devtools::install_github('nzbri/nhiValidator', ref = 'main')
NHIs can be in one of two formats: the original AAANNNC
(the
identifier of three letters and three digits, followed by a numerical
check digit) or the revised AAANNAC
(three letters, two digits, one
letter, and an alphabetic check character). The final check digit or
character is calculated as a checksum based on the first six characters.
Thus it provides an internal validity check, to guard against data entry
errors. This package contains functions that check for the correct
sequence format of letters and characters. It can also conduct the
internal validity check, by calculating what the check digit should be,
and returning whether the calculated value matches the entered value.
As the pool of original NHIs will soon be exhausted, the new format is about to be introduced. This is of the same length as the original but with the final two digits being replaced by letters. This provides for more possible unique values and also improves the strength of the checksum. This package can deal with either format.
Details on the new format can be found at the Ministry of Health.
JBX3656
(a test value, not issued to a real person) is an example of
the original NHI format (3 letters followed by 4 digits). The
nhi_valid()
function shows that the internal checksum is valid:
library(nhiValidator)
nhi_valid('JBX3656')
#> [1] TRUE
That is, the final digit, 6
, is calculated based on the preceding 6
characters. Therefore any other final character would yield an invalid
result:
nhi_valid(c('JBX3650', 'JBX3651', 'JBX3652', 'JBX3653'))
#> [1] FALSE FALSE FALSE FALSE
Conversely, transpositions or substitutions among any of the other
characters are very unlikely to be consistent with the final check digit
of 6
:
nhi_valid(c('BJX3656', 'JBX3696', 'JBX6356', 'JBX3566'))
#> [1] FALSE FALSE FALSE FALSE
The other function the package provides is nhi_format()
. This does not
do the internal validity check of the NHI, but merely reports whether
its sequence of letters and digits is consistent with the original or
revised NHI format, or is in another, invalid format:
# the second entry below would fail the internal validity check,
# but is still in the expected format of an original NHI
nhi_format(c('JBX3656', 'JBX3657', 'ZZZ00AX', 'HELLO', NA),
allow_test_cases = TRUE)
#> [1] "original format" "original format" "revised format" "invalid format"
#> [5] NA
NHIs (of either format) that start with Z
are reserved for testing
purposes (i.e. they will never be assigned to real people). These are
likely only of interest to software developers and system testers. This
package defaults to regarding such NHIs as invalid, because they are
only likely to be encountered in the wild as the result of a typo or
other data error. If, however, you would like to process such values,
you can override that behaviour by setting the allow_test_cases = TRUE
parameter in either the nhi_format()
or nhi_valid()
functions.
No further feature development is envisaged but issue reports and pull requests are certainly welcome.
The package functions are vectorised only to the extent that they can be passed a vector of values and return a corresponding vector of results. Under the hood, however, the functions iterate over each entry in the vector and process them sequentially. This can lead to slow performance when processing a lot of values. If for some reason you need to validate a million NHIs, for example, expect to wait for half an hour. Performance optimisation is not a priority for the original developer but pull requests in that regard will be gratefully received.
This package was developed by Michael MacAskill at the New Zealand Brain Research Institute. It is released as open-source software under an MIT licence. Note the provisions under that licence about warranties and liability.
Please report any issues or suggestions using the Github issues page for this project.