Pval is a utility for generating a PDF (probability density function) and CDF (cumulative distribution function) from a data-input file. It is a two-pass utility, such that the output on the first execution can be bootstrapped into the utility to generate "p-values" (i.e. P10, P50, P90); where the p-value (PXX) is the probabilistic outcome such that there is XX % likelihood that all other values will fall below that measure - effectively the inverse measure of the CDF.
Pval is unique in that is can also execute this analysis on dates which follow the ISO8601 date convention (yyyy-mm-dd).
This utility can be ran manually, or integrated into an automated-process/work-flow. In general these utilities follow the UNIX philosophy much as possible.
Generating distributions for PDF and CDF data is not easily available outside of statistical packages such as R / Matlab / Mathematica. Moreover working with dates can be difficult at times.
Pval is particularily useful for importing the output into BI tools and generating a combination graph of bar-chart (PDF) on a and line-chart (CDF) data. P-values can also be plotted to determine the inverse of the CDF.
Binaries (.exe) for Windows OS have been pre-compiled and can be found in the 'bin' folder. Or in the "Releases" section in GitHub.
With git, you can download all the latest source and binaries with git clone https://github.com/chipnetics/pval
Alternatively, if you don't have git installed:
- Download the latest release here
- Unzip to a local directory.
- Navigate to 'bin' directory for executables.
Utilities are written in the V programming language and will compile under Windows, Linux, and MacOS.
V is syntactically similar to Go, while equally fast as C. You can read about V here.
After installing the latest V compiler, it's as easy as executing the below. Be sure that the V compiler root directory is part of your PATH environment.
git clone https://github.com/chipnetics/pval
cd src
v pval.v
Alternatively, if you don't have git installed:
- Download the bundled source here
- Unzip to a local directory
- Navigate to src directory and run
v pval.v
Please see the V language documentation for further help if required.
For Windows users, if you want to pass optional command line arguments to an executable:
- Navigate to the directory of the utility.
- In the navigation (file-path) bar of Windows Explorer type "cmd" and hit enter to get a command prompt.
- Type the name of the exe along with the optional argument (i.e.
pval.exe --help
).
Options:
-c, --column <int> Column index to generate distribution from
-d, --is-date Indicate column is date (fmt: yyyy-mm-dd)
-b, --binsize <int> Bin sizing
-n, --no-header Indicate input file has no header
-e, --expand Expand output generated by pval
-f, --file-in <string> Input file
-h, --help display this help and exit
--version output version information and exit
As an aside, the author recommends the excellent tool EmEditor by Emurasoft for manually editing or viewing large text-based files for data science & analytics. Check out the tool here. EmEditor is paid software but well worth the investment to increase effeciency.
A relatively basic example on somewhat uniform data. The below would generate PDF and CDF from column 3 of the input file big_data_set.txt, with a binning size of 1.
pval.exe -f big_data_set.txt -b 1 -c 3 > result.txt
bin lower_bound upper_bound dsc_count cml_count pdf cdf
1 1 2 8223 8223 0.1318105313777351 0.1318105313777351
2 2 3 8223 16446 0.1318105313777351 0.2636210627554701
3 3 4 5250 21696 0.0841548449146429 0.3477759076701130
4 4 5 5121 26817 0.0820870401538832 0.4298629478239962
5 5 6 5091 31908 0.0816061553257995 0.5114691031497957
6 6 7 5082 36990 0.0814618898773744 0.59293099302717
7 7 8 5082 42072 0.0814618898773744 0.6743928829045444
8 8 9 5082 47154 0.0814618898773744 0.7558547727819187
9 9 10 5082 52236 0.0814618898773744 0.837316662659293
10 10 11 5079 57315 0.0814138013945660 0.918730464053859
11 11 12 5070 62385 0.0812695359461409 1.
Bootstrapping the result to generate PXX values.
pval.exe -f result.txt -e
pxx as_dec bin_l bin_u p-value
P00 0.00 1 2 1.000
P01 0.01 1 2 1.076
P02 0.02 1 2 1.152
P03 0.03 1 2 1.228
P04 0.04 1 2 1.303
P05 0.05 1 2 1.379
P06 0.06 1 2 1.455
P07 0.07 1 2 1.531
P08 0.08 1 2 1.607
P09 0.09 1 2 1.683
P10 0.10 1 2 1.759
P10 0.11 1 2 1.835
P11 0.12 1 2 1.910
P12 0.13 1 2 1.986
P13 0.14 2 3 2.062
P15 0.15 2 3 2.138
P16 0.16 2 3 2.214
P17 0.17 2 3 2.290
P18 0.18 2 3 2.366
P19 0.19 2 3 2.441
P20 0.20 2 3 2.517
P21 0.21 2 3 2.593
P22 0.22 2 3 2.669
P23 0.23 2 3 2.745
P24 0.24 2 3 2.821
P25 0.25 2 3 2.897
P26 0.26 2 3 2.973
P27 0.27 3 4 3.076
P28 0.28 3 4 3.195
P29 0.29 3 4 3.313
P30 0.30 3 4 3.432
P31 0.31 3 4 3.551
P32 0.32 3 4 3.670
P33 0.33 3 4 3.789
P34 0.34 3 4 3.908
P35 0.35 4 5 4.027
P36 0.36 4 5 4.149
P37 0.37 4 5 4.271
P38 0.38 4 5 4.393
P39 0.39 4 5 4.514
P40 0.40 4 5 4.636
P41 0.41 4 5 4.758
P42 0.42 4 5 4.880
P43 0.43 5 6 5.002
P44 0.44 5 6 5.124
P45 0.45 5 6 5.247
P46 0.46 5 6 5.369
P47 0.47 5 6 5.492
P48 0.48 5 6 5.614
P49 0.49 5 6 5.737
P50 0.50 5 6 5.859
P51 0.51 5 6 5.982
P52 0.52 6 7 6.105
P53 0.53 6 7 6.227
P54 0.54 6 7 6.350
P55 0.55 6 7 6.473
P56 0.56 6 7 6.596
P57 0.57 6 7 6.719
P58 0.58 6 7 6.841
P59 0.59 6 7 6.964
P60 0.60 7 8 7.087
P61 0.61 7 8 7.210
P62 0.62 7 8 7.332
P63 0.63 7 8 7.455
P64 0.64 7 8 7.578
P65 0.65 7 8 7.701
P66 0.66 7 8 7.823
P67 0.67 7 8 7.946
P68 0.68 8 9 8.069
P69 0.69 8 9 8.192
P70 0.70 8 9 8.314
P71 0.71 8 9 8.437
P72 0.72 8 9 8.560
P73 0.73 8 9 8.683
P74 0.74 8 9 8.805
P75 0.75 8 9 8.928
P76 0.76 9 10 9.051
P77 0.77 9 10 9.174
P78 0.78 9 10 9.296
P79 0.79 9 10 9.419
P80 0.80 9 10 9.542
P81 0.81 9 10 9.665
P82 0.82 9 10 9.787
P83 0.83 9 10 9.910
P84 0.84 10 11 10.033
P85 0.85 10 11 10.156
P86 0.86 10 11 10.279
P87 0.87 10 11 10.401
P88 0.88 10 11 10.524
P89 0.89 10 11 10.647
P90 0.90 10 11 10.770
P91 0.91 10 11 10.893
P92 0.92 11 12 11.016
P93 0.93 11 12 11.139
P94 0.94 11 12 11.262
P95 0.95 11 12 11.385
P96 0.96 11 12 11.508
P97 0.97 11 12 11.631
P98 0.98 11 12 11.754
P99 0.99 11 12 11.877
P100 1.00 11 12 12
An example using dates. The below would generate PDF and CDF from column 5 of the input file dates_data.txt, with a binning size of 14 days (bi-weekly). Notice the -d flag to indicate column 5 is a date column.
pval.exe -f dates_data.txt -b 14 -c 5 -d > pdf_cdf_dates.txt
bin lower_bound upper_bound dsc_count cml_count pdf cdf
1 2021-03-22 2021-04-05 87 87 0.0580386924616411 0.0580386924616411
2 2021-04-05 2021-04-19 64 151 0.0426951300867245 0.1007338225483656
3 2021-04-19 2021-05-03 88 239 0.0587058038692462 0.1594396264176118
4 2021-05-03 2021-05-17 28 267 0.0186791194129420 0.1781187458305537
5 2021-05-17 2021-05-31 90 357 0.0600400266844563 0.2381587725150100
6 2021-05-31 2021-06-14 87 444 0.0580386924616411 0.2961974649766511
7 2021-06-14 2021-06-28 112 556 0.0747164776517679 0.370913942628419
8 2021-06-28 2021-07-12 108 664 0.0720480320213476 0.4429619746497665
9 2021-07-12 2021-07-26 106 770 0.0707138092061374 0.513675783855904
10 2021-07-26 2021-08-09 127 897 0.0847231487658439 0.5983989326217478
11 2021-08-09 2021-08-23 152 1049 0.1014009339559707 0.6997998665777185
12 2021-08-23 2021-09-06 121 1170 0.0807204803202135 0.7805203468979319
13 2021-09-06 2021-09-20 116 1286 0.0773849232821881 0.8579052701801201
14 2021-09-20 2021-10-04 0 1286 0.0000000000000000 0.8579052701801201
15 2021-10-04 2021-10-18 0 1286 0.0000000000000000 0.8579052701801201
16 2021-10-18 2021-11-01 0 1286 0.0000000000000000 0.8579052701801201
17 2021-11-01 2021-11-15 0 1286 0.0000000000000000 0.8579052701801201
18 2021-11-15 2021-11-29 0 1286 0.0000000000000000 0.8579052701801201
19 2021-11-29 2021-12-13 0 1286 0.0000000000000000 0.8579052701801201
20 2021-12-13 2021-12-27 0 1286 0.0000000000000000 0.8579052701801201
21 2021-12-27 2022-01-10 1 1287 0.0006671114076051 0.8585723815877252
22 2022-01-10 2022-01-24 212 1499 0.1414276184122749 1.
Bootstrapping the result to generate PXX values.
pval.exe -f pdf_cdf_dates.txt -e
pxx as_dec bin_l bin_u p-value
P00 0.00 2021-03-22 2021-04-05 2021-03-22
P01 0.01 2021-03-22 2021-04-05 2021-03-24
P02 0.02 2021-03-22 2021-04-05 2021-03-26
P03 0.03 2021-03-22 2021-04-05 2021-03-29
P04 0.04 2021-03-22 2021-04-05 2021-03-31
P05 0.05 2021-03-22 2021-04-05 2021-04-03
P06 0.06 2021-04-05 2021-04-19 2021-04-05
P07 0.07 2021-04-05 2021-04-19 2021-04-08
P08 0.08 2021-04-05 2021-04-19 2021-04-12
P09 0.09 2021-04-05 2021-04-19 2021-04-15
P10 0.10 2021-04-05 2021-04-19 2021-04-18
P10 0.11 2021-04-19 2021-05-03 2021-04-21
P11 0.12 2021-04-19 2021-05-03 2021-04-23
P12 0.13 2021-04-19 2021-05-03 2021-04-25
P13 0.14 2021-04-19 2021-05-03 2021-04-28
P15 0.15 2021-04-19 2021-05-03 2021-04-30
P16 0.16 2021-05-03 2021-05-17 2021-05-03
P17 0.17 2021-05-03 2021-05-17 2021-05-10
P18 0.18 2021-05-17 2021-05-31 2021-05-17
P19 0.19 2021-05-17 2021-05-31 2021-05-19
P20 0.20 2021-05-17 2021-05-31 2021-05-22
P21 0.21 2021-05-17 2021-05-31 2021-05-24
P22 0.22 2021-05-17 2021-05-31 2021-05-26
P23 0.23 2021-05-17 2021-05-31 2021-05-29
P24 0.24 2021-05-31 2021-06-14 2021-05-31
P25 0.25 2021-05-31 2021-06-14 2021-06-02
P26 0.26 2021-05-31 2021-06-14 2021-06-05
P27 0.27 2021-05-31 2021-06-14 2021-06-07
P28 0.28 2021-05-31 2021-06-14 2021-06-10
P29 0.29 2021-05-31 2021-06-14 2021-06-12
P30 0.30 2021-06-14 2021-06-28 2021-06-14
P31 0.31 2021-06-14 2021-06-28 2021-06-16
P32 0.32 2021-06-14 2021-06-28 2021-06-18
P33 0.33 2021-06-14 2021-06-28 2021-06-20
P34 0.34 2021-06-14 2021-06-28 2021-06-22
P35 0.35 2021-06-14 2021-06-28 2021-06-24
P36 0.36 2021-06-14 2021-06-28 2021-06-25
P37 0.37 2021-06-14 2021-06-28 2021-06-27
P38 0.38 2021-06-28 2021-07-12 2021-06-29
P39 0.39 2021-06-28 2021-07-12 2021-07-01
P40 0.40 2021-06-28 2021-07-12 2021-07-03
P41 0.41 2021-06-28 2021-07-12 2021-07-05
P42 0.42 2021-06-28 2021-07-12 2021-07-07
P43 0.43 2021-06-28 2021-07-12 2021-07-09
P44 0.44 2021-06-28 2021-07-12 2021-07-11
P45 0.45 2021-07-12 2021-07-26 2021-07-13
P46 0.46 2021-07-12 2021-07-26 2021-07-15
P47 0.47 2021-07-12 2021-07-26 2021-07-17
P48 0.48 2021-07-12 2021-07-26 2021-07-19
P49 0.49 2021-07-12 2021-07-26 2021-07-21
P50 0.50 2021-07-12 2021-07-26 2021-07-23
P51 0.51 2021-07-12 2021-07-26 2021-07-25
P52 0.52 2021-07-26 2021-08-09 2021-07-27
P53 0.53 2021-07-26 2021-08-09 2021-07-28
P54 0.54 2021-07-26 2021-08-09 2021-07-30
P55 0.55 2021-07-26 2021-08-09 2021-08-01
P56 0.56 2021-07-26 2021-08-09 2021-08-02
P57 0.57 2021-07-26 2021-08-09 2021-08-04
P58 0.58 2021-07-26 2021-08-09 2021-08-05
P59 0.59 2021-07-26 2021-08-09 2021-08-07
P60 0.60 2021-08-09 2021-08-23 2021-08-09
P61 0.61 2021-08-09 2021-08-23 2021-08-10
P62 0.62 2021-08-09 2021-08-23 2021-08-11
P63 0.63 2021-08-09 2021-08-23 2021-08-13
P64 0.64 2021-08-09 2021-08-23 2021-08-14
P65 0.65 2021-08-09 2021-08-23 2021-08-16
P66 0.66 2021-08-09 2021-08-23 2021-08-17
P67 0.67 2021-08-09 2021-08-23 2021-08-18
P68 0.68 2021-08-09 2021-08-23 2021-08-20
P69 0.69 2021-08-09 2021-08-23 2021-08-21
P70 0.70 2021-08-23 2021-09-06 2021-08-23
P71 0.71 2021-08-23 2021-09-06 2021-08-24
P72 0.72 2021-08-23 2021-09-06 2021-08-26
P73 0.73 2021-08-23 2021-09-06 2021-08-28
P74 0.74 2021-08-23 2021-09-06 2021-08-29
P75 0.75 2021-08-23 2021-09-06 2021-08-31
P76 0.76 2021-08-23 2021-09-06 2021-09-02
P77 0.77 2021-08-23 2021-09-06 2021-09-04
P78 0.78 2021-08-23 2021-09-06 2021-09-05
P79 0.79 2021-09-06 2021-09-20 2021-09-07
P80 0.80 2021-09-06 2021-09-20 2021-09-09
P81 0.81 2021-09-06 2021-09-20 2021-09-11
P82 0.82 2021-09-06 2021-09-20 2021-09-13
P83 0.83 2021-09-06 2021-09-20 2021-09-14
P84 0.84 2021-09-06 2021-09-20 2021-09-16
P85 0.85 2021-09-06 2021-09-20 2021-09-18
P86 0.86 2022-01-10 2022-01-24 2022-01-10
P87 0.87 2022-01-10 2022-01-24 2022-01-11
P88 0.88 2022-01-10 2022-01-24 2022-01-12
P89 0.89 2022-01-10 2022-01-24 2022-01-13
P90 0.90 2022-01-10 2022-01-24 2022-01-14
P91 0.91 2022-01-10 2022-01-24 2022-01-15
P92 0.92 2022-01-10 2022-01-24 2022-01-16
P93 0.93 2022-01-10 2022-01-24 2022-01-17
P94 0.94 2022-01-10 2022-01-24 2022-01-18
P95 0.95 2022-01-10 2022-01-24 2022-01-19
P96 0.96 2022-01-10 2022-01-24 2022-01-20
P97 0.97 2022-01-10 2022-01-24 2022-01-21
P98 0.98 2022-01-10 2022-01-24 2022-01-22
P99 0.99 2022-01-10 2022-01-24 2022-01-23
P100 1.00 2022-01-10 2022-01-24 2022-01-24