diff --git a/metadata.md b/metadata.md index c1dfb92f..aa0c6880 100644 --- a/metadata.md +++ b/metadata.md @@ -1,658 +1,617 @@ -# Metadata +## Sample -There are eight tables that are described below. example data is stored in [data](data). +The sample is a representative volume of wastewater taken from a Site which is then analysed by a lab. -- [Sample](#sample) -- [WWMeasure](#wwmeasure) -- [Site](#Site) -- [SiteMeasure](#sitemeasure) -- [Reporter](#reporter) -- [Lab](#lab) -- [AssayMethod](#assaymethod) -- [Instrument](#instrument) -- [Polygon](#polygon) -- [CovidPublicHealthData](#covidpublicHealthdata) -- [Lookups](#lookups) +- **sampleID**: (Primary Key) [string] Unique identification for sample. Suggestion:siteID-date-index. -## Entity Relationship Diagram -Use Entity Relationship Diagram to identify variable type. +- **siteID**: (Foreign key) [string] Links with the Site table to describe the location of sampling. -- **BLOB**: The ASCII-encoded string in lower case representing the media type of the Blob. More [details](https://w3c.github.io/FileAPI/#dfn-type) -- **bool**: boolean, TRUE, FALSE -- **char**: ASCII-encoded string -- **cat**: categorical defined using ASCII-encoded string as defined for the variable -- **dateTime**: YYYY-MM-DD HH:mm:ss (24 hour format, in UTC) -- **email**: email address -- **float**: float-point numerical value -- **int**: integer -- **phone**: phone number, either ###-###-#### or #-###-###-#### -- **url**: Uniform Resource Identifier +- **dateTime**: [datetime] for grab samples this is the date, time and timezone the sample was taken. -![](img/ERD.svg) -Comment on the ERD in [Lucidcharts](https://lucid.app/lucidchart/invitations/accept/adc1784b-e237-4a2f-947e-4503544d4510) +- **dateTimeStart**: [datetime] For integrated time averaged samples this is the date, time and timezone the sample was started being taken. -## Sample -The sample is a representative volume of wastewater taken from a [Site](#Site) which is then analysed by a lab. +- **dateTimeEnd**: [datetime] For integrated time average samples this is the date, time and timezone the sample was finished being taken. + + +- **type**: [category] Type of sample. + - `rawWW`: Raw wastewater. + - `swrSed`: Sediments obtained in sewer. + - `pstGrit`: Raw wastewater after a treatment plant's headworks. + - `pSludge`: Sludge produced by primary clarifiers. + - `pEfflu`: Effluent obtained after primary clarifiers. + - `sSludge`: Sludge produced by secondary clarifiers. + - `sEfflu`: Effluent obtained after secondary clarifiers. + - `water`: Non-wastewater, coming from any kind of water body. + - `faeces`: Fecal matter. + - `other`: Other type of site. Add description to typeOther. + +- **typeOther**: [string] Description for other type of sample not listed in -- **sampleID**: (Primary Key) Unique identification for sample. Suggestion: *siteID-date-index*. -- **siteID**: (Foreign key) Links with the Site table to describe the location of sampling. +- **collection**: [category] Method used to collect the data. + - `cpTP24h`: A time proportional 24-hour composite sample generally collected by an autosampler. + - `cpFP24h`: A flow proportional 24-hour composite sample generally collected by an autosampler. + - `grb`: A single large representative grab sample. + - `grbCp8h`: A 8-hour composite with 8 grab samples each taken once per hour, generally manually performed. + - `grbCp3h`: A 3-hour composite with 3 grab samples each taken once per hour, generally manually performed. + - `grbCp3`: A grab-composite sample composed of 3 separate grab samples. + - `mooreSw`: Moore swab passive sample. + - `other`: Other type of collection method. Add description to collectionOther. -- **dateTime**: For grab samples this is the *date, time and timezone* the sample was taken. +- **collectionOther**: [string] Description for other type of method not listed in collection. -- **dateTimeStart**: For integrated time averaged samples this is the *date, time and timezone* the sample was started being taken. -- **dateTimeEnd**: For integrated time average samples this is the *date, time and timezone* the sample was finished being taken. +- **preTreatment**: [boolean] Was the sample chemically treated in anyway with the addition of stabilizers or other -- **type**: Type of sample. - - `rawWW`: Raw wastewater. - - `swrSed`: Sediments obtained in sewer. - - `pstGrit`: Raw wastewater after a treatment plant's headworks. - - `pSludge`: Sludge produced by primary clarifiers. - - `pEfflu`: Effluent obtained after primary clarifiers. - - `sSludge`: Sludge produced by secondary clarifiers. - - `sEfflu`: Effluent obtained after secondary clarifiers. - - `water`: Non-wastewater, coming from any kind of water body. - - `faeces`: Fecal matter. - - `other`: Other type of site. Add description to `typeOther`. +- **preTreatmentDescription**: [string] If preTreatment then describe the treatment that was performed. -- **typeOther**: Description for other type of sample not listed in `type`. -- **collection**: Method used to collect the data. +- **pooled**: [boolean] Is this a pooled sample, and therefore composed of multiple child samples obtained at different sites - - `cpTP24h`: A time proportional 24-hour composite sample generally collected by an autosampler. - - `cpFP24h`: A flow proportional 24-hour composite sample generally collected by an autosampler. - - `grb`: A single large representative grab sample. - - `grbCp8h`: An 8-hour composite with 8 grab samples each taken once per hour, generally manually performed. - - `grbCp3h`: A 3-hour composite with 3 grab samples each taken once per hour, generally manually performed. - - `grbCp3`: A grab-composite sample composed of 3 separate grab samples. - - `mooreSw`: Moore swab passive sample. - - `other`: Other type of collection method. Add description to `collectionOther`. -- **collectionOther**: Description for other type of method not listed in `collection`. +- **children**: [string] If this is a sample with many smaller samples either because of pooling or sub-sampling this indicates a comma separated list of child sampleID's. -- **preTreatment**: Was the sample chemically treated in anyway with the addition of stabilizers or other? -- **preTreatmentDescription**: If `preTreatment` then describe the treatment that was performed. +- **parent**: [string] If this sample has been pooled into one big sample for analysis this indicates the sampleID of the larger pooled sample. -- **pooled**: Is this a pooled sample, and therefore composed of multiple child samples obtained at different sites? (Boolean) -- **children**: If this is a sample with many smaller samples either because of pooling or sub-sampling this indicates *a comma separated list of child sampleID's*. +- **sizeL**: [float] Total volume of water or sludge sampled. -- **parent** : If this sample has been pooled into one big sample for analysis this indicates the *sampleID of the larger pooled sample*. -- **sizeL**: Total volume of water or sludge sampled. +- **fieldSampleTempC**: [float] Temperature that the sample is stored at while it is being sampled. This field is mainly relevant for composite samples which are either kept at ambient temperature or refrigerated while being sampled. -- **fieldSampleTempC**: Temperature that the sample is stored at while it is being sampled. This field is mainly relevant for composite samples which are either kept at ambient temperature or refrigerated while being sampled. -- **shippedOnIce**: Was the sample kept cool while being shipped to the lab? +- **shippedOnIce**: [boolean] Was the sample kept cool while being shipped to the lab -- **storageTempC**: Temperature that the sample is stored at in Celsius. -- **qualityFlag**: Does the reporter suspect the sample having some quality issues? +- **storageTempC**: [float] Temperature that the sample is stored at in Celsius. -- **notes**: Any additional notes. + +- **qualityFlag**: [boolean] Does the reporter suspect the sample having some quality issues + + +- **notes**: [string] Any additional notes. ## WWMeasure -Measurement result (ie. single variable) from a wastewater sample. `WWMeaasure` includes data that is commonly collected by staff at wastewater laboratories where measurement is performed using an assay method (see [AssayMethod](#assaymethod)), but can also be performed using specific instruments (see [Instruments](#instrument). Measures performed at the site of the wastewater sample are reported in `SiteMeasure`. +Measurement result (ie. single variable) from a wastewater sample. WWMeaasure includes data that is commonly collected by staff at wastewater laboratories where measurement is performed using an assay method (see AssayMethod), but can also be performed using specific instruments (see Instruments. Measures performed at the site of the wastewater sample are reported in SiteMeasure. + +- **uWwMeasureID**: (Primary Key) [string] Unique identifier a measurement within the measurement table. + + +- **wwMeasureID**: [string] Unique identifier for wide table only. Use when all measures are performed on a single sample at the same time and same laboratory. Suggestion: siteID_sampleID_LabID_reportDate_ID. + + +- **sampleID**: (Foreign key) [string] Links with the identified Sample + + +- **labID**: (Foreign key) [string] Links with the identified Lab that performed the analysis. + + +- **assayID**: (Foreign key) [string] Links with the AssayMethod used to perform the analysis. Use instrument.ID for measures that are not viral measures. + + +- **instrumentID**: (Foreign key) [string] Links with the Instrument used to perform the analysis. Use assay.ID for viral measures. + -- **uWwMeasureID**: (Primary key) Unique identifier a measurement within the measurement table. +- **reporterID**: (Foreign key) [string] Links with the reporter that is responsible for the data. -- **wwMeasureID**: Unique identifier for wide table only. Use when all measures are performed on a single sample at the same time and same laboratory. Suggestion: _siteID_sampleID_LabID_reportDate_ID_. -- **sampleID**: (Foreign key) Links with the identified Sample. +- **analysisDate**: [date] date the measurement was performed in the lab. -- **labID**: (Foreign key) Links with the identified Lab that performed the analysis. -- **assayID**: (Foreign key) Links with the `AssayMethod` used to perform the analysis. Use `instrument.ID` for measures that are not viral measures. +- **reportDate**: [date] date the data was reported. One sampleID may have updated reports based on updates to assay method or reporting standard. In this situation, use the original sampleID but updated MeasureID, reportDate and assayID (if needed). -- **instrumentID**: (Foreign key) Links with the `Instrument` used to perform the analysis. Use `assay.ID` for viral measures. -- **reporterID**: (Foreign key) Links with the reporter that is responsible for the data. +- **fractionAnalyzed**: [category] Faction of the sample that is analyzed. + - `liquid`: Liquid fraction + - `solid`: Solid fraction + - `mixed`: Mixed/homogenized sample -- **analysisDate**: Date the measurement was performed in the lab. +- **type**: [category] The variable that is being measured on the sample, e.g. a SARS-CoV-2 gene target region (cov), a biomarker for normalisation (n) or a water quality parameter (wq). + - `covN1`: SARS-CoV-2 nucleocapsid gene N1 + - `covN2`: SARS-CoV-2 nucleocapsid gene N2 + - `covN3`: SARS-like coronaviruses nucleocapsid gene N3 + - `covE`: SARS-CoV-2 gene region E + - `covRdRp`: SARS-CoV-2 gene region RdRp + - `nPMMoV`: Pepper mild mottle virus + - `ncrA`: cross-assembly phage + - `nbrsv`: bovine respiratory syncytial virus + - `wqTS`: Total solids concentration. + - `wqTSS`: Total suspended solids concentration. + - `wqVSS`: Volatile suspended solids concentration. + - `wqCOD`: Chemical oxygen demand. + - `wqOPhos`: Ortho-phosphate concentration. + - `wqNH4N`: Ammonium nitrogen concentration, as N. + - `wqTN`: Total nitrogen concentration, as N. + - `wqPh`: pH + - `wqCond`: Conductivity + - `other`: Other measurement category. Add description to categoryOther. -- **reportDate**: Date the data was reported. One sampleID may have updated reports based on updates to assay method or reporting standard. In this situation, use the original `sampleID` but updated `MeasureID`, `reportDate` and `assayID` (if needed). +- **typeOther**: [string] Description for an other variable not listed in category. -- **fractionAnalyzed**: Faction of the sample that is analyzed. - - `liquid`: Liquid fraction - - `solid`: Solid fraction - - `mixed`: Mixed/homogenized sample +- **unit**: [category] Unit of the measurement. + - `gcPMMoV`: Gene copies per copy of PMMoV. + - `gcMl`: Gene copies per milliliter. + - `gcGs`: Gene copies per gram solids. + - `gcL`: Gene copies per liter. + - `gcCrA`: Gene copies per copy of crAssphage. + - `Ct`: Cycle threshold. + - `mgL`: Milligrams per liter. + - `ph`: pH units + - `uScm`: Micro-siemens per centimeter. + - `pp`: Percent positive, for Moore swab. + - `pps`: Percent primary sludge, for total solids. + - `other`: Other measurement of viral copies or wastewater treatment plant parameter. Add description to UnitOther. -- **type**: The variable that is being measured on the sample, e.g. a SARS-CoV-2 gene target region (`cov`), a biomarker for normalisation (`n`) or a water quality parameter (`wq`). +- **unitOther**: [string] Description for other measurement unit not listed in unit. - - `covN1`: SARS-CoV-2 nucleocapsid gene N1 - - `covN2`: SARS-CoV-2 nucleocapsid gene N2 - - `covN3`: SARS-like coronaviruses nucleocapsid gene N3 - - `covE`: SARS-CoV-2 gene region E - - `covRdRp`: SARS-CoV-2 gene region RdRp - - `nPMMoV`: Pepper mild mottle virus - - `ncrA`: cross-assembly phage - - `nbrsv`: bovine respiratory syncytial virus - - `wqTS`: Total solids concentration. - - `wqTSS`: Total suspended solids concentration. - - `wqVSS`: Volatile suspended solids concentration. - - `wqCOD`: Chemical oxygen demand. - - `wqOPhos`: Ortho-phosphate concentration. - - `wqNH4N`: Ammonium nitrogen concentration, as N. - - `wqTN`: Total nitrogen concentration, as N. - - `wqPh`: pH. - - `wqCond`: Conductivity. - - `other`: Other measurement category. Add description to `categoryOther`. -- **typeOther**: Description for an other variable not listed in `category`. +- **aggregation**: [category] Statistical measures used to report the sample units of Ct/Cq, unless otherwise stated. Each aggregation has a corresponding value. + - `single`: This value is not an aggregate measurement in any way (ie. not a mean, median, max or any other) and can be a replicate value. + - `mean`: Arithmetic mean + - `meanNr`: Arithmetic mean, normalized + - `geoMn`: Geometric mean + - `geoMnNr`: Geometric mean, normalized + - `median`: Median + - `min`: Lowest value in a range of values + - `max`: Highest value in a range of values + - `sd`: Standard deviation + - `sdNr`: Standard deviation, normalized + - `other`: Other aggregation method. Add description to aggregationOther -- **unit**: Unit of the measurement. +- **aggregationOther**: [string] Description for other type of aggregation not listed in aggregation. - - `gcPMMoV`: Gene copies per copy of PMMoV. - - `gcMl`: Gene copies per milliliter. - - `gcGs`: Gene copies per gram solids. - - `gcL`: Gene copies per liter. - - `gcCrA`: Gene copies per copy of crAssphage. - - `Ct`: Cycle threshold. - - `mgL`: Milligrams per liter. - - `ph`: pH units - - `uScm`: Micro-siemens per centimeter. - - `pp`: Percent positive, for Moore swab. - - `pps`: Percent primary sludge, for total solids. - - `other`: Other measurement of viral copies or wastewater treatment plant parameter. Add description to `UnitOther`. -- **unitOther**: Description for other measurement unit not listed in `unit`. +- **index**: [integer] Index number in case the measurement was taken multiple times. -- **aggregation**: Statistical measures used to report the sample units of Ct/Cq, unless otherwise stated. Each aggregation has a corresponding value. - - `single`: This value is not an aggregate measurement in any way (ie. not a `mean`, `median`, `max` or any other) and can be a replicate value. - - `mean`: Arithmetic mean - - `meanNr`: Arithmetic mean, normalized - - `geoMn`: Geometric mean - - `geoMnNr`: Geometric mean, normalized - - `median`: Median - - `min`: Lowest value in a range of values - - `max`: Highest value in a range of values - - `sd`: Standard deviation - - `sdNr`: Standard deviation, normalized - - `other`: Other aggregation method. Add description to `aggregationOther` +- **value**: [float] The actual measurement value that was obtained through analysis. -- **aggregationOther**: Description for other type of aggregation not listed in `aggregation`. -- **index**: Index number in case the measurement was taken multiple times. +- **qualityFlag**: [boolean] Does the reporter suspect the measurement having some quality issues -- **value**: The actual measurement value that was obtained through analysis. -- **qualityFlag**: Does the reporter suspect the measurement having some quality issues? +- **accessToPublic**: [boolean] If this is 'no', this data will not be available to the public. If missing, data will be available to the public. -- **accessToPublic**: If this is 'no', this data will not be available to the public. If missing, data will be available to the public. -- **accessToAllOrg**: If this is 'no', this data will not be available to any partner organization. If missing, data will be available to the all organizations. +- **accessToAllOrg**: [boolean] If this is 'no', this data will not be available to any partner organization. If missing, data will be available to the all organizations. -- **accessToSelf**: If this is 'no', this data will not be shown on the portal when this reporter logs in. If missing, data will be available to this reporter. -- **accessToPHAC**: If this is 'no', the data will not be available to employees of the Public Health Agency of Canada - PHAC. If missing, data will be available to employees of the Public Health Agency of Canada - PHAC. +- **accessToSelf**: [boolean] If this is 'no', this data will not be shown on the portal when this reporter logs in. If missing, data will be available to this reporter. -- **accessToLocalHA**: If this is 'no', the, data will not be available to local health authorities. If missing, data will be available to local health authorities. -- **accessToProvHA**: If this is 'no', this data will not be available to provincial health authorities. If missing, data will be available to provincial health authorities. +- **accessToPHAC**: [boolean] If this is 'no', the data will not be available to employees of the Public Health Agency of Canada - PHAC. If missing, data will be available to employees of the Public Health Agency of Canada - PHAC. -- **accessToOtherProv**: If this is 'no', this data will not be available to other data providers not listed before. If missing, data will be available to other data providers not listed before -- **accessToDetails**: More details on the existing confidentiality requirements of this measurement. +- **accessToLocalHA**: [boolean] If this is 'no', the, data will not be available to local health authorities. If missing, data will be available to local health authorities. -- **notes**: Any additional notes. + +- **accessToProvHA**: [boolean] If this is 'no', this data will not be available to provincial health authorities. If missing, data will be available to provincial health authorities. + + +- **accessToOtherProv**: [boolean] If this is 'no', this data will not be available to other data providers not listed before. If missing, data will be available to other data providers not listed before + + +- **accessToDetails**: [boolean] More details on the existing confidentiality requirements of this measurement. + + +- **notes**: [string] Any additional notes. ## Site -The site of wastewater sampling, including several *defaults* that can be used to populate new samples upon creation. +The site of wastewater sampling, including several defaults that can be used to populate new samples upon creation. + +- **siteID**: (Primary Key) [string] Unique identifier for the location where wastewater sample was taken. + + +- **name**: [string] Given name to the site. Location name could be a treatment plant, campus, institution or sewer location, etc. -- **siteID**: (Primary Key) Unique identifier for the location where wastewater sample was taken. -- **name**: Given name to the site. Location name could be a treatment plant, campus, institution or sewer location, etc. +- **description**: [string] Description of wastewater site (city, building, street, etc.) to better identify the location of the sampling point. -- **description**: Description of wastewater site (city, building, street, etc.) to better identify the location of the sampling point. -- **type**: Type of site or institution where sample was taken. +- **type**: [category] Type of site or institution where sample was taken. + - `airPln`: Airplane. + - `corFcil`: Correctional facility. + - `school`: School + - `hosptl`: Hospital + - `ltcf`: Long-term care facility. + - `swgTrck`: Sewage truck. + - `uCampus`: University campus. + - `mSwrPpl`: Major sewer pipeline. + - `pStat`: Pumping station. + - `holdTnk`: Hold tank. + - `retPond`: Retention pond. + - `wwtpMuC`: Municipal wastewater treatment plant for combined sewage. + - `wwtpMuS`: Municipal wastewater treatment plant for sanitary sewage only. + - `wwtpInd`: Industrial wastewater treatment plant. + - `lagoon`: Logoon system for extensive wastewater treatment. + - `septTnk`: Septic tank. + - `river`: River, natural water body. + - `lake`: Lake, natural water body. + - `estuary`: Estuary, natural water body + - `sea`: Sea, natural water body. + - `ocean`: Ocean, natural water body. + - `other`: Other site type. Add description to typeOther. - - `airPln`: Airplane. - - `corFcil`: Correctional facility. - - `school`: School. - - `hosptl`: Hospital. - - `ltcf`: Long-term care facility. - - `swgTrck`: Sewage truck. - - `uCampus`: University campus. - - `mSwrPpl`: Major sewer pipeline. - - `pStat`: Pumping station. - - `holdTnk`: Hold tank. - - `retPond`: Retention pond. - - `wwtpMuC`: Municipal wastewater treatment plant for combined sewage. - - `wwtpMuS`: Municipal wastewater treatment plant for sanitary sewage only. - - `wwtpInd`: Industrial wastewater treatment plant. - - `lagoon`: Logoon system for extensive wastewater treatment. - - `septTnk`: Septic tank. - - `river`: River, natural water body. - - `lake`: Lake, natural water body. - - `estuary`: Estuary, natural water body - - `sea`: Sea, natural water body. - - `ocean`: Ocean, natural water body. - - `other`: Other site type. Add description to `typeOther`. +- **typeOther**: [string] Description of the site when the site is not listed. See siteType. -- **typeOther**: Description of the site when the site is not listed. See `siteType`. -- **SampleTypeDefault**: Used as default when a new sample is created for this site. See `type` in `Sample` table. +- **SampleTypeDefault**: [category] Used as default when a new sample is created for this site. See type in Sample table. + - `rawWW`: Raw wastewater. + - `swrSed`: Sediments obtained in sewer. + - `pstGrit`: Raw wastewater after a treatment plant's headworks. + - `pSludge`: Sludge produced by primary clarifiers. + - `pEfflu`: Effluent obtained after primary clarifiers. + - `sSludge`: Sludge produced by secondary clarifiers. + - `sEfflu`: Effluent obtained after secondary clarifiers. + - `water`: Non-wastewater, coming from any kind of water body. + - `faeces`: Fecal matter. + - `other`: Other type of site. Add description to typeOther. -- **SampleTypeOtherDefault**: Used as default when a new sample is created for this site. See `typeOther` in `Sample` table. +- **SampleTypeOtherDefault**: [string] Used as default when a new sample is created for this site. See typeOther in Sample table. -- **SampleCollectionDefault**: Used as default when a new sample is created for this site. See `collection` in `Sample` table. -- **SampleCollectOtherDefault**: Used as default when a new sample is created for this site. See `collectionOther` in `Sample` table. +- **SampleCollectionDefault**: [category] Used as default when a new sample is created for this site. See collection in Sample table. + - `cpTP24h`: A time proportional 24-hour composite sample generally collected by an autosampler. + - `cpFP24h`: A flow proportional 24-hour composite sample generally collected by an autosampler. + - `grb`: A single large representative grab sample. + - `grbCp8h`: An 8-hour composite with 8 grab samples each taken once per hour, generally manually performed. + - `grbCp3h`: A 3-hour composite with 3 grab samples each taken once per hour, generally manually performed. + - `grbCp3`: A grab-composite sample composed of 3 separate grab samples. + - `mooreSw`: Moore swab passive sample. + - `other`: Other type of collection method. Add description to collectionOther. -- **SampleStorageTempCDefault**: Used as default when a new sample is created for this site. See `storageTempC` in `Sample` table. +- **SampleCollectOtherDefault**: [string] Used as default when a new sample is created for this site. See collectionOther in Sample table. -- **MeasureFractionAnalyzedDefault**: Used as default when a new measurement is created for this site. See `fractionAnalyzed` in `Measurement` table. -- **geoLat**: Site geographical location, latitude in decimal coordinates, ie.: (45.424721) +- **SampleStorageTempCDefault**: [float] Used as default when a new sample is created for this site. See storageTempC in Sample table. -- **geoLong**: Site geographical location, longitude in decimal coordinates, ie.: (-75.695000) -- **notes**: Any additional notes. +- **MeasureFractionAnalyzedDefault**: [category] Used as default when a new measurement is created for this site. See fractionAnalyzed in Measurement table. + - `liquid`: Liquid fraction + - `solid`: Solid fraction + - `mixed`: Mixed/homogenized sample -- **polygonID**: (Foreign key) Links with the Polygon table, this should encompass the area that typically drains into this site. +- **geoLat**: [float] Site geographical location, latitude in decimal coordinates, ie.: (45.424721) -- **sewerNetworkFileLink**: Link to a file that has any detailed information about the sewer network associated with the site (any format). -- **sewerNetworkFileBLOB**: A file BLOB that has any detailed information about the sewer network associated with the site (any format). +- **geoLong**: [float] Site geographical location, longitude in decimal coordinates, ie.: (-75.695000) + + +- **notes**: [string] Any additional notes. + + +- **polygonID**: (Foreign key) [string] Links with the Polygon table, this should encompass the area that typically drains into this site. + + +- **sewerNetworkFileLink**: [string] Link to a file that has any detailed information about the sewer network associated with the site (any format). + + +- **sewerNetworkFileBLOB**: [blob] A file blob that has any detailed information about the sewer network associated with the site (any format). ## SiteMeasure -Measurement result (ie. single variable) obtained by at the site of wastewater sample.`SiteMeasure` includes data that is commonly collected by staff at wastewater treatment facilities and field sample locations. These measures that are not performed on the wastewater sample but provide additional context necessary for the interpretation of the results. Measures performed on the wastewater sample are reported in `WWMeasure`. +Measurement result (ie. single variable) obtained by at the site of wastewater sample.SiteMeasure includes data that is commonly collected by staff at wastewater treatment facilities and field sample locations. These measures that are not performed on the wastewater sample but provide additional context necessary for the interpretation of the results. Measures performed on the wastewater sample are reported in WWMeasure. + +- **uSiteMeasureID**: (Primary Key) [string] Unique identifier for each measurement for a site. + -- **uSiteMeasureID**: (Primary Key) Unique identifier for each measurement for a site. +- **siteMeasureID**: [string] Unique identifier for wide table only. Use when all measures are performed on a single sample. -- **siteMeasureID**: Unique identifier for wide table only. Use when all measures are performed on a single sample. -- **siteID**: (Foreign Key) Links with the Site table to describe the location of measurement. +- **siteID**: (Foreign key) [string] Links with the Site table to describe the location of measurement. -- **instrumentID**: (Foreign Key) Links with the `Instrument` table to describe instrument used for the measurement. -- **reporterID**: (Foreign key) Links with the reporter that is responsible for the data. +- **instrumentID**: (Foreign key) [string] Links with the Instrument table to describe instrument used for the measurement. -- **dateTime**: The date and time the measurement was performed. -- **type**: The type of measurement that was performed. The prefix `env` is used for environmental variables, whereas `ww` indicates a measurement on wastewater. +- **reporterID**: (Foreign key) [string] Links with the reporter that is responsible for the data. - - `envTemp`: Environmental temperature. - - `envRnF`: Rain fall, i.e. amount of precipitation in the form of rain. - - `envSnwF`: Snow fall, i.e. amount of precipitation in the form of snow. - - `envSnwD`: Total depth of snow on the ground. - - `wwFlow`: Flow of wastewater. - - `wwTemp`: Temperature of the wastewater. - - `wwTSS`: Total suspended solids concentration of the wastewater. - - `wwCOD`: Chemical oxygen demand of the wastewater. - - `wwTurb`: Turbidity of the wastewater. - - `wwOPhos`: Ortho-phosphate concentration. - - `wwNH4N`: Ammonium nitrogen concentration, as N. - - `wwTN`: Total nitrogen concentration, as N. - - `wwpH`: pH of the wastewater. - - `wwCond`: Conductivity of the wastewater. - - `other`: An other type of measurement. Add description to `typeOther`. -- **typeOther**: Description of the measurement in case it is not listed in `type`. +- **dateTime**: [date] The date and time the measurement was performed. -- **typeDescription**: Additional information on the performed measurement. -- **aggregation**: When reporting an aggregate measurement, this field describes the method used. +- **type**: [category] The type of measurement that was performed. The prefix env is used for environmental variables, whereas ww indicates a measurement on wastewater. + - `envTemp`: Environmental temperature. + - `envRnF`: Rain fall, i.e. amount of precipitation in the form of rain. + - `envSnwF`: Snow fall, i.e. amount of precipitation in the form of snow. + - `envSnwD`: Total depth of snow on the ground. + - `wwFlow`: Flow of wastewater. + - `wwTemp`: Temperature of the wastewater. + - `wwTSS`: Total suspended solids concentration of the wastewater. + - `wwCOD`: Chemical oxygen demand of the wastewater. + - `wwTurb`: Turbidity of the wastewater. + - `wwOPhos`: Ortho-phosphate concentration. + - `wwNH4N`: Ammonium nitrogen concentration, as N. + - `wwTN`: Total nitrogen concentration, as N. + - `wwpH`: pH of the wastewater. + - `wwCond`: Conductivity of the wastewater. - - `single`: This value is not an aggregate measurement in any way (ie. not a `mean`, `median`, `max` or any other) and can be a replicate value. - - `mean`: Arithmetic mean - - `meanNr`: Arithmetic mean, normalized - - `geoMn`: Geometric mean - - `geoMnNr`: Geometric mean, normalized - - `median`: Median - - `min`: Lowest value in a range of values - - `max`: Highest value in a range of values - - `sd`: Standard deviation - - `sdNr`: Standard deviation, normalized - - `other`: Other aggregation method. Add description to `aggregationOther` +- **typeOther**: [string] Description of the measurement in case it is not listed in type. -- **aggregationOther**: Description for other type of aggregation not listed in `aggregation`. -- **aggregationDesc**: Information on OR reference to which measurements that were included to calculate the aggregated measurement that is being reported. +- **typeDescription**: [string] Additional information on the performed measurement. -- **value**: The actual value that is being reported for this measurement. -- **unit**: The engineering unit of the measurement. +- **aggregation**: [category] When reporting an aggregate measurement, this field describes the method used. + - `single`: This value is not an aggregate measurement in any way (ie. not a mean, median, max or any other) and can be a replicate value. + - `mean`: Arithmetic mean + - `meanNr`: Arithmetic mean, normalized + - `geoMn`: Geometric mean + - `geoMnNr`: Geometric mean, normalized + - `median`: Median + - `min`: Lowest value in a range of values + - `max`: Highest value in a range of values + - `sd`: Standard deviation + - `sdNr`: Standard deviation, normalized + - `other`: Other aggregation method. Add description to aggregationOther -- **qualityFlag**: Does the reporter suspect quality issues with the value of this measurement? +- **aggregationOther**: [string] Description for other type of aggregation not listed in aggregation. -- **accessToPublic**: If this is 'no', this data will not be available to the public. If missing, data will be available to the public. -- **accessToAllOrgs**: If this is 'no', this data will not be available to any partner organization. If missing, data will be available to the all organizations. +- **aggregationDesc**: [string] Information on OR reference to which measurements that were included to calculate the aggregated measurement that is being reported. -- **accessToSelf**: If this is 'no', this data will not be shown on the portal when this reporter logs in. If missing, data will be available to this reporter. -- **accessToPHAC**: If this is 'no', the data will not be available to employees of the Public Health Agency of Canada - PHAC. If missing, data will be available to employees of the Public Health Agency of Canada - PHAC. +- **value**: [float] The actual value that is being reported for this measurement. -- **accessToLocalHA**: If this is 'no', data will not be available to local health authorities. If missing, data will be available to local health authorities. -- **accessToProvHA**: If this is 'no', this data will not be available to provincial health authorities. If missing, data will be available to provincial health authorities. +- **unit**: [string] The engineering unit of the measurement. -- **accessToOtherProv**: If this is 'no', this data will not be available to other data providers not listed before. If missing, data will be available to other data providers not listed before -- **accessToDetails**: More details on the existing confidentiality requirements of this measurement. +- **qualityFlag**: [boolean] Does the reporter suspect quality issues with the value of this measurement -- **notes**: Any additional notes. + +- **accessToPublic**: [boolean] If this is 'no', this data will not be available to the public. If missing, data will be available to the public. + + +- **accessToAllOrgs**: [boolean] If this is 'no', this data will not be available to any partner organization. If missing, data will be available to the all organizations. + + +- **accessToSelf**: [boolean] If this is 'no', this data will not be shown on the portal when this reporter logs in. If missing, data will be available to this reporter. + + +- **accessToPHAC**: [boolean] If this is 'no', the data will not be available to employees of the Public Health Agency of Canada - PHAC. If missing, data will be available to employees of the Public Health Agency of Canada - PHAC. + + +- **accessToLocalHA**: [boolean] If this is 'no', data will not be available to local health authorities. If missing, data will be available to local health authorities. + + +- **accessToProvHA**: [boolean] If this is 'no', this data will not be available to provincial health authorities. If missing, data will be available to provincial health authorities. + + +- **accessToOtherProv**: [boolean] If this is 'no', this data will not be available to other data providers not listed before. If missing, data will be available to other data providers not listed before. + + +- **accessToDetails**: [boolean] More details on the existing confidentiality requirements of this measurement. + + +- **notes**: [string] Any additional notes. ## Reporter The individual or organization that is reporting and responsible for the quality of the data. -- **reporterID**: (Primary Key) Unique identifier for the person or organization that is reporting the data. +- **reporterID**: (Primary Key) [string] Unique identifier for the person or organization that is reporting the data. -- **siteIDDefault**: (Foreign Key) Used as default when a new sample is created by this reporter. See `ID` in `Site` table. -- **labIDDefault**: (Foreign Key) Used as default when a new sample is created by this reporter. See `ID` in `Lab` table. +- **siteIDDefault**: (Foreign key) [string] Used as default when a new sample is created by this reporter. See ID in Site table. -- **contactName**: Full Name of the reporter, either an organization or individual. -- **contactEmail**: Contact e-mail address. +- **labIDDefault**: (Foreign key) [string] Used as default when a new sample is created by this reporter. See ID in Lab table. -- **contactPhone**: Contact phone number. -- **notes**: Any additional notes. +- **contactName**: [string] Full Name of the reporter, either an organization or individual. -## Lab -Laboratory that performs SARS-CoV-2 wastewater testing at one or more sites. +- **contactEmail**: [string] Contact e-mail address. + -- **labID**: (Primary key) Unique identifier for the laboratory. +- **contactPhone**: [string] Contact phone number. -- **assayMethodIDDefault**: (Foreign key) Used as default when a new measurement is created for this lab. See `ID` in `AssayMethod` table. -- **name**: Name corresponding to lab. +- **notes**: [string] Any additional notes. -- **contactName**: Contact person or group, for the lab. +## Lab -- **contactEmail**: Contact e-mail address, for the lab. +Laboratory that performs SARS-CoV-2 wastewater testing at one or more sites. -- **contactPhone**: Contact phone number, for the lab. +- **labID**: (Primary Key) [string] Unique identifier for the laboratory. -- **updateDate**: Date information was provided or updated. -## AssayMethod +- **assayMethodIDDefault**: (Foreign key) [string] Used as default when a new measurement is created for this lab. See ID in AssayMethod table. -The assay method that was used to perform testing. Create a new record if there are changes (improvements) to an existing assay method. Keep the same `ID` and use an updated `version`. A new record for a new version can include only the fields that changed, however, we recommend duplicating existing fields to allow each record to clearly describe all steps. Add a current `date` when recording a new version to an assay. -- **assayMethodID**: (Primary key) Unique identifier for the assay method. +- **name**: [string] Name corresponding to lab. -- **instrumentID**: (Foreign Key) Links with the `Instrument` table to describe instruments used for the measurement. -- **name**: Name of the assay method. +- **contactName**: [string] Contact person or group, for the lab. -- **version**: Version of the assay. [Semantic versioning](https://semver.org) is recommended. -- **summary**: Short description of the assay and how it is different from the other assay methods. +- **contactEmail**: [string] Contact e-mail address, for the lab. -- **referenceLink**: Link to standard operating procedure. -- **date**: Date on which the assayMethod was created or updated (for version update). +- **contactPhone**: [string] Contact phone number, for the lab. -- **aliasID**: ID of an assay that is the same or similar. *a comma separated list*. -- **sampleSizeL**: Size of the sample that is analyzed in liters. +- **updateDate**: [date] date information was provided or updated. -- **loq**: Limit of quantification (LOQ) for this method if one exists. +## AssayMethod -- **lod**: Limit of detection (LOD) for this method if one exists. +The assay method that was used to perform testing. Create a new record if there are changes (improvements) to an existing assay method. Keep the same ID and use an updated version. A new record for a new version can include only the fields that changed, however, we recommend duplicating existing fields to allow each record to clearly describe all steps. Add a current date when recording a new version to an assay. -- **unit**: Unit used by this method, and applicable to the LOD and LOQ. +- **assayMethodID**: (Primary Key) [string] Unique identifier for the assay method. - - `gcPMMoV`: Gene copies per copy of PMMoV. - - `gcMl`: Gene copies per milliliter. - - `gcGms`: Gene copies per gram solids. - - `gcL`: Gene copies per liter. - - `gcCrA`: Gene copies per copy of crAssphage. - - `other`: Other measurement of viral copies. Add description to `unitOther`. -- **unitOther**: Unit used by this method, that are applicable to the LOD and LOQ. +- **instrumentID**: (Foreign key) [string] Links with the Instrument table to describe instruments used for the measurement. -- **methodConc**: Description of the method used to concentrate the sample -- **methodExtract**: Description of the method used to extract the sample +- **name**: [string] Name of the assay method. -- **methodPcr**: Description of the PCR method used -- **qualityAssQC**: Description of the quality control steps taken +- **version**: [string] Version of the assay. Semantic versioning is recommended. -- **inhibition**: Description of the inhibition parameters. -- **surrogateRecovery**: Description of the surrogate recovery for this method. +- **summary**: [string] Short description of the assay and how it is different from the other assay methods. -## Instrument -Instruments that are used for measures in `WWMeasure` and `SiteMeasure`. The assay method for viral measurement are described in `AssayMethod`. +- **referenceLink**: [string] Link to standard operating procedure. -- **instrumentID**: (Primary key) Unique identifier for the assay method. -- **name**: Name of the instrument used to perform the measurement. +- **date**: [date] date on which the assayMethod was created or updated (for version update). -- **model** Model number or version of the instrument. -- **description** Description of the instrument. +- **aliasID**: [string] ID of an assay that is the same or similar. a comma separated list. -- **alias**: ID of an assay that is the same or similar. A comma separated list. -- **referenceLink**: Link to reference for the instrument. +- **sampleSizeL**: [float] Size of the sample that is analyzed in liters. -- **type**: Type of instrument used to perform the measurement. - - `online`: An online sensor - - `lab`: Offline laboratory analysis - - `hand`: A handheld measurement analyzer. - - `atline`: An atline analyzer with sampler. - - `other:` An other type of measurement instrument. Add description to instrumentTypeOther. +- **loq**: [float] Limit of quantification (LOQ) for this method if one exists. -- **typeOther**: Description of the instrument in case it is not listed in instrumentType. -## Polygon +- **lod**: [float] Limit of detection (LOD) for this method if one exists. -A simple polygon that encloses an area on the surface of the earth, normally these polygons will either be of a sewer catchment area or of a health region or other reporting area. -- **polygonID**: (Primary key) Unique identifier for the polygon. +- **unit**: [category] Unit used by this method, and applicable to the LOD and LOQ. + - `gcPMMoV`: Gene copies per copy of PMMoV. + - `gcMl`: Gene copies per milliliter. + - `gcGms`: Gene copies per gram solids. + - `gcL`: Gene copies per liter. + - `gcCrA`: Gene copies per copy of crAssphage. + - `other`: Other measurement of viral copies. Add description to unitOther. -- **name**: Descriptive name of the polygon. +- **unitOther**: [string] Unit used by this method, that are applicable to the LOD and LOQ. -- **pop**: Approximate population size of people living inside the polygon. -- **type**: Type of polygon. +- **methodConc**: [string] Description of the method used to concentrate the sample - - `swrCat`: Sewer catchment area. - - `hlthReg`: Health region served by the sewer network -- **wkt**: [well known text](https://en.wikipedia.org/wiki/Well-known_text_representation_of_geometry) of the polygon +- **methodExtract**: [string] Description of the method used to extract the sample -- **file**: File containing the geometry of the polygon, BLOB format. -- **link**: Link to an external reference that describes the geometry of the polygon. +- **methodPcr**: [string] Description of the PCR method used -## CovidPublicHealthData -Covid-19 patient data for a specified polygon. +- **qualityAssQC**: [string] Description of the quality control steps taken + -- **cphdID**: (Primary key) Unique identifier for the table. +- **inhibition**: [string] Description of the inhibition parameters. -- **reporterID**: (Foreign key) ID of the reporter who gave this data. -- **polygonID**: (Foreign key) Links with the `Polygon` table. +- **surrogateRecovery**: [string] Description of the surrogate recovery for this method. -- **date**: Date of reporting for covid-19 measure. +## Instrument -- **type**: Type of covid-19 patient data. +Instruments that are used for measures in WWMeasure and SiteMeasure. The assay method for viral measurement are described in AssayMethod. - - `conf`: Number of confirmed cases. This measure should be accompanied by `dateType`. - - `active`: Number of active cases. - - `test`: Number of tests performed. - - `posTest`: Number of positive tests. - - `pPosRt`: Percent positivity rate. - - `hospCen`: Hospital census or the number of people admitted with covid-19. - - `hospAdm`: Hospital admissions or patients newly admitted to hospital. +- **instrumentID**: (Primary Key) [string] Unique identifier for the assay method. -- **dateType**: Type of date used for `conf` cases. Typically `report` or `episode` are reported. `onset` and `test` date is not usually reported within aggregate data. - - `episode` : Episode date is the earliest of onset, test or reported date. - - `onset`: Earliest that symptoms were reported for this case. This data is often not known and reported. In lieu, `episode` is used. - - `report`: Date that the numbers were reported publicly. Typically, `reported` data and this measure is most commonly reported and used. - - `test`: Date that the covid-19 test was performed. +- **name**: [string] Name of the instrument used to perform the measurement. -- **value**: The numeric value that is being reported. -- **notes**: Any additional notes. +- **model**: [string] Model number or version of the instrument. -## Lookup -Used for lookup values of all category based columns +- **description**: [string] Description of the instrument. -- **tableName**: Name of the Table -- **columnName**: Name for the column +- **alias**: [string] ID of an assay that is the same or similar. A comma separated list. -- **value**: Name of the value -- **description**: Name of the description +- **referenceLink**: [string] Link to reference for the instrument. -## Naming conventions -- **Table names**: Table names use UpperCamelCase. +- **type**: [category] Type of instrument used to perform the measurement. + - `online`: An online sensor + - `lab`: Offline laboratory analysis + - `hand`: A handheld measurement analyzer. + - `atline`: An atline analyzer with sampler. + - `other`: An other type of measurement instrument. Add description to instrumentTypeOther. -- **Variable and category names**: Both variables and variable categories use lowerCamelCase. Do not use special characters (only uppercase, lowercase letters and numbers). Reason: variable and category names can be combined to generate derived variables. Using special characters will generate non-allowable characters - see below. Category names a maximum of 7 characters to allow concatenation of four categories into a single variaable to comply with ArcGIS 31 character maximum for variable names. +- **typeOther**: [string] Description of the instrument in case it is not listed in instrumentType. + +## Polygon -- **Variables in wide tables**: Wide tables use `_` to concatenate variables from long tables. +A simple polygon that encloses an area on the surface of the earth, normally these polygons will either be of a sewer catchment area or of a health region or other reporting area. -- **Variable order** If a multiple measurement take place on different dates this has a natural form in the long table format, however in the pivot wider format this can be ambiguous. In this case, show a `reportDate` followed by a series of measurements taken on that date (e.g. `covN1_PPMV_mean`) followed by more measurements (e.g. `covN2_PPMV_mean`) +- **polygonID**: (Primary Key) [string] Unique identifier for the polygon. -- **Merging tables** : Merging tables into a wide table requires additional steps when a variable does not have an unique name (when the variable name appears in more than one table). For example, variables such as `dateTime`, `notes`, `description`, `type`, `version` and `ID` variables such as `sampleID` are used in several tables. Use the following approach: - - Variable that are not unique (they are in more than one table): add the table name to the variable by concatenate column names with `_`. e.g. `dateTime` from the `Sample` table becomes `Sample_dateTime`. - - Variable that are unique (they are in only one table in the entire OMD). No variable name changes are needed. +- **name**: [string] Descriptive name of the polygon. -- **Derived, summary or transformed measure**: These measures are generated to summarize or transform one or more variables. Naming convention follows the same approach as naming variable and category names, except use a `_` when concatenating variable or category names. Examples of a derived measure is the calculation of a mean value of one or more SARS-CoV-2 regions. Normalization and standardization are other examples of a transformed measure. Typically derived, summary or transformed measures are not reported, rather the preferred reporting approach is reporting the underlying individual measures. -- **Date time**: YYYY-MM-DD HH:mm:ss (24 hour format, in UTC) +- **pop**: [integer] Approximate population size of people living inside the polygon. -- **Location**: [well known text](https://en.wikipedia.org/wiki/Well-known_text_representation_of_geometry) for polygon. -- **Version**: [Semantic versioning](https://semver.org) +- **type**: [category] Type of polygon. + - `swrCat`: Sewer catchment area. + - `hlthReg`: Health region served by the sewer network -## Examples of how to generate wide variable and category names +- **wkt**: [string] well known text of the polygon -### 1) Simple viral region report -A long table would represent viral measures of: +- **file**: [blob] File containing the geometry of the polygon, blob format. -``` {.markdown} -date = 2021-01-15 -type = covN1 -unit = nPMMoV -aggregation = mean -value = 40 -``` -``` {.markdown} -date = 2021-01-15 -type = covN2 -unit = nPMMoV -aggregation = mean -value = 42 -``` +- **link**: [string] Link to an external reference that describes the geometry of the polygon. -In a long table as: +## CovidPublicHealthData -| date | type | unit | aggregation | value | -|------------|-------|--------|-------------|-------| -| 2021-01-15 | covN1 | nPPMoV | mean | 40 | -| 2021-01-15 | covN2 | nPPMoV | mean | 42 | +Covid-19 patient data for a specified polygon. -A wide table would represent the same measurement as: +- **cphdID**: (Primary Key) [string] Unique identifier for the table. -``` {.markdown} - covidN1_PPMV_mean = 40 - covidN2_PPMV_mean = 42 -``` -In a wide table as: +- **reporterID**: (Foreign key) [string] ID of the reporter who gave this data. -| date | covN1_nPPMoV_mean | covN2_nPPMoV_mean | -|------------|-------------------|-------------------| -| 2021-01-15 | 40 | 42 | -### 2) Derived measure +- **polygonID**: (Foreign key) [string] Links with the Polygon table. -To report a mean value of existing covidN1 and covidN2 measures: -``` {.markdown} - date = 2021-01-15 - type = covN1 - unit = ml - aggregation = mean - value = 42 -``` +- **date**: [string] date of reporting for covid-19 measure. -``` {.markdown} - date = 2021-01-15 - type = covN2 - unit = ml - aggregation = mean - value = 40 -``` -Represent the derived measure as: +- **type**: [category] Type of covid-19 patient data. + - `conf`: Number of confirmed cases. This measure should be accompanied by dateType. + - `active`: Number of active cases. + - `test`: Number of tests performed. + - `posTest`: Number of positive tests. + - `pPosRt`: Percent positivity rate. + - `hospCen`: Hospital census or the number of people admitted with covid-19. + - `hospAdm`: Hospital admissions or patients newly admitted to hospital. -long table format +- **dateType**: [category] Type of date used for conf cases. Typically report or episode are reported. onset and test date is not usually reported within aggregate data. + - `episode`: Episode date is the earliest of onset, test or reported date. + - `onset`: Earliest that symptoms were reported for this case. This data is often not known and reported. In lieu, episode is used. + - `report`: Date that the numbers were reported publicly. Typically, reported data and this measure is most commonly reported and used. + - `test`: Date that the covid-19 test was performed. -``` {.markdown} - date = 2021-01-15 - type = covN1covN2 - unit = ml - aggreation = mean - value = 41 -``` +- **value**: [float] The numeric value that is being reported. -| date | type | unit | aggregation | value | -|------------|------------|------|-------------|-------| -| 2021-01-15 | covN1covN2 | ml | mean | 41 | -or, wide table format +- **notes**: [string] Any additional notes. -``` {.markdown} - date = 2021-01-15 - covN1covN2_ml_mean = 41 -``` +## Lookup -- Viral SARS-CoV-2 copies per reference copies. +Used for lookup values of all category based columns -### 3) Transformed measure +- **tableName**: [string] Name of the Table -To report mean viral copies of mean value N1 and N2 per viral copies of PMMoV: -Represent the derived measure as: +- **columnName**: [string] Name for the column -long table description -``` {.markdown} - date = 2021-01-15 - covN1covN2 = 2 - unit = PPMV - type = meanNr -``` +- **value**: [string] Name of the value -or, -wide table format +- **description**: [string] Name of the description -``` {.markdown} - covidN1covidN2_PPMV_meanNr = 2 -``` diff --git a/src/generate_db_generations_sql.R b/src/generate_db_generations_sql.R index 37a90b6d..80e8f7b8 100644 --- a/src/generate_db_generations_sql.R +++ b/src/generate_db_generations_sql.R @@ -8,6 +8,7 @@ library(glue) ######################### # default location for the DB creation file wbe_CREATE_TABLES_SQL_FN <- file.path("src", "wbe_create_tables.sql") +wbe_META_DATA <- file.path("metadata.md") WBE_DEFAULT_FN <- db_fn <- file.path("data", "db" ,"WBE.db") @@ -29,6 +30,85 @@ wbe_create_tables <- function(base_tbl, base_var, variableCat){ } + + +wbe_metadata_generation <- function(){ + tbls <- read_csv(file.path(curr_wd, "Tables.csv")) + variables <- read_csv(file.path(curr_wd, "Variables.csv")) + variableCat <- read_csv(file.path(curr_wd, "VariableCategory.csv")) + + md_str <- + tbls$tableName %>% + unique() %>% + lapply(function(curr_tbl){ + + tbldesc <- + tbls %>% + filter(tableName == curr_tbl) %>% + pull(tableDesc) + + cols <- + variables %>% + filter(tableName == curr_tbl) %>% + pull(variableName) + + md_cols <- + cols %>% + lapply(function(curr_col){ + cur_col_details <- + variables %>% + filter(tableName == curr_tbl & + variableName == curr_col) + + key_str <- if(is.na(cur_col_details$key) | nchar(cur_col_details$key) == 0){""}else{glue(" ({cur_col_details$key})")} + + type_str <- if(is.na(cur_col_details$variableType) | nchar(cur_col_details$variableType) == 0){""}else{glue(" [{cur_col_details$variableType}]")} + + vals_str <- + if(cur_col_details$variableType == "category"){ + + cats_det <- + variableCat %>% + filter(tableName == curr_tbl & + variableName == curr_col) + + + cats_det$variableValue %>% unique() %>% + lapply(function(curr_val){ + curdesc <- + cats_det %>% + filter(variableValue == curr_val) %>% + pull(desc) + glue("\t-\t`{curr_val}`: {curdesc}") + }) %>% paste0(collapse = "\n") + + } else{""} + + glue("-\t**{curr_col}**:{key_str}{type_str} {cur_col_details$variableDesc}\n{vals_str}") + }) + + md_cols_all <- md_cols %>% paste0(collapse = "\n\n") + glue("## {curr_tbl}\n\n{tbldesc}\n\n{md_cols_all}") + }) + md_str %>% paste0(collapse = "\n") +} + + +######################################## +#' +#' Writes the SQL DB creation to a *.sql file +#' +#' +wbe_metadata_write <- function(full_fn = wbe_META_DATA, ...){ + md_str <- wbe_metadata_generation(...) + + fileConn<-file(full_fn) + writeLines(c(md_str), fileConn) + close(fileConn) +} + + + ############################################ #' #' generates SQL string that will create a database.