-
Couldn't load subscription status.
- Fork 33
Open
Labels
Description
OVERALL VISION: To increase the utility and performance of the CKAN Datastore:
- by enriching resources, so that right after a file is pushed by DP+, it does a lot of data-wrangling tasks that are typically done manually:
- a lot of metadata is inferred, so the Data Publisher does not have to laboriously enter it in
- descriptive statistics are computed, allowing the Data Publisher and the end-user to better understand the resource
- location information is automatically normalized and geocoded
- related datasets/resources are automatically inferred
- auto-tagging
- by taking advantage of PostgreSQL native features
- also use it as a Document Database leveraging JSONB?
- partitioning/sharding?
- by tapping into the rich PostgreSQL extensions ecosystem (in particular - PostGIS, Timescale, Citus, CartoDB, Apache Age and ZomboDB)
- give it "Data Lake"-like capabilities
- enable Datastore API users to issue performant, reliable SQL queries
- Convert DP+ to a CKAN extension #98
- Advanced Data Dictionary #18
- Automatic deduplication #11
- Auto-tagging
- Automatic spatial extent calculation
- Automatic processing/recognition of whitelisted common column names (e.g. latitude, longitude, status, open date, closed date, etc.)
- Semi-automatic creation of indices based on cardinality of column values (exposed through Advanced Data Dictionary) #53
- Smart preview selection (e.g. preview first n rows; latest rows; sample) #47
- Scanning for Personally Identifiable Information #27
- Automatic CKAN Alias creation #9
- Auto partitioning
- Per resource Datapusher+ job configuration #60
- Deferred datapush on initial package creation to allow per package Datapusher+ Configuration
- Validation using
qsv schemaandvalidatecommands #87 - Optimized data type mapping to PostgreSQL data types (for speed/reduced storage/efficiency) #17
- Enabling record-level search
- Container image #8
- affiliated CKAN Service Provider jobs - "DataGroomers" that are meant to periodically groom datastore data #13
- Datapusher+ Management Console for Orgadmins/Sysadmins #54
- Create a proper package install #10
- Smart date inferencing #19
- Smart auto-indexing #30
- Native PostGIS support
- Native time-series support with Timescale
- Fast upsert mode #34
- Recoverable jobs #35
- Better downloading of resources #46