-
Notifications
You must be signed in to change notification settings - Fork 2
Open
Labels
data-transformationhttps://en.wikipedia.org/wiki/Data_transformationhttps://en.wikipedia.org/wiki/Data_transformation
Description
- TL;DR:
- This issue is just to have a simple place to put information related to use of SQL storages. It may or not have more than proof of concepts
- To add value (than merely over complicate an database dump to csv and an database import; that could be done by external tools and documentation) **any command line tool should at least read directly from one or more databases and allow create valid SQL file that could be used to import data again.
- If this becomes too hard, at least we could document scripts to convert CSV to SQL
- Interoperability (think plan taxonomies and how to save equivalent of HXL Hashtags as database column name) are the main objective, even if this mean just prepare documentation and cut performance features that could be used by external tools, not HXL using SQL database directly
- The direct implication of this (this is my guess, not tested) is that most hxl parser tools, not just the libhxl-python (either directly if implemented or if using as library on tools like here) still likely to not optimize the commands as conversion to SQL select equivalents and still have to work with temporary files (that could still be acceptable fast)
- In other words: hxl importers/exporters (in theory, not tested) could not break hard if you do not have memory (like some other data mining tools would fill your memory until your computer crash) but large datasets may be slow
- Please note that even if we compare HXL tools with some programs that can load data from databases, most of them are also optimized for files on disk, or even have to load the entire dataset on memory
- And some enterprise ones (if already are not expensive) seems to cost even an extra to allow work directly from database than their proprietary file format.
- But even if HXL tools or HXL-proxy could not be like super optimized for Gigabyte size database processing (like 50ms response time with very optimized index and selects) it could still be useful for who would use HXL to merge several datasets on a single place
- The direct implication of this (this is my guess, not tested) is that most hxl parser tools, not just the libhxl-python (either directly if implemented or if using as library on tools like here) still likely to not optimize the commands as conversion to SQL select equivalents and still have to work with temporary files (that could still be acceptable fast)
This issue is an draft. Some extra information may be edited/added later.
Metadata
Metadata
Assignees
Labels
data-transformationhttps://en.wikipedia.org/wiki/Data_transformationhttps://en.wikipedia.org/wiki/Data_transformation