-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
hxlquickimport
#6
Comments
Thanks to @CMedelR!!! Not only Ramírez have an research paper called Data mining for the study of the Epidemic (SARS- CoV-2) COVID-19: Algorithm for the identification of patients (SARS-CoV-2) COVID 19 in Mexico and his repository at https://github.com/CMedelR/dataCovid19 have an backup copy of the (at the moment) offline link at https://datos.gob.mx/busca/dataset/informacion-referente-a-casos-covid-19-en-mexico, but his paper explicitly mention the use of the Orange Data Mining! While his dataset will be used as additional test sample (the previous one was initially only the one from Albert Einstein Hospital on São Paulo), we're also adding his paper, since I'm very sure more people would like to find it later! |
… yet); hic sunt dracones (__) ) (..) /|\\ (o_o) / | \\ ___) \/,-|,-\| //,-/_\ ) ' ' (//,-'\| ( ( . \_ gnv `._\(___`. '---' _)/ `-'
The I think that at least for very basic CSV files, the fititnt@bravo:/workspace/git/EticaAI/HXL-Data-Science-file-formats$ hxlquickmeta tests/files/iris.csv
> Connection overview
>> TODO: implement raw connection, HTTP headers, etc
>> (this should output debug information even
>> for inputs that would break libhxl)
ERROR! libhxl and/or HXLmeta/HXLMetaExtras failed <HXLException: HXL tags not found in first 25 rows>
Ok. Trying harder now with HXLMetaExtras...
>> HXLMetaExtras: Pandas DataFrame
>>> DataFrame
sepallength sepalwidth petallength petalwidth class
0 5.1 3.5 1.4 0.2 Iris-setosa
1 4.9 3.0 1.4 0.2 Iris-setosa
2 4.7 3.2 1.3 0.2 Iris-setosa
3 4.6 3.1 1.5 0.2 Iris-setosa
4 5.0 3.6 1.4 0.2 Iris-setosa
.. ... ... ... ... ...
145 6.7 3.0 5.2 2.3 Iris-virginica
146 6.3 2.5 5.0 1.9 Iris-virginica
147 6.5 3.0 5.2 2.0 Iris-virginica
148 6.2 3.4 5.4 2.3 Iris-virginica
149 5.9 3.0 5.1 1.8 Iris-virginica
[150 rows x 5 columns]
>>> DataFrame.T
0 1 2 3 4 5 ... 144 145 146 147 148 149
sepallength 5.1 4.9 4.7 4.6 5.0 5.4 ... 6.7 6.7 6.3 6.5 6.2 5.9
sepalwidth 3.5 3.0 3.2 3.1 3.6 3.9 ... 3.3 3.0 2.5 3.0 3.4 3.0
petallength 1.4 1.4 1.3 1.5 1.4 1.7 ... 5.7 5.2 5.0 5.2 5.4 5.1
petalwidth 0.2 0.2 0.2 0.2 0.2 0.4 ... 2.5 2.3 1.9 2.0 2.3 1.8
class Iris-setosa Iris-setosa Iris-setosa Iris-setosa Iris-setosa Iris-setosa ... Iris-virginica Iris-virginica Iris-virginica Iris-virginica Iris-virginica Iris-virginica
[5 rows x 150 columns]
>>> DataFrame.describe
sepallength sepalwidth petallength petalwidth
count 150.000000 150.000000 150.000000 150.000000
mean 5.843333 3.054000 3.758667 1.198667
std 0.828066 0.433594 1.764420 0.763161
min 4.300000 2.000000 1.000000 0.100000
25% 5.100000 2.800000 1.600000 0.300000
50% 5.800000 3.000000 4.350000 1.300000
75% 6.400000 3.300000 5.100000 1.800000
max 7.900000 4.400000 6.900000 2.500000
>> HXLMetaExtras: Orange Data Mining
data.domain [sepallength, sepalwidth, petallength, petalwidth, class]
data.columns <Orange.data.table.Columns object at 0x7f416848cd30> |
My last comment can be ignored. Actually this may not need. As long as hxlquickmeta accept stdin (be piped) and all other tools work with pipes (the standard ones from HXLStandard works!) its not need at all implement this. So instead of this makes hxlquickmeta fails# Non HXLated file
hxlquickmeta tests/files/iris.csv
(...)
ERROR! libhxl and/or HXLmeta/HXLMetaExtras failed <HXLException: HXL tags not found in first 25 rows>
Ok. Trying harder now with HXLMetaExtras...
(...) This ones works (but not for complex Excel files)# Non HXLated file
hxlquickimport tests/files/iris.csv | hxlquickmeta
## (...)
> lihxl-python overview
>> output.output <_io.TextIOWrapper name='/tmp/tmphdplthem' mode='w' encoding='UTF-8'>
>> source <hxl.io.HXLReader object at 0x7fc33c008820>
> HXLMeta debuginfo
>> HXLMeta.text_headers None
>> HXLMeta.hxl_headers ['#item+sepallength', '#item+sepalwidth', '#item+petallength', '#item+petalwidth', '#item+class']
> get_hashtag_info [ #item+sepallength ] [ None ]
(...) Potential problem with
|
The If need, this issue could be re-opened, but the current version of Eventual point to be done (but not today)Without actually doing a full refactoring to use something like the hxlm.core (or more 'pythonic'), maybe the
With this, at least would be more intuitive to explain another strategy of how to use these tools (and then the Minimal documentation about how to use the command line tools #1 could be solved) |
Meta
Spreadsheet data
See EticaAI-Data_HXL-Data-Science-file-formats_hxlquickimport (https://docs.google.com/spreadsheets/d/1vFkBSharAEg5g5K2u_iDLCBvpWWPqpzC1hcL6QpFNZY/edit#gid=1097528220) for updated content. This is an snapshot.
The text was updated successfully, but these errors were encountered: