-
Notifications
You must be signed in to change notification settings - Fork 128
Labels
bugSomething isn't workingSomething isn't working
Description
Description
Hi I'm Quentin from HF :) I wanted to play with datachain after #375 by @dberenbaum but I'm getting this error:
from datachain import DataChain
DataChain.from_csv("hf://datasets/infinite-dataset-hub/MobilePlanAssistant/data.csv").show()
FileNotFoundError Traceback (most recent call last)
[<ipython-input-2-1e396698d13d>](https://localhost:8080/#) in <cell line: 3>()
1 from datachain import DataChain
2
----> 3 DataChain.from_csv("hf://datasets/infinite-dataset-hub/MobilePlanAssistant/data.csv").show()
5 frames
[/usr/local/lib/python3.10/dist-packages/datachain/lib/dc.py](https://localhost:8080/#) in from_csv(cls, path, delimiter, header, output, object_name, model_name, source, nrows, session, settings, column_types, **kwargs)
1860 convert_options=convert_options,
1861 )
-> 1862 return chain.parse_tabular(
1863 output=output,
1864 object_name=object_name,
[/usr/local/lib/python3.10/dist-packages/datachain/lib/dc.py](https://localhost:8080/#) in parse_tabular(self, output, object_name, model_name, source, nrows, **kwargs)
1743 if col_names or not output:
1744 try:
-> 1745 schema = infer_schema(self, **kwargs)
1746 output = schema_to_output(schema, col_names)
1747 except ValueError as e:
[/usr/local/lib/python3.10/dist-packages/datachain/lib/arrow.py](https://localhost:8080/#) in infer_schema(chain, **kwargs)
112 schemas = []
113 for file in chain.collect("file"):
--> 114 ds = dataset(file.get_path(), filesystem=file.get_fs(), **kwargs) # type: ignore[union-attr]
115 schemas.append(ds.schema)
116 return pa.unify_schemas(schemas)
[/usr/local/lib/python3.10/dist-packages/pyarrow/dataset.py](https://localhost:8080/#) in dataset(source, schema, format, filesystem, partitioning, partition_base_dir, exclude_invalid_files, ignore_prefixes)
792
793 if _is_path_like(source):
--> 794 return _filesystem_dataset(source, **kwargs)
795 elif isinstance(source, (tuple, list)):
796 if all(_is_path_like(elem) or isinstance(elem, FileInfo) for elem in source):
[/usr/local/lib/python3.10/dist-packages/pyarrow/dataset.py](https://localhost:8080/#) in _filesystem_dataset(source, schema, filesystem, partitioning, format, partition_base_dir, exclude_invalid_files, selector_ignore_prefixes)
474 fs, paths_or_selector = _ensure_multiple_sources(source, filesystem)
475 else:
--> 476 fs, paths_or_selector = _ensure_single_source(source, filesystem)
477
478 options = FileSystemFactoryOptions(
[/usr/local/lib/python3.10/dist-packages/pyarrow/dataset.py](https://localhost:8080/#) in _ensure_single_source(path, filesystem)
439 paths_or_selector = [path]
440 else:
--> 441 raise FileNotFoundError(path)
442
443 return filesystem, paths_or_selector
FileNotFoundError: /infinite-dataset-hub/MobilePlanAssistant/data.csv
It looks like _ensure_single_source
incorrectly uses a LocalFileSystem instead of the HfFileSystem
The same path works from pandas via fsspec:
>>> import pandas as pd
>>> df = pd.read_csv("hf://datasets/infinite-dataset-hub/MobilePlanAssistant/data.csv")
>>> df.head()
idx user_input \
0 0 Hi, I'm looking for a mobile plan.
1 1 I need unlimited data and international calling.
2 2 I want at least 10GB of data per month.
3 3 That's too expensive, do you have anything che...
4 4 I'm allergic to cats, will this affect my plan?
bot_response labels
0 Hello! I'd be happy to help you find the best ... Greeting
1 Great, do you have a preferred data limit and ... Data Inquiry
2 I found a plan with unlimited data and interna... Plan Suggestion
3 I found another plan with 8GB of data and inte... Price Comparison
4 I'm sorry, but my abilities are focused on mob... Unexpected Topic
Version Info
0.6.3
Python 3.10.12
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working