Open
Description
Severity
P0 - Critical breaking issue or missing functionality
Current Behavior
In a dataset with the following summary
print(ds.summary())
Dataset length: 0
Columns:
text: sequence[text]
None
it is not possible to append a record of this kind (wrapped in a list)
{"text": ["Tokenized", "text"]}
nor
{"text": [["Tokenized", "text"]]
The result is the following message:
deeplake._deeplake.InvalidColumnValueError: Invalid value for column 'text'. Reason - 'Data of type Text must have '0' dimensions provided '1''
Steps to Reproduce
cat <<EOF >/tmp/gh_issue.py
> import deeplake
ds = deeplake.create("mem://temp")
ds.add_column("text", dtype=deeplake.types.Sequence(deeplake.types.Text()))
# This should work, but fails
ds.append([{"text": ["Tokenized", "text"]}])
> EOF
python /tmp/gh_issue.py
Traceback (most recent call last):
File "/tmp/gh_issue.py", line 7, in <module>
ds.append([{"text": ["Tokenized", "text"]}])
deeplake._deeplake.InvalidColumnValueError: Invalid value for column 'text'. Reason - 'Data of type Text must have '0' dimensions provided '1''
Expected/Desired Behavior
Desired behaviour is that a list[str]
can be passed as an input for a sequence[text]
.
Python Version
python 3.12.0 hab00c5b_0_cpython conda-forge
OS
Ubuntu 24.04.2 LTS
IDE
VS-Code
Packages
deeplake==4.2.14 numpy==2.3.1 pip==25.1.1 setuptools==80.9.0 wheel==0.45.1
Additional Context
[BUG] #912 suggests that sequence[text] is something that should work.
Possible Solution
No response
Are you willing to submit a PR?
- I'm willing to submit a PR (Thank you!)