Skip to content

[BUG] sequence[text] column raises InvalidColumnValueError on valid append of list[str] #3061

Open
@saamc

Description

@saamc

Severity

P0 - Critical breaking issue or missing functionality

Current Behavior

In a dataset with the following summary

print(ds.summary())
Dataset length: 0
Columns:
  text: sequence[text]


None

it is not possible to append a record of this kind (wrapped in a list)

{"text": ["Tokenized", "text"]}

nor

{"text": [["Tokenized", "text"]]

The result is the following message:

deeplake._deeplake.InvalidColumnValueError: Invalid value for column 'text'. Reason - 'Data of type Text must have '0' dimensions provided '1''

Steps to Reproduce

cat <<EOF >/tmp/gh_issue.py 
> import deeplake

ds = deeplake.create("mem://temp")
ds.add_column("text", dtype=deeplake.types.Sequence(deeplake.types.Text()))

# This should work, but fails
ds.append([{"text": ["Tokenized", "text"]}])
> EOF

python /tmp/gh_issue.py 
Traceback (most recent call last):
  File "/tmp/gh_issue.py", line 7, in <module>
    ds.append([{"text": ["Tokenized", "text"]}])
deeplake._deeplake.InvalidColumnValueError: Invalid value for column 'text'. Reason - 'Data of type Text must have '0' dimensions provided '1''

Expected/Desired Behavior

Desired behaviour is that a list[str] can be passed as an input for a sequence[text].

Python Version

python 3.12.0 hab00c5b_0_cpython conda-forge

OS

Ubuntu 24.04.2 LTS

IDE

VS-Code

Packages

deeplake==4.2.14 numpy==2.3.1 pip==25.1.1 setuptools==80.9.0 wheel==0.45.1

Additional Context

[BUG] #912 suggests that sequence[text] is something that should work.

Possible Solution

No response

Are you willing to submit a PR?

  • I'm willing to submit a PR (Thank you!)

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions