Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Deep nested map type configuration issue in text_embedding processor #686

Open
zane-neo opened this issue Apr 11, 2024 · 6 comments
Open
Assignees
Labels
bug Something isn't working

Comments

@zane-neo
Copy link
Collaborator

What is the bug?

When configured with deep nested map type configuration in text_embedding processor, the embedding result will override the original value of document key:
pipeline configuration:

{
  "description": "An example neural search pipeline",
  "processors": [
    {
      "text_embedding": {
        "model_id": "qhO5xY4BYwgbtrHt7KDf",
        "field_map": {
          "category": {
            "name": {
              "en": "category_name_vector"
            }
          }
        }
      }
    }
  ]
}

And simulate the pipeline processor:

{
    "docs": [
        {
            "_index": "neural-search-index-v2",
            "_id": "1",
            "_source": {
                "category": [
                    {
                        "name": {
                            "en": "this is a name"
                        }
                    },
                    {
                        "name": {
                            "en": "hello world"
                        }
                    }
                ]
            }
        }
    ]
}

Result:

{
    "docs": [
        {
            "doc": {
                "_index": "neural-search-index-v2",
                "_id": "1",
                "_source": {
                    "category": [
                        {
                            "name": [
                                -0.10758455,
                                0.07971476,
                                -0.04948872,
                                ...
                            ]
                        },
                        {
                            "name": [
                                -0.034477253,
                                0.031023245,
                                0.006734962,
                                ...
                            ]
                        }
                    ]
                },
                "_ingest": {
                    "timestamp": "2024-04-10T03:51:53.496385Z"
                }
            }
        }
    ]
}

Expected result:

{
    "docs": [
        {
            "doc": {
                "_index": "neural-search-index-v2",
                "_id": "1",
                "_source": {
                    "category": [
                        {
                            "name": {
                                "category_name_vector": [
                                    -0.10758455,
                                    0.07971476,
                                    -0.04948872,
                                    ...
                                ],
                                "en": "this is a name"
                            }
                            
                        },
                        {
                            "name": {
                                "name": [
                                    -0.034477253,
                                    0.031023245,
                                    0.006734962,
                                    ...
                                ],
                                "en": "hello world"
                            }
                        }
                    ]
                },
                "_ingest": {
                    "timestamp": "2024-04-10T03:51:53.496385Z"
                }
            }
        }
    ]
}

How can one reproduce the bug?

Steps to reproduce the behavior.

What is the expected behavior?

The generated embedding results should be placed in the right position of the document.

What is your host/environment?

Operating system, version.

Do you have any screenshots?

If applicable, add screenshots to help explain your problem.

Do you have any additional context?

Add any other context about the problem.

@krishy91
Copy link
Contributor

Hi, I'll look into this! Seems like a issue with nesting of depth of 2.

@zane-neo
Copy link
Collaborator Author

@krishy91 Since we're supporting list of map type, we don't want any limitation on this, e.g. supporting only depth of 2 or 3. We should consider to support deeply nested cases if possible.

@krishy91
Copy link
Contributor

krishy91 commented May 6, 2024

Could reproduce the issue. Will push the fix & additional integrration test for such deep nesting cases.

@naveentatikonda
Copy link
Member

Could reproduce the issue. Will push the fix & additional integrration test for such deep nesting cases.

@krishy91 Is there any update on the fix?

@jmazanec15
Copy link
Member

@zane-neo Is this still an issue? Can this be fixed?

@zane-neo
Copy link
Collaborator Author

zane-neo commented Oct 9, 2024

Yes, this is still an issue, it seems @krishy91 doesn't have bandwidth on this, I'll pick up this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

5 participants