Skip to content

Python: SemanticTextMemory creates MemoryRecord with improper embedding type #1815

Closed

Description

Describe the bug
When using the Python implementation of the SemanticTextMemory and calling the save_information_async function the created MemoryRecord contains an embedding that is an array of arrays (functionally a list of vectors) that only contains 1 vector. This forces the need to decompose the list to get a single array (vector) which is all that you should have since only one piece of text is sent for embedding.

To Reproduce
Steps to reproduce the behavior:

  1. Go to the semantic_text_memory.py file and place a breakpoint in the save_information_async functionon line 68 shown below
    await self._storage.upsert_async(collection_name=collection, record=data)
  2. Run the following test file in debug mode -
import asyncio
import semantic_kernel as sk
import semantic_kernel.connectors.ai.open_ai as sk_oai
import semantic_kernel.memory.volatile_memory_store as sk_mv


async def test():
    # Create a new kernel
    kernel = sk.Kernel()

    # Regiter openai embedding service
    api_key, org_id = sk.openai_settings_from_dot_env()
    kernel.add_text_embedding_generation_service(
        "ada", sk_oai.OpenAITextEmbedding("text-embedding-ada-002", api_key, org_id)
    )

    # Create a new memory store
    kernel.register_memory_store(sk_mv.VolatileMemoryStore())

    await kernel.memory.save_information_async("test", id="test1", text="hello world")


if __name__ == "__main__":
    asyncio.run(test())

  1. Inspect the embedding property of the created MemoryRecord
  2. See in that the embedding is an array of arrays (Note the same behaviour occurs in save_reference_async and search_async when creating the embedding)

Expected behavior
The embedding passed on to the memory store should be a single array (a vector) instead of an array of arrays (array of vectors) since these functions intend on working against a single input (thus will only create 1 embedding) and call down memory store functions that are also intended to work against a single input embedding.

Screenshots
If applicable, add screenshots to help explain your problem.
image

Desktop (please complete the following information):

  • OS: Windows 11
  • IDE: VSCode-Insiders
  • NuGet Package Version: N/A
  • Python Package/Branch: SK/Main

Additional context
Just as a note, I believe that the generation of a list from the OpenAITextEmbedding is valid as the service supports batching, it is just in the case of these functions they do not decompose the top level list.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

Labels

pythonPull requests for the Python Semantic Kernel

Type

No type

Projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions