Python: SemanticTextMemory creates MemoryRecord with improper embedding type

**Describe the bug**
When using the Python implementation of the `SemanticTextMemory` and calling the `save_information_async` function the created `MemoryRecord` contains an embedding that is an array of arrays (functionally a list of vectors) that only contains 1 vector. This forces the need to decompose the list to get a single array (vector) which is all that you should have since only one piece of text is sent for embedding.

**To Reproduce**
Steps to reproduce the behavior:
1. Go to the semantic_text_memory.py file and place a breakpoint in the `save_information_async` functionon line 68 shown below
`await self._storage.upsert_async(collection_name=collection, record=data)`
2. Run the following test file in debug mode -
```
import asyncio
import semantic_kernel as sk
import semantic_kernel.connectors.ai.open_ai as sk_oai
import semantic_kernel.memory.volatile_memory_store as sk_mv


async def test():
    # Create a new kernel
    kernel = sk.Kernel()

    # Regiter openai embedding service
    api_key, org_id = sk.openai_settings_from_dot_env()
    kernel.add_text_embedding_generation_service(
        "ada", sk_oai.OpenAITextEmbedding("text-embedding-ada-002", api_key, org_id)
    )

    # Create a new memory store
    kernel.register_memory_store(sk_mv.VolatileMemoryStore())

    await kernel.memory.save_information_async("test", id="test1", text="hello world")


if __name__ == "__main__":
    asyncio.run(test())

```
4. Inspect the embedding property of the created `MemoryRecord`
5. See in that the embedding is an array of arrays (Note the same behaviour occurs in `save_reference_async` and `search_async` when creating the embedding)

**Expected behavior**
The embedding passed on to the memory store should be a single array (a vector) instead of an array of arrays (array of vectors) since these functions intend on working against a single input (thus will only create 1 embedding) and call down memory store functions that are also intended to work against a single input embedding. 

**Screenshots**
If applicable, add screenshots to help explain your problem.
![image](https://github.com/microsoft/semantic-kernel/assets/7377200/26d8fa54-d2a7-42b5-a940-a6470d390da5)


**Desktop (please complete the following information):**
 - OS: Windows 11
 - IDE: VSCode-Insiders
 - NuGet Package Version: N/A 
 - Python Package/Branch: SK/Main

**Additional context**
Just as a note, I believe that the generation of a list from the `OpenAITextEmbedding` is valid as the service supports batching, it is just in the case of these functions they do not decompose the top level list.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Python: SemanticTextMemory creates MemoryRecord with improper embedding type #1815

cschadewitz
openedon Jul 3, 2023

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Python: SemanticTextMemory creates MemoryRecord with improper embedding type #1815

Description

cschadewitzopenedon Jul 3, 2023

Metadata