Description
Describe the bug
When using the Python implementation of the SemanticTextMemory
and calling the save_information_async
function the created MemoryRecord
contains an embedding that is an array of arrays (functionally a list of vectors) that only contains 1 vector. This forces the need to decompose the list to get a single array (vector) which is all that you should have since only one piece of text is sent for embedding.
To Reproduce
Steps to reproduce the behavior:
- Go to the semantic_text_memory.py file and place a breakpoint in the
save_information_async
functionon line 68 shown below
await self._storage.upsert_async(collection_name=collection, record=data)
- Run the following test file in debug mode -
import asyncio
import semantic_kernel as sk
import semantic_kernel.connectors.ai.open_ai as sk_oai
import semantic_kernel.memory.volatile_memory_store as sk_mv
async def test():
# Create a new kernel
kernel = sk.Kernel()
# Regiter openai embedding service
api_key, org_id = sk.openai_settings_from_dot_env()
kernel.add_text_embedding_generation_service(
"ada", sk_oai.OpenAITextEmbedding("text-embedding-ada-002", api_key, org_id)
)
# Create a new memory store
kernel.register_memory_store(sk_mv.VolatileMemoryStore())
await kernel.memory.save_information_async("test", id="test1", text="hello world")
if __name__ == "__main__":
asyncio.run(test())
- Inspect the embedding property of the created
MemoryRecord
- See in that the embedding is an array of arrays (Note the same behaviour occurs in
save_reference_async
andsearch_async
when creating the embedding)
Expected behavior
The embedding passed on to the memory store should be a single array (a vector) instead of an array of arrays (array of vectors) since these functions intend on working against a single input (thus will only create 1 embedding) and call down memory store functions that are also intended to work against a single input embedding.
Screenshots
If applicable, add screenshots to help explain your problem.
Desktop (please complete the following information):
- OS: Windows 11
- IDE: VSCode-Insiders
- NuGet Package Version: N/A
- Python Package/Branch: SK/Main
Additional context
Just as a note, I believe that the generation of a list from the OpenAITextEmbedding
is valid as the service supports batching, it is just in the case of these functions they do not decompose the top level list.