Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sqlite3.OperationalError: unable to open database file #919

Open
arcontechnologies opened this issue Jul 6, 2024 · 10 comments
Open

sqlite3.OperationalError: unable to open database file #919

arcontechnologies opened this issue Jul 6, 2024 · 10 comments

Comments

@arcontechnologies
Copy link

Hi,
Actually even if you setup the location of llmware_data (in my case it looks like this : LLMWareConfig.set_home("D:\\myapp\\")), it keeps looking for it in C:\users\<myuser>\llmware_data.

As consequence, I should keep 2 llmware_data folders which from point of view is lead to inconsistencies.

the issue :

Traceback (most recent call last):
  File "d:\IntellicatSearch\semantic_reranker.py", line 199, in <module>
    library = Library().load_library(library_name)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "d:\IntellicatSearch\intellicatsearch_env\Lib\site-packages\llmware\library.py", line 266, in load_library
    library_exists = self.check_if_library_exists(library_name, account_name=account_name)
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "d:\IntellicatSearch\intellicatsearch_env\Lib\site-packages\llmware\library.py", line 349, in check_if_library_exists
    library_card = LibraryCatalog().get_library_card(library_name, account_name=account_name)
                   ^^^^^^^^^^^^^^^^
  File "d:\IntellicatSearch\intellicatsearch_env\Lib\site-packages\llmware\library.py", line 1291, in __init__
    if CollectionWriter("library",account_name=self.account_name).check_if_table_build_required():
       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "d:\IntellicatSearch\intellicatsearch_env\Lib\site-packages\llmware\resources.py", line 228, in __init__
    self._writer = SQLiteWriter(self.library_name, account_name=self.account_name, custom_table=custom_table,
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "d:\IntellicatSearch\intellicatsearch_env\Lib\site-packages\llmware\resources.py", line 2564, in __init__
    self.conn = _SQLiteConnect().connect(library_name)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "d:\IntellicatSearch\intellicatsearch_env\Lib\site-packages\llmware\resources.py", line 3083, in connect
    self.conn = sqlite3.connect(db_file)
                ^^^^^^^^^^^^^^^^^^^^^^^^
sqlite3.OperationalError: unable to open database file
@doberst
Copy link
Contributor

doberst commented Jul 6, 2024

@arcontechnologies - could you please try the following to set a new home path:

from llmware.configs import LLMWareConfig

# check current home path
home = LLMWareConfig().get_home()
print("home: ", home)

# set new home path
new_home = "\absolute\new\path"
LLMWareConfig().set_home(new_home)

print("new home: ", LLMWareConfig().get_home())

# optional - but quick check if new path being picked up
LLMWareConfig().setup_llmware_workspace()

# if successful, you should see a "llmware_data" path with subfolders in your new home

@arcontechnologies
Copy link
Author

@doberst Actually if I run your script it shows what I'm expecting : my home

home:  D:\intellicatsearch\
new home:  D:\intellicatsearch\

It actually working fine (except for creating a library) as llmware_data is located here : D:\intellicatsearch\llmware_data

when the sqlite_llmware.db is located in C:\users\<myuser>\llmware_data. if I remove C:\users\<myuser>\llmware_data (because I'm expecting that my only valid llmware_data is located under home) then it throws an error as I reported it at first.

the weird thing is when keeping both locations (because it's the only way for me to create a new library), the new library is created in llmware_data\account \llmware under my home. So I'm confused.

My question why sqlite_llmware.db is still under C:\users\<myuser>\llmware_data when I already pointing to my home ?

@wissamharoun
Copy link

Having this exact same issue - driving me NUTS!

@CodeWithChetan2
Copy link

any solutions so far it is still not working for me

@doberst
Copy link
Contributor

doberst commented Jul 11, 2024

@wissamharoun and @CodeWithChetan2 - could you please explain what issue you are seeing in more detail - the code/example, and details on platform/python/llmware versions ?

@doberst
Copy link
Contributor

doberst commented Jul 11, 2024

@arcontechnologies - if I am following your question, llmware does not delete the sqlite_llmware.db when you change the home paths. The beauty of SQLite is also its simplicity - it is entirely file-based, so when you change file directories, and llmware does not see a SQLite instance in the new home path, we create a new one. If you want to clean up the old paths, you can just delete the sqlite_llmware.db file (you will lose any data on it, but generally no other issues).

In terms of creating multiple libraries in the same path, may I ask why you are constrained in your environment? Each library is fully self-contained with its own namespace and db table/collection - so you can create "library1", "library2", "library3", ... There is an also an account overlay, so you could create account1/library1, account1/library2, ... account2/library1, etc.

Also, would recommend looking at using Mongo or Postgres if you are getting into larger and more complex libraries ... :)

@wissamharoun
Copy link

wissamharoun commented Jul 12, 2024

Darren - Thank you for your attention and dedication to the llmware project. It's truly impressive work!

This issue is among a handful that's crucial for our use cases. We're building for modularity and portability, aiming to evaluate llmware for wider adoption and potential production use. Our goal is to understand llmware's adaptability to diverse environments, as each deployment in small-to-medium enterprises comes with unique data storage requirements and constraints.

The current obstacle is the inconsistent relocation (rather, dislocation) of llmware's home path, particularly in the case of the SQLite database. When setting a new home path, we expect all components, including the SQLite DB, to move accordingly. This would also be true, be it Mongo/Postgres/other db - and of course all the other essential components to make an llmware integration sing: vector stores, etc.

I've prepared a simple script demonstrating this issue, which I hope will help illustrate the problem clearly.

# example of home path and resource/component dislocation

from llmware.configs import LLMWareConfig
from llmware.library import Library
import os

# Set new home path - for example external local storage, but this could be anything
# mounted as a file system, or cloud storage accessed via API
new_home_path = "/Volumes/External_Storage/"
LLMWareConfig.set_home(new_home_path)

# Set SQLite as the active database
LLMWareConfig.set_active_db("sqlite")

# Print updated paths
print(f"New home path: {LLMWareConfig.get_home()}")
print(f"New library path: {LLMWareConfig.get_library_path()}")

# Attempt to create a new library
library_name = "test_library"
account_name = "test_account"
library = Library().create_new_library(library_name, account_name)

# Check where the library was actually created
print(f"Library main path: {library.library_main_path}")

# Check for SQLite database file
expected_db_path = os.path.join(new_home_path, "llmware_data", "accounts", "sqlite_llmware.db")
print(f"Expected SQLite DB path: {expected_db_path}")
print(f"SQLite DB exists at expected location: {os.path.exists(expected_db_path)}")

# Check default location to see if DB was created there instead
default_db_path = os.path.join(os.path.expanduser("~"), "llmware_data", "accounts", "sqlite_llmware.db")
print(f"SQLite DB exists at default location: {os.path.exists(default_db_path)}")

While some paths update, the SQLite DB remains in or is created at the default location. In fact despite the configuration to relocate the home path, at first-run, the script will fail at the point where the library is created, because it expects that "llmware_data/accounts" already exists in the default path. In other words, in this case, the sqlite database will entirely fail to create. Meanwhile the script will have successfully created the "llmware_data" directory and and all the requisite subdirectories in the new home_path (also assuming it exists).

As a test, a "hack" can be observed if the script is run after "llmware_data/accounts" is manually created in the default path -- then the "sqlite_llmware.db" will be created, in the default path under "accounts" - NOT in the new home_path.

This dislocation of the llmware workspace, causes data fragmentation and increases deployment complexity - It could even be amplified further if other components (such as vector stores) fail to align with the expected behavior to be relocated.

I appreciate your insights on this matter and look forward to discussing other aspects of llmware's adaptability in future interactions.

Here's a brief summary of the issue
Inconsistent relocation of llmware's home path, particularly the SQLite database file
Expected behavior: Per library.py, configs.py, setup.py/etc. All components, including the SQLite DB, should move to the new location.
Actual behavior: While some paths update, the SQLite DB remains in or is created at the default location.
Impact: This dislocation of the project files prevents full relocation of the llmware workspace, causing potential data fragmentation, limits portability and increases deployment complexity.

environment:
python 3.12.4
macOS Sonoma 14.5 /darwin unix
hardware: Apple Silicon Mac
llmware version: 0.3.1

@arcontechnologies
Copy link
Author

@wissamharoun that is the same behavior I'm encountering.

@wissamharoun
Copy link

yes. and many thanks to you for creating the issue! 👍🏻👍🏻👍🏻

@AndhikaWB
Copy link

The problem lies in this function

def get_uri_string (cls):

I tried overriding it using this code:

    @classmethod
    def get_uri_string (cls):
        """For SQLite the URI string is the local file with full absolute path"""
        test = LLMWareConfig().get_library_path()
        cls._conf["sqlite_db_folder_path"] = test

        db_file = os.path.join(cls._conf["sqlite_db_folder_path"], cls._conf["db_name"])
        return db_file

Then run the debugger when creating a library, but it's as if my new test variable is not defined, wtf?

image

Also, the db_file is already known to the debugger even though I haven't executed that line yet (since I breakpoint on the previous line)

This is a strange problem, maybe related to Python caching or something?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants