Skip to content

feat(parity): migrate json libraries to sqlite #604

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 38 commits into from
Nov 30, 2024
Merged

feat(parity): migrate json libraries to sqlite #604

merged 38 commits into from
Nov 30, 2024

Conversation

CyanVoxel
Copy link
Member

@CyanVoxel CyanVoxel commented Nov 24, 2024

This PR adds the ability for users to migrate TagStudio v9.1-v9.4 JSON libraries to the v9.5 SQLite format.

Attempting to open a library folder with no ts_library.sqlite file and a ts_library.json file present will initiate the modal prompt for the user to start the migration process. Closing this prompt early stops the migration process and will not open the library. The migration is not completed until the user clicks the "Finish Migration" button, and the ts_library.json is never deleted as a backup precaution.

image

Migration Checklist

  • Library Folder
  • Entries
    • File Path
    • File Suffix
    • Fields
      • Field Order (The SQL DB doesn't seem built for this but it's fine for now)
      • Text Box Fields
        • Name/Subtype
        • Content
      • Text Line Fields
        • Name/Subtype
        • Content
      • Date Fields
        • Name/Subtype
        • Content
      • Tag Box Fields
        • Name/Subtype
        • Content
  • Tags
    • Tag Name
    • Tag Shorthand
    • Tag Aliases
    • Tag Parent Tags
    • Tag Color
  • File Extension List
    • List of extension strings
      • Removes leading “.”

@CyanVoxel CyanVoxel added Type: Enhancement New feature or request Type: QoL A quality of life (QoL) enhancement or suggestion TagStudio: Library Relating to the TagStudio library system Priority: High An important issue requiring attention labels Nov 24, 2024
@CyanVoxel CyanVoxel added this to the SQL Parity milestone Nov 24, 2024
@CyanVoxel CyanVoxel linked an issue Nov 24, 2024 that may be closed by this pull request
@python357-1
Copy link
Collaborator

Have no complaints with my small library, but we should get some bigger samples.

@Cool-Game-Dev
Copy link
Contributor

Seems to work on my smaller library, but I agree with python on this one.

@CyanVoxel
Copy link
Member Author

Updated tests to account for the way the SQL DB loads multiples of the same tag field type on entries. Currently the SQL DB loads the correct number of fields, but merges and mirrors the tag lists across each similar tag field type. This seems to be a limitation of the DB schema.

While I don't have much public documentation on the concept yet, I'm highly considering targeting the "Tag Categories" concept for 9.5 instead of 9.6 to not only address this issue, but to also address the currently broken mixed field editing that needs overhauling anyway.

@Cool-Game-Dev
Copy link
Contributor

@CyanVoxel I'm not complaining, go for it if you need.

@Computerdores
Copy link
Collaborator

Computerdores commented Nov 29, 2024

A couple issues that I noticed immediately on my main library:

  • Pressing "Start and Preview" currently freezes the window until it is done without any feedback (besides the log) that it is still working
  • After pressing "Finish Migration" the window freezes again and a small, completely white (also frozen) window with the title "Refreshing Directories" shows up. At some point they both vanish and "Running Macros on New Entries" shows up
  • After first migrating I had quite a lot of input lag (0.5-2s) when selecting an entry or scrolling (this went away after 30s or so)
    • something I just noticed: the input lag when selecting an entry is also sometimes present just after opening the library but only for the first time selecting an entry

Note:

  • my library has roughly ~750 entries most of which have only 5-10 tags (total file count is ~1400 but alot of them are sidecar files)
  • my library is on an HDD
  • I was running in debug mode, but that should only exacerbate problems not create them

@CyanVoxel
Copy link
Member Author

Pressing "Start and Preview" currently freezes the window until it is done without any feedback (besides the log) that it is still working

This is something I also don't like but unfortunately don't know a way around... I actually ended up creating an entire embedded progress bar that updated the values on screen as it the library processed, but discovered that SQLAlchemy's desire to be on a single thread directly headbutts against Qt's desire for progress bars to be on different threads. I was unable to come up with a solution where I could yield feedback from the library while not ruining the db connection.

After pressing "Finish Migration" the window freezes again and a small, completely white (also frozen) window with the title "Refreshing Directories" shows up. At some point they both vanish and "Running Macros on New Entries" shows up

I've noticed this as well. For me the white window only loads for a split second as it seems to be closed to fast that it never fully loads. There might be somewhere that an .update() method can be placed to help with this, but the issue seems to stem from the refresh bar getting created and destroyed too quickly, and/or being created while the main thread is pretty busy.

After first migrating I had quite a lot of input lag (0.5-2s) when selecting an entry or scrolling (this went away after 30s or so)

something I just noticed: the input lag when selecting an entry is also sometimes present just after opening the library but only for the first time selecting an entry

Hmm, there was a bug from 332 that caused lag like this when selecting entries however it should've been fixed in this PR. Could this be thumbnails rendering in the "background" while you try to click on entries? I usually get some lag until everything on the page is rendered.

@Computerdores
Copy link
Collaborator

This is something I also don't like but unfortunately don't know a way around... I actually ended up creating an entire embedded progress bar that updated the values on screen as it the library processed, but discovered that SQLAlchemy's desire to be on a single thread directly headbutts against Qt's desire for progress bars to be on different threads. I was unable to come up with a solution where I could yield feedback from the library while not ruining the db connection.

Yikes that sounds like a pain to get working. Have you thought about instead of having an indicator for progress, just having an indicator that the program is still running (e.g. a spinner)? This way the window wouldn't be frozen and it would still convey that it isn't done yet

Hmm, there was a bug from 332 that caused lag like this when selecting entries however it should've been fixed in this PR. Could this be thumbnails rendering in the "background" while you try to click on entries? I usually get some lag until everything on the page is rendered.

I was already seeing all the thumbnails at that point (although maybe offscreen ones?)

@CyanVoxel
Copy link
Member Author

This is something I also don't like but unfortunately don't know a way around... I actually ended up creating an entire embedded progress bar that updated the values on screen as it the library processed, but discovered that SQLAlchemy's desire to be on a single thread directly headbutts against Qt's desire for progress bars to be on different threads. I was unable to come up with a solution where I could yield feedback from the library while not ruining the db connection.

Yikes that sounds like a pain to get working. Have you thought about instead of having an indicator for progress, just having an indicator that the program is still running (e.g. a spinner)? This way the window wouldn't be frozen and it would still convey that it isn't done yet

That might work, assuming I can safely move the entire SQL library opening + conversion + closing process to a single separate thread. I think trying yield from the db before was the main issue, but I'm worried that either the GUI will still be blocked or that the DB will refuse to work on any other thread than the main one.

@Computerdores
Copy link
Collaborator

I think that would be worth trying. Windows freezing sketches me the hell out, I have wasted way to much time waiting on programs only to kill them after a while and conclude that they must be broken (especially since that part can take multiple minutes for larger libraries)

@CyanVoxel
Copy link
Member Author

I think that would be worth trying. Windows freezing sketches me the hell out, I have wasted way to much time waiting on programs only to kill them after a while and conclude that they must be broken (especially since that part can take multiple minutes for larger libraries)

Glad I kept around the progress bar code - I was able to get an indefinite progress bar that updates throughout the major migration events. I've also taken the opportunity to catch and display any errors encountered on the UI itself, as well as display a message box with any data discrepancies if necessary.

@Computerdores
Copy link
Collaborator

I think that would be worth trying. Windows freezing sketches me the hell out, I have wasted way to much time waiting on programs only to kill them after a while and conclude that they must be broken (especially since that part can take multiple minutes for larger libraries)

Glad I kept around the progress bar code - I was able to get an indefinite progress bar that updates throughout the major migration events. I've also taken the opportunity to catch and display any errors encountered on the UI itself, as well as display a message box with any data discrepancies if necessary.

I've tested this and it looks really good! I think there are probably still improvements that could be made (there are a bunch of log messages from add_field_to_entry and they show up over a long timespan so that could probably be shown to the user), but this definitely good enough!

I will try to read over the code and give a review later

Computerdores added a commit to Computerdores/TagStudio that referenced this pull request Nov 30, 2024
Copy link
Collaborator

@Computerdores Computerdores left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking quite good overall, just a couple comments

shorthand=tag.shorthand,
color=TagColor.get_color_from_str(tag.color),
)
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems a bit weird to me, why are the sub tags and aliases added before the tags that their foreign keys point to?

I haven't tested this, but I suspect that, because every operation in those loops is its own DB Transaction, should TS crash after the Aliases, but before the Tags have been added, that there would be database inconsistencies stemming from foreign keys in the Aliases and Subtags pointing to tags that don't exist. This might not matter if the migration just restarts after such a crash (or killing by the user), but seems like it is just asking for problems.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might've been some negative reinforcement during testing that got me to believe that this was needed, but swapping it to do the tags first seems to work just fine (and logically makes more sense).

While the user shouldn't have the option to complete the migration without the full DB migrated, it's still a great observation.

) -> None:
self.path = path
self.folder = folder
assert not self.id
Copy link
Collaborator

@Computerdores Computerdores Nov 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Why is this necessary? Shouldn't the constructor run before anything can set this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went and followed the assertion that was present in the Entry constructor, but yes I don't think this has ever been tripped during debugging and can probably be taken out now

open_status: LibraryStatus = self.lib.open_library(path)

# Migration is required
if open_status.message and open_status.message.startswith("[JSON]"):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure using this message to figure out whether migration is required is the best idea, maybe a migration_required field in LibraryStatus would be better? (This way changing what looks like a log message doesn't break functionality)

tag_id = tag.id # Tag IDs start at 0
sql_color = tag.color.name
json_color = (
self.json_lib.get_tag(tag_id).color.upper().replace(" ", "_")
Copy link
Collaborator

@Computerdores Computerdores Nov 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit:
.upper().replace(" ", "_")
this conversion is duplicated in TagColor.get_color_from_str maybe that should be a static method of TagColor?

@CyanVoxel CyanVoxel removed the Status: Review Needed A review of this is needed label Nov 30, 2024
@CyanVoxel CyanVoxel merged commit ef68603 into main Nov 30, 2024
10 checks passed
@CyanVoxel CyanVoxel deleted the json-to-sql branch November 30, 2024 21:00
CyanVoxel added a commit that referenced this pull request Dec 6, 2024
* add files

* fix: term was parsing ANDList instead of ORList

* make mypy happy

* ruff format

* add missing todo

* add more constraint types

* add parent property to AST

* add BaseVisitor class

* make mypy happy

* add __init__.py

* Revert "make mypy happy"

This reverts commit 926d0dd.

* refactoring and fixes

* rudimentary search field integration

* fix: check for None properly

* fix: Entries without Tags are now searchable

* make mypy happy

* Revert "fix: Entries without Tags are now searchable"

This reverts commit 19b40af.

* fix: changed joins to outerjoins and added missing outerjoin

* use query lang instead of tag_id FIlterState

* add todos

* fix: remove uncecessary line that broke search when searching for exact name

* fix tag search

* refactoring

* fix: path now uses GLOB operator for proper GLOBs

* refactoring: remove FilterState.id and implement Library.get_entry_full as replacement

* fix: use default value notation instead of if None statement in __post_init__

* remove obsolete Search Mode UI and related code

* ruff fixes

* remove obsolete tests

* fix: item_thumb didn't query entries correctly

* fix: search_library now correctly returns the number of *unique* entries

* make mypy happy

* implement NOT

* remove obsolete filename search

* remove summary as it is not applicable anymore

* finish refactoring of FilterState

* implement special:untagged

* fix: make mypy happy

* Revert changes to search_tags in favor of changes from #604

* fix: also port test changes

* fix: remove unneccessary import

* fix: remove unused dataclass

* fix: AND now works correctly with tags

* simplify structure of parsed AST

* add performance logging

* perf: Improve performance of search by reducing number of required joins from 4 to 1

* perf: double NOT is now optimized out of the AST

* fix: bug where pages would show less than the configured number of entries

* Revert "add performance logging"

This reverts commit c3c7d75.

* fix: tag_id search was broken

* somewhat adapt the existing autocompletion to this PR

* perf: Use Relational Division Queries to improve Query Execution Time

* fix: raise Exception so as to not fail silently

* fix: Parser bug broke parentheses

* little bit of clean up

* remove unnecessary comment

* add library for testing search

* feat: add basic tests

* fix: and queries containing just one tag were broken

* chore: remove debug code

* feat: more tests

* refactor: more consistent name for variable

Co-authored-by: Travis Abendshien <46939827+CyanVoxel@users.noreply.github.com>

* fix: ruff check complaint over double import

---------

Co-authored-by: Travis Abendshien <46939827+CyanVoxel@users.noreply.github.com>
DandyDev01 pushed a commit to DandyDev01/TagStudio that referenced this pull request Dec 13, 2024
* add files

* fix: term was parsing ANDList instead of ORList

* make mypy happy

* ruff format

* add missing todo

* add more constraint types

* add parent property to AST

* add BaseVisitor class

* make mypy happy

* add __init__.py

* Revert "make mypy happy"

This reverts commit 926d0dd.

* refactoring and fixes

* rudimentary search field integration

* fix: check for None properly

* fix: Entries without Tags are now searchable

* make mypy happy

* Revert "fix: Entries without Tags are now searchable"

This reverts commit 19b40af.

* fix: changed joins to outerjoins and added missing outerjoin

* use query lang instead of tag_id FIlterState

* add todos

* fix: remove uncecessary line that broke search when searching for exact name

* fix tag search

* refactoring

* fix: path now uses GLOB operator for proper GLOBs

* refactoring: remove FilterState.id and implement Library.get_entry_full as replacement

* fix: use default value notation instead of if None statement in __post_init__

* remove obsolete Search Mode UI and related code

* ruff fixes

* remove obsolete tests

* fix: item_thumb didn't query entries correctly

* fix: search_library now correctly returns the number of *unique* entries

* make mypy happy

* implement NOT

* remove obsolete filename search

* remove summary as it is not applicable anymore

* finish refactoring of FilterState

* implement special:untagged

* fix: make mypy happy

* Revert changes to search_tags in favor of changes from TagStudioDev#604

* fix: also port test changes

* fix: remove unneccessary import

* fix: remove unused dataclass

* fix: AND now works correctly with tags

* simplify structure of parsed AST

* add performance logging

* perf: Improve performance of search by reducing number of required joins from 4 to 1

* perf: double NOT is now optimized out of the AST

* fix: bug where pages would show less than the configured number of entries

* Revert "add performance logging"

This reverts commit c3c7d75.

* fix: tag_id search was broken

* somewhat adapt the existing autocompletion to this PR

* perf: Use Relational Division Queries to improve Query Execution Time

* fix: raise Exception so as to not fail silently

* fix: Parser bug broke parentheses

* little bit of clean up

* remove unnecessary comment

* add library for testing search

* feat: add basic tests

* fix: and queries containing just one tag were broken

* chore: remove debug code

* feat: more tests

* refactor: more consistent name for variable

Co-authored-by: Travis Abendshien <46939827+CyanVoxel@users.noreply.github.com>

* fix: ruff check complaint over double import

---------

Co-authored-by: Travis Abendshien <46939827+CyanVoxel@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Priority: High An important issue requiring attention TagStudio: Library Relating to the TagStudio library system Type: Enhancement New feature or request Type: QoL A quality of life (QoL) enhancement or suggestion
Projects
Status: ✅ Done
Development

Successfully merging this pull request may close these issues.

[Bug]: File Paths not Intercompatible between Windows and Linux Store file information in an SQLite database
4 participants