Skip to content

feat: Implement Query lang #606

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 67 commits into from
Dec 6, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
67 commits
Select commit Hold shift + click to select a range
7981d13
add files
Computerdores Nov 27, 2024
eae8d9d
fix: term was parsing ANDList instead of ORList
Computerdores Nov 27, 2024
2bbad4a
make mypy happy
Computerdores Nov 27, 2024
25a64d9
ruff format
Computerdores Nov 27, 2024
e9afa42
add missing todo
Computerdores Nov 27, 2024
581b7ac
add more constraint types
Computerdores Nov 27, 2024
6f35864
add parent property to AST
Computerdores Nov 27, 2024
47a3b96
add BaseVisitor class
Computerdores Nov 27, 2024
1507c5c
make mypy happy
Computerdores Nov 27, 2024
e0c404a
add __init__.py
Computerdores Nov 27, 2024
f592f07
Revert "make mypy happy"
Computerdores Nov 27, 2024
fc3db94
refactoring and fixes
Computerdores Nov 27, 2024
958ba86
rudimentary search field integration
Computerdores Nov 27, 2024
de08849
fix: check for None properly
Computerdores Nov 27, 2024
06761a5
fix: Entries without Tags are now searchable
Computerdores Nov 27, 2024
a636864
make mypy happy
Computerdores Nov 27, 2024
e2e378f
Revert "fix: Entries without Tags are now searchable"
Computerdores Nov 27, 2024
927beed
fix: changed joins to outerjoins and added missing outerjoin
Computerdores Nov 28, 2024
cb8437b
use query lang instead of tag_id FIlterState
Computerdores Nov 28, 2024
9681bfc
add todos
Computerdores Nov 28, 2024
6eab426
fix: remove uncecessary line that broke search when searching for exa…
Computerdores Nov 28, 2024
0d4afd4
fix tag search
Computerdores Nov 28, 2024
eda7f52
refactoring
Computerdores Nov 28, 2024
27fac24
fix: path now uses GLOB operator for proper GLOBs
Computerdores Nov 28, 2024
7c43686
refactoring: remove FilterState.id and implement Library.get_entry_fu…
Computerdores Nov 28, 2024
f9bf7a7
fix: use default value notation instead of if None statement in __pos…
Computerdores Nov 28, 2024
db6284f
remove obsolete Search Mode UI and related code
Computerdores Nov 28, 2024
2f91337
ruff fixes
Computerdores Nov 28, 2024
7a3679d
remove obsolete tests
Computerdores Nov 28, 2024
d747435
fix: item_thumb didn't query entries correctly
Computerdores Nov 28, 2024
cdf2f09
fix: search_library now correctly returns the number of *unique* entries
Computerdores Nov 28, 2024
1f5a4dc
make mypy happy
Computerdores Nov 28, 2024
3d4b649
implement NOT
Computerdores Nov 28, 2024
3ab8d6a
remove obsolete filename search
Computerdores Nov 28, 2024
ea17580
remove summary as it is not applicable anymore
Computerdores Nov 28, 2024
83e156f
finish refactoring of FilterState
Computerdores Nov 28, 2024
a784fe6
implement special:untagged
Computerdores Nov 28, 2024
6fb1d4b
fix: make mypy happy
Computerdores Nov 28, 2024
d9da558
Revert changes to search_tags in favor of changes from #604
Computerdores Nov 30, 2024
05c7d94
fix: also port test changes
Computerdores Nov 30, 2024
8602f2f
fix: remove unneccessary import
Computerdores Nov 30, 2024
4dfd7e3
fix: remove unused dataclass
Computerdores Nov 30, 2024
2d778ce
Merge branch 'main' into query-lang
Computerdores Nov 30, 2024
1e8c72c
fix: AND now works correctly with tags
Computerdores Dec 2, 2024
bb7196e
simplify structure of parsed AST
Computerdores Dec 2, 2024
c3c7d75
add performance logging
Computerdores Dec 2, 2024
59b9966
perf: Improve performance of search by reducing number of required jo…
Computerdores Dec 3, 2024
4cb8ad9
perf: double NOT is now optimized out of the AST
Computerdores Dec 3, 2024
a99ced2
fix: bug where pages would show less than the configured number of en…
Computerdores Dec 3, 2024
bb79f76
Merge branch 'main' into query-lang
Computerdores Dec 3, 2024
127eee9
Revert "add performance logging"
Computerdores Dec 3, 2024
b31745a
fix: tag_id search was broken
Computerdores Dec 3, 2024
3235bbe
somewhat adapt the existing autocompletion to this PR
Computerdores Dec 3, 2024
592bd17
perf: Use Relational Division Queries to improve Query Execution Time
Computerdores Dec 3, 2024
958e028
fix: raise Exception so as to not fail silently
Computerdores Dec 3, 2024
49498f6
fix: Parser bug broke parentheses
Computerdores Dec 4, 2024
6c788a5
little bit of clean up
Computerdores Dec 4, 2024
4aac8e7
remove unnecessary comment
Computerdores Dec 4, 2024
d7ac76a
add library for testing search
Computerdores Dec 4, 2024
484205d
feat: add basic tests
Computerdores Dec 4, 2024
358fc70
fix: and queries containing just one tag were broken
Computerdores Dec 4, 2024
5312c14
Merge branch 'query-lang' of github.com:Computerdores/TagStudio into …
Computerdores Dec 4, 2024
3c3b5d2
chore: remove debug code
Computerdores Dec 4, 2024
9d357b0
feat: more tests
Computerdores Dec 4, 2024
4265fde
refactor: more consistent name for variable
Computerdores Dec 6, 2024
1eaea3d
Merge branch 'main' into query-lang
Computerdores Dec 6, 2024
d0740ba
fix: ruff check complaint over double import
Computerdores Dec 6, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
110 changes: 39 additions & 71 deletions tagstudio/src/core/library/alchemy/enums.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,10 @@
import enum
from dataclasses import dataclass
from dataclasses import dataclass, replace
from pathlib import Path

from src.core.query_lang import AST as Query # noqa: N811
from src.core.query_lang import Constraint, ConstraintType, Parser


class TagColor(enum.IntEnum):
DEFAULT = 1
Expand Down Expand Up @@ -50,13 +53,6 @@ def get_color_from_str(color_name: str) -> "TagColor":
return TagColor.DEFAULT


class SearchMode(enum.IntEnum):
"""Operational modes for item searching."""

AND = 0
OR = 1


class ItemType(enum.Enum):
ENTRY = 0
COLLATION = 1
Expand All @@ -68,71 +64,12 @@ class FilterState:
"""Represent a state of the Library grid view."""

# these should remain
page_index: int | None = None
page_size: int | None = None
search_mode: SearchMode = SearchMode.AND # TODO - actually implement this
page_index: int | None = 0
page_size: int | None = 500

# these should be erased on update
# tag name
tag: str | None = None
# tag ID
tag_id: int | None = None

# entry id
id: int | None = None
# whole path
path: Path | str | None = None
# file name
name: str | None = None
# file type
filetype: str | None = None
mediatype: str | None = None

# a generic query to be parsed
query: str | None = None

def __post_init__(self):
# strip values automatically
if query := (self.query and self.query.strip()):
# parse the value
if ":" in query:
kind, _, value = query.partition(":")
value = value.replace('"', "")
else:
# default to tag search
kind, value = "tag", query

if kind == "tag_id":
self.tag_id = int(value)
elif kind == "tag":
self.tag = value
elif kind == "path":
self.path = value
elif kind == "name":
self.name = value
elif kind == "id":
self.id = int(self.id) if str(self.id).isnumeric() else self.id
elif kind == "filetype":
self.filetype = value
elif kind == "mediatype":
self.mediatype = value

else:
self.tag = self.tag and self.tag.strip()
self.tag_id = int(self.tag_id) if str(self.tag_id).isnumeric() else self.tag_id
self.path = self.path and str(self.path).strip()
self.name = self.name and self.name.strip()
self.id = int(self.id) if str(self.id).isnumeric() else self.id

if self.page_index is None:
self.page_index = 0
if self.page_size is None:
self.page_size = 500

@property
def summary(self):
"""Show query summary."""
return self.query or self.tag or self.name or self.tag_id or self.path or self.id
# Abstract Syntax Tree Of the current Search Query
ast: Query = None

@property
def limit(self):
Expand All @@ -142,6 +79,37 @@ def limit(self):
def offset(self):
return self.page_size * self.page_index

@classmethod
def show_all(cls) -> "FilterState":
return FilterState()

@classmethod
def from_search_query(cls, search_query: str) -> "FilterState":
return cls(ast=Parser(search_query).parse())

@classmethod
def from_tag_id(cls, tag_id: int | str) -> "FilterState":
return cls(ast=Constraint(ConstraintType.TagID, str(tag_id), []))

@classmethod
def from_path(cls, path: Path | str) -> "FilterState":
return cls(ast=Constraint(ConstraintType.Path, str(path).strip(), []))

@classmethod
def from_mediatype(cls, mediatype: str) -> "FilterState":
return cls(ast=Constraint(ConstraintType.MediaType, mediatype, []))

@classmethod
def from_filetype(cls, filetype: str) -> "FilterState":
return cls(ast=Constraint(ConstraintType.FileType, filetype, []))

@classmethod
def from_tag_name(cls, tag_name: str) -> "FilterState":
return cls(ast=Constraint(ConstraintType.Tag, tag_name, []))

def with_page_size(self, page_size: int) -> "FilterState":
return replace(self, page_size=page_size)


class FieldTypeEnum(enum.Enum):
TEXT_LINE = "Text Line"
Expand Down
89 changes: 34 additions & 55 deletions tagstudio/src/core/library/alchemy/library.py
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,6 @@
from sqlalchemy.exc import IntegrityError
from sqlalchemy.orm import (
Session,
aliased,
contains_eager,
make_transient,
selectinload,
Expand All @@ -42,7 +41,6 @@
TS_FOLDER_NAME,
)
from ...enums import LibraryPrefs
from ...media_types import MediaCategories
from .db import make_tables
from .enums import FieldTypeEnum, FilterState, TagColor
from .fields import (
Expand All @@ -54,6 +52,7 @@
)
from .joins import TagField, TagSubtag
from .models import Entry, Folder, Preferences, Tag, TagAlias, ValueType
from .visitors import SQLBoolExpressionBuilder

logger = structlog.get_logger(__name__)

Expand Down Expand Up @@ -402,6 +401,29 @@ def get_entry(self, entry_id: int) -> Entry | None:
make_transient(entry)
return entry

def get_entry_full(self, entry_id: int) -> Entry | None:
"""Load entry an join with all joins and all tags."""
with Session(self.engine) as session:
statement = select(Entry).where(Entry.id == entry_id)
statement = (
statement.outerjoin(Entry.text_fields)
.outerjoin(Entry.datetime_fields)
.outerjoin(Entry.tag_box_fields)
)
statement = statement.options(
selectinload(Entry.text_fields),
selectinload(Entry.datetime_fields),
selectinload(Entry.tag_box_fields)
.joinedload(TagBoxField.tags)
.options(selectinload(Tag.aliases), selectinload(Tag.subtags)),
)
entry = session.scalar(statement)
if not entry:
return None
session.expunge(entry)
make_transient(entry)
return entry

@property
def entries_count(self) -> int:
with Session(self.engine) as session:
Expand Down Expand Up @@ -518,63 +540,18 @@ def search_library(
with Session(self.engine, expire_on_commit=False) as session:
statement = select(Entry)

if search.tag:
SubtagAlias = aliased(Tag) # noqa: N806
statement = (
statement.join(Entry.tag_box_fields)
.join(TagBoxField.tags)
.outerjoin(Tag.aliases)
.outerjoin(SubtagAlias, Tag.subtags)
.where(
or_(
Tag.name.ilike(search.tag),
Tag.shorthand.ilike(search.tag),
TagAlias.name.ilike(search.tag),
SubtagAlias.name.ilike(search.tag),
)
)
)
elif search.tag_id:
statement = (
statement.join(Entry.tag_box_fields)
.join(TagBoxField.tags)
.where(Tag.id == search.tag_id)
)

elif search.id:
statement = statement.where(Entry.id == search.id)
elif search.name:
statement = select(Entry).where(
and_(
Entry.path.ilike(f"%{search.name}%"),
# dont match directory name (ie. has following slash)
~Entry.path.ilike(f"%{search.name}%/%"),
)
)
elif search.path:
search_str = str(search.path).replace("*", "%")
statement = statement.where(Entry.path.ilike(search_str))
elif search.filetype:
statement = statement.where(Entry.suffix.ilike(f"{search.filetype}"))
elif search.mediatype:
extensions: set[str] = set[str]()
for media_cat in MediaCategories.ALL_CATEGORIES:
if search.mediatype == media_cat.name:
extensions = extensions | media_cat.extensions
break
# just need to map it to search db - suffixes do not have '.'
statement = statement.where(
Entry.suffix.in_(map(lambda x: x.replace(".", ""), extensions))
if search.ast:
statement = statement.outerjoin(Entry.tag_box_fields).where(
SQLBoolExpressionBuilder(self).visit(search.ast)
)

extensions = self.prefs(LibraryPrefs.EXTENSION_LIST)
is_exclude_list = self.prefs(LibraryPrefs.IS_EXCLUDE_LIST)

if not search.id: # if `id` is set, we don't need to filter by extensions
if extensions and is_exclude_list:
statement = statement.where(Entry.suffix.notin_(extensions))
elif extensions:
statement = statement.where(Entry.suffix.in_(extensions))
if extensions and is_exclude_list:
statement = statement.where(Entry.suffix.notin_(extensions))
elif extensions:
statement = statement.where(Entry.suffix.in_(extensions))

statement = statement.options(
selectinload(Entry.text_fields),
Expand All @@ -584,6 +561,8 @@ def search_library(
.options(selectinload(Tag.aliases), selectinload(Tag.subtags)),
)

statement = statement.distinct(Entry.id)

query_count = select(func.count()).select_from(statement.alias("entries"))
count_all: int = session.execute(query_count).scalar()

Expand All @@ -597,7 +576,7 @@ def search_library(

res = SearchResult(
total_count=count_all,
items=list(session.scalars(statement).unique()),
items=list(session.scalars(statement)),
)

session.expunge_all()
Expand Down
Loading
Loading