Skip to content

Commit

Permalink
Version 579
Browse files Browse the repository at this point in the history
  • Loading branch information
hydrusnetwork committed Jun 19, 2024
1 parent 1c70b00 commit 8f20b37
Show file tree
Hide file tree
Showing 55 changed files with 1,684 additions and 1,058 deletions.
82 changes: 42 additions & 40 deletions docs/changelog.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,48 @@ title: Changelog
!!! note
This is the new changelog, only the most recent builds. For all versions, see the [old changelog](old_changelog.html).

## [Version 579](https://github.com/hydrusnetwork/hydrus/releases/tag/v579)

### some url-checking logic

* the 'during URL check, check for neighbour-spam?' checkbox in _file import options_ has some sophisticated new logic. check the issue for a longer explanation, but long story short is if you have two different booru URLs that share the same source URL (with one or both simply being incorrect e.g. both point to the same 'clean' source, even though one is 'messy'), then that bad source URL will no longer cause the second booru import job to get 'already in db'. it now recognises this is an untrustworthy mapping and goes ahead with the download, just as you actually want. once the file is imported, it is still able, as normal, to quickly recognise the true positive 'already in db' result, so I believe have successfully plugged a logical hole here without affecting normal good operation! (issue #1563)
* the 'associate source urls' option in file import options is more careful about the above logic. source urls are now definitely not included in the pre-import file url checks if this option is off

### some regex quality of life

* regex input text boxes have been given a pass. the regex 'help' links are folded into the button, the links are updated to something newer (one of the older ones seems to have died), the button is now put aside the input and labelled `.*`, the menu is a little neater, and the input has placeholder text and now shows green/red (valid/invalid in the stylesheet) depending on whether the current regex text compiles ok. just a nicer widget overall
* this widget is now in filename tagging, the String Match panel regex match, the String Converter panel regex step, and the 'regex favourites' options panel, which I was surprised to learn the existence of
* the regex menu for the String Converter regex step also now shows how to do grouping in python regex. I hadn't experimented with this properly in python, but I learned this past week that this thing can handle `(...) -> \1` group-replace fine and can do named groups with `(?P<name>...) -> \g<name>` too!
* for convenience, when editing a String Match, if you flick from 'any' to 'fixed' or 'regex', it now puts whatever was in your example text beforehand as the new value for the fixed text or regex

### list selecting and scrolling

* I added some new scroll-to tech to my multi-column lists
* pasting a URL into the 'edit URL Classes' dialog's test input now selects and scrolls to the matching URL Class
* the following lists should all have better list sort/select preservation, and will now scroll to and maintain visibility, on various edit/add events: edit url classes, edit gugs, edit parsers, edit shortcut sets, edit shortcut set, the options dialog frame locations, the options dialog media viewer options, manage services, manage account types, manage logins, manage login scripts, edit login script, and some weird legacy stuff. lots more to do in future
* when you 'add from defaults' for many lists, it will now try and scroll to what was just added. may not be perfect!
* same deal with 'import' buttons. it will now try and scroll to what you import!
* I am also moving to 'when you edit, you only edit one row at a time'. in general, when I have written list edit functions, I write them to edit each row of a multi-selection in turn with a new dialog, but: this is not used very much, can be confusing/annoying to the user, and increases code complexity, so I am undoing it. as I continue to work here, if you have a multi-selection, an edit call will increasingly just edit the top selected row. maybe in this case I'll reduce the selection, maybe I'll add some different way to do multi-edit again, let me know what you think

### misc

* import folders now work in a far more efficient way. previously, the client loaded import folders every three minutes to see which were ready to run; now, it loads them once on startup or change and then consults each folder to determine how long to wait until loading it again. it isn't perfect yet, but this ancient, terrible code from back when 100 files was a lot is now far more efficient. users with large import folders may notice less background lag, let me know how you get on. thanks to the users who spotted this--there's doubtless more out there
* to help muscle memory, the 'undo' menu is now disabled when there is nothing for it to hold, not invisible. same deal for the 'pending' menu, although this will still hide if you have no services to pend to (ipfs, hydrus repositories). see how this feels, maybe I'll add options for it
* the new 'is this webp animated?' check is now a little faster
* if your similar file search tree is missing a branch (this can happen after db damage or crash desync during a file import) and a new file import (wanting to add a new leaf) runs into this gap, the database now imports the file successfully and the user gets a popup message telling them to regen their similar files search tree when convenient (rather than raising an error and failing the import)
* added a FAQ question 'I just imported files from my hard drive collection. How can I get their tags from the boorus?', to talk about my feelings on this technical question and to link to the user guide here: https://wiki.hydrus.network/books/hydrus-manual/page/file-look-up
* the default bandwidth rules for a hydrus repository are boosted from 512MB a day to 2GB. my worries about a database syncing 'too fast' for maintenance timers to kick in are less critical these days

### build and cleanup

* since the recent test 'future build' went without any problems, I am folding its library updates into the normal build. Qt (PySide6) goes from 6.6.0 to 6.6.3.1 for Linux and Windows, there's a newer SQLite dll on Windows, and there's a newer mpv dll on Windows
* updated all the requirements.txts to specify to not use the brand new numpy 2.0.0, which it seems just released this week and breaks anything that was compiled to work with 1.x.x. if you tried to set up a new venv in the past few days and got weird numpy errors, please rebuild your venv in v579, it should work again
* thanks to a user, the Docker build's `requests` 'no_proxy' patch is fixed for python &gt;3.10
* cleaned up a ton of `SyntaxWarnings` boot logspam on python &gt;=3.12 due to un-`r`-texted escape sequences like `\s`. thanks to the user who submitted all this, let me know if I missed any
* cleaned up some regex ui code
* cleaned up some garbage in the string panel ui code
* fixed some weird vertical stretch in some single-control dialogs

## [Version 578](https://github.com/hydrusnetwork/hydrus/releases/tag/v578)

### animated webp
Expand Down Expand Up @@ -332,43 +374,3 @@ title: Changelog
* cleaned up how some text and exceptions are split by newlines to handle different sorts of newline, and cleaned up how I fetch the first 'summary' line of text in all cases across the program
* replaced `os.linesep` with `\n` across the program. Qt only wants `\n` anyway, most logging wants `\n` (and sometimes converts on the fly behind the scenes), and this helps KISS otherwise. I might bring back `os.linesep` for sidecars and stuff if it proves a problem, but most text editors and scripting languages are very happy with `\n`, so we'll see
* multi-column lists now show multiline tooltips if the underlying text in the cell was originally multiline (although tbh this is rare)

## [Version 569](https://github.com/hydrusnetwork/hydrus/releases/tag/v569)

### user contributions

* thanks to a user, fixed a problem with the recent URL changes that caused downloaders examining multi-file posts to only grab the first file
* thanks to a user, all the menubar commands that launch a modal dialog are now suffix'd by an ellipsis
* thanks to a user, fixed an issue regarding KDE 6 quitting the program as soon as the pre-boot 'your database is missing a location, let's find it' repair dialog was ok'd
* thanks to a user, the application icon is fixed in KDE Plasma Wayland (and anything else that pulls icon from .desktop file). if you have been using a hydrus.desktop file and don't see a program icon, you should rename it to `/usr/share/applications/io.github.hydrusnetwork.hydrus.desktop` . more importantly, if you manage a package for hydrus--please output to this file path instead of `hydrus.desktop` if you make one
* thanks to a user, updated the `hydrus_client.sh` file to include `"$@"`, which passes parameters given to the .sh file to the .py call

### more on last week's URL work

* fixed the 'show the Request URL under "additional urls" submenu' thing on the file log list menu. I screwed up the logic and was effectively testing for when `1 != 1`
* the converter that generates a Referral URL now operates on the API/redirect conversion principle too--it normalises the Source URL to its 'Request URL' state--keeping defined ephemeral params and filling in defaults but dropping any extra gubbins not asked for--before applying the conversion
* fixed the 'manage url class' dialog to correctly display an example API/redirect-converted URL based on the new _request url_, not the _normalised url_ (so the api/redirect example will now show the new ephemeral params properly). this was working in requests correctly behind the scenes, it was just the example text box in the dialog that was showing wrong
* improved the 'is this query text pre-encoded?' test to check for `%hh`, where `h` is a hexadecimal character, instead of the hackier 'is % in it while not followed by whitespace or end of string?'
* improved/simplified/optimised the overall procedure that figures out if an entered URL is pre-encoded or not. this routine now only runs at the stage where a URL is ingested and it obeys the `%hh` rule. these ingestion points are currently: the text boxes in a urls downloader/simple downloader page; the 'import new sources' function of file log menus; a URL `ContentParser` in the parsing system; the test box in `manage url classes`; and the main gui's 'import url' landing pad, which is used by the drag and drop system, the clipboard watcher, and the client api's 'import url' command. note that this does not occur on 'manage known urls' editing, where you can do what you want with whatever, and I won't coerce it to anything

### misc

* fixed a variety of logical cases around &gt;0, =0, !=0, &lt;0 for the `NumberTest` objects I recently applied to system:duration and elsewhere. when it comes to file searching, files that have 'None' duration are now considered equivalent to files that have an explicit 0 duration in all cases. previously, I was trying to thread a needle where '=0' would find null results but &lt;x would not, and it was a mess. now it all works the same way. if you want to search for 'duration &lt; x' and want to exclude still images, either add a filetype pred or slap on 'has duration'
* improved the stability of the manual file exporter process. it was consulting an object in a thread that it shouldn't have
* improved the ability of the manual file exporter process to report errors on a very large export that encounters errors after the dialog has closed
* fixed the 'remember last used default tag service in manage tag dialogs' and its accompanying dropdown not saving their current value on options dialog ok. sorry for the trouble!
* fixed the system that truncates very long filenames (for export folders and drag and drop exports) on Linux when the exporter is also outputting a sidecar that has a long extra suffix
* the 'find potential duplicate pairs' routine that runs in idle time now properly obeys the work/rest times in `options->maintenance and processing`. previously, it was just the 'run now' routine that was resting in that way, and the idle thing was just doing a hardcoded 'work for 60 seconds every 10 mins or so'. thanks to the reporting user who cleverly noticed this
* the `options->connection` page now mentions your proxy needs to be `http://`

### boring stuff

* updated the windows setup_venv.bat to allow for custom python or venv locations using parameters. this was so I could set up a multi-python testing situation easier
* added some unit tests for the new URL encoding gubbins
* improved un-encoded URL parsing in the downloader when the URL is relative and needs to be joined to the source url
* improved some URL parsing and ingestion to better handle urls with non-ascii characters in the domain
* replaced several 'does it start with "http"?' areas with a better and unified scheme/netloc test
* wrote a routine to split URL paths into path components, and spammed it everywhere so this code is now unified. I expect we'll get a `PathComponent` class at some point, too. there will be a future question about what to do with double slashes, `//` in paths--it turns out the logic has been mixed here, and I think I will probably collapse them to `/` in all cases
* rewrote an unhealthy call that indirectly caused the above multi-file post parsing problem
* fixed some None/0 `NumberTest` stuff if you manage to enter '&lt;0' or &gt;-5 and similar
* I figured out the problems with PyInstaller 6.x and some other stuff, there should be a 'Future Build' alongside this release in github for advanced users to test with
14 changes: 14 additions & 0 deletions docs/faq.md
Original file line number Diff line number Diff line change
Expand Up @@ -129,6 +129,20 @@ Not really. Unless your situation involves millions of richly locally tagged fil

Yes. I am working on updating the database infrastructure to allow a full purge, but the structure is complicated, so it will take some time. If you are afraid of someone stealing your hard drive and matriculating your sordid MLP collection (or, in this case, the historical log of horrors that you rejected), do some research into drive encryption. Hydrus runs fine off an encrypted disk.

## I just imported files from my hard drive collection. How can I get their tags from the boorus?

The problem of 'what tags should these files have?' is technically difficult to solve, and there isn't a fast and easy way to query a booru and say 'hey, what are your tags for this?', particularly _en masse_. It is even more difficult to keep up with updates (e.g. someone adding a tag to a file some months or years after it was uploaded). This is the main problem I designed the PTR to solve.

If you cannot or do not want to devote the local resources to sync with the PTR, there are a few hacky ways to perform tag lookups, mostly with manual hash-based lookups. The big boorus support file search based on 'md5' hash, so there are ways to build a workflow where you can 'search' a booru or iqdb for one file at a time to see if there is a hit, and then get tags as if you were downloading it. An old system in the client called 'file lookup scripts' works like this, in the _manage tags_ dialog, and some users have figured out ways to make it work with some clever downloaders.

Be careful with these systems. They tend to be slow and use a lot of resources serverside, so you will be rude if you hit them too hard. They work for a handful of files every now and then, but please do not set up jobs of many many thousands of files, and absolutely do not repeat the job for the same files regularly--you will just waste a lot of CPU and network time for everyone, and only gain a couple of tags in the process. Note that the hash-based lookups only work if your files have not changed since being downloaded; if you have scaled them, stripped metadata, or optimised quality, then they will count as new files and the hashes will have changed, and you will need to think about services like iqdb or saucenao, or ultimately the hydrus duplicate resolution system.

That said, here is [a user guide on how to perform various kinds of file lookups](https://wiki.hydrus.network/books/hydrus-manual/page/file-look-up).

If you are feeling adventurous, you can also explore the newer [AI-tagging tools](client_api.html#auto-taggers) that users are working on.

Ultimately, though, a good and simple way to backfill your files' tags is just rely on normal downloading workflows. Try downloading your favourite artists (and later set up subscriptions) and you will naturally get files you like, with tags, and if, by (expected) serendipity, a file on the site is the same as one you already imported, hydrus will add the tags to it retroactively.

## Does Hydrus run ok off an encrypted drive partition? { id="encryption" }

Yes! Both the database and your files should be fine on any of the popular software solutions. These programs give your OS a virtual drive that on my end looks and operates like any other. I have yet to encounter one that SQLite has a problem with. Make sure you don't have auto-dismount set--or at least be hawkish that it will never trigger while hydrus is running--or you could damage your database.
Expand Down
Loading

0 comments on commit 8f20b37

Please sign in to comment.