Releases: johanneszab/TumblThree
Download of password protected blogs
- Can download non-hidden, password protected blogs.
- UI changes:
- Added a password textbox in the details tab for supplying a password if its necessary for accessing the blog.
- Moved the tags column out of the blog manager into the details tab.
- Removed the now redundant 'Check directory for files'-checkbox since the downloader is capable of resuming files, it checks for the files existence anyways.
- Added a 'blog type' column in the blog manager denoting which downloader is used.
- Updated Chinese translation (thanks to @Emphasia).
- New in v1.0.8.22:
- Fixes newly introduced (v1.0.8.21) crash if a tumblr search with more than one keyword/tag was added (#139).
- Updates English text in the user interface and tool tips. More user interface cleanup will follow ..
- New in v1.0.8.23:
- Downloads inlined photos in photo posts and inlined videos in video posts. I think every inlined photo/video should be covered now. I've excluded scanning for inlined photos in, and only in, photo posts previously to not scan the same photo twice. I also wasn't aware that you can add a photo to a photo post. Same applies to video posts (thanks to anon for pointing this out).
- Add a separate maximum video connection value in the settings window. Someone has to test if this helps downloading mixed video & photo blogs with mostly video content. If the connection value for videos is set too high, TumblThree might not completely download them, but also not count them as downloaded, since the tumblr video host (vt.tumblr.com) closes all connections if there are too many open for a too long time. But still, one has to re-queue/download them to eventually finish all downloads. Please see #141 for more.
- New in v1.0.8.24:
- Somewhat "fixes" the timeout. Thus, if you have a wonky connection that frequently gets interrupted, TumblThree shouldn't stall anymore. The timeout value now counts for the whole connection time regardless of it's state. E.g. if you won't finish downloading a large file (video) within 120 seconds (default) increase the value or the file is truncated. If the release has any side effects since I've had to modify the core webrequest/downloader/crawler code for this, please try the v1.0.8.22 (#116).
Note: Before upgrading, remove the ColumnSettings from the settings.json in C:\Users\YOURUSERNAME\AppData\Local\TumblThree\Settings\ or delete the file entirely, otherwise you'll get a Could not restore ui settings-error. If you remove the settings.json file, you'll have to reset all your settings afterwards.
Tumblr search downloader
- A downloader for downloading photos and videos from the tumblr tag search (e.g. http://www.tumblr.com/tagged/my+keywords) (login required). The keywords should be separated by plus signs (+). See #97 for more.
- A downloader for downloading photos and videos from the tumblr search (e.g. http://www.tumblr.com/search/my+keywords). The keywords should be separated by plus signs (+). It only returns around 50-150 posts. See #97 for more.
- Allows to download blog posts in a defined time span.
- Customized detail views for each downloaders capability depending on the selection in the manager.
- Code refactoring.
- Bugfixes.
Note: After upgrading from previous releases, delete your settings (or just the Queuelist.json) in C:\Users\YOURUSERNAME\AppData\Local\TumblThree\Settings\ if you get the error 1: The queue list could not be loaded message.
Bugfixes & Code Refactoring
- Removes user interface lag during blog addition.
- Stop now also stops (and saves the active databases) if the network connection was/is disrupted.
- Uses .NET Framework 4.6 now as it should be available for all supported windows versions (Windows Vista and above). If it doesn't work anymore let me know. I don't use any new features of this version in the code so we could still stick to .NET version 4.5, but they improved the memory handling (garbage collection) next to some other things. Maybe it's worth it.
- Code Refactoring.
- Updates Chinese translation (thanks to @Emphasia).
- Updates French translation (thanks to @willemijns).
- New in v1.0.8.9:
- Updates Chinese translation.
- Fixes parsing of meta data in hidden blogs.
- Fixes bug introduced in v1.0.8.8 which prevented downloading "liked/by" posts.
- New in v1.0.8.10:
- Improved the selection handling in the details panel. If multiple blogs are selected, old values are now kept if they are the same for all blogs and changes are immediately reflected.
- New in v1.0.8.11:
- Adds audio file download support for tumblr and hidden tumblr blogs.
Allows to download hidden blogs
- Allows to download hidden tumblr blogs (that require a login to view/dashboard blogs). For this you have to login to tumblr.com using either the Internet Explorer or you can do it within TumblThree under Settings->Authenticate. The same cookie will be used. For non-hidden blogs however, you don't have to login. There are two separate downloader, one for each blog type.
- Finally a proper _raw (original / high resolution) tumblr image file handling. The file dimension size from the crawler is now always tested as the last fallback if no _raw file was found. The defaults are sane now without introducing to much latency or dropping/stalling downloads if the _raw file was not found.
- Fixed duplicate downloading due to _raw file introduction. If the same url was detected multiple times (e.g. double post in the blog) by the crawler and ended up in close proximity in the downloader queue, it might have happened that the same image was downloaded twice by different processes but in a different file size.
For more advanced users:
- The settings.json in C:\Users\Username\AppData\Local\TumblThree\Settings\ contains a list of hosts which can be modified named TumblrHosts. These hosts are tested for _raw image files in order, now containing only the media.tumblr.com host.
Bugfixes & Performance Improvements
- Improved cpu usage: The cpu usage should stay below a quarter of a core now. Previously the scanning hogged a lot of cpu cycles to prevent adding duplicates to the download queue which scaled inversely with the blog size.
- Improved memory usage: The file downloader was not properly uncoupled from the cancellation mechanism, resulting in an increasing memory usage with each download. The collected blog statistics (number of posts, kind of posts, etc.) are now also early removed from the memory and not held in memory until the complete crawl is finished.
- Correct cancellation handling (stopping of the crawler tasks).
- _Raw file support:
- if no _raw file is available, the downloader tries the _1280 file.
- Fixes dropping download speeds after a while introduced with the probing for _raw files and an unhandled failing of those (#101).
- More stability improvements.
This release is the continuation of the v1.0.4 branch. If unsure, download this release. If you use the version v1.0.4.31 or anything later, you should really update to this one.
Download high resolution images.
- Downloads high resolution (_raw .jpg | .png | .gif) images. Since this size isn't offered by the api, all image urls are now forcefully renamed to your settings (Settings->Imagesize).
Note: This might result in a re-download of the same image again, but with a different filename.
Download specific pages
- Sets the Date modified date in the Explorer to the posts time. It allows to view the blog chronologically by sorting by date. E.g. if a picture was posted on June 04, 2013, the date of that picture will be June 04, 2013.
- Allows to download single or ranges of blog pages. Valid formats are comma separated values or ranges. E.g:
- 1,2,3 downloads the pages 1 and 2 and 3.
- 1-10 downloads the pages 1 till 10.
- If entered nothing the whole blog is being downloaded.
- You can set the posts per page between 1 to 50 used for crawling. E.g. settings it to 50 will scan 50 posts per page. If nothing is set, 50 posts will be set.
- Clicking the preview opens the preview in a full screen window.
- An option to export all blog urls as a text file in the settings (settings -> general -> Export Blogs). One url per row. This allows a quick transfer of all blogs to a different TumblThree instance by simply opening the generated .txt file, select all blogs and copy them into the clipboard (i.e. ctrl-a, ctrl-c).
- Updates Russian translation (thanks @blackgur).
- Updates German translation.
- Applies all settings changes immediately if possible without application restart. Changing the download location during an active download still requires a manual restart.
Skip reblogged posts.
- Adds an option to skip reblogged posts and download only original content from the author.
- Improves the download of inlined photos and videos in text posts (e.g. a picture in a answer posts).
- Other minor bugfixes (see the last six commits).
Note: You have to set Download reblogged posts for each old dataset. Simply select all blogs (ctrl-a) and mark the checkbox in the Details view.
Code refactoring.
I've changed quite a lot internally. You can check out the last ~75 commits. Most of them were code refactoring and code enhancements. Should be mostly bugfree now.
New features are:
- Resumes incomplete downloads.
- Fixes incomplete video download.
- Downloader now stops immediately when stopping as downloads are resumable.
- Saves application settings now as json instead of xml. So you have to reset everything in the settings.
- The preview doesn't lag anymore and does not stall the application.
- It's now possible to drag&drop blogs from the manager (left) to the queue.
- An option to check the directory for already download files besides the internal database (#44).
- An option to download an url list instead of the actual binary files (#42).
- Fixes application crash if a drag&drop was initiated during a cell edit (e.g. tags cell) (#66).
- An application update checker.
- Downloads liked photos and videos, see #74 for more. For downloading those, you have to do some steps:
- Go to Settings, click the Authenticate button. Logon to tumblr using an account. The window/browser should automatically close after the login indicating a successful authentication. TumblThree will use the Internet Explorer cookies for authentication.
- Add the blog url including the liked/by string in the url (e.g. https://www.tumblr.com/liked/by/wallpaperfx/).
- Allows to change the visibility of the columns in the manager. There is a bug right now where you have to remove and re-add a column to display previously removed columns again.
- Adds a portable mode which stores the application settings next to the executable instead in the AppData folder.
- Fixes bandwidth throttling. Also allows to completely bypass it by setting the value to 0 in the settings.
- Allows to set proxy credentials (ProxyUsername, ProxyPassword) in plaintext in the settings file. Not tested.
Bugfixes:
- Fixes UI stall if many blogs were added using the ClipboardManager (#18).
- Fixes the autodownload function. Previously the stored value in the Settings.xml was used, not the one currently set (#63).
Bufixes since the first code refactoring (v1.0.4.31) release include:
- Fixes downloading of tagged files.
- Fixes application crash if a blog is added that is empty (#40).
- Fixes possible downloader stall (#75).
- Improves the photo and video detection in the tumblr likedby downloader (#77).
Note: If you have old binary data files (.tumblr) without the separated file list (_files.tumblr) you need to convert the big files into two smaller ones using the v1.0.4.31 release. After that you can use any of the newer releases.
The Tumblr api is now rate limited.
Backup your Index folder in the download location before running this version. It will permanently modify your blog index files (*.tumblr) upon the first run. They contain the already downloaded file information and might end up broken after the upgrade.
- Saves blog databases as .json files (plain text) instead of a binary format. Allows modification in your text editor of choice.
- The url list is now a separated file (_files.tumblr, also saved as json) and loaded on demand and is not permanently held in memory to reduce memory usage.
- Stores only the filename of tumblr photo, video and audio posts, instead of the whole url. This lowers memory consumption as a large part of the url is not file but host specific. The whole url address was saved to prevent reloading of the same file, but since the host server changes, the filename should be sufficient for this task.
- The picture/video preview lags a bit in the beginning and might display nothing for several seconds but does not freeze the whole application anymore.
- Downloads inline images of all post types (#24).
- The picture preview now displays animated .gifs (#38).
Rate limited Tumblr api:
The initial download process where all the image, video and audio urls are being searched for has to be slowed down since mid-February of 2017. The servers now only accept a defined number of connections per time interval. If too many connections are opened the servers don't respond anymore and just close the connection with a 429 respond -- Limit exceeded (see #26 for more).
Therefore, this pre-release addresses this new issue by:
- Adding a rate limiter in the settings. The Number of connections is per time in seconds and might be increased. I've not tested these two values thoroughly, but they work without hitting the limit. Different solutions as mention in #26 are faster (e.g. crawl in small batches and start the download immediately) but require more work to properly implement them. Only the initial evaluation period for grabbing the urls and meta information is slowed down. The picture, video and audio download is not impacted.
- It now shows an error if the api limit was reached. You should lower the limit for the api connections in the settings and re-crawl the specific blog, otherwise not all posts will be downloaded.
- Brings back some speed by simultaneously accessing the api and immediately downloading the first grabbed image, video and audio urls. So it does not wait for the "evaluating xxx of xxx post" to finish before starting to download.
- If a blog was successfully downloaded, the newest post id is saved. Upon the next download, only newer posts will be evaluated using the tumblr api, thus finishing the blog more quickly. A full rescan can be forced in the details view.