Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

#260 & #247 stop messing up files #262

Merged
merged 4 commits into from
Aug 18, 2018
Merged

#260 & #247 stop messing up files #262

merged 4 commits into from
Aug 18, 2018

Conversation

sinopsysHK
Copy link
Contributor

@sinopsysHK sinopsysHK commented Jul 19, 2018

Using a hash function with lower collision over the whole file name prevents from confusing between a new file and an older one that would have had the same hash.

Current hash was computed upon the first 16 chars of the file name so if ever there is a common prefix (such as "[myfevoriteindexer.com]") the hash is the same.

Now with the full file name being used it reduce the risk of collision which will avoid to retrieve old data from previous run (#247 )

Furthermose having the same guid convention in postprocessing (lower case) it enables to find back a previous processing and avoid seeking again for movie metadata when files are not deleted from download directory (#260)

Added little optimization avoiding blank tmdb queries.

Added a patch to secure Kodi import (movies with no video details info crashing import).

@nosmokingbandit
Copy link
Owner

There is a lot going on here...

See my comments on the file changes.

Copy link
Owner

@nosmokingbandit nosmokingbandit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few changes and I'll merge. If you just want to revert your ajax.py changed regarding Kodi I'll start working on how to handle that in the ui instead. Revert the change at library.py:160 and I'll merge the PR.

@@ -871,6 +871,11 @@ def import_kodi_movies(self, movies):

for movie in movies:

if not movie['imdbid']:
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If Kodi doesn't provide the imdbid there should be a means for the user to provide this in the webui in the way that Plex and Directory imports do. Simply ignoring a missing imdbid is not a good idea.

@@ -157,7 +158,7 @@ def get_movies(url):
else:
movie['resolution'] = 'DVD-SD'
else:
movie['videocodec'] = movie['resolution'] = ''
movie['videocodec'] = ''
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you remove this the 'resolution' key will not exist if the metadata cannot be parsed by hachoir. This key should be assigned here so that it is a blank string rather than null in the database.

Copy link
Contributor Author

@sinopsysHK sinopsysHK Jul 26, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello,

I do not avoid to set it but instead initialised it earlier once with default value “Unknown-SD” (line 148) so that either proper value is then found and overwrite default either keeps default.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I missed that. Should it simply 'Unknown' rather than 'Unknown-SD'?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

“Unknown” is even more accurate but I wanted to respect the pattern and further more Unknown-SD is a used else where so integration was closer to as-is.

core/library.py Outdated
@@ -908,7 +909,7 @@ def searchresults(guid, status, movie_info=None):

search_result = searchresults.score([search_result], imported=True)[0]

required_keys = ('score', 'size', 'status', 'pubdate', 'title', 'imdbid', 'indexer', 'date_found', 'info_link', 'guid', 'torrentfile', 'resoluion', 'type', 'downloadid', 'freeleech')
required_keys = ('score', 'size', 'status', 'pubdate', 'title', 'imdbid', 'indexer', 'date_found', 'info_link', 'guid', 'torrentfile', 'resolution', 'type', 'downloadid', 'freeleech')
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you go looking for typos you'll quickly be overwhelmed. Too bad IDEs don't have good spell checkers.

@@ -116,7 +116,7 @@ def scan_directory():
if r:
logging.info('Found match for {} in releases: {}.'.format(fname, r['title']))
else:
r['guid'] = 'POSTPROCESSING{}'.format(b16encode(fname.encode('ascii', errors='ignore')).decode('utf-8').zfill(16)[:16])
r['guid'] = 'postprocessing{}'.format(hashlib.md5(fname.encode('ascii', errors='ignore')).hexdigest())
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assumed b16 would be faster but after a quick test it looks like md5 is slightly faster. Both call C functions for the actual work but I suppose the md5 algorithm is so optimized at this point in time.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand the focus on performance but there is also a need to be correct: very frustrating to play a movie that is not actually what its filename tells it is.
Furthermore as mover/renamer overwrite existing files without warning it can mess the current library.

As b16encode do not apply any rotate/swap of bytes then it is not a good hash function.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The purpose is simply to have a unique string in the database as a fake guid. Using md5 or b16 doesn't make any difference in how the string is then used since an identical directory or filename will result in an identical guid. There is no real need to obscure the data at all, I simply wanted to have a guid-like representation.

Changing this to md5 will have no affect on anything other than performance. Which is actually slightly better than b16 for whatever reason.

@@ -871,6 +871,11 @@ def import_kodi_movies(self, movies):

for movie in movies:

if not movie['imdbid']:
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If Kodi doesn't provide an imdbid we should give the user a chance to supply it in the webui similar to how the Plex and Directory import pages handle missing data.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Understand but do changing behaviour in ui prevents the need to control there is actually something to query tmdb for ?
You might have personal videos that do not have IMDb Id: it is not a missing metadata from Kodi search.

But it indeed it would be good to have mean to hand over the process and found missing ones.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I supposed I assumed that users wouldn't import their home movies into Watcher. This is ok.

@@ -157,7 +158,7 @@ def get_movies(url):
else:
movie['resolution'] = 'DVD-SD'
else:
movie['videocodec'] = movie['resolution'] = ''
movie['videocodec'] = ''
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The resolution key needs to be assigned somewhere or it will be null in the database (None in python) and will raise exceptions any time it is compared to other strings.

core/library.py Outdated
@@ -908,7 +909,7 @@ def searchresults(guid, status, movie_info=None):

search_result = searchresults.score([search_result], imported=True)[0]

required_keys = ('score', 'size', 'status', 'pubdate', 'title', 'imdbid', 'indexer', 'date_found', 'info_link', 'guid', 'torrentfile', 'resoluion', 'type', 'downloadid', 'freeleech')
required_keys = ('score', 'size', 'status', 'pubdate', 'title', 'imdbid', 'indexer', 'date_found', 'info_link', 'guid', 'torrentfile', 'resolution', 'type', 'downloadid', 'freeleech')
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't go looking for typos, you'll quickly be overwhelmed. Too bad IDEs don't have good spell checkers.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure typo may not be the priority but in this case it has an impact on what is stored in the DB and thus loose info that is never recovered later on

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not saying it doesn't need to be fixed, only that finding all of them would take a full day.

And its lose.

@sinopsysHK
Copy link
Contributor Author

To wrap up all the reviews, except if I miss understood, I feel like you agreed with all changes including Ajax.py to support specific Kodi use-case where library can contain personal video with no IMDb Id and library.py that keeps resolution attribute set in any case.
Are you still expecting some back tracking ?

@nosmokingbandit
Copy link
Owner

I think the only thing I want changed before I merge is switching Unknown-SD' to Unknown`. I don't want to indicate something is SD if we really don't know. Some users have bandwidth caps and I don't want them to accidentally download a massive 4K release because of a parsing error that caused it to default to SD. It won't change how anything actually works, but makes it a bit more clear to the user that Unknown means that we have no idea at all and they are assuming all risk associated with that.

@sinopsysHK
Copy link
Contributor Author

Renamed Unknown-SD to Unknown

Using a hash function with lower collision over the whole file name prevents from confusing between a new file and an older one that would have had the same hash.

Current hash was computed upon the first 16 chars of the file name so if ever there is a common prefix (such as "[myfevoriteindexer.com]")  the hash  is the same.

Now with the full file name being used it reduce the risk of collision which will avoid to retrieve old data from previous run (#247)

Furthermose having the same guid convention in postprocessing (lower case) it enables to find back a previous processing and avoid seeking again for movie metadata when files are not deleted from download directory (#260)
Some movie in Kodi library do not have the imdb id (empty string in json).
Before querying tmdb, ought to check if imdb id is available otherwise it will anyway fail
Kodi import is crashing if ever there are movies without video details.
If resolution got empty string it crashes Kodi library import because this is not a know value in source_select list.
Creating explicit Unknown resolution/source, enable to manage properly this use case, allowing to flow seamlessly.

Propagate new Unknown in Quality settings

Add management of Unknown quality in quality settings to prevent page crash
On Synology python 3.5.1 json lib do not support multi-type input but only string.
Response.content is the raw bytes server output which is then not compliant.
@nosmokingbandit nosmokingbandit merged commit 827dfc0 into nosmokingbandit:master Aug 18, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants