Fix Raw Image Handling and Improve Text File Encoding Compatibility#233
Fix Raw Image Handling and Improve Text File Encoding Compatibility#233
Conversation
- Fix RAW images not being loaded correctly in the preview panel - Fix trying to read size data from null images - Refactor `os.stat` to `<Path object>.stat()` - Remove unnecessary upper/lower conversions - Improve encoding compatibility beyond UTF-8 when reading text files - Code cleanup
tagstudio/src/core/utils/encoding.py
Outdated
| for encoding in ENCODINGS: | ||
| with open(filepath, "r", encoding=encoding) as text_file: |
There was a problem hiding this comment.
I dont think opening the file (up to) 5 times it's a good approach, even though utf-8 might be the right choice in majority of cases.
I'd probably use existing library like chardet.
If you want to do it without introducing another dependency, opening the file once in a binary mode and then trying to do the content decode might also work (code not tested):
with open(file_path, 'rb') as text_file: # `rb` opens the file in binary mode
text_file = file.read(1024) # Read the first 1024 bytes
for encoding in ENCODINGS:
try:
decoded_data = raw_data.decode(encoding, errors='replace')
if '�' not in decoded_data:
return encoding
except Exception:
continue
and then it could be worth saving the file encoding somewhere in the library, so it doesnt need to be done every time 🤔
There was a problem hiding this comment.
I was on the fence about adding another dependency, but I think I needed to hear it from someone else to realize that it's the better approach. I'll go and get a chardet implementation going 👍
Also, saving the encoding somewhere would definitely be the way to go. Probably out of the scope of this PR, but that would be very useful to have on hand along with cached file stats.
Some small bugfixes and improvements involving thumbnail and preview rendering:
os.statto<Path object>.stat()