Fix Raw Image Handling and Improve Text File Encoding Compatibility #233

CyanVoxel · 2024-06-01T02:30:14Z

Some small bugfixes and improvements involving thumbnail and preview rendering:

Fix RAW images not being loaded correctly in the preview panel
Fix trying to read size data from null images
Refactor os.stat to <Path object>.stat()
Remove unnecessary upper/lower conversions
Improve encoding compatibility beyond UTF-8 when reading text files
Misc. code cleanup

- Fix RAW images not being loaded correctly in the preview panel - Fix trying to read size data from null images - Refactor `os.stat` to `<Path object>.stat()` - Remove unnecessary upper/lower conversions - Improve encoding compatibility beyond UTF-8 when reading text files - Code cleanup

yedpodtrzitko · 2024-06-01T03:48:49Z

tagstudio/src/core/utils/encoding.py

+    for encoding in ENCODINGS:
+        with open(filepath, "r", encoding=encoding) as text_file:


I dont think opening the file (up to) 5 times it's a good approach, even though utf-8 might be the right choice in majority of cases.

I'd probably use existing library like chardet.

If you want to do it without introducing another dependency, opening the file once in a binary mode and then trying to do the content decode might also work (code not tested):

with open(file_path, 'rb') as text_file: # `rb` opens the file in binary mode text_file = file.read(1024) # Read the first 1024 bytes for encoding in ENCODINGS: try: decoded_data = raw_data.decode(encoding, errors='replace') if '�' not in decoded_data: return encoding except Exception: continue

and then it could be worth saving the file encoding somewhere in the library, so it doesnt need to be done every time 🤔

I was on the fence about adding another dependency, but I think I needed to hear it from someone else to realize that it's the better approach. I'll go and get a chardet implementation going 👍

Also, saving the encoding somewhere would definitely be the way to go. Probably out of the scope of this PR, but that would be very useful to have on hand along with cached file stats.

CyanVoxel added Type: Bug Something isn't working as intended Type: Refactor Code that needs to be restructured or cleaned up labels Jun 1, 2024

CyanVoxel added this to the Alpha 9.3 milestone Jun 1, 2024

yedpodtrzitko reviewed Jun 1, 2024

View reviewed changes

Use chardet for character encoding detection

c83dd78

CyanVoxel merged commit 0646508 into main Jun 3, 2024
6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix Raw Image Handling and Improve Text File Encoding Compatibility #233

Fix Raw Image Handling and Improve Text File Encoding Compatibility #233

CyanVoxel commented Jun 1, 2024

yedpodtrzitko Jun 1, 2024 •

edited

Loading

CyanVoxel Jun 1, 2024

		for encoding in ENCODINGS:
		with open(filepath, "r", encoding=encoding) as text_file:

Fix Raw Image Handling and Improve Text File Encoding Compatibility #233

Fix Raw Image Handling and Improve Text File Encoding Compatibility #233

Conversation

CyanVoxel commented Jun 1, 2024

yedpodtrzitko Jun 1, 2024 • edited Loading

Choose a reason for hiding this comment

CyanVoxel Jun 1, 2024

Choose a reason for hiding this comment

yedpodtrzitko Jun 1, 2024 •

edited

Loading