Skip to content

File name too long UTF-8 long titles #231

@grea09

Description

@grea09

Ehlo,
I wanted to share an error I encounter frequently with soundscrape. This happens with Japanese songs the most because of UTF-8. I tried a fix in the code but it isn't very straight forward.

Here is a link that would make soundscrape fail : nice japanese track

And here is the logs:

Downloading: ゴレマハガト楽団 feat.ふえ吹き野うさぎ - あにまる☆マーチ(ああ…翡翠茶漬け… Bootleg Remix)
Problem downloading ゴレマハガト楽団 feat.ふえ吹き野うさぎ - あにまる☆マーチ(ああ…翡翠茶漬け… Bootleg Remix)
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/soundscrape/soundscrape.py", line 443, in download_tracks
    filename = download_file(location, track_filename)
  File "/usr/local/lib/python3.6/dist-packages/soundscrape/soundscrape.py", line 1190, in download_file
    with open(tmp_path, 'wb') as f:
OSError: [Errno 36] File name too long: '翡乃イスカ(Hino Isuka)/翡乃イスカ(Hino Isuka) - ゴレマハガト楽団 feat.ふえ吹き野うさぎ - あにまる☆マーチ(ああ…翡翠茶漬け… Bootleg Remix).mp3.tmp'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/bin/soundscrape", line 11, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.6/dist-packages/soundscrape/soundscrape.py", line 119, in main
    process_soundcloud(vargs)
  File "/usr/local/lib/python3.6/dist-packages/soundscrape/soundscrape.py", line 292, in process_soundcloud
    id3_extras=id3_extras)
  File "/usr/local/lib/python3.6/dist-packages/soundscrape/soundscrape.py", line 460, in download_tracks
    puts_safe(e)
  File "/usr/local/lib/python3.6/dist-packages/soundscrape/soundscrape.py", line 1315, in puts_safe
    puts(text)
  File "/home/antoine/.local/lib/python3.6/site-packages/clint/textui/core.py", line 57, in puts
    s = tsplit(s, NEWLINES)
  File "/home/antoine/.local/lib/python3.6/site-packages/clint/utils.py", line 69, in tsplit
    string = string.replace(i, final_delimiter)
AttributeError: 'OSError' object has no attribute 'replace'

The fix I did that worked was to decode the UTF-8 and get the raw char length then try to cut the raw string at the right place as to not have incorrect UTF-8 chars. Then to reencode it. The other issue is when working in an already deep path in the system especially if it has UTF-8 chars in it.

I hope that helps fixing it. I unfortunately don't have the time to recode aand test that fix to do a PR.

Thanks for the awesome tool !!!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions