Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Images with "international" filenames are not extracted on Markdown import #2334

Closed
henrikje opened this issue Jan 21, 2020 · 6 comments
Closed
Labels
bug It's a bug

Comments

@henrikje
Copy link

I just tried the new import-images-in-markdown feature (#2262) and it mostly worked as I hoped! (Thanks!) However, while some images were imported and their targets were replaced, other image tags were left unprocessed. Those were links with (in my case) Swedish characters such as å, ä and ö.

Example of a link that was successfully imported:

![Some text that may contain åäö](photos/7A673A17-142E-4BEF-A0F6-71DA5C510416.jpeg)

Example of a link that was NOT imported:

![Some text](photos/image-with-åäö.jpg)

Environment

Joplin version: 1.0.178
Platform: macOS
OS specifcs: 10.14.6

@henrikje henrikje added the bug It's a bug label Jan 21, 2020
@laurent22
Copy link
Owner

I'm not sure this is actually a bug because the URL should be url-encoded and I don't think it is in your example.

If you write it as ![Some text](photos/image-with-%C3%A5%C3%A4%C3%B6.jpg) does it work?

@BartBucknill, what do you think about it?

@henrikbarium
Copy link

Reading the CommonMark specification (if that is what Joplin is adhering to), it says images have the same syntax as links, and for links it says:

A link destination consists of either

  • a sequence of zero or more characters between an opening < and a closing > that contains no line breaks or unescaped < or > characters, or

  • a nonempty sequence of characters that does not start with <, does not include ASCII space or control characters, and includes parentheses only if (a) they are backslash-escaped or (b) they are part of a balanced pair of unescaped parentheses. (Implementations may impose limits on parentheses nesting to avoid performance issues, but at least three levels of nesting should be supported.)

Personally I don't find it completely clarifies the situation, but it doesn't explicitly require url encoding at least. 🙃

@BartBucknill
Copy link
Contributor

@laurent22 @henrikje Made a fix #2346
This bug occurs because the results of extractImageUrls come back URL encoded; I just added a decode of the parsed link before using.

@BartBucknill
Copy link
Contributor

Oh dear, looks like the fix works on mac but not linux, so the pipeline isn't passing https://travis-ci.org/laurent22/joplin/jobs/640608501?utm_medium=notification&utm_source=github_status

Won't be able to look into this right now, it'll have to wait a bit longer.
Anyone have any tips on fixing this for linux also?

@BartBucknill
Copy link
Contributor

@laurent22 Based on reading a few Q/A on stackoveflow (such as this one), I think it's possible that the test I added when making a fix for this issue is failing in the linux pipeline because of an encoding applied to the filename when it was saved on my macOS machine. In other words, had the file been saved on linux, it might be fine - although I haven't had time to validate this on a linux machine.
If correct then this fix may be good enough, although still not sure what to do with the failing test given that it has to run on both OSs.

@laurent22
Copy link
Owner

Yes that's reasonable, it's often hard to get filenames to work well cross-platform, so we could disable the test for Linux for instance.

What I'm more concerned about is that extractImageUrls() gives encoded URLs, while I think they should be decoded. Internally, any kind of data should be in decoded form (unless explicitly noted), and it's on app boundaries that we should encode/decode. So not sure, maybe the bug to be fixed is in extractImageUrls() actually.

@lock lock bot locked and limited conversation to collaborators Apr 4, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug It's a bug
Projects
None yet
Development

No branches or pull requests

4 participants