-
-
Notifications
You must be signed in to change notification settings - Fork 232
add unicode regex for paths and fix path encoding #1420
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add unicode regex for paths and fix path encoding #1420
Conversation
Codecov Report
@@ Coverage Diff @@
## main #1420 +/- ##
=======================================
Coverage 74.60% 74.60%
=======================================
Files 43 43
Lines 7828 7829 +1
Branches 1992 1992
=======================================
+ Hits 5840 5841 +1
- Misses 1416 1417 +1
+ Partials 572 571 -1
Continue to review full report at Codecov.
|
cwltool/command_line_tool.py
Outdated
ACCEPTLIST_EN_STRICT_RE = re.compile(r"^[a-zA-Z0-9._+-]+$") | ||
ACCEPTLIST_EN_RELAXED_RE = re.compile(r".*") # Accept anything |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACCEPTLIST_EN_STRICT_RE = re.compile(r"^[a-zA-Z0-9._+-]+$") | |
ACCEPTLIST_EN_RELAXED_RE = re.compile(r".*") # Accept anything | |
ACCEPTLIST_EN_STRICT_RE = re.compile(r"^[\w0-9._+\- \u2600-\u26FF]+$") # accept unicode word characters and emojis | |
ACCEPTLIST_EN_RELAXED_RE = re.compile(r".*") # Accept anything |
Better to fix this here, I think?
Also, use a r""
string to fix the following flake8 complaints:
cwltool/command_line_tool.py:89:31: W605 invalid escape sequence '\w'
cwltool/command_line_tool.py:89:39: W605 invalid escape sequence '\-'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had it like that at first, but thought the EN
implied english and this is decidedly not english 😄
I tried tox but that exits on the first failure (Maybe because of --strict for pytest, but I tried removing that with no luck). test_udocker_should_display_memory_usage
fails because psutil removed travis support in version 5.8.0 and I can't install the pinned version on my system. I pinned that version of psutil, as that might get it to work on others' systems.
Removing the TRAVIS
tests and running with tox works normally, as does invoking pytest
manually, but there's many failures like e.g.
_________________________________________________ test_dont_require_inputs __________________________________________________
def test_dont_require_inputs():
if sys.version_info[0] < 3:
stream = BytesIO()
else:
stream = StringIO()
script = None
try:
> script = NamedTemporaryFile(mode="w", delete=False)
../../tests/test_toolargparse.py:134:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
/home/zenon/.pyenv/versions/3.6.12/lib/python3.6/tempfile.py:551: in NamedTemporaryFile
(fd, name) = _mkstemp_inner(dir, prefix, suffix, flags, output_type)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
dir = '/home/zenon/DEV/ETH/cwltool/tmp/test_cid_file_dir___parallel__0/tmpaip03ivw', pre = 'tmp', suf = '', flags = 131266
output_type = <class 'str'>
def _mkstemp_inner(dir, pre, suf, flags, output_type):
"""Code common to mkstemp, TemporaryFile, and NamedTemporaryFile."""
names = _get_candidate_names()
if output_type is bytes:
names = map(_os.fsencode, names)
for seq in range(TMP_MAX):
name = next(names)
file = _os.path.join(dir, pre + name + suf)
try:
> fd = _os.open(file, flags, 0o600)
E FileNotFoundError: [Errno 2] No such file or directory: '/home/zenon/DEV/ETH/cwltool/tmp/test_cid_file_dir___parallel__0/tmpaip03ivw/tmpkssk2xd5'
/home/zenon/.pyenv/versions/3.6.12/lib/python3.6/tempfile.py:262: FileNotFoundError
but then
$ pytest -k test_dont_require_inputs
==================================================== test session starts ====================================================
platform linux -- Python 3.6.12, pytest-4.2.1, py-1.10.0, pluggy-0.13.1
rootdir: /home/zenon/DEV/ETH/cwltool, inifile: setup.cfg
plugins: mock-3.5.1, cov-2.11.1
collected 392 items / 391 deselected / 1 selected
tests/test_toolargparse.py . [100%]
========================================= 1 passed, 391 deselected in 2.62 seconds ==============================
So I'm not sure which failures are real failures.
Thank you @Panaetius ! For local testing, try I'll investigate some of the errors from my side as well |
There is a test that wants filenames with spaces to be rejected, but this PR "fixes" that. This is because CWL implementations are allowed to reject filenames with spaces:
|
Oh I see, I was not aware of that. Should I remove that from the regex? |
Yes please. Users who still want to use filenames with spaces with |
Ok, I'll remove that tomorrow. Also this PR is still missing test(s) but I wasn't sure where to add them. Is there maybe an existing test I could add more cases to? |
Thanks!
I think |
bf09c24
to
1e91134
Compare
…to add_unicode_support
1e91134
to
8485091
Compare
I added some tests. I also changed the regex to remove the space and I removed I also had to change the locale to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you run make format
?
done |
Woohoo! Thank you very much @Panaetius ! |
Thank you for the help and merging it so fast 😄 |
This should support filenames with unicode words and emojis in them. I adjusted the regex to use
\w
word characters, which matches most unicode symbols (I tested it locally with chinese, korean, arabic and cuneiform) that can appear in words. I also added the emoji range (one of our users actually had an emoji in a filename which lead to us having an issue in the first place). Also spaces are allowed in filenames, so i added that as well.I had to change how
path
is stored, as the implementation url-encoded the filepath (e.g.file:///tmp/myfile
) and then removed thefile://
prefix, but without unencoding special characters, so you'd get a path like/tmp/%E2%98%95coffee.txt
instead of/tmp/☕coffee.txt
, which doesn't exist on disk.I had to change some dependencies to get tests running, but quite a few of them still fail when running the whole suite, with errors like
FileNotFoundError: [Errno 2] No such file or directory: '/home/user/DEV/cwltool/tmp/test_cid_file_dir___parallel__0/tmpz2stpzft/tmpgpo291r4'
, but when I tried running a sample of the failing ones individually, they passed. So there might be some concurrency/fixture issue in the tests? I'm not sure how I could run the whole suite in a way that works to see if there's any that actually fail.Closes #110