Skip to content

copy files to avoid concurrency issues in mris_expand #1938

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from

Conversation

chrisgorgo
Copy link
Member

No description provided.

@chrisgorgo chrisgorgo requested a review from effigies April 5, 2017 03:21
@codecov-io
Copy link

codecov-io commented Apr 5, 2017

Codecov Report

Merging #1938 into master will increase coverage by <.01%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1938      +/-   ##
==========================================
+ Coverage   72.47%   72.47%   +<.01%     
==========================================
  Files        1063     1063              
  Lines       54148    54148              
  Branches     7811     7811              
==========================================
+ Hits        39244    39246       +2     
+ Misses      13684    13682       -2     
  Partials     1220     1220
Flag Coverage Δ
#smoketests 72.47% <ø> (ø) ⬆️
#unittests 70.02% <ø> (ø) ⬆️
Impacted Files Coverage Δ
nipype/interfaces/freesurfer/utils.py 62.31% <ø> (ø) ⬆️
...nterfaces/freesurfer/tests/test_auto_MRIsExpand.py 85.71% <ø> (ø) ⬆️
nipype/interfaces/base.py 84.38% <0%> (+0.18%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4176856...fe3998a. Read the comment docs.

@effigies
Copy link
Member

effigies commented Apr 5, 2017

This makes no sense to me. Why should one flow of a mapnode be able to affect the other side, if symlinked? They shouldn't even be reading the same files, so even if they did for some reason modify them, it shouldn't produce a race condition. Further, the specific issue seems to be the link disappearing between os.islink and os.readlink, which suggests to me that either you had two instances of the workflow running or Node's cleanup mechanism is hopping directories.

@satra Would appreciate your eyes on this. The error is reported here.

@satra
Copy link
Member

satra commented Apr 5, 2017

@effigies - will do after lunch today, but your assessment seems correct.

@chrisfilo - this seems quite familiar to the previous error where multiple workflows were running on the same working directory tree in parallel.

@chrisgorgo
Copy link
Member Author

I was running only one instance of fmriprep with multiproc set to 6 and only one subject. Not sure how two workflows could be running at the same time in such scenario. Furthermore, this is how I have been testing fmriprep for months and never run into this before.

It could have something to do with the fact that both work and outout are mounted from host (Windows with hyper-v), but then again this was not a problem before (and I have run a lot of workflows). However, if this is the issue we play safe and copy files.

@effigies
Copy link
Member

effigies commented Apr 5, 2017

If it's mounted from a Windows host, then I'd think symlinks shouldn't work correctly. Maybe it'd be better to detect this case (if we can) in filemanip.copyfile and use copying there? It already falls back to copy when symlinks are unavailable. Because otherwise, it seems like any copyfile=False interface could be susceptible to this bug, not just MRIsExpand.

Can you try something like this from inside the Docker image, in a directory that's mounted from the host:

echo teststring > file_a
python -c <<END;
import os
os.symlink("file_a", "file_b")
link_val = os.readlink("file_b")
contents_a = open("file_a", 'r').read()
contents_b = open("file_b", 'r').read()
print """Link: file_b -> {}
file_a: {}
file_b: {}
""".format(link_val, contents_a, contents_b)
END;

And see what file_a and file_b look like from Windows?

Also, what does mount | grep scratch look like in Docker?

@chrisgorgo
Copy link
Member Author

The issue might be that we are linking across two volumes (I'll test that). In the meanwhile the mount info

root@2e242a44f297:/scratch# mount |grep /scratch
//10.0.75.1/D on /scratch type cifs (rw,relatime,vers=3.02,sec=ntlmssp,cache=strict,username=filo,domain=MSI,uid=0,noforceuid,gid=0,noforcegid,addr=10.0.75.1,file_mode=0755,dir_mode=0755,iocharset=utf8,nounix,serverino,mapposix,nobrl,mfsymlinks,noperm,rsize=1048576,wsize=1048576,echo_interval=60,actimeo=1)
root@2e242a44f297:/scratch# mount |grep /data
//10.0.75.1/D on /data type cifs (ro,relatime,vers=3.02,sec=ntlmssp,cache=strict,username=filo,domain=MSI,uid=0,noforceuid,gid=0,noforcegid,addr=10.0.75.1,file_mode=0755,dir_mode=0755,iocharset=utf8,nounix,serverino,mapposix,nobrl,mfsymlinks,noperm,rsize=1048576,wsize=1048576,echo_interval=60,actimeo=1)

@chrisgorgo
Copy link
Member Author

Linking within one volume works as expected:

root@2e242a44f297:/scratch# python -i test.py
Link: file_b -> file_a
file_a: teststring

file_b: teststring

@effigies
Copy link
Member

effigies commented Apr 5, 2017

Hmm. So it's CIFS implementing it as mfsymlinks. I wonder if this is actually a network FS sync problem exhibiting as a file not found, then, and it's being triggered by load. Unfortunately, I'm not sure how we could detect this case except by checking whether we're working on a CIFS volume.

I wouldn't think linking across volumes should make a difference.

I wonder if we should add an option to disable symlinks in the config file, and we could suggest people running on Windows to disable them.

Edit: Annoyingly, "load" here seems to mean "writing/reading 8 mfsymlinks" in rapid succession.

@effigies
Copy link
Member

effigies commented Apr 5, 2017

I proposed #1941 because I think it's a better option than this, and it only took a couple minutes to write. If we'd prefer some other option, I'm open to it. Detecting CIFS would require depending on psutil, to the best of my searching.

@chrisgorgo
Copy link
Member Author

I've tried using a single mount point and got the same issue.

I'll try using internal Docker storage as working directory next.

@satra satra closed this in #1941 Apr 9, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants