Skip to content

Conversation

@joshmoore
Copy link
Member

This comes after several reports on the mailing lists of NFS and CIFS installs
failing.

Questions

  • Is the section obvious enough? Do we need a red blinking banner?
  • Are all of the terms well-enough understood? Do we need links?
  • Are there any tests that can be performed to check locking?
  • Should we also add a section on latency?

/cc @chris-allan

This comes after several reports on the mailing lists of NFS
and CIFS installs failing.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add, "...for more information, especially if the drive is remotely mounted)". I am thinking that people installing OMERO may often not be savvy enough to realise what filesystem arrangements offer proper locking, but they might be more likely to know what's remotely mounted, so then they still get a stronger hint that maybe they'd better read further.

Could move the parenthetical section to a footnote if it's getting too enormous to be a list element. Also, change the "See" to all lower-case unless it's really meant to be a sentence, in which case we could instead append a period.

For the links to more information, should/can we link them straight to the "Locking and remote shares" section down at the bottom of those pages?

@mtbc
Copy link
Member

mtbc commented Jan 24, 2013

It's good enough for me to be merged in, just made some light suggestions for consideration.

@mtbc
Copy link
Member

mtbc commented Jan 24, 2013

Another interesting question, perhaps worth a ticket: can we build a locking check into the server startup?

@joshmoore
Copy link
Member Author

@mtbc: wording suggested implemented.

@mtbc
Copy link
Member

mtbc commented Jan 24, 2013

@joshmoore: Indeed, definite improvement, thank you.

@joshmoore
Copy link
Member Author

@chris-allan: any further thoughts?

@chris-allan
Copy link
Member

Seems an acceptable start. We could probably write an essay about just NFS locking experiences but that can come as we have more specifics from the community.

@joshmoore
Copy link
Member Author

From @ehrenfeu:

picking up this thread in regard to the message you're quoting here - apparently the launching of "nfslockd" hasn't been integrated into the startup scripts yet. Is that correct?

Josh, as you wrote in the pull request it would be very helpful to have a test to check locking. This is extremely important for us as we're currently forced to have the binary repository on NFS :-(

So the basic questions that currently come to my mind are:

  • what specific variant(s) of locking need to be supported (flock(), fcntl(), lockf()), do you have any references that I can feed to our storage-experts?
  • why is locking done, for multi-threading or multi-nodes (or ...)?
  • do you expect that running "nfslockd" is sufficient?
  • how can we check if it is working correctly?
  • is NFSv4 instead of v3 something that could help?

@chris-allan, any thoughts on adding parts of this here, or shoot for a ticket to give us time to test?

@joshmoore
Copy link
Member Author

Also, from Mark Woodbridge:

Some anecdotal evidence: we're using NFSv4 for our OMERO repository (incl. FullText) and haven't had any problems. v4 doesn't seem to require a locking daemon.

@chris-allan
Copy link
Member

Much of the following assumes Linux. Your mileage may vary if you are running OMERO on Windows, *BSD, Solaris or other UNIX or UNIX-like operating system.

  1. The management of lockd on your system is your responsibility. On modern kernels it is an in kernel service and is not "started" per say. rpc.statd must of course be running, you won't have NFS support at all without it anyway.
  2. flock() has a BSD history and semantics. It does not lock files over NFS so we do not use it. On Linux fcntl() and lockf() are equivalent.
  3. I have a little script I use for checking locks in a directory: https://gist.github.com/4671879
  4. Distributed locking over NFS is subject to many variables. NFSv4 may make your life easier (locking is integrated into the core protocol) but it may also make your life tougher. Support on each Linux distribution varies due to kernel and user space toolchain differences. NFSv4 on Linux is by no means without its issues.

In short this is an environment based decision that is far outside of the scope of support that the OME team can provide. If you choose to place your binary repository on NFS you should be doing due diligence on your environment. This includes but is not limited to:

  • Which NFS version you are going to use
  • If you are using NFSv4, whether you are going to use ACLs
  • The operating system you're running OMERO on and its support for NFS and any outstanding issues related to NFS version you have chosen are in the kernel you are running
  • Your NAS or server operating system vendor 's support for NFS and NFS locking on filesystems it exports
  • The network topology between the physical or virtual machine and the NAS or server exporting the filesystem
  • What you will do if locking goes sideways on the client (reboot, stop-remount-start, etc.)
  • What you will do if locking goes sideways on the NAS or server (reboot, re-export, restart processes, etc.)
  • What you will do if you loose the NFS mount on the client
  • How you will monitor the health of your NFS exported filesystem

@ehrenfeu
Copy link
Contributor

Thanks Chris for the detailed information! I will check with our system here during the next days and report anything useful back here.

@joshmoore
Copy link
Member Author

@ehrenfeu, let us know what your research shows and we'll update the text modification.

/cc @hflynn

@jburel
Copy link
Member

jburel commented Feb 1, 2013

Do we want the updated doc as it is for the 4.4.6 release?
If we are happy, will merge. Another PR will be opened after the investigation.

@joshmoore
Copy link
Member Author

Probably worth including Chris' warnings. I'll do that today.

@joshmoore
Copy link
Member Author

Since there will be an immediate doc-fixup post-4.4.6, I'll hold off for today. See https://trac.openmicroscopy.org.uk/ome/ticket/10277

/cc @hflynn

Ok to merge?

@mtbc
Copy link
Member

mtbc commented Feb 4, 2013

Seems good to me to merge.

@hflynn
Copy link
Contributor

hflynn commented Feb 4, 2013

Reads fine to me

joshmoore added a commit that referenced this pull request Feb 4, 2013
Add sections on locking and remote shares to unix/windows
@joshmoore joshmoore merged commit d444e0b into ome:dev_4_4 Feb 4, 2013
@joshmoore joshmoore deleted the remote-mounts branch February 4, 2013 12:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants