Various additional dfs plugin fixes #6

krehm · 2022-07-13T11:56:38Z

Global Changes:

The plugin was using duplicate definitions of internal DAOS client
structures (dfs_obj_t, dfs_t, dfs_entry_t), and would create malloc'd
copies of those structs in order to be able to access their private
fields. Should DAOS modify those structures in future releases, the
plugin would break for those releases. The dependencies on internal
fields have been removed, the DAOS client API is now strictly followed.
The path_map and size_map caches used DFS mount-point-relative
pathnames as keys. If more than one DFS filesystem is mounted during
the same run, then the same relative pathname could be in use in
both filesystems. Callers that retrieved values from the caches
could get results for the wrong filesytem. Code was changed to create
path_map and size_map caches per-filesystem.
A number of fields (connected, daos_fs, pool, container) were
stored in the global DFS structure which meant that any time a
path was presented to the plugin that was in a different DFS
filesystem than the previous path, the current filesystem would
have to be unmounted and then the new filesystem would be mounted.
The application could have had files open at the time in the
filesystem that was unmounted. The code was changed to maintain
filesystem state relative to each pool/container combo, and so
any number of DFS filesystems can now be mounted simultaneously.
None of the code in the DFS Cleanup() function was ever being called.
This is a known tensorflow issue, see
FileSystem destructor is not being called when done tensorflow/tensorflow#27535
The workaround is to call Cleanup() via the atexit() function.
The RenameFile function was enhanced to delete the cached size
of the source file and store that cached size for the destination
file.
The dfsLookUp() routine required the caller to indicate whether
or not the object to be looked up was a directory. This was
necessary because dfs_open() was being used to open the object,
and that call requires a different open_mode for directories and
files. However, the caller does not always know the type of the
object being looked up, e.g PathExists() and IsDir(). If the
caller guesses wrong, then the dfs_open() call fails, either with
EINVAL or ENOTDIR. The caller would map these errors to ENOENT,
which is incorrect. Code was changed to replace the dfs_open()
call with dfs_lookup_rel(), which removes the requirement that
the caller know the object's type a priori, the caller can check
the type of the object after it has been opened.
The dfsLookUp() routine required all callers to implement three
different behaviors depending upon the type of object being opened.
1. If a DFS root directory, a null dfs_obj_t would be returned,
  this would have to be special-cased by the caller.
2. If a non-root directory, a non-null dfs_obj_t would be returned
  which the caller must never release because the dfs_obj_t is
  also an entry in the path_map cache. Releasing the entry would
  cause future requests that use that cache entry to fail. There
  were a few cases in the code where this was occurring.
3. If a non-directory, a non-null dfs_obj_t would be returned
  which the caller must always release when done with it.
  The code was changed so that a DFS root directory returns a
  non-null dfs_obj_t. Also, whenever a directory that is in the
  path_map cache is referenced, dfs_dup() is used to make a (cheap)
  copy of the dfs_obj_t to return to the caller, so that the cached
  copy is never used outside of the cache. As a result, dfsLookUp()
  now always returns a non-null dfs_obj_t which must be released when
  no longer in use. Another advantage of using dfs_dup() is that it
  is then safe at any moment to clear a filesystem's path_map cache,
  there is no possibility that some caller is using a cached dfs_obj_t
  at that time.
All relative path references in the code have been replaced with
references to a dfs_path_t class which encapsulates everything
known about a particular DFS path, including the filesystem in
which the path resides. Member functions make it easy to update
the correct caches for the correct filesystem for each path.
Also, there were many places throughout the code where string
manipulation was being done, e.g. to extract a parent pathname or
a basename. That code has been replaced with dfs_path_t member
functions so that the actual string manipulation only occurs in
a single place in the plugin.
Setup() now initializes a dfs_path_t instead of global pool, cont,
and rel_path variables. It also does some minor lexical
normalization of the rel_path member, as opposed to doing so in
multiple places in the code downstream.
Code was modified in various places so that 100 of the tests in
the tensorflow modular_filesystem_test' test suite pass. there are
three remaining failing tests. One is an incorrect test, one is
checking a function not implemented in the plugin. The third is
reporting failures in TranslateName() which will be handled in a
separate PR.
The plugin was coded to use 'dfs://' as the filesystem prefix, but
the DAOS client API is coded internally to use 'daos://' as the
prefix. The plugin was changed to use 'daos://' so that pathnames
used by one application would not have to be munged in order to
also work with tensorflow.

Per file changes:

dfs_utils.h:
* The per-container class cont_info_t was added that maintains all
per-filesystem state.
* Class dfs_path_t was added that maintains all per-file state.
The class knows which filesystem the file is in, e.g. to update
the correct cache maps.
* The global fields connected, daos_fs, pool, container, path_map,
and size_map are removed, replaced by the per-filesystem versions.
* Mounts are now done at the same time as connection to the container,
filesytems remain mounted until their containers are disconnected.

dfs_filesystem.cc:

Many of the functions were made static so that they don't show up
in the library's symbol table, avoiding potential conflicts with
other plugins.
Changed path references to dfs_path_t references throughout.
DFSRandomAccessFile()
- Replaced the dpath string with the dfs_path_t as a constructor
  parameter so that the per-filesystem size cache can be updated.
  DFSWritableFile()
- Replaced the dpath string with the dfs_path_t as a constructor
  parameter so that the per-filesystem size cache can be updated
  whenever the file is appended to.
  NewWritableFile()
- Changed file creation mode parameter to include S_IRUSR so that
  files can be read when the filesystem is mounted via fuse.
  NewAppendableFile()
- Changed file creation mode parameter to include S_IRUSR so that
  files can be read when the filesystem is mounted via fuse.
  PathExists()
- Reworked the code to work with the new dfsLookUp() behavior.
  dfsPathExists() call was removed as it no longer provided anything
  not already provided by dfsLookUp(). Also, many errors returned
  by dfsPathExists() were mapped to TF_NOT_FOUND, which was
  incorrect. Also, PathExists() can be called for either files or
  directories, but dfsPathExists internally called dfsLookUp() with
  isDirectory = false, so callers that passed in a directory path
  would get failures.
  Stat()
- Used to call dfs_get_size(), then called dfs_ostat(), but the file
  size is available in stbuf, so the dfs_get_size() call was
  extra overhead and was removed.
  FlushCaches()
- Used to call ClearConnections, which unmounted any filesystem and
  disconnected from its container and pool, when there could be
  files open for read or write. The ClearConnections call was
  removed. Code was added to clear the size caches as well as the
  directory caches.

dfs_utils.cc

New functions were added for clearing individual filesystem caches
and all filesystem caches for all mounted filesystems.
There was code in many places for path string manipulation, checking
if an object was a directory, etc. dfs_path_t member functions were
created to replace all these so that a particular operation was only
implemented in one spot in the code.
DFS::~DFS()
- The code to clear the directory cache only released the first entry,
  there was no code to iterate through the container. Replaced with
  function calls to clear all directory and size caches.
  Unmount()
- Now done automatically as part of disconnecting a container, a
  separate function was no longer needed.
  ParseDFSPath()
- The code assumed that any path it was given would have both pool
  and container components, it was unable to handle malformed paths.
  Code was changed to let duns_resolve_path() validate the path
  components. There used to be two calls to duns_resolve_path()
  because DUNS_NO_CHECK_PATH was not set, and so the first path
  would fail if the pool and container components were labels,
  duns_resolv_path() only recognizes uuids if DUNS_NO_CHECK_PATH
  is not set. When pool and/or container labels were used, the
  duns_resolve_path() code would check against local mounted
  filesystems, and would hopefully fail. The code then prepended
  dfs:: and tried again, which would be recognized as a
  "direct path". Paths which only contained uuids were
  successfully parsed with the first duns_resolve_path() call.
  By using the DUNS_NO_CHECK_PATH flag and always including the
  daos:// prefix, only a single system call is needed.
  Setup()
- Reworked to populate a dfs_path_t instead of separate pool,
  cont, and relpath variables. A filesystem is now automatically
  mounted as part of connecting to the container, so a separate
  function was no longer needed.
  ClearConnections()
- The code for looping through pools and containers didn't work
  properly because the subroutines erase their map entries
  internally which invalidates the iterators being used in
  ClearConnections(). Code was changed so that the iterators
  are reinitialized each time through the loop.
- Code to disconnect all the containers in a pool was moved to
  the DisconnectPool() function, so that it is not possible to
  disconnect a pool without first disconnecting all its containers.
  dfsDeleteObject()
- Enhanced to only clear the directory cache for the filesystem in
  which the object existed.
- If the object was a file, the size cache entry for that file is
  deleted. If a directory was being recursively deleted, the
  filesystem's size cache is now also cleared.
  dfsLookUp() and dfsFindParent()
- As mentioned at the top, code was rewritten so that cached
  directory entries are never returned to a caller, instead
  a dup reference is returned so that the caller is always
  given an object reference it must release.
  dfsCreateDir()
- Error exit statuses were enhanced in order to pass the tensorflow
  'modular_filesystem_test' test suite.
  ConnectPool()
- Simplified somewhat as the pool id_handle_t is no longer needed.
  ConnectContainer()
- Simplified somewhat as the cont id_handle_t is no longer needed.
- Added code to immediately mount any container that is connected.
- Code added to initialize all the per-filesystem state variables
  was added.
  DisconnectPool()
- Added code to disconnect any containers before disconnecting the
  pool.
  DisconnectContainer()
- Added code to unmount any filesystem before disconnecting its
  container.
Added all the dfs_path_t member function implementations.
Included a few new dsym references for dfs function calls that have
been added.

Signed-off-by: Kevan Rehm kevan.rehm@hpe.com

Global Changes: * The plugin was using duplicate definitions of internal DAOS client structures (dfs_obj_t, dfs_t, dfs_entry_t), and would create malloc'd copies of those structs in order to be able to access their private fields. Should DAOS modify those structures in future releases, the plugin would break for those releases. The dependencies on internal fields have been removed, the DAOS client API is now strictly followed. * The path_map and size_map caches used DFS mount-point-relative pathnames as keys. If more than one DFS filesystem is mounted during the same run, then the same relative pathname could be in use in both filesystems. Callers that retrieved values from the caches could get results for the wrong filesytem. Code was changed to create path_map and size_map caches per-filesystem. * A number of fields (connected, daos_fs, pool, container) were stored in the global DFS structure which meant that any time a path was presented to the plugin that was in a different DFS filesystem than the previous path, the current filesystem would have to be unmounted and then the new filesystem would be mounted. The application could have had files open at the time in the filesystem that was unmounted. The code was changed to maintain filesystem state relative to each pool/container combo, and so any number of DFS filesystems can now be mounted simultaneously. * None of the code in the DFS Cleanup() function was ever being called. This is a known tensorflow issue, see tensorflow/tensorflow#27535 The workaround is to call Cleanup() via the atexit() function. * The RenameFile function was enhanced to delete the cached size of the source file and store that cached size for the destination file. * The dfsLookUp() routine required the caller to indicate whether or not the object to be looked up was a directory. This was necessary because dfs_open() was being used to open the object, and that call requires a different open_mode for directories and files. However, the caller does not always know the type of the object being looked up, e.g PathExists() and IsDir(). If the caller guesses wrong, then the dfs_open() call fails, either with EINVAL or ENOTDIR. The caller would map these errors to ENOENT, which is incorrect. Code was changed to replace the dfs_open() call with dfs_lookup_rel(), which removes the requirement that the caller know the object's type a priori, the caller can check the type of the object after it has been opened. * The dfsLookUp() routine required all callers to implement three different behaviors depending upon the type of object being opened. 1. If a DFS root directory, a null dfs_obj_t would be returned, this would have to be special-cased by the caller. 2. If a non-root directory, a non-null dfs_obj_t would be returned which the caller must never release because the dfs_obj_t is also an entry in the path_map cache. Releasing the entry would cause future requests that use that cache entry to fail. There were a few cases in the code where this was occurring. 3. If a non-directory, a non-null dfs_obj_t would be returned which the caller must always release when done with it. The code was changed so that a DFS root directory returns a non-null dfs_obj_t. Also, whenever a directory that is in the path_map cache is referenced, dfs_dup() is used to make a (cheap) copy of the dfs_obj_t to return to the caller, so that the cached copy is never used outside of the cache. As a result, dfsLookUp() now always returns a non-null dfs_obj_t which must be released when no longer in use. Another advantage of using dfs_dup() is that it is then safe at any moment to clear a filesystem's path_map cache, there is no possibility that some caller is using a cached dfs_obj_t at that time. * All relative path references in the code have been replaced with references to a dfs_path_t class which encapsulates everything known about a particular DFS path, including the filesystem in which the path resides. Member functions make it easy to update the correct caches for the correct filesystem for each path. Also, there were many places throughout the code where string manipulation was being done, e.g. to extract a parent pathname or a basename. That code has been replaced with dfs_path_t member functions so that the actual string manipulation only occurs in a single place in the plugin. * Setup() now initializes a dfs_path_t instead of global pool, cont, and rel_path variables. It also does some minor lexical normalization of the rel_path member, as opposed to doing so in multiple places in the code downstream. * Code was modified in various places so that 100 of the tests in the tensorflow modular_filesystem_test' test suite pass. there are three remaining failing tests. One is an incorrect test, one is checking a function not implemented in the plugin. The third is reporting failures in TranslateName() which will be handled in a separate PR. * The plugin was coded to use 'dfs://' as the filesystem prefix, but the DAOS client API is coded internally to use 'daos://' as the prefix. The plugin was changed to use 'daos://' so that pathnames used by one application would not have to be munged in order to also work with tensorflow. Per file changes: dfs_utils.h: * The per-container class cont_info_t was added that maintains all per-filesystem state. * Class dfs_path_t was added that maintains all per-file state. The class knows which filesystem the file is in, e.g. to update the correct cache maps. * The global fields connected, daos_fs, pool, container, path_map, and size_map are removed, replaced by the per-filesystem versions. * Mounts are now done at the same time as connection to the container, filesytems remain mounted until their containers are disconnected. dfs_filesystem.cc: * Many of the functions were made static so that they don't show up in the library's symbol table, avoiding potential conflicts with other plugins. * Changed path references to dfs_path_t references throughout. DFSRandomAccessFile() * Replaced the dpath string with the dfs_path_t as a constructor parameter so that the per-filesystem size cache can be updated. DFSWritableFile() * Replaced the dpath string with the dfs_path_t as a constructor parameter so that the per-filesystem size cache can be updated whenever the file is appended to. NewWritableFile() * Changed file creation mode parameter to include S_IRUSR so that files can be read when the filesystem is mounted via fuse. NewAppendableFile() * Changed file creation mode parameter to include S_IRUSR so that files can be read when the filesystem is mounted via fuse. PathExists() * Reworked the code to work with the new dfsLookUp() behavior. dfsPathExists() call was removed as it no longer provided anything not already provided by dfsLookUp(). Also, many errors returned by dfsPathExists() were mapped to TF_NOT_FOUND, which was incorrect. Also, PathExists() can be called for either files or directories, but dfsPathExists internally called dfsLookUp() with isDirectory = false, so callers that passed in a directory path would get failures. Stat() * Used to call dfs_get_size(), then called dfs_ostat(), but the file size is available in stbuf, so the dfs_get_size() call was extra overhead and was removed. FlushCaches() * Used to call ClearConnections, which unmounted any filesystem and disconnected from its container and pool, when there could be files open for read or write. The ClearConnections call was removed. Code was added to clear the size caches as well as the directory caches. dfs_utils.cc * New functions were added for clearing individual filesystem caches and all filesystem caches for all mounted filesystems. * There was code in many places for path string manipulation, checking if an object was a directory, etc. dfs_path_t member functions were created to replace all these so that a particular operation was only implemented in one spot in the code. DFS::~DFS() * The code to clear the directory cache only released the first entry, there was no code to iterate through the container. Replaced with function calls to clear all directory and size caches. Unmount() * Now done automatically as part of disconnecting a container, a separate function was no longer needed. ParseDFSPath() * The code assumed that any path it was given would have both pool and container components, it was unable to handle malformed paths. Code was changed to let duns_resolve_path() validate the path components. There used to be two calls to duns_resolve_path() because DUNS_NO_CHECK_PATH was not set, and so the first path would fail if the pool and container components were labels, duns_resolv_path() only recognizes uuids if DUNS_NO_CHECK_PATH is not set. When pool and/or container labels were used, the duns_resolve_path() code would check against local mounted filesystems, and would hopefully fail. The code then prepended dfs:: and tried again, which would be recognized as a "direct path". Paths which only contained uuids were successfully parsed with the first duns_resolve_path() call. By using the DUNS_NO_CHECK_PATH flag and always including the daos:// prefix, only a single system call is needed. Setup() * Reworked to populate a dfs_path_t instead of separate pool, cont, and relpath variables. A filesystem is now automatically mounted as part of connecting to the container, so a separate function was no longer needed. ClearConnections() * The code for looping through pools and containers didn't work properly because the subroutines erase their map entries internally which invalidates the iterators being used in ClearConnections(). Code was changed so that the iterators are reinitialized each time through the loop. * Code to disconnect all the containers in a pool was moved to the DisconnectPool() function, so that it is not possible to disconnect a pool without first disconnecting all its containers. dfsDeleteObject() * Enhanced to only clear the directory cache for the filesystem in which the object existed. * If the object was a file, the size cache entry for that file is deleted. If a directory was being recursively deleted, the filesystem's size cache is now also cleared. dfsLookUp() and dfsFindParent() * As mentioned at the top, code was rewritten so that cached directory entries are never returned to a caller, instead a dup reference is returned so that the caller is always given an object reference it must release. dfsCreateDir() * Error exit statuses were enhanced in order to pass the tensorflow 'modular_filesystem_test' test suite. ConnectPool() * Simplified somewhat as the pool id_handle_t is no longer needed. ConnectContainer() * Simplified somewhat as the cont id_handle_t is no longer needed. * Added code to immediately mount any container that is connected. * Code added to initialize all the per-filesystem state variables was added. DisconnectPool() * Added code to disconnect any containers before disconnecting the pool. DisconnectContainer() * Added code to unmount any filesystem before disconnecting its container. * Added all the dfs_path_t member function implementations. * Included a few new dsym references for dfs function calls that have been added. Signed-off-by: Kevan Rehm <kevan.rehm@hpe.com>

omar-hmarzouk merged commit c53f6a0 into daos-stack:devel Jul 25, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Various additional dfs plugin fixes #6

Various additional dfs plugin fixes #6

Uh oh!

krehm commented Jul 13, 2022

Uh oh!

Uh oh!

Various additional dfs plugin fixes #6

Various additional dfs plugin fixes #6

Uh oh!

Conversation

krehm commented Jul 13, 2022

Uh oh!

Uh oh!