Skip to content

SftpClient Enumerates Rather Than Accumulates Directory Items #395

Open
@znamenap

Description

@znamenap

Hi,
Thank you for really great library! Thank you once again!

I've a proposal to enhance (fix) some behavior which leads to delayed processing and huge memory footprints. They are actually our current production issues caused by 65 thousand directory items in single parent directory. Our consumer service has to list all of them and then process one by one or in batches. The current SftpClient code basically reads all the items and make SftpFile objects of it. That generates a peak in memory usage which is still acceptable. However, the criteria or behavior which is not acceptable is the time required to make the list of the folder items. In our scenario, it takes 1 minute and 45 seconds. The distance is between London and New York for production environment. The service we maintain and which uses SSH.NET has to check the remote directory as frequent as possible at max by 1 minute. In my test environment between West Europe and West US the listing operation lasts 2 minutes and 6 seconds over 75 thousand folder items.

            var files = _sftpSession.RequestReadDir(handle);

            while (files != null)
            {
                result.AddRange(from f in files
                                select new SftpFile(_sftpSession, string.Format(CultureInfo.InvariantCulture, "{0}{1}", basePath, f.Key), f.Value));

                //  Call callback to report number of files read
                if (listCallback != null)
                {
                    //  Execute callback on different thread
                    ThreadAbstraction.ExecuteThread(() => listCallback(result.Count));
                }

                files = _sftpSession.RequestReadDir(handle);
            }

There is nothing actually to do with the principle on listing the directory into items but how the library handles result data. Currently, it accumulates all items and then passes them to consumer as one big list. The enhancement should be about to introduce a new method which does real enumeration. This would merge/join the time spent on listing and consuming the items and pushes it directly to consumer. The consumer's logic may decide - ah, we've got enough items to process so we can stop enumerating (i.e. in the middle or at the first 150 items). This scenario actually shorten the processing time to acceptable minimum.

Obviously, I've looked into the source code, have done the change, tested it and thought about multithreading scenarios. I wish it to be merged in if you agree on.

Many thanks,
Pavel

Mine proposal is about to add dedicated new methods into SftpClient which does enumeration:

        /// <summary>
        /// Enumerates files and directories in remote directory.
        /// </summary>
        /// <remarks>
        /// This method differs to <see cref="ListDirectory(string, Action{int})"/> in the way how the items are returned. 
        /// It yields the items to the last moment for the enumerator to decide if it needs to continue or stop enumerating the items. 
        /// It is handy in case of really huge directory contents at remote server - meaning really huge 65 thousand files and more.
        /// It also decrease the memory footprint and avoids LOH allocation as happen per call to <see cref="ListDirectory(string, Action{int})"/> method.
        /// There aren't asynchronous counterpart methods to this because enumerating should happen in your specific asynchronous block.
        /// </remarks>
        /// <param name="path">The path.</param>
        /// <param name="listCallback">The list callback.</param>
        /// <returns>
        /// An <see cref="System.Collections.Generic.IEnumerable{SftpFile}"/> of files and directories ready to be enumerated.
        /// </returns>
        /// <exception cref="ArgumentNullException"><paramref name="path" /> is <b>null</b>.</exception>
        /// <exception cref="SshConnectionException">Client is not connected.</exception>
        /// <exception cref="SftpPermissionDeniedException">Permission to list the contents of the directory was denied by the remote host. <para>-or-</para> A SSH command was denied by the server.</exception>
        /// <exception cref="SshException">A SSH error where <see cref="Exception.Message" /> is the message from the remote host.</exception>
        /// <exception cref="ObjectDisposedException">The method was called after the client was disposed.</exception>
        public IEnumerable<SftpFile> EnumerateDirectory(string path, Action<int> listCallback = null)`

        /// <summary>
        /// Internals the list directory.
        /// </summary>
        /// <param name="path">The path.</param>
        /// <param name="listCallback">The list callback.</param>
        /// <returns>
        /// A list of files in the specified directory.
        /// </returns>
        /// <exception cref="ArgumentNullException"><paramref name="path" /> is <b>null</b>.</exception>
        /// <exception cref="SshConnectionException">Client not connected.</exception>
        private IEnumerable<SftpFile> InternalEnumerateDirectory(string path, Action<int> listCallback)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions