Skip to content

dvc push may fail for remote hosted on network attached drives #306

Open

Description

OS: Linux
Python 3.8
dvc 3.55.2
dvc-azure 3.1.0
dvc-data 3.16.5
dvc-http 2.32.0
dvc-objects 5.1.0
dvc-render 1.0.2
dvc-studio-client 0.21.0
dvc-task 0.4.0

Hi,

I come across a special case where the dvc push fails due to hosting the remote on a network shared drive (Samba Drive).

During pushing the files from the local cache to the remote storage, DVC will reach copyfile here and attempt to do CoW (Copy on Write) here but due to the fact that both (cache and remote) are on different file systems, it will fail and try to unlink here the linking happened due to calling fcntl.ioctl() here and then copyfile will try to copy the file using shutil.copyfile(src, dst) here. At this point, the call fails due to FileNotFoundError exception as the dst is seen as file (os.path.exists(dst) returns True) but it is not actually found on the filesystem.

it seems that due to network latency or caching from SMB drive, the unlinking here is not reflected immediately and I've to wait few milliseconds until the dst path is removed and could be seen as not found on the filesystem (os.path.exists(dst) returns False). Then, normal copying using shutil.copyfile(src, dst) could work fine.

I've written a simple script by extracting the code snippets from lib to reproduce the error.

import os
import shutil
from secrets import token_urlsafe
import fcntl
import time

FICLONE = 0x40049409
def reflink(src, dst):
    src_fd = os.open(src, os.O_RDONLY)
    try:
        dst_fd = os.open(dst, os.O_WRONLY | os.O_CREAT | os.O_TRUNC, 0o666)
    except OSError:
        os.close(src_fd)
        raise

    try:
        
        fcntl.ioctl(dst_fd, FICLONE, src_fd)
    except OSError as e:
        print(">>>", e)
        os.close(src_fd)
        os.close(dst_fd)
        try:
            os.unlink(dst)

            ### Temp fix, comment below code to reproduce the error ###
            while os.path.exists(dst): # Check if file still exists
                time.sleep(0.1)
                print(f"{dst} still exists after 0.1 second")
            ### Temp fix, comment above code to reproduce the error ###
            
            print(dst, "exists: ", os.path.exists(dst)) # after unlink, the `dst` shouldn't be found
        except OSError:
            print(">>> failed to unlink")
            pass
        raise
    else:
        os.close(src_fd)
        os.close(dst_fd)

def tmp_fname(prefix: str = "") -> str:
    """Temporary name for a partial download"""
    return f"{prefix}.{token_urlsafe(16)}.tmp"



rpath = "/mnt/interior_sensing/test-dvc/randombinfile.bin" #dest Samba drive
parent = "/mnt/interior_sensing/test-dvc"
lpath = "./randombitsfile.bin" # local filesystem
tmp_file = os.path.join(parent, tmp_fname())
# create random file
fileSizeInBytes = 1024*1024
with open(lpath, 'wb') as fout:
    fout.write(os.urandom(fileSizeInBytes)) 
try:
    reflink(lpath, tmp_file)
    # NOTE: reflink may or may not clone src permissions
    os.chmod(dest, 0o666 & ~umask)
    os.replace(tmp_file, rpath)
    print("Reflink Successfully !!!")
    exit()
except OSError:
    pass
# if reflink is succeeded, code will not reach here. Otherwise, do normal copy
shutil.copyfile(lpath, tmp_file)
os.replace(tmp_file, rpath)

Error output:

>>> [Errno 18] Invalid cross-device link
/mnt/interior_sensing/test-dvc/.a4qKiU5jH5cZ124XMHY-dg.tmp exists:  True
Traceback (most recent call last):
  File "/home/user/env/test_dvc_script.py", line 67, in <module>
    shutil.copyfile(lpath, tmp_file)
  File "/home/user/env/lib/python3.9/shutil.py", line 266, in copyfile
    with open(dst, 'wb') as fdst:
FileNotFoundError: [Errno 2] No such file or directory: '/mnt/interior_sensing/test-dvc/.a4qKiU5jH5cZ124XMHY-dg.tmp'

Output after applying the temp fix

>>> [Errno 18] Invalid cross-device link #due to the fact that src, dst are not different filesystems
/mnt/interior_sensing/test-dvc/.k28cCuZ4Tx5SFpVu-dn9RQ.tmp still exists after 0.1 second
/mnt/interior_sensing/test-dvc/.k28cCuZ4Tx5SFpVu-dn9RQ.tmp still exists after 0.1 second
/mnt/interior_sensing/test-dvc/.k28cCuZ4Tx5SFpVu-dn9RQ.tmp still exists after 0.1 second
/mnt/interior_sensing/test-dvc/.k28cCuZ4Tx5SFpVu-dn9RQ.tmp still exists after 0.1 second
/mnt/interior_sensing/test-dvc/.k28cCuZ4Tx5SFpVu-dn9RQ.tmp still exists after 0.1 second
/mnt/interior_sensing/test-dvc/.k28cCuZ4Tx5SFpVu-dn9RQ.tmp still exists after 0.1 second
/mnt/interior_sensing/test-dvc/.k28cCuZ4Tx5SFpVu-dn9RQ.tmp still exists after 0.1 second
/mnt/interior_sensing/test-dvc/.k28cCuZ4Tx5SFpVu-dn9RQ.tmp still exists after 0.1 second
/mnt/interior_sensing/test-dvc/.k28cCuZ4Tx5SFpVu-dn9RQ.tmp still exists after 0.1 second
/mnt/interior_sensing/test-dvc/.k28cCuZ4Tx5SFpVu-dn9RQ.tmp still exists after 0.1 second
/mnt/interior_sensing/test-dvc/.k28cCuZ4Tx5SFpVu-dn9RQ.tmp still exists after 0.1 second
/mnt/interior_sensing/test-dvc/.k28cCuZ4Tx5SFpVu-dn9RQ.tmp still exists after 0.1 second
/mnt/interior_sensing/test-dvc/.k28cCuZ4Tx5SFpVu-dn9RQ.tmp still exists after 0.1 second
/mnt/interior_sensing/test-dvc/.k28cCuZ4Tx5SFpVu-dn9RQ.tmp still exists after 0.1 second
/mnt/interior_sensing/test-dvc/.k28cCuZ4Tx5SFpVu-dn9RQ.tmp still exists after 0.1 second
/mnt/interior_sensing/test-dvc/.k28cCuZ4Tx5SFpVu-dn9RQ.tmp still exists after 0.1 second
/mnt/interior_sensing/test-dvc/.k28cCuZ4Tx5SFpVu-dn9RQ.tmp still exists after 0.1 second
/mnt/interior_sensing/test-dvc/.k28cCuZ4Tx5SFpVu-dn9RQ.tmp still exists after 0.1 second
/mnt/interior_sensing/test-dvc/.k28cCuZ4Tx5SFpVu-dn9RQ.tmp still exists after 0.1 second
/mnt/interior_sensing/test-dvc/.k28cCuZ4Tx5SFpVu-dn9RQ.tmp still exists after 0.1 second
/mnt/interior_sensing/test-dvc/.k28cCuZ4Tx5SFpVu-dn9RQ.tmp still exists after 0.1 second
/mnt/interior_sensing/test-dvc/.k28cCuZ4Tx5SFpVu-dn9RQ.tmp exists:  False
(env) user@machine:~/workspace$ ls /mnt/interior_sensing/test-dvc/
randombinfile.bin # file copied successfully

I'm not sure if this a case that should be handled or it is a case specific to my setup, but I thought it is worth mentioning.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions