Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement iRODS PyFilesystem2 module #490

Closed
hechth opened this issue Nov 15, 2023 · 15 comments
Closed

Implement iRODS PyFilesystem2 module #490

hechth opened this issue Nov 15, 2023 · 15 comments
Assignees
Milestone

Comments

@hechth
Copy link

hechth commented Nov 15, 2023

It would be really great if there was a PyFilesystem2 implementation for iRODS like fs-irods (see here).

Should this be part of the python-irodsclient or an external library based on the python-irodsclient and PyFilesystem2?

I think it would make sense to implement it as a standalone library - I would start with an implementation and then we could potentially transfer the project into the irods organization for better visibility? Please let me know your thoughts on this or if you are aware that something similar has already been implemented.

@trel
Copy link
Member

trel commented Nov 15, 2023

oh, very interesting. we'll definitely investigate and have an opinion soon. thanks!

@d-w-moore
Copy link
Collaborator

d-w-moore commented Nov 15, 2023

Looks like it may be a good thing on which to build a bilateral,ssh-powered, "rsync-like" file syncher.

@d-w-moore
Copy link
Collaborator

d-w-moore commented Nov 17, 2023

Interesting... could also be a useful abstraction for rsync, ingest, lots of things.

import io
import weakref
from fs.base import FS
from fs.info import Info
import irods.test.helpers as h
from irods.path import iRODSPath
from irods.models import DataObject,Collection

weakrefs = weakref.WeakKeyDictionary()

session = h.make_session()

def norm(*paths):
    return str(iRODSPath(*paths))

def is_coll(_abs):
    return session.collections.exists(_abs)

# ///////////////////////////////////////////
#
# -- A skeletal and naive PyFilesystem2 implementation.

class iRODS_FS(FS):

    def __init__(self, root_path):
        super(iRODS_FS,self).__init__()
        self.root_path = root_path

    def makedir(self,*s,**kw):pass
    def removedir(self,*s,**kw):pass
    def remove(self,*s,**kw):pass
    def setinfo(self,*s,**kw):pass

    def openbin(self,path,mode='r',**_):
        p = norm(self.root_path, path)
        _mode = list(mode)
        while _mode.count('b'):
            _mode.remove('b')
        d = session.data_objects.open(p, ''.join(_mode))
        weakrefs [d.raw] = d
        return d.raw

    def listdir(self, path):
        c = session.collections.get(norm(self.root_path, path))
        return [o.name for o in c.subcollections + c.data_objects]

    def getinfo(self, path, namespaces=None):
        p = norm(self.root_path, path)
        info = { 'basic':{
                    'name': p.split('/')[-1],
                    'is_dir':is_coll(p)
                 }
        }
        return Info(info)

# -- ////////////////////////////////////////

if __name__ == '__main__':
    home = h.home_collection(session)
    try:
        session.collections.create(f'{home}/demo/dir1')
        session.data_objects.create(f'{home}/demo/emptydata')
        UC_enc = '\u1000'.encode('utf8')
        with session.data_objects.open(f'{home}/demo/dir1/text.dat','w') as f:
            f.write(b'hello'+UC_enc)

        # Use iRODS as a PyFilesystem2
        ifs = iRODS_FS(home)

        # Open binary file as Unicode (default)
        contents = ifs.open('/demo/dir1/text.dat').read()
        print ('unicode contents length = ',len(contents))
        print ('unicode contents = ',contents)
        with ifs.open('/demo/dir1/text2.dat','w') as f2:
            f2.write(contents + '\u1001')

        print ('walking:')
        for obj in ifs.walk():
            print('\t',obj)
    finally:
        session.collections.remove(f'{home}/demo',force = True)

@hechth
Copy link
Author

hechth commented Nov 17, 2023

@d-w-moore my plan was to give it a go today to work on this, I might use this as a starting point though I have to say I'm not particularly familiar with iRODS nor the python client.

@hechth
Copy link
Author

hechth commented Nov 17, 2023

See WIP here: https://github.com/hechth/fs-irods

@d-w-moore
Copy link
Collaborator

d-w-moore commented Nov 17, 2023

@d-w-moore my plan was to give it a go today to work on this, I might use this as a starting point though I have to say I'm not particularly familiar with iRODS nor the python client.

@hechth - You're welcome to mine and play with this code all you want... Meanwhile I'll probably be fleshing it out today into something a little more complete and real. It needs more methods, thread safety and maybe more.

@hechth
Copy link
Author

hechth commented Nov 17, 2023

@d-w-moore We can also schedule acall to coordinate these efforts? I think it will be a bit more complex so it might take time. I started with the skeleton for the extension and checked the options for how to deal with the session, but I!d be very curious to hear how you would implement it. My current approach is to always open a new session for each command so that they are self-contained in that sense. Also implementing unit tests is quite tricky as everything needs to be mocked basically

@d-w-moore
Copy link
Collaborator

d-w-moore commented Nov 17, 2023

@hechth Yes, a call would be good - maybe early next week, Monday (20.11) - Wednesday(22.11) sometime? At any rate, I don't I will be getting as far on it today as I thought.

Regarding the session: that represents state, and my impression is that pyfilesystem2 doesn't deal much with that, beyond storing it in the filesystem object and using _lock to guard against thread conflicts.

The global session object idea presented in my code snippet above is awkward, as I don't know what is supposed to happen in the case of overlapping access; ie., simultaneous use of irodsfs('/tempZone/home/user') and irodsfs('/tempZone/home/user/subcollection' from different threads would seem to demand a shared _lock between the two instances. So maybe better to accept (as a parameter) or generate a new session object for every irodsfs- ?

@hechth
Copy link
Author

hechth commented Nov 17, 2023

@d-w-moore monday sounds good. Which timezone are you in? I'll continue a bit still today and then I can give an overview on Monday and we can refine the implementation?

@d-w-moore
Copy link
Collaborator

@d-w-moore monday sounds good. Which timezone are you in? I'll continue a bit still today and then I can give an overview on Monday and we can refine the implementation?

Great. I'm in EST time zone

@d-w-moore
Copy link
Collaborator

d-w-moore commented Nov 20, 2023

@hechth changing the openbin implementation in the above snippet to

    def openbin(self,path,mode='r',buffering=-1,**_):
        p = norm(self.root_path, path)
        _mode = list(mode)
        while _mode.count('b'):
            _mode.remove('b')
        d = session.data_objects.open(p, ''.join(_mode), _buffering = buffering)
        if buffering >= 0:
            return d
        else:
            weakrefs[d.raw] = d
            return d.raw

And basing on the python-irodsclient from this dev branch:
https://github.com/d-w-moore/python-irodsclient/tree/pyfilesystem2-buffering

we can now run this code After having created /tempZone/home/rods/thing, via e.g. an iput):

ifs = iRODS_FS('/tempZone/home/rods')

with ifs.open('thing', 'rb', buffering=2048) as b:
    print('buffered: ',b)
    b.read(1)
with ifs.open('thing', 'rb', buffering=0) as b:
    print('buffered: ',b)
    b.read(1)
with ifs.open('thing', 'rb', buffering=-1) as b:
    print('buffered: ',b)
    b.read(1)

Results are:

buffered:  <_io.BufferedReader name='thing'>
buffered:  <_io.BufferedReader name='thing'>
buffered:  <fs.iotools.RawWrapper object at 0x7fbd86a2c940>

@hechth
Copy link
Author

hechth commented Nov 30, 2023

I think this can be closed as the repo has become a somewhat standalone resource I think and we can move discussions to the issues over there? Thanks for the useful starting points, it has been very helpful :)

@korydraughn
Copy link
Contributor

... the repo has become a somewhat standalone resource ...

Just want to confirm that when you say, the repo, you're referring to https://github.com/hechth/fs-irods?

@hechth
Copy link
Author

hechth commented Nov 30, 2023

Yes, I am.

@korydraughn
Copy link
Contributor

Cool cool.

Feel free to open new issues here if you encounter problems with the PRC.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

5 participants