Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Querying a collection on a zone across federation results in a SYS_HEADER_READ_LEN_ERR #173

Open
bh9 opened this issue Sep 25, 2019 · 2 comments

Comments

@bh9
Copy link
Contributor

bh9 commented Sep 25, 2019

irods version: 4.1.12
python-irodsclient: clone of current master
irods environment:

{
    "irods_authentication_scheme": "native",
    "irods_cwd": "/Sanger1/home/bh9",
    "irods_home": "/Sanger1/home/bh9",
    "irods_host": "irods-sanger1-ies1.internal.sanger.ac.uk",
    "irods_port": 1247,
    "irods_ssl_ca_certificate_file": "/etc/irods/ca.pem",
    "irods_user_name": "bh9",
    "irods_zone_name": "Sanger1"
}

In most of the irods tool chain, there are a few ways to talk to a different zone:

  1. Provide the zone flag if there is one
  2. Log into a different zone to start with
  3. Specify a collection path in another zone
    As discussed yesterday, we are attempting to query across federation using the python-irodsclient, but since the iRODSSession.query() function does not provide us a zone flag, I figured I would try to specify the path. Attempting to query the contents of another zone by specifying the collection path results in a SYS_HEADER_READ_LEN_ERR. I've put together a small reproducer
#!/usr/bin/python
import os
from irods.session import iRODSSession
from irods.models import DataObject, Collection, DataObjectMeta
from irods.column import Criterion
env_file = os.path.expanduser('~/.irods/irods_environment.json')
session = iRODSSession(irods_env_file=env_file)
query = session.query(Collection, DataObject, DataObjectMeta).filter(Criterion('=', Collection.name, '/seq/home/bh9#Sanger1'))
query.first()

This script produces the below traceback when run

Traceback (most recent call last):
  File "./sysheaderreadlenerrreproducer.py", line 10, in <module>
    query.first()
  File "/nfs/users/nfs_b/bh9/python-irodsclient/irods/query.py", line 222, in first
    results = query.execute()
  File "/nfs/users/nfs_b/bh9/python-irodsclient/irods/query.py", line 165, in execute
    result_message = conn.recv()
  File "/nfs/users/nfs_b/bh9/python-irodsclient/irods/connection.py", line 95, in recv
    raise get_exception_by_code(msg.int_info)
irods.exception.SYS_HEADER_READ_LEN_ERR: None

Takeaways from this:
Good News: We can now reliably reproduce SYS_HEADER_READ_LEN_ERRs
Bad News: There doesn't seem to be a way to query a different zone with this client

@d-w-moore
Copy link
Collaborator

I've determined that the problem (throwing of the error SYS_HEADER_READ_LEN_ERR) arises when querying a particular combination of columns via:

session.query(Collection, DataObject, DataObjectMeta)

but doesn't arise when we omit 'Collection' from the result tuple. That is, we can do this:

results1 = session.query( DataObject, DataObjectMeta)
results2 = session.query (DataObject, Collection).filter(   criteria_based_on_results1 ... )

That ellipsis isn't "giving up ", by the way. I've included a demonstration of how this might work. See attached Python program, a demo in which I target a search at data objects containing a particular AVU name, located in a zone, also given by-name. (Example usage /tmp/data_objs_meta_search.py -z otherZone -m AVUname will list full logical paths for data objects in a given zone based on the name part of the AVU matching the target of the '-m' option.)

data_objs_meta_search.py.txt

@d-w-moore
Copy link
Collaborator

The following Python program is an improvement on the above, allowing searches on Collections or DataObjects in any zone (or the current one by default):
search_zone.py.txt
It avoids the joint search on DataObject/DataObjectMeta/Collection by getting the collection_id's from DataObject and caching the resulting iRODSCollection objects to avoid lots of extra queries.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

3 participants