- 
                Notifications
    You must be signed in to change notification settings 
- Fork 928
add an iwarp specific BTL component #5192
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
add an iwarp specific BTL component, just say no to xrc, cuda support, etc. And only do rdmacm based connection setup. Signed-off-by: Howard Pritchard <howardp@lanl.gov>
| The IBM CI (XL Compiler) build failed! Please review the log, linked below. Gist: https://gist.github.com/bafb53823a9988fd54e8b13a1394a0db | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made a zillion comments, some of which are quite likely redundant / might have been mentioned multiple times. Sorry about that. 😦
In general:
- There's still a bunch of non-iWARP-specific stuff in here. E.g., I see a bunch of references to XRC.
- I also see references to things that I don't think are relevant to iWARP, but I'm not 100% sure, such as: APM, LMC, BTLs per LID, PKEY, atomics, SRQ, etc. All the ones that are not relevant should be removed from this BTL (code, MCA params, comments, etc.).
- Additionally, if some of the above are not relevant, then a LOT of code can be deleted / simiplified (e.g., if there's no XRC and no SRQ, then there is only one kind of QP: PP -- so all the stuff about parsing different types of QPs in MCA params, setting up different types of QPs, checking for QP type compatibility, ...etc. -- all that stuff can just be deleted). There's a few different cases of that (e.g., APM falls in the same category).
- There should only be one CPC left: RDMA CM. As such, the whole CPC mechanism should likely be deleted (there's quite a bit of infrastructure in place for the CPC) and it should be replaced with simple function calls to the RDMA CM CPC. That should end up being a pretty big chunk of code that can be deleted.
This is my $0.02, but I'm not an iWARP vendor. So you can disregard my notes -- I just thought @hppritcha might be excited about deleting even more code. 🎉 🎂 Should also get @larrystevenwise 's take on this PR, too.
|  | ||
| ############################################################################ | ||
|  | ||
| [Mellanox Arbel InfiniHost III MemFree/Tavor] | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should probably remove all the non-iWARP devices from this file.
| %s | ||
| # | ||
| [ini file:not in a section] | ||
| In parsing the OpenFabrics (openib) BTL parameter file, values were | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to s/openib/iwarp/i throughout this file (and adjust surrounding text as appropriate).
| will be deactivated for this run. | ||
|  | ||
| Valid queue pair types are "P" for per-peer and "S" for shared receive | ||
| queue. | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does SRQ even exist in iWARP?
Should probably audit all the help messages in this file and delete any that are no longer relevant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SRQ is supported by Chelsio.
| /** | ||
| * This is struct is defined below | ||
| */ | ||
| struct opal_btl_iwarp_connect_base_module_t; | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the whole CPC scheme still relevant? I.e., there's only 1 CPC left, right? If so, might be tremendously simpler (i.e., could snip out a whole lot of code) to just hard-code the function calls and delete all the CPC registration/etc. stuff.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. Assume and require librdmacm for connection setup.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't want to do big changes to cpc until I've got a platform for testing.
|  | ||
| #include "opal_config.h" | ||
|  | ||
| #include "connect/connect.h" | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this file relevant anymore?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'd like to leave this in place till I can test with a rdmacm only cpc - looks like some funny business after oob was removed.
| "ibv_modify_srq" function. If a return value of the function not | ||
| success => our decision that the device doesn't support this | ||
| capability. */ | ||
| static int check_if_device_support_modify_srq(mca_btl_iwarp_module_t *iwarp_btl) | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is irrelevant if iWARP does not support SRQ.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lets leave SRQ support in.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you leave SRQ support in, you might want to update the ini file to indicate which model(s) of Chelsio cards support it (I think all of them are marked as point-to-point RQs right now).
        
          
                opal/mca/btl/iwarp/btl_iwarp.c
              
                Outdated
          
        
      | attr.attr.max_sge = 1; | ||
| iwarp_btl->qps[qp].u.srq_qp.rd_posted = 0; | ||
| #if HAVE_XRC | ||
| if(BTL_IWARP_QP_TYPE_XRC(qp)) { | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There will definitely never be XRC in this BTL.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree.
        
          
                opal/mca/btl/iwarp/btl_iwarp.c
              
                Outdated
          
        
      |  | ||
| mca_btl_iwarp_transport_type_t mca_btl_iwarp_get_transport_type(mca_btl_iwarp_module_t* iwarp_btl) | ||
| { | ||
| /* If we have a driver with RDMAoE supporting as the device struct contains the same type (IB) for | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function is likely no longer relevant.
        
          
                opal/mca/btl/iwarp/btl_iwarp.c
              
                Outdated
          
        
      | #endif | ||
|  | ||
| #if HAVE_XRC | ||
| /* if user configured to run with XRC qp and the device doesn't | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Delete all XRC. I'll stop commenting on. it -- I see XRC blocks in several other places in this file.
        
          
                opal/mca/btl/iwarp/btl_iwarp.c
              
                Outdated
          
        
      | mca_btl_iwarp_proc_t* ib_proc; | ||
|  | ||
| #if defined(HAVE_STRUCT_IBV_DEVICE_TRANSPORT_TYPE) | ||
| /* Most current iWARP adapters (June 2008) cannot handle | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make having IBV_DEVICE_TRANSPORT_TYPE be required for this BTL (i.e., remove the #if).
| SRQ should stay: T6 supports it, and the sw support is heading upstream now. | 
Signed-off-by: Howard Pritchard <hppritcha@gmail.com>
Somehow didn't remove all the XRC stuff before. Also remove option for IB RDMACM ADDR, since this isn't relevant for Iwarp devices. Signed-off-by: Howard Pritchard <hppritcha@gmail.com>
| The IBM CI (XL Compiler) build failed! Please review the log, linked below. Gist: https://gist.github.com/e32b96c67ddb56d95ab1d7f407ce56b3 | 
git ignore the openib btl to make IBM compiler happy Signed-off-by: Howard Pritchard <hppritcha@gmail.com>
remove dynamic service level code from iWarp btl. Not relevant. Signed-off-by: Howard Pritchard <hppritcha@gmail.com>
| mellanox fails because we aren't building openib btl now. | 
Signed-off-by: Howard Pritchard <hppritcha@gmail.com>
iWarp only was supported on ancient OFED 1.2 and newer. Since that's all we're going to support from Open IB land going forward, bail in configuring if transport type field not present in IB device struct. Remove conditionals around iWarp btl that checks for transport type. Signed-off-by: Howard Pritchard <hppritcha@gmail.com>
| The IBM CI (PGI Compiler) build failed! Please review the log, linked below. Gist: https://gist.github.com/ba7b3cc07364959e948e8f48d07ac1ad | 
Signed-off-by: Howard Pritchard <hppritcha@gmail.com>
get rid of level_of_service field in the btl iwarp module struct, prune out dead code associated with not OPAL_HAVE_RDMACM. Signed-off-by: Howard Pritchard <hppritcha@gmail.com>
| @artpol84 @jladd-mlnx Can you please adjust the Mellanox CI to handle the case when the openib BTL no longer exists / is not built? | 
| @jsquyres what is the best signature to detect that? OMPI version? or just check for the openib component file? | 
| @artpol84 Perhaps  | 
| @karasevb please address this ASAP. | 
| bot:mellanox:retest | 
| @hppritcha it looks like an issue in thi this PR | 
Signed-off-by: Howard Pritchard <hppritcha@gmail.com>
iWarp doesn't know what a pkey is - according to Steve Wise. Signed-off-by: Howard Pritchard <hppritcha@gmail.com>
Get rid of LID's, APM, etc. from the iwarp BTL. Signed-off-by: Howard Pritchard <hppritcha@gmail.com>
| The IBM CI (GNU Compiler) build failed! Please review the log, linked below. Gist: https://gist.github.com/87b7d224a30c98962f3b707d96fee66b | 
| The IBM CI (XL Compiler) build failed! Please review the log, linked below. Gist: https://gist.github.com/960b6eec9713c56f8818002709ea2458 | 
| The IBM CI (PGI Compiler) build failed! Please review the log, linked below. Gist: https://gist.github.com/c809dbde2fe449387f1a910aa2d43272 | 
Signed-off-by: Howard Pritchard <hppritcha@gmail.com>
| decided to not push this iwarp only solution for 4.0. closing. | 
| Hey @hppritcha, why did this get dropped? Thanks, Steve. | 
add an iwarp specific BTL component, just say no
to xrc, cuda support, etc. And only do rdmacm
based connection setup.
Signed-off-by: Howard Pritchard howardp@lanl.gov