Skip to content

Conversation

@hppritcha
Copy link
Member

add an iwarp specific BTL component, just say no
to xrc, cuda support, etc. And only do rdmacm
based connection setup.

Signed-off-by: Howard Pritchard howardp@lanl.gov

add an iwarp specific BTL component, just say no
to xrc, cuda support, etc.  And only do rdmacm
based connection setup.

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
@ibm-ompi
Copy link

The IBM CI (XL Compiler) build failed! Please review the log, linked below.

Gist: https://gist.github.com/bafb53823a9988fd54e8b13a1394a0db

Copy link
Member

@jsquyres jsquyres left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made a zillion comments, some of which are quite likely redundant / might have been mentioned multiple times. Sorry about that. 😦

In general:

  • There's still a bunch of non-iWARP-specific stuff in here. E.g., I see a bunch of references to XRC.
  • I also see references to things that I don't think are relevant to iWARP, but I'm not 100% sure, such as: APM, LMC, BTLs per LID, PKEY, atomics, SRQ, etc. All the ones that are not relevant should be removed from this BTL (code, MCA params, comments, etc.).
  • Additionally, if some of the above are not relevant, then a LOT of code can be deleted / simiplified (e.g., if there's no XRC and no SRQ, then there is only one kind of QP: PP -- so all the stuff about parsing different types of QPs in MCA params, setting up different types of QPs, checking for QP type compatibility, ...etc. -- all that stuff can just be deleted). There's a few different cases of that (e.g., APM falls in the same category).
  • There should only be one CPC left: RDMA CM. As such, the whole CPC mechanism should likely be deleted (there's quite a bit of infrastructure in place for the CPC) and it should be replaced with simple function calls to the RDMA CM CPC. That should end up being a pretty big chunk of code that can be deleted.

This is my $0.02, but I'm not an iWARP vendor. So you can disregard my notes -- I just thought @hppritcha might be excited about deleting even more code. 🎉 🎂 Should also get @larrystevenwise 's take on this PR, too.


############################################################################

[Mellanox Arbel InfiniHost III MemFree/Tavor]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should probably remove all the non-iWARP devices from this file.

%s
#
[ini file:not in a section]
In parsing the OpenFabrics (openib) BTL parameter file, values were
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to s/openib/iwarp/i throughout this file (and adjust surrounding text as appropriate).

will be deactivated for this run.

Valid queue pair types are "P" for per-peer and "S" for shared receive
queue.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does SRQ even exist in iWARP?

Should probably audit all the help messages in this file and delete any that are no longer relevant.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SRQ is supported by Chelsio.

/**
* This is struct is defined below
*/
struct opal_btl_iwarp_connect_base_module_t;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the whole CPC scheme still relevant? I.e., there's only 1 CPC left, right? If so, might be tremendously simpler (i.e., could snip out a whole lot of code) to just hard-code the function calls and delete all the CPC registration/etc. stuff.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. Assume and require librdmacm for connection setup.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't want to do big changes to cpc until I've got a platform for testing.


#include "opal_config.h"

#include "connect/connect.h"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this file relevant anymore?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'd like to leave this in place till I can test with a rdmacm only cpc - looks like some funny business after oob was removed.

"ibv_modify_srq" function. If a return value of the function not
success => our decision that the device doesn't support this
capability. */
static int check_if_device_support_modify_srq(mca_btl_iwarp_module_t *iwarp_btl)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is irrelevant if iWARP does not support SRQ.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets leave SRQ support in.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you leave SRQ support in, you might want to update the ini file to indicate which model(s) of Chelsio cards support it (I think all of them are marked as point-to-point RQs right now).

attr.attr.max_sge = 1;
iwarp_btl->qps[qp].u.srq_qp.rd_posted = 0;
#if HAVE_XRC
if(BTL_IWARP_QP_TYPE_XRC(qp)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There will definitely never be XRC in this BTL.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree.


mca_btl_iwarp_transport_type_t mca_btl_iwarp_get_transport_type(mca_btl_iwarp_module_t* iwarp_btl)
{
/* If we have a driver with RDMAoE supporting as the device struct contains the same type (IB) for
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is likely no longer relevant.

#endif

#if HAVE_XRC
/* if user configured to run with XRC qp and the device doesn't
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Delete all XRC. I'll stop commenting on. it -- I see XRC blocks in several other places in this file.

mca_btl_iwarp_proc_t* ib_proc;

#if defined(HAVE_STRUCT_IBV_DEVICE_TRANSPORT_TYPE)
/* Most current iWARP adapters (June 2008) cannot handle
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make having IBV_DEVICE_TRANSPORT_TYPE be required for this BTL (i.e., remove the #if).

@larrystevenwise
Copy link

SRQ should stay: T6 supports it, and the sw support is heading upstream now.

hppritcha added 2 commits May 28, 2018 09:59
Signed-off-by: Howard Pritchard <hppritcha@gmail.com>
Somehow didn't remove all the XRC stuff before.
Also remove option for IB RDMACM ADDR, since this
isn't relevant for Iwarp devices.

Signed-off-by: Howard Pritchard <hppritcha@gmail.com>
@ibm-ompi
Copy link

The IBM CI (XL Compiler) build failed! Please review the log, linked below.

Gist: https://gist.github.com/e32b96c67ddb56d95ab1d7f407ce56b3

hppritcha added 2 commits May 28, 2018 11:22
git ignore the openib btl to make IBM compiler happy

Signed-off-by: Howard Pritchard <hppritcha@gmail.com>
remove dynamic service level code from iWarp btl.
Not relevant.

Signed-off-by: Howard Pritchard <hppritcha@gmail.com>
@hppritcha
Copy link
Member Author

mellanox fails because we aren't building openib btl now.

hppritcha added 2 commits May 28, 2018 12:45
Signed-off-by: Howard Pritchard <hppritcha@gmail.com>
iWarp only was supported on ancient OFED 1.2 and newer.
Since that's all we're going to support from Open IB land
going forward, bail in configuring if transport type
field not present in IB device struct.  Remove
conditionals around iWarp btl that checks for transport
type.

Signed-off-by: Howard Pritchard <hppritcha@gmail.com>
@ibm-ompi
Copy link

The IBM CI (PGI Compiler) build failed! Please review the log, linked below.

Gist: https://gist.github.com/ba7b3cc07364959e948e8f48d07ac1ad

hppritcha added 2 commits May 28, 2018 13:40
Signed-off-by: Howard Pritchard <hppritcha@gmail.com>
get rid of level_of_service field in the btl iwarp
module struct, prune out dead code associated with
not OPAL_HAVE_RDMACM.

Signed-off-by: Howard Pritchard <hppritcha@gmail.com>
@jsquyres
Copy link
Member

@artpol84 @jladd-mlnx Can you please adjust the Mellanox CI to handle the case when the openib BTL no longer exists / is not built?

@artpol84
Copy link
Contributor

artpol84 commented Jun 6, 2018

@jsquyres what is the best signature to detect that? OMPI version? or just check for the openib component file?

@jsquyres
Copy link
Member

jsquyres commented Jun 6, 2018

@artpol84 Perhaps ompi_info | grep openib ?

@artpol84
Copy link
Contributor

artpol84 commented Jun 8, 2018

@karasevb please address this ASAP.

@karasevb
Copy link
Member

karasevb commented Jun 8, 2018

bot:mellanox:retest

@artpol84
Copy link
Contributor

artpol84 commented Jun 8, 2018

@hppritcha it looks like an issue in thi this PR

Signed-off-by: Howard Pritchard <hppritcha@gmail.com>
iWarp doesn't know what a pkey is - according to
Steve Wise.

Signed-off-by: Howard Pritchard <hppritcha@gmail.com>
Get rid of LID's, APM, etc. from the iwarp
BTL.

Signed-off-by: Howard Pritchard <hppritcha@gmail.com>
@ibm-ompi
Copy link

The IBM CI (GNU Compiler) build failed! Please review the log, linked below.

Gist: https://gist.github.com/87b7d224a30c98962f3b707d96fee66b

@ibm-ompi
Copy link

The IBM CI (XL Compiler) build failed! Please review the log, linked below.

Gist: https://gist.github.com/960b6eec9713c56f8818002709ea2458

@ibm-ompi
Copy link

The IBM CI (PGI Compiler) build failed! Please review the log, linked below.

Gist: https://gist.github.com/c809dbde2fe449387f1a910aa2d43272

Signed-off-by: Howard Pritchard <hppritcha@gmail.com>
@hppritcha
Copy link
Member Author

decided to not push this iwarp only solution for 4.0. closing.

@hppritcha hppritcha closed this Jul 9, 2018
@larrystevenwise
Copy link

Hey @hppritcha, why did this get dropped?

Thanks,

Steve.

@hppritcha hppritcha deleted the topic/iwarp_btl branch October 29, 2021 17:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants