Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CART-688: New PSR query implementation. Skip-checkpatch: true Skip-build: true #2890

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

vschhabra
Copy link
Contributor

No description provided.

Skip-checkpatch: true
Skip-build: true

Signed-off-by: Vikram Chhabra <vikram.chhabra@intel.com>
{
D_RWLOCK_WRLOCK(&grp_priv->gp_rwlock);
// TBD D_FREE(grp_priv->gp_psr_phy_addr);
grp_priv->gp_psr_rank_list = psr_rank_list;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

list passed needs to be copied, cannot rely on user-provided list to remain valid beyond original API call

Comment on lines 830 to 835
if (rpc_priv->crp_psr_state == PSR_STATE_INITED) {
/* Send lookup to rank:tag=0 */
rpc_priv->crp_psr_state = PSR_STATE_LOOKUP_RANKTAG0;
rank = tgt_ep->ep_rank;
tag = 0;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can only do this block if rank=tgt_ep->ep_rank, tag=0 is in the local cache. i dont see where you check this

"0. If rank:tag=0 URI is present in cache, but rank:tag=X (where X is requested rank) is not, URI lookup is sent to rank:tag=0. "

While rank=X tag=0 is guaranteed to be known by all servers, there is no such guarantee for clients, as client might only know about psrs and nothing else initially

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When it goes into lookup_by_rpc, it will create a new request, which will go into crt_req_send and first check the local cache.

Comment on lines 840 to 844
membs = grp_priv_get_membs(grp_priv);
idx = random() % grp_priv->gp_size;

//TBD check return value.
d_rank_list_idx(membs, &rank, idx);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar comment here, while you picked the random member, you might not have URI cached yet for that random member when this is executed on the client side.

In general for all the cases you need to have such checks and fallbacks.

The way i see it the logic needs to be something like this:
Based on cart-688 you select which server=X you are going to use to lookup requested rank=rA tag=tA.
After that:

Whenever you are going to call server X, tag=0 to lookup uri for rank=rA tag=tA then:
if server X tag=0 is not in uri cache then you first need to issue rpc to one of PSRs to lookup server X tag=0 uri:
- issue rpc to PSR to lookup server X tag=0. As part of rpc you provide completion callback and arguments.
- in a completion callback of this RPC, you populate cache for server X tag=0, and from there you issue another uri lookup RPC, this time RPC going to server rank=X tag=0 to lookup value of rank=rA, tag=tA.
- in a completion callback of this RPC (to lookup uri for rank=rA, tag=tA that was sent to server rank=X tag=0), you (if successful) populate local cache for rank=rA tag=tA, and finally issue the original RPC that was requested by the user.

So a callgraph in this secnario is something like:
User rpc attempt to (rA, tA) -> pick server X, tag=0 -> Lookup URI for server X, tag=0 from PSR -> from completion handler -> Lookup URI for server rA, tA from serverX tag=0 -> from completion handler issue original user rpc to rA,tA

Such handling/chaining of multiple RPCs is the main thing i see missing right now.

Today the code has only simple chain support where:
user wants to send rpc to (rA, tA) -> we issue PSR request to uri lookup (rA, tA) -> in completion callback we issue original RPC to (rA, tA).

So today in completion callback for uri lookup, it is assumed that once that is successful we can issue rpc to the user selected endpoint/destination. This is not going to be the case though at all the times as shown in example above.

Skip-build: true
Skip-test: true

Signed-off-by: Vikram Chhabra <vikram.chhabra@intel.com>
@mchaarawi
Copy link
Contributor

@frostedcmos do we still need this PR? if not could you please close and delete the branch?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

3 participants