-
Notifications
You must be signed in to change notification settings - Fork 298
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CART-688: New PSR query implementation. Skip-checkpatch: true Skip-build: true #2890
base: master
Are you sure you want to change the base?
Conversation
Skip-checkpatch: true Skip-build: true Signed-off-by: Vikram Chhabra <vikram.chhabra@intel.com>
src/cart/src/cart/crt_group.h
Outdated
{ | ||
D_RWLOCK_WRLOCK(&grp_priv->gp_rwlock); | ||
// TBD D_FREE(grp_priv->gp_psr_phy_addr); | ||
grp_priv->gp_psr_rank_list = psr_rank_list; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
list passed needs to be copied, cannot rely on user-provided list to remain valid beyond original API call
src/cart/src/cart/crt_rpc.c
Outdated
if (rpc_priv->crp_psr_state == PSR_STATE_INITED) { | ||
/* Send lookup to rank:tag=0 */ | ||
rpc_priv->crp_psr_state = PSR_STATE_LOOKUP_RANKTAG0; | ||
rank = tgt_ep->ep_rank; | ||
tag = 0; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can only do this block if rank=tgt_ep->ep_rank, tag=0 is in the local cache. i dont see where you check this
"0. If rank:tag=0 URI is present in cache, but rank:tag=X (where X is requested rank) is not, URI lookup is sent to rank:tag=0. "
While rank=X tag=0 is guaranteed to be known by all servers, there is no such guarantee for clients, as client might only know about psrs and nothing else initially
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When it goes into lookup_by_rpc, it will create a new request, which will go into crt_req_send and first check the local cache.
src/cart/src/cart/crt_rpc.c
Outdated
membs = grp_priv_get_membs(grp_priv); | ||
idx = random() % grp_priv->gp_size; | ||
|
||
//TBD check return value. | ||
d_rank_list_idx(membs, &rank, idx); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar comment here, while you picked the random member, you might not have URI cached yet for that random member when this is executed on the client side.
In general for all the cases you need to have such checks and fallbacks.
The way i see it the logic needs to be something like this:
Based on cart-688 you select which server=X you are going to use to lookup requested rank=rA tag=tA.
After that:
Whenever you are going to call server X, tag=0 to lookup uri for rank=rA tag=tA then:
if server X tag=0 is not in uri cache then you first need to issue rpc to one of PSRs to lookup server X tag=0 uri:
- issue rpc to PSR to lookup server X tag=0. As part of rpc you provide completion callback and arguments.
- in a completion callback of this RPC, you populate cache for server X tag=0, and from there you issue another uri lookup RPC, this time RPC going to server rank=X tag=0 to lookup value of rank=rA, tag=tA.
- in a completion callback of this RPC (to lookup uri for rank=rA, tag=tA that was sent to server rank=X tag=0), you (if successful) populate local cache for rank=rA tag=tA, and finally issue the original RPC that was requested by the user.
So a callgraph in this secnario is something like:
User rpc attempt to (rA, tA) -> pick server X, tag=0 -> Lookup URI for server X, tag=0 from PSR -> from completion handler -> Lookup URI for server rA, tA from serverX tag=0 -> from completion handler issue original user rpc to rA,tA
Such handling/chaining of multiple RPCs is the main thing i see missing right now.
Today the code has only simple chain support where:
user wants to send rpc to (rA, tA) -> we issue PSR request to uri lookup (rA, tA) -> in completion callback we issue original RPC to (rA, tA).
So today in completion callback for uri lookup, it is assumed that once that is successful we can issue rpc to the user selected endpoint/destination. This is not going to be the case though at all the times as shown in example above.
Skip-build: true Skip-test: true Signed-off-by: Vikram Chhabra <vikram.chhabra@intel.com>
@frostedcmos do we still need this PR? if not could you please close and delete the branch? |
No description provided.