-
Notifications
You must be signed in to change notification settings - Fork 225
Personal/pragyagandhi/az auth none change #555
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
pragyagandhi
wants to merge
53
commits into
sahlberg:master
Choose a base branch
from
linuxsmiths:personal/pragyagandhi/AzAuthNoneChange
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Personal/pragyagandhi/az auth none change #555
pragyagandhi
wants to merge
53
commits into
sahlberg:master
from
linuxsmiths:personal/pragyagandhi/AzAuthNoneChange
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
These will be updated when we create the RPC request and when we receive the RPC response bytes, respectively. The req and resp size containg the RPC header, the NFS header and optional data bytes. User can query the req and resp bytes using the two newly added APIs rpc_pdu_get_req_size() rpc_pdu_get_resp_size()
…esp_size Personal/linuxsmiths/add req resp size
rpc_find_pdu() removes the pdu from the waitpdu list, this pdu is added to rpc->pdu. If a reconnect happens during this time, we were not correctly re-queuing this pdu to rpc->outqueue. This was causing application to forever wait for the callback.
…_requeue_rpc_pdu_on_reconnect Personal/linuxsmiths/correctly requeue rpc pdu on reconnect
…s_for_catching_bug_where_libnfs_misses_a_pdu Added instrumentatio to catch missing pdu in libnfs.
These are a result of some bugs found during stress runs. 1. LIBNFS_LIST_REMOVE() was not setting q->tail properly, so it was causing some PDUs to get dropped from outqueue. Application would never hear about these PDUs and would be kept indefinitely waiting. Added rpc_remove_pdu_from_queue() which correctly updates q->tail. 2. rpc_return_to_queue() was adding pdu to the head of outqueue. If we do it when we are resetting the connection it's ok, but o/w it may cause incorrect data to be sent out as the pdu at the head may have been half written. Once such case if when we bring back a pdu from waitpdu queue to outqueue after retransmit timeout. Renamed rpc_return_to_queue() to rpc_return_to_outqueue() to be more explicit and also rpc_return_to_outqueue() adds the pdu after outqueue.head and not at the end. This ensures that all callers can safely call this to return a pdu to outqueue for retransmit, w/o worrying about the half sent pdus. Additionally added useful asserts, w/o which it's very risky to run it blind.
…_in_requeuing_pdus Few fixes for correctly updating the pdu queues.
…reconnect) Requests that timeout in outqueue have not been sent to server and hence they do not signifiy any issue with the server. Yes they may indicate issue with the server indirectly since requests ahead of them did not get responses, but any corrective action is taken on behalf of those requests which have been sent to server. Also added a new stats member num_timedout_in_outqueue to convey this to application.
…int_server_not_responding_for_requests_in_outqueue Exclude requests timing out in outqueue from causing major recovery (…
…oid_checks Added paranoid checks to catch irregularities in pdu state/queue main…
…t_major_timeout_less_than_timeout When retrans is not set, do not set major_timeout less than timeout.
…ore_iovec_for_write_length One iovector was accounted less in rpc_nfs3_writev_task()
Currently we go ahead and try to read the data which causes it to fail and initiates a reconnect. Caller never gets a response for such failed READ pdus.
…ure_in_zero_copy_reads Handle failed NFS response for zero copy reads
This is to avoid huge number of WRITE (and READ) PDUs to hijack a connection causing commands like ls/stat/find/chmod to appear to hang for a long time, since WRITEs can take really long to be sent out due to TCP window getting full (if server is unable to process them fast enough). POSIX has compliance requirement only for completed requests. Since we are only reordering ongoing requests (not yet completed) it should not cause any violation. Moreover I cannot think of any application which will depend on any ordering guarantees of parallel requests. Let's monitor it and if it causes any issuesm revisit on a case-by-case basic.
…io_requests_to_head_of_outqueue Add non-IO RPC PDUs to head of outqueue
… outqueue. We add one more variable tailp to rpc_queue. This tracks the last high priority pdu, and allows us to maintain high priority and low priority pdus in the single outqueue, while maintaining order between pdus of the same priority. All requests except WRITEs are treated as high priority as they don't take much space in the socket queue and can be dispatched w/o much delay. Dispatching them fast lets them get processed at the server and overall we save time and also results in better user experience. Imagine queueing READ requests behind WRITEs and READs getting delayed as WRITEs cannot be sent due to server's TCP window filling up. OTOH if we send the READs with priority they can get serviced on the server in the time when WRITEs are still waiting to be sent, thus we can overall improve the number of reqs we can server per unit time.
…_prio_pdus_to_tailp_ahead_of_low_prio_pdus Added explicit APIs for adding pdus to low priority and high priority…
After adding iov_ref I forgot to update rpc_nfs4_readv_task(). Fixing that.
…_rpc_nfs4_readv_task Add missing initialization in rpc_nfs4_readv_task()
… be set for nfs service threads
…ervice_thread_start_allows_stack_size_to_be_set Introduce nfs_mt_service_thread_start_ss() for allowing stack size to…
(cherry picked from commit b2dc9a1)
…is defined ON platforms that support atomic_int type (which is what we care about), define multithreading_enabled as atomic. This prevents a TSAN warning in nfs_mt_service_thread()/nfs_mt_service_thread_start_ss() where a thread sets multithreading_enabled while main thread waits for it to be set.
…tithreading_enabled_atomic Define multithreading_enabled as atomic_int type if HAVE_STDATOMIC_H …
…to_rpc_context Personal/linuxsmiths/add tid to rpc context
…du for writing. We don't need to depend on poll() timeout to notice the to-be-written pdu.
…tfd_for_notifying_pollout Personal/linuxsmiths/use eventfd for notifying pollout
…to an empty outqueue/ This takes care of properly notifying service thread even when pdus are requeued.
* Adding azauth RPC * Updated * Updated * Updated * Updated * Update nfs.x * Update libnfs-raw-nfs.h * Update libnfs-raw-nfs.c * Update nfs.c * Update libnfs-raw-nfs.h * Update libnfs-raw-nfs.c --------- Co-authored-by: pragyagandhi <pragyagandhi@microsoft.com>
* Added azauth changes * Should not push * Updated * Updated * Updated * Audit fixes * More audit changes. * Remove AZAUTH RPC from outqueue on reconnect, to avoid dup AZAUTH RPCs sent to server. * more audit changes * review changes * review * Review changes * Updated * Updated * Temp commit to fix some known issues. * better comments and logs * Added some more useful asserts and logs/comments and final audit. * Added auth_token_cb_res to have auth_data * Removed setters for auth_cb_res * review changes --------- Co-authored-by: Nagendra Tomar <natomar@microsoft.com>
* Changed clientid to 16 bytes and auth_context * Addressed comments * Update * Update
getaddrinfo() can fail with a temp error in case of too many and too fast calls for name resolution. Retry with a wait. Also improved error logging with strerror for the case of socket open failure. Co-authored-by: Nagendra Tomar <natomar@microsoft.com>
…nstead of looping (#27) Looping is left for paranoid build. Co-authored-by: Nagendra Tomar <natomar@microsoft.com>
Default behavior is same as before, i.e., reconnect to the same address as was resolved the very first time we connected to the server, but user can choose the "resolve before reconnect" behaviour by calling nfs_set_resolve_on_reconnect(). With that set, everytime we have to reconnect we first resolve the server address and reconnect to that address instead of the originally resolved address. This enables support for NFS server migration where the NFS server address can change after mount. libnfs can simply reconnect to the new address and start exchanging RPCs with that. There is no need to perform mount with the new address. Co-authored-by: Nagendra Tomar <natomar@microsoft.com>
#29) Was the PDU retransmitted? i.e., did we send it to the server more than once? User can use this info to relax some errors, f.e., a REMOVE request that fails with NFS3ERR_NOENT can be treated as success if it was a retransmited request. Note that most applications will handle an unlink() call succeeding for a non-existent file better than unlink() call failing with NOENT for a file that was actually present. Co-authored-by: Nagendra Tomar <natomar@microsoft.com>
…ver (#30) Co-authored-by: Nagendra Tomar <natomar@microsoft.com>
* Macros change for large RPC on libnfs * Added a comment --------- Co-authored-by: Nagendra Tomar <natomar@microsoft.com>
…ing pollout Looks like the Windows TCP is shrinking the window (reneg'ing)
This conflicts with current master. |
Also, please rebase this ontop of current master. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.