Skip to content

update to CPU fraction auto-tuning algorithm #19

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
22 changes: 22 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -61,6 +61,28 @@ Following environment variables control the behavior of DTO library:
DTO_LOG_LEVEL=0/1/2 controls the log level. higher value means more verbose logging (default 0).
```

Although not the only usage models of DTO, the following are some common ones:
Latency reduction - the goal is to minimize the latency of offloaded operations. Use the following settings:
DTO_AUTO_ADJUST_KNOBS=1 (the CPU fraction setting is critical to this mode. The optimal value is dynamic so autotune algorithm needs to be enabled)
DTO_WAIT_METHOD=busypoll

Power Reduction - the goal is to reduce power by offloading memory operations to DSA allowing the cpu core to go into a lower power state. This mode may reduce or increase the latency of operations depending on the load on DSA devices.
DTO_AUTO_ADJUST_KNOBS=0
DTO_CPU_SIZE_FRACTION=0.0 (offload the entire operations to DSA)
DTO_WAIT_METHOD=umwait

Cycle count Reduction - the goal is to reduce cpu cycles by offloading memory operations to DSA. This mode may reduce or increase the latency of operations depending on the load on DSA devices and on interaction with the OS scheduler and other threads. The idea is to offload operations to DSA and allow the OS to schedule other work while DSA perform the operation.
DTO_AUTO_ADJUST_KNOBS=0
DTO_CPU_SIZE_FRACTION=0.0 (offload the entire operations to DSA)
DTO_WAIT_METHOD=yield

Avoiding Cache polution - the goal is to avoid polluting the cache with data from the given process.
DTO_DSA_CC=0
DTO_AUTO_ADJUST_KNOBS=0
DTO_CPU_SIZE_FRACTION=0.0 (offload the entire operations to DSA so none of the data is pulled into cache)
DTO_WAIT_METHOD=yield or umwait (saves either cycles or power)


## Build

Pre-requisite packages:
Expand Down
9 changes: 8 additions & 1 deletion dto.c
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,7 @@
*/
#define MAX_WQS 32
#define MAX_NUMA_NODES 32
#define DTO_DEFAULT_MIN_SIZE 16384
#define DTO_DEFAULT_MIN_SIZE 65536
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I set it to 32K should we pick 48K as the middle? if not, I am fine with 64K

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets set it to 64K. There are cases where even 64K is marginal.

#define DTO_INITIALIZED 0
#define DTO_INITIALIZING 1

Expand Down Expand Up @@ -429,6 +429,13 @@ static __always_inline void dsa_wait_and_adjust(const volatile uint8_t *comp)
__dsa_wait(comp);
local_num_waits++;
}

// operations that have failed (mostly due to page fault) return very quickly and cause the algorithm
// to think that the DSA operation was faster than it really was. We exclude them from the calculation.
if (*comp != DSA_COMP_SUCCESS) {
return;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: just fix the space

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

}

adjust_num_descs++;
adjust_num_waits += local_num_waits;

Expand Down