Skip to content

Update estat iscsi, zvol, and zpl scripts. #55

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Feb 3, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 3 additions & 4 deletions bpf/estat/iscsi.c
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,6 @@

typedef struct {
u64 ts;
u64 flags;
u64 size;
} iscsi_data_t;

Expand All @@ -33,7 +32,6 @@ iscsi_target_start(struct pt_regs *ctx, struct iscsi_conn *conn,
{
iscsi_data_t data = {};
data.ts = bpf_ktime_get_ns();
data.flags = hdr->flags;
data.size = hdr->data_length;
iscsi_base_data.update((u64 *) &cmd, &data);

Expand All @@ -56,6 +54,7 @@ aggregate_data(iscsi_data_t *data, u64 ts, char *opstr)
}

// @@ kprobe|iscsit_build_rsp_pdu|iscsi_target_end
// @@ kprobe|iscsit_build_datain_pdu|iscsi_target_end
int
iscsi_target_end(struct pt_regs *ctx, struct iscsi_cmd *cmd)
{
Expand All @@ -67,9 +66,9 @@ iscsi_target_end(struct pt_regs *ctx, struct iscsi_cmd *cmd)
return (0); // missed issue
}

if (data->flags & ISCSI_FLAG_CMD_READ) {
if (cmd->data_direction == DMA_FROM_DEVICE) {
aggregate_data(data, ts, READ_STR);
} else if (data->flags & ISCSI_FLAG_CMD_WRITE) {
} else if (cmd->data_direction & DMA_TO_DEVICE) {
aggregate_data(data, ts, WRITE_STR);
}
iscsi_base_data.delete((u64 *) &cmd);
Expand Down
38 changes: 19 additions & 19 deletions bpf/estat/zpl.c
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,10 @@ BPF_HASH(io_info_map, u32, io_info_t);
#else
#define POOL (OPTARG)
#endif
#define ZFS_READ_SYNC_LENGTH 14
#define ZFS_READ_ASYNC_LENGTH 15
#define ZFS_WRITE_SYNC_LENGTH 15
#define ZFS_WRITE_ASYNC_LENGTH 16

// TODO factor this out into a helper so that it isn't duplicated
static inline bool
Expand All @@ -42,7 +46,7 @@ equal_to_pool(char *str)
}

static inline int
zfs_read_write_entry(io_info_t *info, struct inode *ip, uio_t *uio, int flags)
zfs_read_write_entry(io_info_t *info, struct inode *ip, zfs_uio_t *uio, int flags)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this change work for all previous kernel versions? E.g. I'm curious about the case where a customer may upgrade, but continue running a kernel from a prior released (i.e. no reboot), would this script continue to work?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, that would be a problem.

We could deliver separate C scripts for each kernel version and then have the estat python script run the correct one. Is that a problem we want to solve?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is that a problem we want to solve?

I'd like to defer this question to the team.

Today, AFAIK, we do support deferred/non-reboot upgrades from any prior 6.0-based release to the latest one. So it's possible that we could have a system running the kernel bits from the 6.0.0.0 release, but the userland bits from the most recent 6.0.6.0 release.

In that case, I think these scripts would no longer work, since by design we only (currently) support running the scripts on the matching kernel for that release, right? So, we'd need to decide as a team, if it's OK for the scripts not to work in such a scenario.

If the current architecture of these scripts is to only work when run on the kernel version of the matching release (e.g. 6.0.6.0 scripts only work with the 6.0.6.0 kernel and modules), then I'll approve this, since it's consistent with our existing design decisions, even though I feel like that design is lacking and prone to failure on deferred upgrades.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, the current perf-diag design does not address deferred upgrade. It is something we should give some thought too. Changes to stbtrace scripts could be more problematic since they are used in analytics.

It does seem to be outside the scope of this PR. At a miniumum, the scripts should run on matching kernel versions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Brad, from my understanding those scripts are only run manually by support, is that right?

If that's the case then having them work for deferred upgrade is probably not a P1, but we should definitely create a bug and allocate time for it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While this specific script may only be run by support, as Brad mentioned, there's other scripts that are used by the product (for analytics) and may suffer from this same problem on deferred upgrade.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seb mentioned to me that we already have a bug tracking this. I was just unsuccessful finding it but I'll make sure there is a jira issue.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, for other scripts that are used by analytics, I think supporting deferred upgrade is a must.

{
// Essentially ITOZSB, but written explicitly so that BCC can insert
// the necessary calls to bpf_probe_read.
Expand All @@ -68,7 +72,7 @@ zfs_read_write_entry(io_info_t *info, struct inode *ip, uio_t *uio, int flags)

// @@ kprobe|zfs_read|zfs_read_entry
int
zfs_read_entry(struct pt_regs *ctx, struct inode *ip, uio_t *uio, int flags)
zfs_read_entry(struct pt_regs *ctx, struct inode *ip, zfs_uio_t *uio, int flags)
{
io_info_t info = {};
info.is_write = false;
Expand All @@ -77,7 +81,7 @@ zfs_read_entry(struct pt_regs *ctx, struct inode *ip, uio_t *uio, int flags)

// @@ kprobe|zfs_write|zfs_write_entry
int
zfs_write_entry(struct pt_regs *ctx, struct inode *ip, uio_t *uio, int flags)
zfs_write_entry(struct pt_regs *ctx, struct inode *ip, zfs_uio_t *uio, int flags)
{
io_info_t info = {};
info.is_write = true;
Expand All @@ -87,7 +91,7 @@ zfs_write_entry(struct pt_regs *ctx, struct inode *ip, uio_t *uio, int flags)
// @@ kretprobe|zfs_read|zfs_read_write_exit
// @@ kretprobe|zfs_write|zfs_read_write_exit
int
zfs_read_write_exit(struct pt_regs *ctx, struct inode *ip, uio_t *uio)
zfs_read_write_exit(struct pt_regs *ctx, struct inode *ip, zfs_uio_t *uio)
{
u32 tid = bpf_get_current_pid_tgid();
io_info_t *info = io_info_map.lookup(&tid);
Expand All @@ -97,24 +101,20 @@ zfs_read_write_exit(struct pt_regs *ctx, struct inode *ip, uio_t *uio)

u64 delta = bpf_ktime_get_ns() - info->start_time;

char name[32];
char name[16];
int offset;
if (info->is_write) {
const char s[] = "zfs_write";
__builtin_memcpy(&name, s, sizeof (s));
offset = sizeof (s) - 1;
if (info->is_sync) {
__builtin_memcpy(name, "zfs_write sync", ZFS_WRITE_SYNC_LENGTH);
} else {
__builtin_memcpy(name, "zfs_write async", ZFS_WRITE_ASYNC_LENGTH);
}
} else {
const char s[] = "zfs_read";
__builtin_memcpy(&name, s, sizeof (s));
offset = sizeof (s) - 1;
}

if (info->is_sync) {
const char s[] = " sync";
__builtin_memcpy(name + offset, s, sizeof (s));
} else {
const char s[] = " async";
__builtin_memcpy(name + offset, s, sizeof (s));
if (info->is_sync) {
__builtin_memcpy(name, "zfs_read sync", ZFS_READ_SYNC_LENGTH);
} else {
__builtin_memcpy(name, "zfs_read async", ZFS_READ_ASYNC_LENGTH);
}
}

char axis = 0;
Expand Down
9 changes: 6 additions & 3 deletions bpf/estat/zvol.c
Original file line number Diff line number Diff line change
Expand Up @@ -11,16 +11,19 @@
#include <sys/zil_impl.h>
#include <sys/zfs_rlock.h>
#include <sys/spa_impl.h>
#include <sys/dataset_kstats.h>
#include <sys/zvol_impl.h>


#define ZVOL_WCE 0x8
#define ZVOL_READ 1
#define ZVOL_WRITE 2
#define NAME_LENGTH 6
#define AXIS_LENGTH 5
#define AXIS_LENGTH 6
#define READ_LENGTH 5
#define WRITE_LENGTH 6
#define SYNC_LENGTH 5
#define ASYNC_LENGTH 6

#ifndef OPTARG
#define POOL "domain0"
Expand Down Expand Up @@ -116,10 +119,10 @@ zvol_return(struct pt_regs *ctx)
__builtin_memcpy(&name, "read", READ_LENGTH);
} else if (sync) {
__builtin_memcpy(&name, "write", WRITE_LENGTH);
__builtin_memcpy(&axis, "sync", WRITE_LENGTH);
__builtin_memcpy(&axis, "sync", SYNC_LENGTH);
} else {
__builtin_memcpy(&name, "write", WRITE_LENGTH);
__builtin_memcpy(&axis, "async", WRITE_LENGTH);
__builtin_memcpy(&axis, "async", ASYNC_LENGTH);
}
AGGREGATE_DATA(name, axis, delta, data->bytes);
zvol_base_data.delete(&pid);
Expand Down