forked from torvalds/linux
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Tom Herbert says: ==================== strp: Stream parser for messages This patch set introduces a utility for parsing application layer protocol messages in a TCP stream. This is a generalization of the mechanism implemented of Kernel Connection Multiplexor. This patch set adapts KCM to use the strparser. We expect that kTLS can use this mechanism also. RDS would probably be another candidate to use a common stream parsing mechanism. The API includes a context structure, a set of callbacks, utility functions, and a data ready function. The callbacks include a parse_msg function that is called to perform parsing (e.g. BPF parsing in case of KCM), and a rcv_msg function that is called when a full message has been completed. For strparser we specify the return codes from the parser to allow the backend to indicate that control of the socket should be transferred back to userspace to handle some exceptions in the stream: The return values are: >0 : indicates length of successfully parsed message 0 : indicates more data must be received to parse the message -ESTRPIPE : current message should not be processed by the kernel, return control of the socket to userspace which can proceed to read the messages itself other < 0 : Error is parsing, give control back to userspace assuming that synchronization is lost and the stream is unrecoverable (application expected to close TCP socket) There is one issue I haven't been able to fully resolve. If parse_msg returns ESTRPIPE (wants control back to userspace) the parser may already have consumed some bytes of the message. There is no way to put bytes back into the TCP receive queue and tcp_read_sock does not allow an easy way to peek messages. In lieu of a better solution, we return ENODATA on the socket to indicate that the data stream is unrecoverable (application needs to close socket). This condition should only happen if an application layer message header is split across two skbuffs and parsing just the first skbuff wasn't sufficient to determine the that transfer to userspace is needed. This patch set contains: - strparser implementation - changes to kcm to use strparser - strparser.txt documentation v2: - Add copyright notice to C files - Remove GPL module license from strparser.c - Add report of rxpause v3: - Restore GPL module license - Use EXPORT_SYMBOL_GPL v4: - Removed unused function, changed another to be static as suggested by davem - Rewoked data_ready to be called from upper layer, no longer requires taking over socket data_ready callback as suggested by Lance Chao Tested: - Ran a KCM thrash test for 24 hours. No behavioral or performance differences observed. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
- Loading branch information
Showing
12 changed files
with
896 additions
and
423 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,136 @@ | ||
Stream Parser | ||
------------- | ||
|
||
The stream parser (strparser) is a utility that parses messages of an | ||
application layer protocol running over a TCP connection. The stream | ||
parser works in conjunction with an upper layer in the kernel to provide | ||
kernel support for application layer messages. For instance, Kernel | ||
Connection Multiplexor (KCM) uses the Stream Parser to parse messages | ||
using a BPF program. | ||
|
||
Interface | ||
--------- | ||
|
||
The API includes a context structure, a set of callbacks, utility | ||
functions, and a data_ready function. The callbacks include | ||
a parse_msg function that is called to perform parsing (e.g. | ||
BPF parsing in case of KCM), and a rcv_msg function that is called | ||
when a full message has been completed. | ||
|
||
A stream parser can be instantiated for a TCP connection. This is done | ||
by: | ||
|
||
strp_init(struct strparser *strp, struct sock *csk, | ||
struct strp_callbacks *cb) | ||
|
||
strp is a struct of type strparser that is allocated by the upper layer. | ||
csk is the TCP socket associated with the stream parser. Callbacks are | ||
called by the stream parser. | ||
|
||
Callbacks | ||
--------- | ||
|
||
There are four callbacks: | ||
|
||
int (*parse_msg)(struct strparser *strp, struct sk_buff *skb); | ||
|
||
parse_msg is called to determine the length of the next message | ||
in the stream. The upper layer must implement this function. It | ||
should parse the sk_buff as containing the headers for the | ||
next application layer messages in the stream. | ||
|
||
The skb->cb in the input skb is a struct strp_rx_msg. Only | ||
the offset field is relevant in parse_msg and gives the offset | ||
where the message starts in the skb. | ||
|
||
The return values of this function are: | ||
|
||
>0 : indicates length of successfully parsed message | ||
0 : indicates more data must be received to parse the message | ||
-ESTRPIPE : current message should not be processed by the | ||
kernel, return control of the socket to userspace which | ||
can proceed to read the messages itself | ||
other < 0 : Error is parsing, give control back to userspace | ||
assuming that synchronization is lost and the stream | ||
is unrecoverable (application expected to close TCP socket) | ||
|
||
In the case that an error is returned (return value is less than | ||
zero) the stream parser will set the error on TCP socket and wake | ||
it up. If parse_msg returned -ESTRPIPE and the stream parser had | ||
previously read some bytes for the current message, then the error | ||
set on the attached socket is ENODATA since the stream is | ||
unrecoverable in that case. | ||
|
||
void (*rcv_msg)(struct strparser *strp, struct sk_buff *skb); | ||
|
||
rcv_msg is called when a full message has been received and | ||
is queued. The callee must consume the sk_buff; it can | ||
call strp_pause to prevent any further messages from being | ||
received in rcv_msg (see strp_pause below). This callback | ||
must be set. | ||
|
||
The skb->cb in the input skb is a struct strp_rx_msg. This | ||
struct contains two fields: offset and full_len. Offset is | ||
where the message starts in the skb, and full_len is the | ||
the length of the message. skb->len - offset may be greater | ||
then full_len since strparser does not trim the skb. | ||
|
||
int (*read_sock_done)(struct strparser *strp, int err); | ||
|
||
read_sock_done is called when the stream parser is done reading | ||
the TCP socket. The stream parser may read multiple messages | ||
in a loop and this function allows cleanup to occur when existing | ||
the loop. If the callback is not set (NULL in strp_init) a | ||
default function is used. | ||
|
||
void (*abort_parser)(struct strparser *strp, int err); | ||
|
||
This function is called when stream parser encounters an error | ||
in parsing. The default function stops the stream parser for the | ||
TCP socket and sets the error in the socket. The default function | ||
can be changed by setting the callback to non-NULL in strp_init. | ||
|
||
Functions | ||
--------- | ||
|
||
The upper layer calls strp_tcp_data_ready when data is ready on the lower | ||
socket for strparser to process. This should be called from a data_ready | ||
callback that is set on the socket. | ||
|
||
strp_stop is called to completely stop stream parser operations. This | ||
is called internally when the stream parser encounters an error, and | ||
it is called from the upper layer when unattaching a TCP socket. | ||
|
||
strp_done is called to unattach the stream parser from the TCP socket. | ||
This must be called after the stream processor has be stopped. | ||
|
||
strp_check_rcv is called to check for new messages on the socket. This | ||
is normally called at initialization of the a stream parser instance | ||
of after strp_unpause. | ||
|
||
Statistics | ||
---------- | ||
|
||
Various counters are kept for each stream parser for a TCP socket. | ||
These are in the strp_stats structure. strp_aggr_stats is a convenience | ||
structure for accumulating statistics for multiple stream parser | ||
instances. save_strp_stats and aggregate_strp_stats are helper functions | ||
to save and aggregate statistics. | ||
|
||
Message assembly limits | ||
----------------------- | ||
|
||
The stream parser provide mechanisms to limit the resources consumed by | ||
message assembly. | ||
|
||
A timer is set when assembly starts for a new message. The message | ||
timeout is taken from rcvtime for the associated TCP socket. If the | ||
timer fires before assembly completes the stream parser is aborted | ||
and the ETIMEDOUT error is set on the TCP socket. | ||
|
||
Message length is limited to the receive buffer size of the associated | ||
TCP socket. If the length returned by parse_msg is greater than | ||
the socket buffer size then the stream parser is aborted with | ||
EMSGSIZE error set on the TCP socket. Note that this makes the | ||
maximum size of receive skbuffs for a socket with a stream parser | ||
to be 2*sk_rcvbuf of the TCP socket. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,145 @@ | ||
/* | ||
* Stream Parser | ||
* | ||
* Copyright (c) 2016 Tom Herbert <tom@herbertland.com> | ||
* | ||
* This program is free software; you can redistribute it and/or modify | ||
* it under the terms of the GNU General Public License version 2 | ||
* as published by the Free Software Foundation. | ||
*/ | ||
|
||
#ifndef __NET_STRPARSER_H_ | ||
#define __NET_STRPARSER_H_ | ||
|
||
#include <linux/skbuff.h> | ||
#include <net/sock.h> | ||
|
||
#define STRP_STATS_ADD(stat, count) ((stat) += (count)) | ||
#define STRP_STATS_INCR(stat) ((stat)++) | ||
|
||
struct strp_stats { | ||
unsigned long long rx_msgs; | ||
unsigned long long rx_bytes; | ||
unsigned int rx_mem_fail; | ||
unsigned int rx_need_more_hdr; | ||
unsigned int rx_msg_too_big; | ||
unsigned int rx_msg_timeouts; | ||
unsigned int rx_bad_hdr_len; | ||
}; | ||
|
||
struct strp_aggr_stats { | ||
unsigned long long rx_msgs; | ||
unsigned long long rx_bytes; | ||
unsigned int rx_mem_fail; | ||
unsigned int rx_need_more_hdr; | ||
unsigned int rx_msg_too_big; | ||
unsigned int rx_msg_timeouts; | ||
unsigned int rx_bad_hdr_len; | ||
unsigned int rx_aborts; | ||
unsigned int rx_interrupted; | ||
unsigned int rx_unrecov_intr; | ||
}; | ||
|
||
struct strparser; | ||
|
||
/* Callbacks are called with lock held for the attached socket */ | ||
struct strp_callbacks { | ||
int (*parse_msg)(struct strparser *strp, struct sk_buff *skb); | ||
void (*rcv_msg)(struct strparser *strp, struct sk_buff *skb); | ||
int (*read_sock_done)(struct strparser *strp, int err); | ||
void (*abort_parser)(struct strparser *strp, int err); | ||
}; | ||
|
||
struct strp_rx_msg { | ||
int full_len; | ||
int offset; | ||
}; | ||
|
||
static inline struct strp_rx_msg *strp_rx_msg(struct sk_buff *skb) | ||
{ | ||
return (struct strp_rx_msg *)((void *)skb->cb + | ||
offsetof(struct qdisc_skb_cb, data)); | ||
} | ||
|
||
/* Structure for an attached lower socket */ | ||
struct strparser { | ||
struct sock *sk; | ||
|
||
u32 rx_stopped : 1; | ||
u32 rx_paused : 1; | ||
u32 rx_aborted : 1; | ||
u32 rx_interrupted : 1; | ||
u32 rx_unrecov_intr : 1; | ||
|
||
struct sk_buff **rx_skb_nextp; | ||
struct timer_list rx_msg_timer; | ||
struct sk_buff *rx_skb_head; | ||
unsigned int rx_need_bytes; | ||
struct delayed_work rx_delayed_work; | ||
struct work_struct rx_work; | ||
struct strp_stats stats; | ||
struct strp_callbacks cb; | ||
}; | ||
|
||
/* Must be called with lock held for attached socket */ | ||
static inline void strp_pause(struct strparser *strp) | ||
{ | ||
strp->rx_paused = 1; | ||
} | ||
|
||
/* May be called without holding lock for attached socket */ | ||
static inline void strp_unpause(struct strparser *strp) | ||
{ | ||
strp->rx_paused = 0; | ||
} | ||
|
||
static inline void save_strp_stats(struct strparser *strp, | ||
struct strp_aggr_stats *agg_stats) | ||
{ | ||
/* Save psock statistics in the mux when psock is being unattached. */ | ||
|
||
#define SAVE_PSOCK_STATS(_stat) (agg_stats->_stat += \ | ||
strp->stats._stat) | ||
SAVE_PSOCK_STATS(rx_msgs); | ||
SAVE_PSOCK_STATS(rx_bytes); | ||
SAVE_PSOCK_STATS(rx_mem_fail); | ||
SAVE_PSOCK_STATS(rx_need_more_hdr); | ||
SAVE_PSOCK_STATS(rx_msg_too_big); | ||
SAVE_PSOCK_STATS(rx_msg_timeouts); | ||
SAVE_PSOCK_STATS(rx_bad_hdr_len); | ||
#undef SAVE_PSOCK_STATS | ||
|
||
if (strp->rx_aborted) | ||
agg_stats->rx_aborts++; | ||
if (strp->rx_interrupted) | ||
agg_stats->rx_interrupted++; | ||
if (strp->rx_unrecov_intr) | ||
agg_stats->rx_unrecov_intr++; | ||
} | ||
|
||
static inline void aggregate_strp_stats(struct strp_aggr_stats *stats, | ||
struct strp_aggr_stats *agg_stats) | ||
{ | ||
#define SAVE_PSOCK_STATS(_stat) (agg_stats->_stat += stats->_stat) | ||
SAVE_PSOCK_STATS(rx_msgs); | ||
SAVE_PSOCK_STATS(rx_bytes); | ||
SAVE_PSOCK_STATS(rx_mem_fail); | ||
SAVE_PSOCK_STATS(rx_need_more_hdr); | ||
SAVE_PSOCK_STATS(rx_msg_too_big); | ||
SAVE_PSOCK_STATS(rx_msg_timeouts); | ||
SAVE_PSOCK_STATS(rx_bad_hdr_len); | ||
SAVE_PSOCK_STATS(rx_aborts); | ||
SAVE_PSOCK_STATS(rx_interrupted); | ||
SAVE_PSOCK_STATS(rx_unrecov_intr); | ||
#undef SAVE_PSOCK_STATS | ||
|
||
} | ||
|
||
void strp_done(struct strparser *strp); | ||
void strp_stop(struct strparser *strp); | ||
void strp_check_rcv(struct strparser *strp); | ||
int strp_init(struct strparser *strp, struct sock *csk, | ||
struct strp_callbacks *cb); | ||
void strp_tcp_data_ready(struct strparser *strp); | ||
|
||
#endif /* __NET_STRPARSER_H_ */ |
Oops, something went wrong.