Skip to content

WIP: Seq file #232

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 7 commits into from
Closed

WIP: Seq file #232

wants to merge 7 commits into from

Conversation

adamrk
Copy link

@adamrk adamrk commented Apr 28, 2021

Add a Rust trait equivalent to the C seq_operations interface and the ability to create a /proc file from a type implementing that trait. Instead of just translating each function in the interface to Rust I used an iterator instead since it maps pretty cleanly. But happy to change it if we thing it makes more sense to keep things close to the C side.

A couple notes:

  1. I split this into two commits, where the second contains only CI/CD changes, so the first commit can be eventually upstreamed, but not the second. Does this make sense or should we be doing something else?
  2. In the sample driver I called some panicking functions, but I can't remember what notation we decided on for for the comments on why they won't panic (just used NO PANIC).
  3. There's one point in seq_file.rs where I needed to allocate a string, but couldn't get what I needed using try_reserve, so just left a TODO about fixing it when we have more fallible options.
  4. Added some testing in the CI that actually uses the sample driver instead of just loading/unloading it. I needed this for my own testing, but it seems like we're not doing it for any other sample drivers. Is there a reason for that?

/// Create a `/proc` file.
///
/// The returned value must not be dropped until the module is unloaded.
pub fn proc_create<T: SeqOperations + Sync>(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need #[cfg] on proc etc.?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Not necessarily on the function itself, it could be on the implementation instead so that callers do not need to care, i.e. skipping the bindings call if PROC is not there -- please check what the kernel does currently for PROC-related calls).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm... It looks like other places have an #ifdef CONFIG_PROC_FS block around all the code for setting up the seq_file. Not sure which way makes more sense?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ended up splitting this into seq_file.rs and proc_fs.rs. The proc specific parts are all under #[cfg(CONFIG_PROC_FS)], but the seq_file parts aren't so we can later add e.g. debugfs seq_files.

@ojeda
Copy link
Member

ojeda commented Apr 28, 2021

I split this into two commits, where the second contains only CI/CD changes, so the first commit can be eventually upstreamed, but not the second. Does this make sense or should we be doing something else?

Thanks for being mindful of that! It does make it easier for me when pushing to rust-next, but I take care of that manually otherwise.

In the sample driver I called some panicking functions, but I can't remember what notation we decided on for for the comments on why they won't panic (just used NO PANIC).

NOPANIC -- let me document that today, I will send a PR. Thanks a lot for starting its usage!

There's one point in seq_file.rs where I needed to allocate a string, but couldn't get what I needed using try_reserve, so just left a TODO about fixing it when we have more fallible options.

That's perfectly fine!

Added some testing in the CI that actually uses the sample driver instead of just loading/unloading it. I needed this for my own testing, but it seems like we're not doing it for any other sample drivers. Is there a reason for that?

We have some tests for the printing sample, for instance. But yeah, no particular reason, some tests that could be there were never added -- feel free to send a PR; otherwise I will do it at some point.

Copy link

@wedsonaf wedsonaf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comments from a quick pass. I'll take a closer look tomorrow.

// SAFETY: Calling a C function. `s` is guaranteed to be null terminated
// because we explicitly constructed it just above.
unsafe {
bindings::seq_puts(m, s.as_ptr() as *const u8 as *const c_types::c_char);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use seq_printf here and specify the string length so that we don't need to depend on to_string and concatenation?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looked into it a bit, but I can't really see how to do it (at least not in a way that's generic over Display). Here are some options I found:

  1. If we use seq_printf with the length in the format string, we'll still need to allocate a String to calculate the length and get the pointer to pass. It would get rid of the concat though.
  2. Instead of having Item: Display, we could require a fn(&Item) -> &str. Then the implementer can decide if they want to allocate a String or not.
  3. We could have the trait define a print function that take Item and seq_file and let the implementer use the various seq print functions. But then they'll need to use unsafe and they'd have the same problem as us if they want to print a string or type that doesn't have a printf format.
  4. Instead of having Item: Display we could make our own trait for types that can be formatted with printf and write safe wrappers around seq_printf for each of those.
  5. There is the VaList type which seems to handle varargs, so we could have that type instead of the generic Item, but it seems more for defining Rust code that can be called on the C side with varargs, not for generating a Rust type that can be passed to C functions expecting varargs.

I don't think there's any support for putting a dynamically sized string on the stack.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Went with 2.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's fine for now. In the future if there's a need for it, I think it would make sense to do a variant of option 3, where we would provide abstractions that would allow users to use the various seq_file write variants without having to call unsafe functions.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy to make a follow-up PR for that.

@adamrk
Copy link
Author

adamrk commented Apr 28, 2021

Just fixed the compiler error in the sample by making SeqFile Sync. Why does KernelModule need to be Sync by the way?

@adamrk adamrk force-pushed the seq-file branch 2 times, most recently from 231f3b4 to dbb38aa Compare April 28, 2021 20:40
Comment on lines 64 to 68
let template = "rust_seq_file: device opened this many times: ";
message.try_reserve_exact(template.len() + 4).ok()?;
// NOPANIC: We reserved space for `template` above.
message.push_str(template);
if *count < 1000 {
// NOPANIC: There are 4 characters remaining in the string which
// leaves space for a 3 digit number and the newline.
write!(&mut message, "{}\n", *count).ok()?;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something more explicit for the 3, 4 and 1000 constants:

const MAX_DIGITS: u32 = 3;
const MAX_LENGTH: u32 = MAX_DIGITS + 1;
const MAX_COUNT: u32 = 10u32.pow(MAX_DIGITS);

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would put them local to the function, given they are not used elsewhere.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

/// `m` must have been created from a proc file generated by calling
/// `proc_create::<T>`.
unsafe fn convert(m: *mut bindings::seq_file) -> *const T {
let reg = crate::container_of!((*m).op, Self, seq_ops);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should use (*m).private to save state. This way op can be constant shared by everyone.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Had to make some changes to get this to work. I couldn't see how to stare the data in (*m).private using proc_create_seq_private because that always uses the default seq_open which doesn't give us the chance to mutate the seq_file created on open.

So I switched it to use proc_create_data which let's us use our own open.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at how others have done this, it looks like it's done this way: call proc_create_seq_private with your state in data, then retrieve it using PDE_DATA(file_inode(seq->file)).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, got that working now.

@adamrk adamrk force-pushed the seq-file branch 3 times, most recently from 15244a0 to d96b676 Compare May 6, 2021 08:07
@adamrk
Copy link
Author

adamrk commented May 6, 2021

Think I've addressed everything, but it required quite a few changes so probably needs a fresh review.

@adamrk adamrk force-pushed the seq-file branch 2 times, most recently from 0aeaa42 to 6f05b79 Compare May 7, 2021 16:43
@@ -14,6 +16,10 @@
#include <linux/mm.h>
#include <uapi/linux/android/binder.h>

#ifdef CONFIG_PROC_FS
#include "../../fs/proc/internal.h"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need this?

I don't think this would be acceptable to upstream maintainers. We shouldn't be peeking into internal stuff.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was using it to get access to the data field of proc_dir_entry so that we can drop the data when dropping the entry.

Removed this by having the Rust ProcDirEntry store it's own copy of the data pointer so that we don't need to extract it from the proc_dir_entry.

/// # Invariants
///
/// The pointer [`ProcDirEntry::proc_dir_entry`] is a valid pointer and
/// it's field [`bindings::proc_dir_entry::data`] is a valid pointer to

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: its field

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

unsafe {
bindings::proc_remove(self.proc_dir_entry);
}
drop(data);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this drop is implicit, we don't really need to call it explicitly.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

proc_ops: &'static bindings::proc_ops,
data: T,
) -> KernelResult<ProcDirEntry<T>> {
let data_ptr = Box::into_raw(Box::try_new(data)?) as *mut c_types::c_void;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to improve this abstraction.

Forcing data to move into a Box means that it cannot contain any self-referential data structures, including mutexes. In the end, I think we'll need this to use PointerWrapper so that users can choose what wrapper they want (if any).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. Now the user is responsible for choosing the PointerWrapper and passing in the data already wrapped.

type Iterator: Iterator<Item = Self::Item>;

/// Called once each time the `seq_file` is opened.
fn start(arg: &Self) -> Option<Box<Peekable<Self::Iterator>>>;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here too I think we may want to consider PointerWrapper rather than Box. (Although here it seems less problematic than in the previous usage.)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still looking into this one. The one thing I'm thinking about is that we'll also need to mutate the iterator on the inside, so we can't just make it generic over any PointerWrapper, right?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Switched this one to using PointerWrapper.

*pos += 1;
}

iterator.next();

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may want to check the return value of next here, mostly because the documentation says:

Advances the iterator and returns the next value.

Returns None when iteration is finished. Individual iterator implementations may choose to resume iteration, and so calling next() again may or may not eventually start returning Some(Item) again at some point.

That is, if we are at the end already (e.g., because the underlying data structure changed in the case of an iterator that allows mutation), then we risk restarting the iteration process if we don't check the return value.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I didn't realize that iterators could resume like that.

I changed it, but I think in this case we might actually be ok without checking next. Since we have a Peekable and we've already called peek on it and checked for None (in an earlier call to next_callback or start_callback) so next will always return Some here. See impl of next for Peekable.

@adamrk
Copy link
Author

adamrk commented May 13, 2021

Thanks for the review @wedsonaf 👍

@adamrk adamrk force-pushed the seq-file branch 2 times, most recently from 1d5d31a to d53b853 Compare May 13, 2021 21:11
@ksquirrel

This comment has been minimized.

@alex alex requested a review from wedsonaf May 19, 2021 21:48
@ksquirrel

This comment has been minimized.

@ksquirrel

This comment has been minimized.

@adamrk
Copy link
Author

adamrk commented May 27, 2021

Hi @wedsonaf, just letting you know I think everything's been addressed now.

@adamrk
Copy link
Author

adamrk commented Jun 23, 2021

Ping @wedsonaf - any further comments on this one?

The usage of PointerWrapper with calls to from_pointer followed by forget are strictly speaking a violation of the API because you're creating more than one mutable reference to an object, which leads to UB if they're ever called concurrently (which the C API allows).

This could occur if the file is opened once and then there are concurrent reads on the same file descriptor?

@wedsonaf
Copy link

Ping @wedsonaf - any further comments on this one?

The usage of PointerWrapper with calls to from_pointer followed by forget are strictly speaking a violation of the API because you're creating more than one mutable reference to an object, which leads to UB if they're ever called concurrently (which the C API allows).

This could occur if the file is opened once and then there are concurrent reads on the same file descriptor?

from_pointer is called on a shared resource, so it doesn't need to happen on the same file descriptor. For example, if multiple processes open and read from the same proc file concurrently, there's a window where they can all call from_pointer concurrently and each have a supposedly unique instance of the same object.

Also note start and the matching stop are called on read, not when the file is opened (and released), see https://elixir.bootlin.com/linux/v5.12.13/source/fs/seq_file.c#L222. You may want to update the documentation to reflect this.

@ksquirrel

This comment has been minimized.

@ksquirrel

This comment has been minimized.

@adamrk
Copy link
Author

adamrk commented Jun 29, 2021

LGTM. Suggestion for next time: break the PR up into smaller chunks to make it easier to review.

The usage of PointerWrapper with calls to from_pointer followed by forget are strictly speaking a violation of the API because you're creating more than one mutable reference to an object, which leads to UB if they're ever called concurrently (which the C API allows).

I think #386 improves this, allowing you to borrow the contents without having to call from_pointer, and #387 I think allows for a simpler sample.

Once those are in, I think we should update your implementation. I'll leave iy up to you whether you want to wait for them or do it in a subsequent PR; if the latter, would you mind opening an issue so that we don't lose track of it?

I opened #394 which contains these changes and uses the new PointerWrapper so that we don't do from_pointer -> forget in the start callback. I don't think this will exactly work for the cases where we want to borrow the iterator in show and next since we actually need mutable borrows in those cases. Opened #395 to discuss that (and so we don't forget to fix those).

@ksquirrel
Copy link
Member

Review of 2fc05b510fdd:

  • ⚠️ This PR changes more than 20 files.
  • ⚠️ This PR adds/deletes more than 500 lines.
  • ✔️ Commit 4f0aeff: Looks fine!
  • ✔️ Commit 2fc05b5: Looks fine!

@nbdd0121
Copy link
Member

nbdd0121 commented Sep 15, 2021

You could use the Display implementation and print using seq_printf("%pA", &format_args!("{}", item_to_display)).

This way you would just need to put type Item: Display and would not need the display method anymore!

Adds a trait which allows Rust modules to implement the seq_operations
interface and use it to create a `/proc` file.

Signed-off-by: Adam Bratschi-Kaye <ark.email@gmail.com>
Signed-off-by: Adam Bratschi-Kaye <ark.email@gmail.com>
@adamrk adamrk changed the title Seq file WIP: Seq file Feb 28, 2022
@adamrk adamrk force-pushed the seq-file branch 6 times, most recently from 9e62a22 to 52a09db Compare March 2, 2022 20:10
@adamrk adamrk closed this Mar 11, 2022
senekor pushed a commit to senekor/linux that referenced this pull request Feb 24, 2025
The namespace percpu counter protects pending I/O, and we can
only safely diable the namespace once the counter drop to zero.
Otherwise we end up with a crash when running blktests/nvme/058
(eg for loop transport):

[ 2352.930426] [  T53909] Oops: general protection fault, probably for non-canonical address 0xdffffc0000000005: 0000 [Rust-for-Linux#1] PREEMPT SMP KASAN PTI
[ 2352.930431] [  T53909] KASAN: null-ptr-deref in range [0x0000000000000028-0x000000000000002f]
[ 2352.930434] [  T53909] CPU: 3 UID: 0 PID: 53909 Comm: kworker/u16:5 Tainted: G        W          6.13.0-rc6 Rust-for-Linux#232
[ 2352.930438] [  T53909] Tainted: [W]=WARN
[ 2352.930440] [  T53909] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-3.fc41 04/01/2014
[ 2352.930443] [  T53909] Workqueue: nvmet-wq nvme_loop_execute_work [nvme_loop]
[ 2352.930449] [  T53909] RIP: 0010:blkcg_set_ioprio+0x44/0x180

as the queue is already torn down when calling submit_bio();

So we need to init the percpu counter in nvmet_ns_enable(), and
wait for it to drop to zero in nvmet_ns_disable() to avoid having
I/O pending after the namespace has been disabled.

Fixes: 74d1696 ("nvmet-loop: avoid using mutex in IO hotpath")

Signed-off-by: Hannes Reinecke <hare@kernel.org>
Reviewed-by: Nilay Shroff <nilay@linux.ibm.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Tested-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

5 participants