- 
                Notifications
    You must be signed in to change notification settings 
- Fork 301
          [DRAFT] RISC-V: Use .insn on (Zkne or Zknd) intrinsics
          #1917
        
          New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
f31c152    to
    bc825ff      
    Compare
  
    .insn on Zknd or Zkne intrinsics.insn on Zkne or Zknd intrinsics
      bc825ff    to
    3ad3868      
    Compare
  
    7ee6e1e    to
    643b06a      
    Compare
  
    .insn on Zkne or Zknd intrinsics.insn on (Zkne or Zknd) intrinsics
      | My only problem with this approach is that this will interact badly with the optimizer. LLVM must assume all  Also, you can do both the rustc PR and the stdarch PR together now, due to stdarch being a subtree of the rustc repo, rather than a submodule | 
| 
 I see. Worth experimenting and I'll try that later whether actual AES decryption code is heavily affected. 
 This is intentional because CI of  I actually found that some of the tests ( | 
643b06a    to
    1209865      
    Compare
  
    **DRAFT: THIS COMMIT SHALL BE SQUASHED INTO THE NEXT COMMIT.** Because changes to rustc are not performed yet, we cannot switch `#[target_feature]` attribute to enable desired features. Instead, this commit just comments out `#[target_feature]`.
| It seems... even if we use  The code below implements 1 and 2 of AES-256 decryption routines. 
 #![feature(riscv_ext_intrinsics)]
#![allow(clippy::identity_op)]
use core::arch::riscv64::*;
/*
    Key Sizes:
    AES-256 key:
        256-bit key data
        == 4 u64 words
    AES-256 scheduled key:
        128-bit block * (14 + 1)  ; where 14 is the number of AES-256 rounds
        == 30 u64 words
*/
// Adjust #[target_feature] attributes below depending on the configuration and build.
// #[target_feature(enable = "zknd")]
// #[target_feature(enable = "zknd", enable = "zkne")]
#[unsafe(no_mangle)]
pub unsafe extern "C" fn aes256_key_schedule(key: *const [u64; 4], scheduled_key: *mut [u64; 30]) {
    unsafe {
        let key: &[u64; 4] = &*key;
        let mut rk0 = key[0];
        let mut rk1 = key[1];
        let mut rk2 = key[2];
        let mut rk3 = key[3];
        let scheduled_key: &mut [u64; 30] = &mut *scheduled_key;
        scheduled_key[0] = rk0;
        scheduled_key[1] = rk1;
        scheduled_key[2] = rk2;
        scheduled_key[3] = rk3;
        macro_rules! double_round {
            ($i: expr) => {
                let tmp = aes64ks1i::<$i>(rk3);
                rk0 = aes64ks2(tmp, rk0);
                rk1 = aes64ks2(rk0, rk1);
                let tmp = aes64ks1i::<10>(rk1);
                rk2 = aes64ks2(tmp, rk2);
                rk3 = aes64ks2(rk2, rk3);
                scheduled_key[4 * ($i + 1) + 0] = rk0;
                scheduled_key[4 * ($i + 1) + 1] = rk1;
                scheduled_key[4 * ($i + 1) + 2] = rk2;
                scheduled_key[4 * ($i + 1) + 3] = rk3;
            };
        }
        double_round!(0);
        double_round!(1);
        double_round!(2);
        double_round!(3);
        double_round!(4);
        double_round!(5);
        // Process tail
        let tmp = aes64ks1i::<6>(rk3);
        rk0 = aes64ks2(tmp, rk0);
        rk1 = aes64ks2(rk0, rk1);
        scheduled_key[4 * 7 + 0] = rk0;
        scheduled_key[4 * 7 + 1] = rk1;
    }
}
// #[target_feature(enable = "zknd")]
// #[target_feature(enable = "zknd", enable = "zkne")]
#[unsafe(no_mangle)]
pub unsafe extern "C" fn aes256_key_schedule_for_decryption(
    scheduled_key: *const [u64; 30],
    scheduled_key_dec: *mut [u64; 30],
) {
    unsafe {
        let scheduled_key: &[u64; 30] = &*scheduled_key;
        let scheduled_key_dec: &mut [u64; 30] = &mut *scheduled_key_dec;
        // Copy the first scheduled key as-is.
        scheduled_key_dec[0] = scheduled_key[0];
        scheduled_key_dec[1] = scheduled_key[1];
        // Inverse MixColumns on scheduled keys index 1..=13
        macro_rules! invmc {
            ($i: expr) => {
                scheduled_key_dec[$i * 2 + 0] = aes64im(scheduled_key[$i * 2 + 0]);
                scheduled_key_dec[$i * 2 + 1] = aes64im(scheduled_key[$i * 2 + 1]);
            };
        }
        invmc!(1);
        invmc!(2);
        invmc!(3);
        invmc!(4);
        invmc!(5);
        invmc!(6);
        invmc!(7);
        invmc!(8);
        invmc!(9);
        invmc!(10);
        invmc!(11);
        invmc!(12);
        invmc!(13);
        // Copy the last scheduled key as-is.
        scheduled_key_dec[28] = scheduled_key[28];
        scheduled_key_dec[29] = scheduled_key[29];
    }
} | 
Using the inline assembly and `.insn` could avoid current issues regarding intrinsics available when either Zknd or Zkne is available. Note that, coincidental success of the testing process depends on the possibly flaky disassembler behavior (which attempts to disassemble instructions *not* enabled on the target of the output object file) and should be removed now or if anything breaks.
1209865    to
    1a47fad      
    Compare
  
    | Little Update: Added  While RISC-V is normally described as "RISC-V does not have condition flags", by default the Rust compiler assumes that RISC-V inline assembly may overwrite some of vector-related registers. Since we don't touch them, we'd better add it to ensure mixed code don't need saving/restoring such registers. It affects other intrinsics with inline assembly and I'll submit another (non-draft) PR for better efficiency on mixed RVV + non-vector intrinsics. | 
| Also note that, the option  | 
| Ok, if it does optimize satisfyingly enough, we can then probably use the asm implementation. Did you send the rustc PR yet? | 
| @sayantn | 
| I have no problems, but typically changes due to review comments don't need attribution, so you really don't need to put any attributions. | 
| We didn't share the code directly but I think we helped each other so...  I understand that attribution is completely optional but since you have first examined the possibilities to fix the issue #1765, I want to show your contribution in some way. | 
| Closing this in favor of rust-lang/rust#146798. In that PR, inline assembly got symbolic. | 
Using the inline assembly and
.insncould avoid current issues regarding intrinsics available when either Zknd or Zkne is available.Note: this is intentionally incomplete as rustc changes are not performed yet.