Proposal for Vector Calling Convention #389

lhtin · 2023-06-16T14:53:03Z

Update 2023-6-29: Updated to a new proposal based on comments, please view the file changes directly.

This proposal is based on the prior work (#171) which is currently implemented in LLVM. The differences between them are:

v1-v15 and v24-v31 is callee save in this proposal for vector functions, but LLVM's one is all caller-save
No ABI for tuple type ABI implemented in LLVM yet
For variadic functions, the current LLVM does inconsistent passing between caller and callee. (https://godbolt.org/z/9qv9n3zYP)

PoCs:

Link of PoC on Clang/LLVM (https://reviews.llvm.org/D154576)
Link of PoC on GCC (https://gcc.gnu.org/pipermail/gcc-patches/2023-July/625042.html)

NOTE: Original post, it doesn't matter if you don't read it.

Hi,

Here's a proposal for vector calling convention. This proposal is based on the prior work (#171) which is currently implemented in LLVM. This proposal extends the range of registers that can be used for vector data types and added more descriptions of the allocation.

Update 2023-6-27: Here (riscv-non-isa/rvv-intrinsic-doc#38) is a proposal that is very similar to the current one.

Reasons for the proposal:

Reason 1: The vector register size tends to be very large, so all vector registers used to pass arguments can reduce memory usage.

In particular, for arguments with LMUL equal to 8, it is possible to support passing in 3 such arguments (i.e. v8-v15, v16-v23, v24-v31).
Reason 2: In RVV, the mask operand of an instruction with a mask must be placed in the v0 register. Let the first vector mask type argument pass in v0 can reduce register move.
Reason 3: Finding the right register segment each time starting from v1 allows more arguments to be passed through the vector registers.

For example for the function void foo(vint32m8_t a1, vint32m8_t a2, vint32m8_t a3, vint32m1_t a4 vint32m2_t a5, vint32m4_t a6);. You can pass a1 into v8-v15, a2 into v16-v23, a3 into v24-v31, a4 into v1, a5 into v2-v3, a6 into v4-v7. If not, a4, a5, and a6 need to be passed by reference.
~~Reason 4: Start from v1 to find LMUL-aligned vector registers segment for return value can reduce the overlap of different LMUL return value vector registers.~~

(Update 2023-6-28: Kito pointed out that this reason is not true. Proposal for Vector Calling Convention #389 (comment))

For example, for the following code snippet. You can return x in v1, and y in v2-v3, they do not overlap.
```
vint32m1_t x = foo1 ();
vint32m2_t y = foo2 (c, d);
vint32m2_t z = __riscv_vwadd_wv (x, y, vl);
```

I have a few points that I am not very sure about and would like your help in looking at.

Do we really need all ilp32v, ilp32fv, ilp32dv?

Update 2023-6-22: After further understanding, there should be no need to add a new abi, but the function that uses the vector calling convention must be annotated with STO_RISCV_VARIANT_CC.
Will using all registers as argument registers to cause the resolver routine in the dynamic link lazy bind cost too much?

Update:

Maybe solved by Calling conventions for the lazily bound functions. #190 ? (point by @sorear ). From bellow two patches, it seems that the STO_RISCV_VARIANT_CC symbol is not yet used inside glibc.
- https://sourceware.org/git/?p=glibc.git;a=commit;h=117e8b341c5c0ace8d65feeef136fececb3fdc9c
- https://sourceware.org/git/?p=glibc.git;a=commit;h=bf88b47ecb54888a789c02fa81aa4ab81ec2f3a5

Feel free to ask any questions or suggestions.

Best,
Lehua (RiVAI)

nick-knight

Some prior work: #171

riscv-cc.adoc

lhtin · 2023-06-17T09:41:25Z

Some prior work: #171

Thank you for sending the previous discussion about vector calling convention.

From this issue and related issues(#66), I found that if all vector registers are used as argument registers, the resolver routine in the dynamic link lazy bind needs to save all vector registers. This work will be huge because the vector register itself can be very large. This issue needs to be further resolved.

sorear · 2023-06-20T15:47:58Z

Suppose XLEN = VLEN = 64 and we are passing an argument of type struct { int a; int b; vint8m1_t c; }. Since the struct has three fields, this proposal causes the struct to be "passed according to the integer calling convention". But since the aggregate is no larger than 2*XLEN, the struct is passed in a pair of integer registers. If the same call occurs with VLEN = 128, the struct will be passed by reference in the proposal. I consider it undesirable for the registers used to pass an argument to depend on the runtime VLEN, so I propose that the integer calling convention be modified to add "aggregates containing a value of vector type" to the list of aggregates always passed by reference in the integer calling convention.

From this issue and related issues(#66), I found that if all vector registers are used as argument registers, the resolver routine in the dynamic link lazy bind needs to save all vector registers. This work will be huge because the vector register itself can be very large. This issue needs to be further resolved.

Wasn't this already resolved in #190?

Do we really need all ilp32v, ilp32fv, ilp32dv

I don't think we should be defining any new ABIs for this; the vector calling convention is exactly the same as the integer calling convention for functions that don't take or return vector values, supporting scalable vector arguments or return values without the V extension is meaningless, and any proposal which prevents mixing vector and non-vector code and requires to recompile the world at a single time is not viable for major distros.

lhtin · 2023-06-21T03:05:43Z

@sorear Thank you for your comments.

Suppose XLEN = VLEN = 64 and we are passing an argument of type struct { int a; int b; vint8m1_t c; }. Since the struct has three fields, this proposal causes the struct to be "passed according to the integer calling convention". But since the aggregate is no larger than 2*XLEN, the struct is passed in a pair of integer registers. If the same call occurs with VLEN = 128, the struct will be passed by reference in the proposal. I consider it undesirable for the registers used to pass an argument to depend on the runtime VLEN, so I propose that the integer calling convention be modified to add "aggregates containing a value of vector type" to the list of aggregates always passed by reference in the integer calling convention.

I think the way to pass parameters does not depend on runtime VLEN, this compiler compile is also not aware of this information. Here when VLEN=64 and 128, the inconsistency is because you specify vl64 and vl128 at compile time and use the fixed-vlmax option. If you don't specify it, the compiler can only assume that it is VLEN agnostic and doesn't know if it is less than or equal to 2*XLEN. For safety reasons, it must be passed through memory. I think this is a bit like the difference between struct { long long a; long long b; } on top of ilp32 and lp64, where on ilp32 it pass in memory but use two registers in lp64.

Wasn't this already resolved in #190?

Yes! Thank you for pointing that out.

According to my superficial understanding, this means that if you use vector registers to pass parameters, you need to set the STO_RISCV_VARIANT_CC attribute. So I understand that the resolver function does not use the vector register if it finds this attribute or saves it before using it and restores it afterward. Is it right?

Update: After looking at the relevant patches again, it should be forced to bind directly instead of lazy bind when the function comes with STO_RISCV_VARIANT_CC.

I don't think we should be defining any new ABIs for this; the vector calling convention is exactly the same as the integer calling convention for functions that don't take or return vector values, supporting scalable vector arguments or return values without the V extension is meaningless, and any proposal which prevents mixing vector and non-vector code and requires to recompile the world at a single time is not viable for major distros.

How do you know if a compiled library uses a vector register as an argument without adding a new abi? If one is used and one is not, then linking them together will cause problems, right?

sorear · 2023-06-21T12:37:54Z

I think the way to pass parameters does not depend on runtime VLEN, this compiler compile is also not aware of this information. Here when VLEN=64 and 128, the inconsistency is because you specify vl64 and vl128 at compile time and use the fixed-vlmax option.

To be clear, I'm talking about the text of the proposal, not the current gcc behavior. I believe that the proposal as written could be interpreted as saying that the compiler needs to look at vlenb before deciding which registers to pass the argument in.

For safety reasons, it must be passed through memory.

This is the behavior I want - I propose we make it clear in the ABI specification.

So I understand that the resolver function does not use the vector register if it finds this attribute or saves it before using it and restores it afterward. Is it right?

Update: After looking at the relevant patches again, it should be forced to bind directly instead of lazy bind when the function comes with STO_RISCV_VARIANT_CC.

I would argue that both of those are allowed implementations.

How do you know if a compiled library uses a vector register as an argument without adding a new abi? If one is used and one is not, then linking them together will cause problems, right?

The calling convention is per-function. If the caller of a function passes a vector register and the callee expects a vector argument, it works. If the caller is using the integer convention and the callee expects only integers, it works since both sides are using the integer convention. If the caller and the callee have mismatched prototypes, we are already looking at undefined behavior.

nick-knight · 2023-06-21T14:16:00Z

Regarding the thread:

Do we really need all ilp32v, ilp32fv, ilp32dv

I don't think we should be defining any new ABIs for this;

Our psABI currently says:

Vector registers are not used for passing arguments or return values; we intend to define a new calling convention variant to allow that as a future software optimization.

My understanding is that this PR proposes such a variant. I don't really care what us humans call it. I just need to know what to tell my compiler (e.g., via mabi) to ensure my C program leverages the new calling convention variant. And I (and my linker) need to be able to inspect an ELF to deduce which calling convention variant was used.

lhtin · 2023-06-22T07:23:47Z

@sorear @nick-knight For the question of whether new ABIs need to be added. After re-understood the existing ABI specification, functions need to be marked as STO_RISCV_VARIANT_CC for cases that are not compatible with the current ABI specification, so that the compiler, inker, and loader can distinguish them and prevent caller and callee from using different specifications.

Therefore, it should not be necessary to add a new abi. However, the compiler needs to add an option to enable the vector calling convention and annotate the function that uses vector registers to pass arguments or return value with STO_RISCV_VARIANT_CC.

lhtin · 2023-06-22T07:34:58Z

This is the behavior I want - I propose we make it clear in the ABI specification.

I think this proposal is reasonable. Currently, neither GCC nor LLVM supports putting vector type fields in struct structures. Adding this rule should not introduce a breaking change. I descript it as below:

The size of the vector type is considered to be unknown. So if structs contain a field of
vector type, they always are passed by reference.

NOTE: The vector type mentioned here refers to the type defined https://github.com/riscv-non-isa/rvv-intrinsic-doc/blob/master/rvv-intrinsic-rfc.md#type-system[here]

riscv-cc.adoc

kito-cheng · 2023-06-28T02:24:47Z

Reason 4: Start from v1 to find LMUL-aligned vector registers segment for return value can reduce the overlap of different LMUL return value vector registers.

For example, for the following code snippet. You can return x in v1, and y in v2-v3, they do not overlap.

vint32m1_t x = foo1 ();
vint32m2_t y = foo2 (c, d);
vint32m2_t z = __riscv_vwadd_wv (x, y, vl);

This is not true, because all return value register are all call-clobber, so it not really get benefit on this case.

lhtin · 2023-06-28T02:39:28Z

Reason 4: Start from v1 to find LMUL-aligned vector registers segment for return value can reduce the overlap of different LMUL return value vector registers.
For example, for the following code snippet. You can return x in v1, and y in v2-v3, they do not overlap.
vint32m1_t x = foo1 ();
vint32m2_t y = foo2 (c, d);
vint32m2_t z = __riscv_vwadd_wv (x, y, vl);

This is not true, because all return value register are all call-clobber, so it not really get benefit on this case.

Indeed, thank you very much for pointing that out. I've added a delete line to Reason 4.

kito-cheng · 2023-06-28T03:02:42Z

|===
| Name    | ABI Mnemonic | Meaning                      | Preserved across calls?

|v0       |              | Argument register for mask*  | No
|v1-v7    |              | Callee-saved registers        | Yes
|v8-v23   |              | Argument registers           | No
|v24-v31  |              | Callee-saved registers       | Yes
|===

I would like add callee-save registers rather than all caller-save, this could benefit those kernel loop contain with vector function, 16+1 argument register can hold 2 LMUL=8 value and one vector mask, which is satisfy almost all math function, the only function take 3 argument in std c math is fma, which can be expanded to instruction.

e.g.

void func (){
  vector_type a, b, x, y, z;
  for (i = 0 ~ n)
  {
     a = foo (b); // vectorized foo
     // `a` can be moved into a callee-save register instead of spill to stack.
     x = bar (y); // vectorized bar
     z = a + x;
  }
}

Just emphasize one point is the outer function func still define all vector register as caller-save register, so no callee-save overhead for func, the rule only applied on foo and bar, SVE also define similar rule for their vector calling convention.

sorear · 2023-06-28T04:04:25Z

What's the use case for temporary registers? We have integer temporary registers because they're needed for some kinds of stub code, and I think float temporaries exist only to align the float ABI register names with the integer ABI register names, but with LMUL=8 types it's much easier to write a function that needs 24 vector registers than it is for integer registers. Making v1 - v7 non-arguments wouldn't affect most functions but I'm also not sure what this is accomplishing.

Far more hesitant about a block of call-preserved registers. I remember, years and years ago before there even was a vector spec, a decision being made to never add new call-preserved U-mode state, does anyone remember the complete list of reasons why that was? setjmp shouldn't be an issue because setjmp doesn't take any vector arguments, so the compiler won't try to have vectors live across it. .eh_frame could be a problem will also need careful handing; my first impression is that if vectors are not allowed to be live at landing pad entry, then vector-related entries don't need to appear in the unwind info, but this needs a closer analysis.

kito-cheng · 2023-06-28T09:16:57Z

@sorear updated the table, I just changed the column of Preserved across calls? but no update with Meaning column.

What's the use case for temporary registers? We have integer temporary registers because they're needed for some kinds of stub code, and I think float temporaries exist only to align the float ABI register names with the integer ABI register names, but with LMUL=8 types it's much easier to write a function that needs 24 vector registers than it is for integer registers. Making v1 - v7 non-arguments wouldn't affect most functions but I'm also not sure what this is accomplishing.

caller can hold value on v1-v7 and v24-v31, which is one LMUL=8, one LMUL=4, one LMUL=2 and one LMUL=1 values, this might not easy to use when LMUL=8, but would be useful to hold multiple value across the function call with LMUL=4~1.

Making v1-v7 and v24-v31 could reduce the potential spill/reload around the vectorized function call, use same example with more comment to demonstrate the idea.

All callee-save:

void func (){
  vector_type a, b, x, y, z;
  for (i = 0 ~ n)
  {
     b = load
     a = foo (b); // vectorized foo
     // spill a, b
     y = load
     x = bar (y); // vectorized bar
     // reload a, b
     z = a + x + b;
  }
}

So a and b will always spill/reload N times where N is the number of iteration.

All callee-save:

void func (){
  // NO callee-save reg store/restore since func is NOT using vector cc
  vector_type a, b, x, y, z;
  for (i = 0 ~ n)
  {
     b = load
     // Put b into vector arg register
     // Copy b to callee-save register
     a = foo (b); // vectorized foo
     // Move a to callee-save register
     y = load
     x = bar (y); // vectorized bar
     z = a + x + b;
  }
}

So there is no spill/reload around within the loop.

You might ask what about foo and bar? they won't have callee-save reg overhead IF the vector reg pressure is less than 17, but what if the reg pressure larger than 17? save/restore callee-save reg at prologue and epilogue, that would be same situation as all callee-save.

So in this design we can gain some performance IF those vectorized function has lower reg pressure, the worst case is same as the baseline design, overall is net gain.

Far more hesitant about a block of call-preserved registers. I remember, years and years ago before there even was a vector spec, a decision being made to never add new call-preserved U-mode state, does anyone remember the complete list of reasons why that was? setjmp shouldn't be an issue because setjmp doesn't take any vector arguments, so the compiler won't try to have vectors live across it.

Adding call-preserved vector registers into integer or HW floating point cc will cause ABI breakage, it will cause same problem as passing argument in vector reg, and this issue has addressed by STO_RISCV_VARIANT_CC :)

cc @aswaterman might have memory on those early things.

.eh_frame could be a problem will also need careful handing; my first impression is that if vectors are not allowed to be live at landing pad entry, then vector-related entries don't need to appear in the unwind info, but this needs a closer analysis.

Fortunately RVV is not forerunner on scalable vector, unwinding stuffs already resolved by ARM folks and we just follow them (patch for unwinding scalable vector gcc-mirror/gcc@89367e7) :P

sorear · 2023-06-28T19:52:28Z

So there is no spill/reload around within the loop.

With the changed proposal to have 17 call-clobbered argument registers, 15 call-saved, and 0 temporary (call-clobbered non-argument) registers I no longer have a question about the purpose of the last category :)

I also see that saving 16+12 registers in any vectorized function that calls non-vectorized functions was implemented and found adequate for SVE, so it should be good for us.

(Feedback from people with real-world experience with SVE calling conventions would be useful - I'm going off first principles here. Did 16 saved vector registers turn out to be a useful number or should a different one be picked?)

Fortunately RVV is not forerunner on scalable vector, unwinding stuffs already resolved by ARM folks and we just follow them (patch for unwinding scalable vector gcc-mirror/gcc@89367e7) :P

I was asking more about the case where a vectorized function contains a try/catch block. Since the exception unwinding could pass through any number of frames with saved vector registers, either the unwinder needs to be able to restore the saved vector registers, or the compiler needs to be aware that saved vector registers are lost on the edge between a throwing call and the landing pad.

Incidentally this does NOT work on aarch64, with the following test program, qemu-aarch64 8.0.0, and gcc 12.2.0 or 13.1.0:

#include <arm_sve.h>
volatile svint8_t *space;
typedef svint8_t (*Ft)(svint8_t);
volatile Ft g;
inline Ft hide(Ft f) { asm("" : "=r"(f) : "0"(f)); return f; }
svint8_t F3(svint8_t arg) {
    throw 5;
}
svint8_t F2(svint8_t arg) {
    svint8_t temp = *space;
    svint8_t ret = hide(F3)(arg);
    *space = temp;
    return ret;
}
svint8_t F1(svint8_t arg) {
    try {
        return hide(F2)(arg);
    } catch (int x) {
        return arg;
    }   
}
svint8_t F0(svint8_t arg) {
    hide(F1)(arg);
    return arg;
}
int main() {
    volatile svint8_t sp = svdup_s8(3);
    space = &sp;
    svint8_t rv = hide(F0)(svdup_s8(2));
    return svclastb_n_s8(svdup_b8(1),6,rv);
}

riscv-cc.adoc

lhtin · 2023-06-29T07:32:08Z

|===
| Name    | ABI Mnemonic | Meaning                      | Preserved across calls?

|v0       |              | Argument register for mask*  | No
|v1-v7    |              | Callee-saved registers          | Yes
|v8-v23   |              | Argument registers           | No
|v24-v31  |              | Callee-saved registers         | Yes
|===
I would like add callee-save registers rather than all caller-save, this could benefit those kernel loop contain with vector function, 16+1 argument register can hold 2 LMUL=8 value and one vector mask, which is satisfy almost all math function, the only function take 3 argument in std c math is fma, which can be expanded to instruction.

e.g.
void func (){
  vector_type a, b, x, y, z;
  for (i = 0 ~ n)
  {
     a = foo (b); // vectorized foo
     // `a` can be moved into a callee-save register instead of spill to stack.
     x = bar (y); // vectorized bar
     z = a + x;
  }
}
Just emphasize one point is the outer function func still define all vector register as caller-save register, so no callee-save overhead for func, the rule only applied on foo and bar, SVE also define similar rule for their vector calling convention.

This proposal is more sophisticated and reasonable. I think I should update based on this and wait for more people to review it.

Also passing arguments of vector type to the unprototype function should report an error, similar to ARM SVE's ABI. Is this rule to be stated in this document as well?

kito-cheng · 2023-06-29T09:52:46Z

@sorear

With the changed proposal to have 17 call-clobbered argument registers, 15 call-saved, and 0 temporary (call-clobbered non-argument) registers I no longer have a question about the purpose of the last category :)

Cool, I'm glad we have a consensus here!

(Feedback from people with real-world experience with SVE calling conventions would be useful - I'm going off first principles here. Did 16 saved vector registers turn out to be a useful number or should a different one be picked?)

Yeah, let me tag few more people once @lhtin updated :)

I was asking more about the case where a vectorized function contains a try/catch block. Since the exception unwinding could pass through any number of frames with saved vector registers, either the unwinder needs to be able to restore the saved vector registers, or the compiler needs to be aware that saved vector registers are lost on the edge between a throwing call and the landing pad.

Incidentally this does NOT work on aarch64, with the following test program, qemu-aarch64 8.0.0, and gcc 12.2.0 or 13.1.0:

Thanks for providing the case, I can reproduce with linaro aarch64 toolchain, we (SiFive folks) resolved several unwinding issue, so I thought should be not issues around their, but seems like not :(

kito-cheng · 2023-06-29T09:53:43Z

@lhtin

This proposal is more sophisticated and reasonable. I think I should update based on this and wait for more people to review it.

Thanks!

Also passing arguments of vector type to the unprototype function should report an error, similar to ARM SVE's ABI. Is this rule to be stated in this document as well?

Yeah, forbid that would be easier, so let us follow SVE here :)

riscv-cc.adoc

kito-cheng · 2023-07-07T08:21:46Z

@lhtin Could you add few more info on the post?

Link of PoC on clang/LLVM (https://reviews.llvm.org/D154576)
Link of PoC on GCC (Add once you have updated one)
Comparison with existing experimental vector calling convention on LLVM:
- v1-v15 and v24-v31 is callee save in this proposal, but LLVM's one is all caller-save
- No ABI for tuple type ABI implemented in LLVM yet.

…note

kito-cheng

LGTM, thanks @lhtin and everyone!

I believe we gather enough feedback and address all major comments, so I think we are ready to land, however it's vacations season for most American and European, therefore I would like to wait few more time, my plan is merge this and #380 at 2024/1/5 IF no further strong comment/objection.

kito-cheng · 2024-01-08T03:25:08Z

Thanks everyone, finally we have an official vector calling convention after few years :)

… and returns I post the vector register calling convention rules from in the proposal[1] directly here: v0 is used to pass the first vector mask argument to a function, and to return vector mask result from a function. v8-v23 are used to pass vector data arguments, vector tuple arguments and the rest vector mask arguments to a function, and to return vector data and vector tuple results from a function. Each vector data type and vector tuple type has an LMUL attribute that indicates a vector register group. The value of LMUL indicates the number of vector registers in the vector register group and requires the first vector register number in the vector register group must be a multiple of it. For example, the LMUL of `vint64m8_t` is 8, so v8-v15 vector register group can be allocated to this type, but v9-v16 can not because the v9 register number is not a multiple of 8. If LMUL is less than 1, it is treated as 1. If it is a vector mask type, its LMUL is 1. Each vector tuple type also has an NFIELDS attribute that indicates how many vector register groups the type contains. Thus a vector tuple type needs to take up LMUL×NFIELDS registers. The rules for passing vector arguments are as follows: 1. For the first vector mask argument, use v0 to pass it. The argument has now been allocated. 2. For vector data arguments or rest vector mask arguments, starting from the v8 register, if a vector register group between v8-v23 that has not been allocated can be found and the first register number is a multiple of LMUL, then allocate this vector register group to the argument and mark these registers as allocated. Otherwise, pass it by reference. The argument has now been allocated. 3. For vector tuple arguments, starting from the v8 register, if NFIELDS consecutive vector register groups between v8-v23 that have not been allocated can be found and the first register number is a multiple of LMUL, then allocate these vector register groups to the argument and mark these registers as allocated. Otherwise, pass it by reference. The argument has now been allocated. NOTE: It should be stressed that the search for the appropriate vector register groups starts at v8 each time and does not start at the next register after the registers are allocated for the previous vector argument. Therefore, it is possible that the vector register number allocated to a vector argument can be less than the vector register number allocated to previous vector arguments. For example, for the function `void foo (vint32m1_t a, vint32m2_t b, vint32m1_t c)`, according to the rules of allocation, v8 will be allocated to `a`, v10-v11 will be allocated to `b` and v9 will be allocated to `c`. This approach allows more vector registers to be allocated to arguments in some cases. Vector values are returned in the same manner as the first named argument of the same type would be passed. [1] riscv-non-isa/riscv-elf-psabi-doc#389 gcc/ChangeLog: * config/riscv/riscv-protos.h (builtin_type_p): New function for checking vector type. * config/riscv/riscv-vector-builtins.cc (builtin_type_p): Ditto. * config/riscv/riscv.cc (struct riscv_arg_info): New fields. (riscv_init_cumulative_args): Setup variant_cc field. (riscv_vector_type_p): New function for checking vector type. (riscv_hard_regno_nregs): Hoist declare. (riscv_get_vector_arg): Subroutine of riscv_get_arg_info. (riscv_get_arg_info): Support vector cc. (riscv_function_arg_advance): Update cum. (riscv_pass_by_reference): Handle vector args. (riscv_v_abi): New function return vector abi. (riscv_return_value_is_vector_type_p): New function for check vector arguments. (riscv_arguments_is_vector_type_p): New function for check vector returns. (riscv_fntype_abi): Implement TARGET_FNTYPE_ABI. (TARGET_FNTYPE_ABI): Implement TARGET_FNTYPE_ABI. * config/riscv/riscv.h (GCC_RISCV_H): Define macros for vector abi. (MAX_ARGS_IN_VECTOR_REGISTERS): Ditto. (MAX_ARGS_IN_MASK_REGISTERS): Ditto. (V_ARG_FIRST): Ditto. (V_ARG_LAST): Ditto. (enum riscv_cc): Define all RISCV_CC variants. * config/riscv/riscv.opt: Add --param=riscv-vector-abi. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/base/abi-call-args-1-run.c: New test. * gcc.target/riscv/rvv/base/abi-call-args-1.c: New test. * gcc.target/riscv/rvv/base/abi-call-args-2-run.c: New test. * gcc.target/riscv/rvv/base/abi-call-args-2.c: New test. * gcc.target/riscv/rvv/base/abi-call-args-3-run.c: New test. * gcc.target/riscv/rvv/base/abi-call-args-3.c: New test. * gcc.target/riscv/rvv/base/abi-call-args-4-run.c: New test. * gcc.target/riscv/rvv/base/abi-call-args-4.c: New test. * gcc.target/riscv/rvv/base/abi-call-error-1.c: New test. * gcc.target/riscv/rvv/base/abi-call-return-run.c: New test. * gcc.target/riscv/rvv/base/abi-call-return.c: New test.

jrtc27

I note that this was merged despite being 21(!!) individual commits with all manner of fixups, making the history of the repository a mess. Can we please be more diligent in future about ensuring we merge a sensible series of commits?

jrtc27 · 2024-05-25T18:38:40Z

riscv-cc.adoc

+
+Vector arguments and return values are disallowed to pass to an unprototyped
+function.
+


This should not have been done like this. STO_RISCV_VARIANT_CC is an ELF-specific thing (a flag in ElfXX_Sym's st_other) detailed in riscv-elf.adoc, but riscv-cc.adoc is a separate document that describes the calling convention independently from the underlying file format (and could be used by Windows or macOS if they choose to adopt RISC-V, which use PE/COFF and Mach-O respectively).

lhtin changed the title ~~a vector abi proposal~~ A Vector ABI Proposal Jun 16, 2023

lhtin changed the title ~~A Vector ABI Proposal~~ A Vector Calling Convention Proposal Jun 16, 2023

lhtin changed the title ~~A Vector Calling Convention Proposal~~ Proposal for Vector Calling Convention Jun 17, 2023

nick-knight reviewed Jun 17, 2023

View reviewed changes

riscv-cc.adoc Outdated Show resolved Hide resolved

riscv-cc.adoc Outdated Show resolved Hide resolved

lhtin force-pushed the master branch from 9f04ae2 to 18d88a2 Compare June 20, 2023 06:25

sorear mentioned this pull request Jun 20, 2023

representation of GNU C fixed-size vectors #390

Open

kito-cheng reviewed Jun 27, 2023

View reviewed changes

sorear reviewed Jun 28, 2023

View reviewed changes

riscv-cc.adoc Show resolved Hide resolved

riscv-cc.adoc Outdated Show resolved Hide resolved

riscv-cc.adoc Outdated Show resolved Hide resolved

riscv-cc.adoc Outdated Show resolved Hide resolved

riscv-cc.adoc Show resolved Hide resolved

riscv-cc.adoc Outdated Show resolved Hide resolved

kito-cheng reviewed Jun 29, 2023

View reviewed changes

riscv-cc.adoc Outdated Show resolved Hide resolved

kito-cheng reviewed Jun 29, 2023

View reviewed changes

riscv-cc.adoc Outdated Show resolved Hide resolved

riscv-cc.adoc Outdated Show resolved Hide resolved

kito-cheng reviewed Jun 30, 2023

View reviewed changes

riscv-cc.adoc Outdated Show resolved Hide resolved

kito-cheng reviewed Jul 5, 2023

View reviewed changes

riscv-cc.adoc Outdated Show resolved Hide resolved

riscv-cc.adoc Outdated Show resolved Hide resolved

riscv-cc.adoc Outdated Show resolved Hide resolved

riscv-cc.adoc Outdated Show resolved Hide resolved

riscv-cc.adoc Outdated Show resolved Hide resolved

kito-cheng mentioned this pull request Jul 6, 2023

Draft for vector calling convention. #171

Closed

lhtin added 12 commits December 22, 2023 16:21

Fix cross document link

d7b1e51

address some comments by sorear

78c9230

udpate to a new proposal by kito

c1c6635

rule for unprototype function

21b9592

add some note

0437e69

adjust some description

2c7a11f

adjust some description

b89beab

address some comments

4724a1c

address some comments by Kito

814a846

add the scope of application of vector cc

07b4a5a

Rename vector tuple argument to tuple vector data argument and add a …

0283e38

…note

Address Kito's Comments

0fc69ed

lhtin force-pushed the master branch from 9b673bf to 0fc69ed Compare December 22, 2023 08:27

kito-cheng approved these changes Dec 22, 2023

View reviewed changes

kito-cheng merged commit d4c38ee into riscv-non-isa:master Jan 8, 2024
4 checks passed

jrtc27 reviewed May 25, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proposal for Vector Calling Convention #389

Proposal for Vector Calling Convention #389

lhtin commented Jun 16, 2023 •

edited

Loading

nick-knight left a comment

lhtin commented Jun 17, 2023

sorear commented Jun 20, 2023

lhtin commented Jun 21, 2023 •

edited

Loading

sorear commented Jun 21, 2023

nick-knight commented Jun 21, 2023 •

edited

Loading

lhtin commented Jun 22, 2023

lhtin commented Jun 22, 2023 •

edited

Loading

kito-cheng commented Jun 28, 2023

lhtin commented Jun 28, 2023

kito-cheng commented Jun 28, 2023 •

edited

Loading

sorear commented Jun 28, 2023

kito-cheng commented Jun 28, 2023

sorear commented Jun 28, 2023

lhtin commented Jun 29, 2023

kito-cheng commented Jun 29, 2023

kito-cheng commented Jun 29, 2023

kito-cheng commented Jul 7, 2023

kito-cheng left a comment

kito-cheng commented Jan 8, 2024

jrtc27 left a comment

jrtc27 May 25, 2024


		Vector arguments and return values are disallowed to pass to an unprototyped
		function.

Proposal for Vector Calling Convention #389

Proposal for Vector Calling Convention #389

Conversation

lhtin commented Jun 16, 2023 • edited Loading

nick-knight left a comment

Choose a reason for hiding this comment

lhtin commented Jun 17, 2023

sorear commented Jun 20, 2023

lhtin commented Jun 21, 2023 • edited Loading

sorear commented Jun 21, 2023

nick-knight commented Jun 21, 2023 • edited Loading

lhtin commented Jun 22, 2023

lhtin commented Jun 22, 2023 • edited Loading

kito-cheng commented Jun 28, 2023

lhtin commented Jun 28, 2023

kito-cheng commented Jun 28, 2023 • edited Loading

sorear commented Jun 28, 2023

kito-cheng commented Jun 28, 2023

sorear commented Jun 28, 2023

lhtin commented Jun 29, 2023

kito-cheng commented Jun 29, 2023

kito-cheng commented Jun 29, 2023

kito-cheng commented Jul 7, 2023

kito-cheng left a comment

Choose a reason for hiding this comment

kito-cheng commented Jan 8, 2024

jrtc27 left a comment

Choose a reason for hiding this comment

jrtc27 May 25, 2024

Choose a reason for hiding this comment

lhtin commented Jun 16, 2023 •

edited

Loading

lhtin commented Jun 21, 2023 •

edited

Loading

nick-knight commented Jun 21, 2023 •

edited

Loading

lhtin commented Jun 22, 2023 •

edited

Loading

kito-cheng commented Jun 28, 2023 •

edited

Loading