@rucnyz Thank you for the wonderful benchmark!
I noticed that the secure_coding dataset for C/C++ is actually the patching dataset. This same patching dataset is also present in the Python secure_coding split in here.
As is mentioned in the huggingface community discussions and #2 , could you please update these datasets so that we can use them for our experiments?