Currerntly a todo list:
-
Generate
op lists
byop.yaml
and opshim
and pointwise op (may also with reduction op) -
Compile kernels (cubin) in python with
Triton compile
and cache them in cpp -
Add flaggems kernels && flash attention kernels
-
Add third party gtest and glog
Done:
- use torch LOG, ENV :
TORCH_CPP_LOG_LEVEL=INFO