Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor: Prerequisite for AutoScheduler #362

Merged
merged 57 commits into from
Dec 22, 2024
Merged
Changes from 1 commit
Commits
Show all changes
57 commits
Select commit Hold shift + click to select a range
6782d42
gemm
hikettei Dec 20, 2024
44a490f
Tile
hikettei Dec 20, 2024
610316a
search tile
hikettei Dec 20, 2024
0cd518a
Merge branch 'main' into opt-gemm
hikettei Dec 20, 2024
f6bcc57
OpenBLAS
hikettei Dec 20, 2024
253cf6f
90GFLops
hikettei Dec 20, 2024
d4c08c4
wip
hikettei Dec 20, 2024
db52790
wip
hikettei Dec 20, 2024
082ae9d
wip
hikettei Dec 20, 2024
9446987
tile param search
hikettei Dec 20, 2024
6a92698
finish an experiment
hikettei Dec 20, 2024
0e435c1
apply tiling without causing segv
hikettei Dec 20, 2024
e32cec2
tiling all bands
hikettei Dec 20, 2024
c4215bc
things are moved
hikettei Dec 20, 2024
4c2de70
move everything into auto-scheduler
hikettei Dec 21, 2024
785d2eb
asd
hikettei Dec 21, 2024
34656d2
fix: tiling-size related gc error
hikettei Dec 21, 2024
6a0c211
An initial attempt to unroll: tile based
hikettei Dec 21, 2024
0ab186d
moved
hikettei Dec 21, 2024
2ce3075
split
hikettei Dec 21, 2024
b22d20f
.
hikettei Dec 21, 2024
874cd17
moved isl things to auto-scheduler
hikettei Dec 21, 2024
b47fd09
refactor
hikettei Dec 21, 2024
e92725d
synchronize baseline
hikettei Dec 21, 2024
e1e5dc7
updt
hikettei Dec 21, 2024
dfca38d
updt
hikettei Dec 21, 2024
0ed2740
typo
hikettei Dec 21, 2024
0adb742
typo
hikettei Dec 21, 2024
378c310
typo
hikettei Dec 21, 2024
d2de5a1
Tweak
hikettei Dec 21, 2024
cd01fe9
nested unroll directive
hikettei Dec 21, 2024
64b8c7c
Unroll for static and simple case
hikettei Dec 21, 2024
3ac9213
Fold MOD
hikettei Dec 21, 2024
0d4095f
nope
hikettei Dec 21, 2024
642acc9
Initial attempt of unrolling
hikettei Dec 21, 2024
ea829fd
copy and remove UNROLL_PARENT for the reminder
hikettei Dec 21, 2024
b7ba19f
Loop Unrolling is worked
hikettei Dec 21, 2024
6df4461
Unrolling softmax
hikettei Dec 21, 2024
994a615
wip: generating a sketch
hikettei Dec 21, 2024
ca032f8
Identify the sketch
hikettei Dec 21, 2024
4cdec95
TODO
hikettei Dec 21, 2024
1c5096e
remove extra unroll reminder
hikettei Dec 21, 2024
bfb2d11
coincident for elwise
hikettei Dec 22, 2024
771fa79
fix: unroll
hikettei Dec 22, 2024
aad075a
quickload
hikettei Dec 22, 2024
1b56342
MemoryPlanner should produce the same result
hikettei Dec 22, 2024
e77703a
Fix for unroll reminder
hikettei Dec 22, 2024
0e9fff8
try unrolling tiled bands
hikettei Dec 22, 2024
6674d7b
ignore tiled band dims
hikettei Dec 22, 2024
ce60a52
base-item
hikettei Dec 22, 2024
5ba4fa3
remove: caten/polyhedral
hikettei Dec 22, 2024
26a4a66
clean up
hikettei Dec 22, 2024
e63ac07
rem dep
hikettei Dec 22, 2024
546d0c4
caten instead
hikettei Dec 22, 2024
f73c49f
updt
hikettei Dec 22, 2024
04cb131
updt
hikettei Dec 22, 2024
c832f5e
updt
hikettei Dec 22, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
wip
  • Loading branch information
hikettei committed Dec 20, 2024
commit d4c08c44b6e8053fcaf2366100dbb3dd0f4c02fd
12 changes: 7 additions & 5 deletions scripts/gemm/gemm.c
Original file line number Diff line number Diff line change
Expand Up @@ -6,11 +6,12 @@
#include <arm_neon.h>

// Optimized by AutoScheduler
#define MB 16
#define NB 16
#define KB 16

void gemm_naive_tiled(int M, int N, int K,
// [TODO] Can't we achieve 90 GFlops with only using gcc?
// Supporting SIMD Intrinsic beyonds the compiler.
// 20~30 GFlops
void gemm(int M, int N, int K,
const float * __restrict A,
const float * __restrict B,
float * __restrict C)
Expand All @@ -25,6 +26,7 @@ void gemm_naive_tiled(int M, int N, int K,
int nMax = (n0 + NB < N) ? (n0 + NB) : N;
for (int j = j0; j < jMax; j++) {
float sum = 0.0f;
#pragma omp simd reduction(+:sum)
for (int n = n0; n < nMax; n++) {
sum += A[i*N + n] * B[n*K + j];
}
Expand All @@ -34,8 +36,8 @@ void gemm_naive_tiled(int M, int N, int K,
}
}
}

void gemm(int M, int N, int K,
// 20~80 GFlops
void gemm_hand_optimized(int M, int N, int K,
const float * __restrict A,
const float * __restrict B,
float * __restrict C)
Expand Down