-
Notifications
You must be signed in to change notification settings - Fork 46
Parameter Sweep for Attention #1830
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
e6e74e6
to
e0b3124
Compare
…fRunner instead of inheriting it.
e5563cc
to
023185f
Compare
Codecov ReportAll modified and coverable lines are covered by tests ✅ Additional details and impacted files@@ Coverage Diff @@
## develop #1830 +/- ##
===========================================
- Coverage 78.52% 77.99% -0.52%
===========================================
Files 100 100
Lines 29907 30057 +150
Branches 4452 4656 +204
===========================================
- Hits 23482 23442 -40
- Misses 4590 4601 +11
- Partials 1835 2014 +179
Flags with carried forward coverage won't be shown. Click here to find out more. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR introduces a parameter sweep for attention-based kernels to test various input shapes and performance configuration combinations and to uncover potential bugs.
- Implements a parameter sweep generator for testing attention configurations.
- Adds asynchronous test runners and logging for failing configurations.
- Provides a command-line interface for configuring the sweep.
Comments suppressed due to low confidence (1)
mlir/utils/performance/attentionSweeps.py:66
- [nitpick] Ensure that attribute names in generateMlirDriverArgs are consistent with those set in AttentionConfiguration (e.g., 'dataType' vs 'dtype' and 'numCU' vs 'numCu'). Consistent naming helps avoid confusion and potential runtime errors.
'-t', self.dataType,
parser.add_argument('--quiet', action='store_true') | ||
parser.add_argument('--jobs', type=int, default=os.cpu_count()) | ||
parser.add_argument('--mlir-build-dir', type=str, required=True) | ||
parser.add_argument('--samples', type=int, default=1000) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we should use a time limit instead of number of samples? Because different machines will take diff time for 1k samples. @umangyadav
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we should use a time limit instead of number of samples? Because different machines will take diff time for 1k samples. @umangyadav @dhernandez0
There is option to add timeout in the CI function which would call attentionSweeps.py, which can be used independently of the samples number.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe void parameterSweep(...) in CI, could be changed to something like:
void parameterSweep(String CONFIG, String codepath, String sweepType = "parameter") {
timeout(time: 300, activity: true, unit: 'MINUTES') {
dir('build') {
if (sweepType == "attention") {
sh """python3 ./bin/attentionSweeps.py"""
} else {
sh """python3 ./bin/parameterSweeps.py -j 5 ${CONFIG}"""
}
}
}
}
it should be time limited then I think ...
and call in the stage would be something like:
stage("Parameter Sweep") {
steps {
script {
// ...
]) {
parameterSweep("conv_structure", "${CODEPATH}")
parameterSweep("perf_config", "${CODEPATH}")
parameterSweep("", "${CODEPATH}", "attention")
}
}
}
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but I think if it times out, it will make CI fail?
Signed-off-by: Djordje Antic <djoantic@amd.com>
Signed-off-by: Djordje Antic <djoantic@amd.com>
Signed-off-by: Djordje Antic <djoantic@amd.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks for creating this script, great work!
TODO (in a follow-up tickets/PRs):
|
…hanges Add return value to initializeDataTypesAttention() in perfRunner so that it can be used easily in other modules. Invoke initializeDataTypesAttention() in attentionSweeps.py. Signed-off-by: Djordje Antic <djoantic@amd.com>
Implement parameter sweep for attention (
attentionSweeps.py
) that will test combinations of input shapes and perfConfigs for attention and find potential bugs. AdjustparameterSweeps.py
so that attentionSweeps.py can reuse its methods.Resolves https://github.com/ROCm/rocMLIR-internal/issues/1800