Improve error classification for unknown and network errors#6803
Improve error classification for unknown and network errors#6803spboyer wants to merge 1 commit intoAzure:mainfrom
Conversation
There was a problem hiding this comment.
Pull request overview
This PR reduces the telemetry “Unknown” error bucket by adding explicit error classifiers in MapError for common context- and network-related failures, improving observability and triage across azd commands.
Changes:
- Classify
context.Canceledasuser.canceledandcontext.DeadlineExceededasinternal.timeout. - Classify common network failures (DNS, connection/TLS-ish wrappers, EOF) as
internal.networkvia a newisNetworkErrorhelper. - Add new unit test cases for
MapErrorand a newTest_isNetworkErrortable test.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
cli/azd/internal/cmd/errors.go |
Extends MapError classification and adds isNetworkError helper for network-related error bucketing. |
cli/azd/internal/cmd/errors_test.go |
Adds new MapError cases and introduces a dedicated isNetworkError test suite. |
Comments suppressed due to low confidence (2)
cli/azd/internal/cmd/errors.go:304
isNetworkErrortreats any*os.SyscallErroras a network error.os.SyscallErroris used for non-network syscalls as well (e.g., process start failures like "fork/exec" and other filesystem/process operations), so this can misclassify unrelated internal/tool errors asinternal.network. Consider removing this broad check, or narrowing it to known network-related syscalls/errno values (or only counting*os.SyscallErrorwhen it is wrapped by*net.OpError).
This issue also appears on line 288 of the same file.
// Check for connection reset / broken pipe via syscall errors
var sysErr *os.SyscallError
if errors.As(err, &sysErr) {
return true
}
cli/azd/internal/cmd/errors.go:304
isNetworkErrorhas branches for*net.OpError,*tls.RecordHeaderError, and*os.SyscallError, but the newTest_isNetworkErrorcases only cover DNS and EOF variants. Adding tests that exercise these additional branches would help prevent regressions and ensure the intended classification behavior stays stable across platforms.
// Check for network operation errors (connection refused, timeout, etc.)
var opErr *net.OpError
if errors.As(err, &opErr) {
return true
}
// Check for TLS errors
var tlsRecordErr *tls.RecordHeaderError
if errors.As(err, &tlsRecordErr) {
return true
}
// Check for connection reset / broken pipe via syscall errors
var sysErr *os.SyscallError
if errors.As(err, &sysErr) {
return true
}
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Add explicit classifiers in MapError for: - context.Canceled as user.canceled (was internal.errors_errorString) - context.DeadlineExceeded as internal.timeout (was internal.errors_errorString) - Network errors (DNS, TLS, connection, EOF) as internal.network This reduces the unknown/internal error bucket by classifying common error patterns that previously fell through to the generic fallback. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
f2485ed to
958a393
Compare
Azure Dev CLI Install InstructionsInstall scriptsMacOS/Linux
bash: pwsh: WindowsPowerShell install MSI install Standalone Binary
MSI
Documentationlearn.microsoft.com documentationtitle: Azure Developer CLI reference
|
Summary
Reduces the Unknown error bucket (28% of all azd errors) by adding explicit classifiers for common error patterns that previously fell through to the generic internal fallback.
Fixes #6796
Changes
errors.go
errors_test.go
Data Context
Over 90 days, 28.22% of all azd errors (~36,132 of 128,054) were classified as unknown.