Skip to content

Commit

Permalink
[Reland] Use head sha timestamp as end date for similar failure search (
Browse files Browse the repository at this point in the history
#5160)

The context is in #5151. This
reland PR adds 2 more fixes:

* Do a left join from `workflow_job` to `push`, so that Dr.CI can always
find all the jobs from the PR even when the commit SHA is not found on
`push` in the case of forked PRs. The `head_sha_timestamp` field will be
empty then.
* When the `head_sha_timestamp` is empty, call `fetchCommitTimestamp` to
get the timestamp directly from GitHub. This is done once per commit.

Note that if the GitHub query fails and `head_sha_timestamp` is still
empty. Dr.CI won't apply similar flaky search to avoid FP, the search
query would expand to the current date otherwise.

### Testing

```
curl --request POST \
--url "http://localhost:3000/api/drci/drci?prNumber=PR_NUMBER" \
--header "Authorization: TOKEN" \
--data 'repo=pytorch'
```

1. pytorch/pytorch#125271, new forked PR, no
ciflow. `head_sha_timestamp` from Rockset is empty and
`fetchCommitTimestamp` is invoked. Dr.CI continues to work.

<details open><summary><b>NEW FAILURES</b> - The following jobs have
failed:</summary><p>

* [Lint / lintrunner-clang /
linux-job](https://hud.pytorch.org/pr/pytorch/pytorch/125271#24449212917)
([gh](https://github.com/pytorch/pytorch/actions/runs/8902585059/job/24449212917))
    `>>> Lint for torch/csrc/utils/tensor_memoryformats.cpp:`
* [pull / linux-focal-cuda12.1-py3.10-gcc9 / test (default, 2, 5,
linux.4xlarge.nvidia.gpu)](https://hud.pytorch.org/pr/pytorch/pytorch/125271#24449643728)
([gh](https://github.com/pytorch/pytorch/actions/runs/8902585046/job/24449643728))
    `test_autograd.py::TestAutograd::test_type_conversions`
* [pull / linux-focal-cuda12.1-py3.10-gcc9-sm86 / test (default, 2, 5,
linux.g5.4xlarge.nvidia.gpu)](https://hud.pytorch.org/pr/pytorch/pytorch/125271#24450124622)
([gh](https://github.com/pytorch/pytorch/actions/runs/8902585046/job/24450124622))
    `test_autograd.py::TestAutograd::test_type_conversions`
* [pull / linux-focal-py3.11-clang10 / test (crossref, 2, 2,
linux.2xlarge)](https://hud.pytorch.org/pr/pytorch/pytorch/125271#24449335282)
([gh](https://github.com/pytorch/pytorch/actions/runs/8902585046/job/24449335282))
    `test_autograd.py::TestAutograd::test_type_conversions`
* [pull / linux-focal-py3.11-clang10 / test (default, 1, 3,
linux.2xlarge)](https://hud.pytorch.org/pr/pytorch/pytorch/125271#24449334520)
([gh](https://github.com/pytorch/pytorch/actions/runs/8902585046/job/24449334520))

`test_tensor_creation_ops.py::TestTensorCreationCPU::test_constructor_dtypes_cpu`
* [pull / linux-focal-py3.11-clang10 / test (default, 2, 3,
linux.2xlarge)](https://hud.pytorch.org/pr/pytorch/pytorch/125271#24449334757)
([gh](https://github.com/pytorch/pytorch/actions/runs/8902585046/job/24449334757))
    `test_autograd.py::TestAutograd::test_type_conversions`
* [pull / linux-focal-py3.11-clang10 / test (dynamo, 3, 3,
linux.2xlarge)](https://hud.pytorch.org/pr/pytorch/pytorch/125271#24449335837)
([gh](https://github.com/pytorch/pytorch/actions/runs/8902585046/job/24449335837))
    `test_autograd.py::TestAutograd::test_type_conversions`
* [pull / linux-focal-py3.12-clang10 / test (default, 1, 3,
linux.2xlarge)](https://hud.pytorch.org/pr/pytorch/pytorch/125271#24449281229)
([gh](https://github.com/pytorch/pytorch/actions/runs/8902585046/job/24449281229))

`test_tensor_creation_ops.py::TestTensorCreationCPU::test_constructor_dtypes_cpu`
* [pull / linux-focal-py3.12-clang10 / test (default, 2, 3,
linux.2xlarge)](https://hud.pytorch.org/pr/pytorch/pytorch/125271#24449281368)
([gh](https://github.com/pytorch/pytorch/actions/runs/8902585046/job/24449281368))
    `test_autograd.py::TestAutograd::test_type_conversions`
* [pull / linux-focal-py3.12-clang10 / test (dynamo, 3, 3,
linux.2xlarge)](https://hud.pytorch.org/pr/pytorch/pytorch/125271#24449282003)
([gh](https://github.com/pytorch/pytorch/actions/runs/8902585046/job/24449282003))
    `test_autograd.py::TestAutograd::test_type_conversions`
* [pull / linux-focal-py3.8-clang10 / test (crossref, 2, 2,
linux.2xlarge)](https://hud.pytorch.org/pr/pytorch/pytorch/125271#24449309061)
([gh](https://github.com/pytorch/pytorch/actions/runs/8902585046/job/24449309061))
    `test_autograd.py::TestAutograd::test_type_conversions`
* [pull / linux-focal-py3.8-clang10 / test (default, 1, 3,
linux.2xlarge)](https://hud.pytorch.org/pr/pytorch/pytorch/125271#24449308208)
([gh](https://github.com/pytorch/pytorch/actions/runs/8902585046/job/24449308208))

`test_tensor_creation_ops.py::TestTensorCreationCPU::test_constructor_dtypes_cpu`
* [pull / linux-focal-py3.8-clang10 / test (default, 2, 3,
linux.2xlarge)](https://hud.pytorch.org/pr/pytorch/pytorch/125271#24449308391)
([gh](https://github.com/pytorch/pytorch/actions/runs/8902585046/job/24449308391))
    `test_autograd.py::TestAutograd::test_type_conversions`
* [pull / linux-focal-py3.8-clang10 / test (dynamo, 3, 3,
linux.2xlarge)](https://hud.pytorch.org/pr/pytorch/pytorch/125271#24449309632)
([gh](https://github.com/pytorch/pytorch/actions/runs/8902585046/job/24449309632))
    `test_autograd.py::TestAutograd::test_type_conversions`
* [pull / linux-jammy-py3.10-clang15-asan / test (default, 2, 6,
linux.4xlarge)](https://hud.pytorch.org/pr/pytorch/pytorch/125271#24449403443)
([gh](https://github.com/pytorch/pytorch/actions/runs/8902585046/job/24449403443))
    `test_autograd.py::TestAutograd::test_type_conversions`
* [pull / linux-jammy-py3.8-gcc11 / test (default, 1, 3,
linux.2xlarge)](https://hud.pytorch.org/pr/pytorch/pytorch/125271#24449357342)
([gh](https://github.com/pytorch/pytorch/actions/runs/8902585046/job/24449357342))

`test_tensor_creation_ops.py::TestTensorCreationCPU::test_constructor_dtypes_cpu`
* [pull / linux-jammy-py3.8-gcc11 / test (default, 2, 3,
linux.2xlarge)](https://hud.pytorch.org/pr/pytorch/pytorch/125271#24449357569)
([gh](https://github.com/pytorch/pytorch/actions/runs/8902585046/job/24449357569))
    `test_autograd.py::TestAutograd::test_type_conversions`
</p></details>

2. pytorch/pytorch#125225. Another forked PR
with `ciflow/trunk`. `head_sha_timestamp` is now available from Rockset
and `fetchCommitTimestamp` is not needed

<details open><summary><b>NEW FAILURES</b> - The following jobs have
failed:</summary><p>

* [pull / linux-focal-cuda12.1-py3.10-gcc9 / test (default, 1, 5,
linux.4xlarge.nvidia.gpu)](https://hud.pytorch.org/pr/pytorch/pytorch/125225#24445851668)
([gh](https://github.com/pytorch/pytorch/actions/runs/8893561548/job/24445851668))
    `test_fx.py::TestVisionTracing::test_torchvision_models_vit_b_16`
* [pull / linux-focal-cuda12.1-py3.10-gcc9 / test (default, 2, 5,
linux.4xlarge.nvidia.gpu)](https://hud.pytorch.org/pr/pytorch/pytorch/125225#24445852045)
([gh](https://github.com/pytorch/pytorch/actions/runs/8893561548/job/24445852045))

`test_transformers.py::TestTransformersCUDA::test_script_encoder_subclass_cuda`
* [pull / linux-focal-cuda12.1-py3.10-gcc9 / test (default, 3, 5,
linux.4xlarge.nvidia.gpu)](https://hud.pytorch.org/pr/pytorch/pytorch/125225#24445852311)
([gh](https://github.com/pytorch/pytorch/actions/runs/8893561548/job/24445852311))

`dynamo/test_autograd_function.py::AutogradFunctionTests::test_amp_custom_fwd_bwd`
* [pull / linux-focal-cuda12.1-py3.10-gcc9 / test (default, 4, 5,
linux.4xlarge.nvidia.gpu)](https://hud.pytorch.org/pr/pytorch/pytorch/125225#24445852638)
([gh](https://github.com/pytorch/pytorch/actions/runs/8893561548/job/24445852638))
`test_jit.py::TestScript::test_torchscript_multi_head_attn_fast_path`
* [pull / linux-focal-cuda12.1-py3.10-gcc9-sm86 / test (default, 1, 5,
linux.g5.4xlarge.nvidia.gpu)](https://hud.pytorch.org/pr/pytorch/pytorch/125225#24446408907)
([gh](https://github.com/pytorch/pytorch/actions/runs/8893561548/job/24446408907))
    `test_fx.py::TestVisionTracing::test_torchvision_models_vit_b_16`
* [pull / linux-focal-cuda12.1-py3.10-gcc9-sm86 / test (default, 2, 5,
linux.g5.4xlarge.nvidia.gpu)](https://hud.pytorch.org/pr/pytorch/pytorch/125225#24446409189)
([gh](https://github.com/pytorch/pytorch/actions/runs/8893561548/job/24446409189))
`test_jit.py::TestScript::test_torchscript_multi_head_attn_fast_path`
* [pull / linux-focal-cuda12.1-py3.10-gcc9-sm86 / test (default, 3, 5,
linux.g5.4xlarge.nvidia.gpu)](https://hud.pytorch.org/pr/pytorch/pytorch/125225#24446409446)
([gh](https://github.com/pytorch/pytorch/actions/runs/8893561548/job/24446409446))

`test_transformers.py::TestTransformersCUDA::test_script_encoder_subclass_cuda`
* [pull / linux-focal-cuda12.1-py3.10-gcc9-sm86 / test (default, 4, 5,
linux.g5.4xlarge.nvidia.gpu)](https://hud.pytorch.org/pr/pytorch/pytorch/125225#24446409676)
([gh](https://github.com/pytorch/pytorch/actions/runs/8893561548/job/24446409676))

`test_transformers.py::TestTransformersCUDA::test_transformerencoderlayer_subclass_cuda`
* [pull / linux-focal-py3.11-clang10 / test (crossref, 1, 2,
linux.2xlarge)](https://hud.pytorch.org/pr/pytorch/pytorch/125225#24445471589)
([gh](https://github.com/pytorch/pytorch/actions/runs/8893561548/job/24445471589))
    `test_fx.py::TestVisionTracing::test_torchvision_models_vit_b_16`
* [pull / linux-focal-py3.11-clang10 / test (crossref, 2, 2,
linux.2xlarge)](https://hud.pytorch.org/pr/pytorch/pytorch/125225#24445471884)
([gh](https://github.com/pytorch/pytorch/actions/runs/8893561548/job/24445471884))

`test_transformers.py::TestTransformersCPU::test_script_encoder_subclass_cpu`
* [pull / linux-focal-py3.11-clang10 / test (default, 1, 3,
linux.2xlarge)](https://hud.pytorch.org/pr/pytorch/pytorch/125225#24445470929)
([gh](https://github.com/pytorch/pytorch/actions/runs/8893561548/job/24445470929))
    `test_fx.py::TestVisionTracing::test_torchvision_models_vit_b_16`
* [pull / linux-focal-py3.11-clang10 / test (default, 2, 3,
linux.2xlarge)](https://hud.pytorch.org/pr/pytorch/pytorch/125225#24445471168)
([gh](https://github.com/pytorch/pytorch/actions/runs/8893561548/job/24445471168))

`test_transformers.py::TestTransformersCPU::test_script_encoder_subclass_cpu`
* [pull / linux-focal-py3.11-clang10 / test (default, 3, 3,
linux.2xlarge)](https://hud.pytorch.org/pr/pytorch/pytorch/125225#24445471397)
([gh](https://github.com/pytorch/pytorch/actions/runs/8893561548/job/24445471397))
`test_jit.py::TestScript::test_torchscript_multi_head_attn_fast_path`
* [pull / linux-focal-py3.11-clang10 / test (dynamo, 3, 3,
linux.2xlarge)](https://hud.pytorch.org/pr/pytorch/pytorch/125225#24445472530)
([gh](https://github.com/pytorch/pytorch/actions/runs/8893561548/job/24445472530))

`test_transformers.py::TestTransformersCPU::test_script_encoder_subclass_cpu`
* [pull / linux-focal-py3.12-clang10 / test (default, 1, 3,
linux.2xlarge)](https://hud.pytorch.org/pr/pytorch/pytorch/125225#24445428834)
([gh](https://github.com/pytorch/pytorch/actions/runs/8893561548/job/24445428834))
    `test_fx.py::TestVisionTracing::test_torchvision_models_vit_b_16`
* [pull / linux-focal-py3.12-clang10 / test (default, 2, 3,
linux.2xlarge)](https://hud.pytorch.org/pr/pytorch/pytorch/125225#24445429085)
([gh](https://github.com/pytorch/pytorch/actions/runs/8893561548/job/24445429085))

`test_transformers.py::TestTransformersCPU::test_script_encoder_subclass_cpu`
* [pull / linux-focal-py3.12-clang10 / test (dynamo, 3, 3,
linux.2xlarge)](https://hud.pytorch.org/pr/pytorch/pytorch/125225#24445429974)
([gh](https://github.com/pytorch/pytorch/actions/runs/8893561548/job/24445429974))

`test_transformers.py::TestTransformersCPU::test_script_encoder_subclass_cpu`
* [pull / linux-focal-py3.8-clang10 / test (crossref, 1, 2,
linux.2xlarge)](https://hud.pytorch.org/pr/pytorch/pytorch/125225#24445479567)
([gh](https://github.com/pytorch/pytorch/actions/runs/8893561548/job/24445479567))
    `test_fx.py::TestVisionTracing::test_torchvision_models_vit_b_16`
* [pull / linux-focal-py3.8-clang10 / test (crossref, 2, 2,
linux.2xlarge)](https://hud.pytorch.org/pr/pytorch/pytorch/125225#24445479782)
([gh](https://github.com/pytorch/pytorch/actions/runs/8893561548/job/24445479782))

`test_transformers.py::TestTransformersCPU::test_script_encoder_subclass_cpu`
* [pull / linux-focal-py3.8-clang10 / test (default, 1, 3,
linux.2xlarge)](https://hud.pytorch.org/pr/pytorch/pytorch/125225#24445478904)
([gh](https://github.com/pytorch/pytorch/actions/runs/8893561548/job/24445478904))
    `test_fx.py::TestVisionTracing::test_torchvision_models_vit_b_16`
* [pull / linux-focal-py3.8-clang10 / test (default, 2, 3,
linux.2xlarge)](https://hud.pytorch.org/pr/pytorch/pytorch/125225#24445479120)
([gh](https://github.com/pytorch/pytorch/actions/runs/8893561548/job/24445479120))

`test_transformers.py::TestTransformersCPU::test_script_encoder_subclass_cpu`
* [pull / linux-focal-py3.8-clang10 / test (dynamo, 3, 3,
linux.2xlarge)](https://hud.pytorch.org/pr/pytorch/pytorch/125225#24445480497)
([gh](https://github.com/pytorch/pytorch/actions/runs/8893561548/job/24445480497))

`test_transformers.py::TestTransformersCPU::test_script_encoder_subclass_cpu`
* [pull / linux-jammy-py3.10-clang15-asan / test (default, 1, 6,
linux.4xlarge)](https://hud.pytorch.org/pr/pytorch/pytorch/125225#24445500236)
([gh](https://github.com/pytorch/pytorch/actions/runs/8893561548/job/24445500236))
    `test_fx.py::TestVisionTracing::test_torchvision_models_vit_b_16`
* [pull / linux-jammy-py3.10-clang15-asan / test (default, 3, 6,
linux.4xlarge)](https://hud.pytorch.org/pr/pytorch/pytorch/125225#24445500673)
([gh](https://github.com/pytorch/pytorch/actions/runs/8893561548/job/24445500673))

`test_transformers.py::TestTransformersCPU::test_transformerencoderlayer_subclass_model_cpu`
* [pull / linux-jammy-py3.10-clang15-asan / test (default, 4, 6,
linux.4xlarge)](https://hud.pytorch.org/pr/pytorch/pytorch/125225#24445500892)
([gh](https://github.com/pytorch/pytorch/actions/runs/8893561548/job/24445500892))

`test_transformers.py::TestTransformersCPU::test_script_encoder_subclass_cpu`
* [pull / linux-jammy-py3.10-clang15-asan / test (default, 5, 6,
linux.4xlarge)](https://hud.pytorch.org/pr/pytorch/pytorch/125225#24445501108)
([gh](https://github.com/pytorch/pytorch/actions/runs/8893561548/job/24445501108))
`test_jit.py::TestScript::test_torchscript_multi_head_attn_fast_path`
* [pull / linux-jammy-py3.8-gcc11 / test (default, 1, 3,
linux.2xlarge)](https://hud.pytorch.org/pr/pytorch/pytorch/125225#24445495672)
([gh](https://github.com/pytorch/pytorch/actions/runs/8893561548/job/24445495672))
    `test_fx.py::TestVisionTracing::test_torchvision_models_vit_b_16`
* [pull / linux-jammy-py3.8-gcc11 / test (default, 2, 3,
linux.2xlarge)](https://hud.pytorch.org/pr/pytorch/pytorch/125225#24445495930)
([gh](https://github.com/pytorch/pytorch/actions/runs/8893561548/job/24445495930))

`test_transformers.py::TestTransformersCPU::test_script_encoder_subclass_cpu`
* [pull / linux-jammy-py3.8-gcc11 / test (default, 3, 3,
linux.2xlarge)](https://hud.pytorch.org/pr/pytorch/pytorch/125225#24445496144)
([gh](https://github.com/pytorch/pytorch/actions/runs/8893561548/job/24445496144))
`test_jit.py::TestScript::test_torchscript_multi_head_attn_fast_path`
* [pull / linux-jammy-py3.8-gcc11 / test (jit_legacy, 1, 1,
linux.2xlarge)](https://hud.pytorch.org/pr/pytorch/pytorch/125225#24445496582)
([gh](https://github.com/pytorch/pytorch/actions/runs/8893561548/job/24445496582))

`test_jit_legacy.py::TestScript::test_torchscript_multi_head_attn_fast_path`
</p></details>

3. pytorch/executorch#3353, non-ghstack,
non-forked PR.

`{"3353":{"FAILED":[],"FLAKY":[],"BROKEN_TRUNK":[],"UNSTABLE":[]}}`

4. pytorch/pytorch#125292, ghstack, non-forked
PR.

<details open><summary><b>NEW FAILURE</b> - The following job has
failed:</summary><p>

* [inductor / cuda12.1-py3.10-gcc9-sm86 / test
(dynamic_inductor_torchbench, 2, 2,
linux.g5.4xlarge.nvidia.gpu)](https://hud.pytorch.org/pr/pytorch/pytorch/125292#24455309482)
([gh](https://github.com/pytorch/pytorch/actions/runs/8904802497/job/24455309482))
    `resnet18`
</p></details>
  • Loading branch information
huydhn authored May 2, 2024
1 parent 9f72bd3 commit 45e6a91
Show file tree
Hide file tree
Showing 9 changed files with 108 additions and 38 deletions.
26 changes: 16 additions & 10 deletions torchci/lib/drciUtils.ts
Original file line number Diff line number Diff line change
Expand Up @@ -234,24 +234,25 @@ export async function hasSimilarFailures(
return;
}

// NB: It's important to sort the oldest matching results in the search window
// first here because that can be used to verify if the failure came from one
// of the previous merge commits of a reverted PR. The first record is the most
// relevant one and also the first time the failure is observed in the search
// window
// NB: Using the job completed_at timestamp has many false positives, so it's
// better that we only enable this feature when the head commit timestamp is
// available and use it as the end date
if (
job.completed_at === undefined ||
job.completed_at === null ||
job.completed_at === ""
job.head_sha_timestamp === undefined ||
job.head_sha_timestamp === null ||
job.head_sha_timestamp === "" ||
job.head_sha_timestamp === "0"
) {
return;
}

const endDate = dayjs(job.completed_at);
// NB: Use the commit timestamp here instead of the job timestamp to avoid using
// the wrong end date when a PR is reverted and the job reruns
const endDate = dayjs(job.head_sha_timestamp);
const startDate = dayjs(
baseCommitDate !== "" && baseCommitDate !== "0"
? baseCommitDate
: job.completed_at
: job.head_sha_timestamp
).subtract(lookbackPeriodInHours, "hour");

if (
Expand All @@ -263,6 +264,11 @@ export async function hasSimilarFailures(
return;
}

// NB: It's important to sort the oldest matching results in the search window
// first here because that can be used to verify if the failure came from one
// of the previous merge commits of a reverted PR. The first record is the most
// relevant one and also the first time the failure is observed in the search
// window
const records = await querySimilarFailures({
failure_captures: job.failure_captures,
name: job.name,
Expand Down
18 changes: 18 additions & 0 deletions torchci/lib/fetchCommit.ts
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@ import rocksetVersions from "rockset/prodVersions.json";

import { CommitData, JobData } from "./types";
import { removeCancelledJobAfterRetry } from "./jobUtils";
import { Octokit } from "octokit";

export default async function fetchCommit(
owner: string,
Expand Down Expand Up @@ -68,3 +69,20 @@ export default async function fetchCommit(
jobs: _.concat(filteredJobs, badWorkflows),
};
}

export async function fetchCommitTimestamp(
octokit: Octokit,
owner: string,
repo: string,
commit_sha: string
): Promise<string> {
// Query GitHub to get the commit timestamp, this is used to get the timestamp of
// commits from forked PRs
const commit = await octokit.rest.git.getCommit({
owner: owner,
repo: repo,
commit_sha: commit_sha,
});

return commit.data.committer.date;
}
1 change: 1 addition & 0 deletions torchci/lib/types.ts
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,7 @@ export interface RecentWorkflowsData extends BasicJobData {
completed_at: string | null;
html_url: string;
head_sha: string;
head_sha_timestamp?: string;
head_branch?: string | null;
pr_number?: number;
failure_captures: string[];
Expand Down
40 changes: 36 additions & 4 deletions torchci/pages/api/drci/drci.ts
Original file line number Diff line number Diff line change
Expand Up @@ -34,9 +34,11 @@ import {
} from "lib/jobUtils";
import getRocksetClient from "lib/rockset";
import _ from "lodash";
import { fetchCommitTimestamp } from "lib/fetchCommit";

interface PRandJobs {
head_sha: string;
head_sha_timestamp?: string;
pr_number: number;
jobs: RecentWorkflowsData[];
merge_base: string;
Expand Down Expand Up @@ -89,7 +91,12 @@ export async function updateDrciComments(
NUM_MINUTES + ""
);

const workflowsByPR = reorganizeWorkflows(OWNER, repo, recentWorkflows);
const workflowsByPR = await reorganizeWorkflows(
OWNER,
repo,
recentWorkflows,
octokit
);
const head = get_head_branch(repo);
await addMergeBaseCommits(octokit, repo, head, workflowsByPR);
const sevs = getActiveSEVs(await fetchIssuesByLabel("ci: sev"));
Expand Down Expand Up @@ -832,26 +839,51 @@ export async function getWorkflowJobsStatuses(
};
}

export function reorganizeWorkflows(
export async function reorganizeWorkflows(
owner: string,
repo: string,
recentWorkflows: RecentWorkflowsData[]
): Map<number, PRandJobs> {
recentWorkflows: RecentWorkflowsData[],
octokit?: Octokit
): Promise<Map<number, PRandJobs>> {
const workflowsByPR: Map<number, PRandJobs> = new Map();
const headShaTimestamps: Map<string, string> = new Map();

for (const workflow of recentWorkflows) {
const pr_number = workflow.pr_number!;
if (!workflowsByPR.has(pr_number)) {
let headShaTimestamp = workflow.head_sha_timestamp;
// NB: The head SHA timestamp is currently used as the end date when searching
// for similar failures. However, it's not available on Rockset for commits
// from forked PRs before a ciflow ref is pushed. In such case, the head SHA
// timestamp will be undefined and we will make an additional query to GitHub
// to get the value
if (octokit && !headShaTimestamp) {
headShaTimestamp = await fetchCommitTimestamp(
octokit,
owner,
repo,
workflow.head_sha
);
headShaTimestamps.set(workflow.head_sha, headShaTimestamp);
}

workflowsByPR.set(pr_number, {
pr_number: pr_number,
head_sha: workflow.head_sha,
head_sha_timestamp: headShaTimestamp,
jobs: [],
merge_base: "",
merge_base_date: "",
owner: owner,
repo: repo,
});
}

const headShaTimestamp = headShaTimestamps.get(workflow.head_sha);
if (!workflow.head_sha_timestamp && headShaTimestamp) {
workflow.head_sha_timestamp = headShaTimestamp;
}

workflowsByPR.get(pr_number)!.jobs.push(workflow);
}

Expand Down
6 changes: 5 additions & 1 deletion torchci/rockset/commons/__sql/commit_failed_jobs.sql
Original file line number Diff line number Diff line change
Expand Up @@ -10,14 +10,18 @@ SELECT
j.completed_at,
j.html_url,
j.head_sha,
p.head_commit.timestamp AS head_sha_timestamp,
j.head_branch,
j.torchci_classification.captures AS failure_captures,
IF(j.torchci_classification.line IS NULL, null, ARRAY_CREATE(j.torchci_classification.line)) AS failure_lines,
j.torchci_classification.context AS failure_context,
j._event_time AS time,
FROM
commons.workflow_job j
JOIN commons.workflow_run w ON w.id = j.run_id
JOIN commons.workflow_run w ON w.id = j.run_id HINT(join_broadcast = true)
-- Do a left join here because the push table won't have any information about
-- commits from forked repo
LEFT JOIN commons.push p ON p.after = j.head_sha HINT(join_broadcast = true)
WHERE
ARRAY_CONTAINS(
SPLIT(: shas, ','),
Expand Down
16 changes: 11 additions & 5 deletions torchci/rockset/commons/__sql/recent_pr_workflows_query.sql
Original file line number Diff line number Diff line change
Expand Up @@ -2,11 +2,15 @@
-- classified into new failures and unrelated failures such as broken trunk, flaky, unstable, etc.
WITH recent_shas AS (
SELECT
distinct p.head.sha AS sha,
p.number AS number
distinct pull_request.head.sha AS sha,
pull_request.number AS number,
push.head_commit.timestamp AS timestamp,
FROM
workflow_job j
JOIN commons.pull_request p ON j.head_sha = p.head.sha HINT(join_broadcast = true)
JOIN commons.pull_request pull_request ON j.head_sha = pull_request.head.sha HINT(join_broadcast = true)
-- Do a left join here because the push table won't have any information about
-- commits from forked repo
LEFT JOIN commons.push push ON j.head_sha = push.after HINT(join_broadcast = true)
WHERE
(
(
Expand All @@ -15,9 +19,9 @@ WITH recent_shas AS (
)
AND : prNumber = 0
)
OR : prNumber = p.number
OR : prNumber = pull_request.number
)
AND p.base.repo.full_name =: repo
AND pull_request.base.repo.full_name =: repo
)
SELECT
w.id AS workflowId,
Expand All @@ -33,6 +37,7 @@ SELECT
j.head_branch,
recent_shas.number AS pr_number,
recent_shas.sha AS head_sha,
recent_shas.timestamp AS head_sha_timestamp,
j.torchci_classification.captures AS failure_captures,
IF(
j.torchci_classification.line IS NULL,
Expand Down Expand Up @@ -62,6 +67,7 @@ SELECT
w.head_branch,
recent_shas.number AS pr_number,
w.head_sha,
recent_shas.timestamp AS head_sha_timestamp,
null AS failure_captures,
null AS failure_lines,
null AS failure_context,
Expand Down
6 changes: 3 additions & 3 deletions torchci/rockset/prodVersions.json
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
"hud_query": "69f0bc9a618c82b1",
"commit_jobs_query": "10d4a302d49906bb",
"disabled_non_flaky_tests": "f909abf9eec15b56",
"commit_failed_jobs": "2884aac8948770e4",
"commit_failed_jobs": "f64429abcebf36b8",
"filter_forced_merge_pr": "a28350c863e36239",
"flaky_tests": "eb7ed21e7f1a6d09",
"flaky_tests_across_jobs": "474e5454bda0c5bb",
Expand All @@ -23,7 +23,7 @@
"issue_query": "e4d338de89980044",
"failure_samples_query": "7940a636284d0752",
"num_commits_master": "e4a864147cf3bf44",
"recent_pr_workflows_query": "25ba3e57d82caeb1",
"recent_pr_workflows_query": "59fb1170f4648591",
"reverted_prs_with_reason": "751f01cba16364f0",
"unclassified": "1b31a2d8f4ab7230",
"test_insights_overview": "42dbd5232f45fd53",
Expand Down Expand Up @@ -104,4 +104,4 @@
"validation_jobs_red": "ac8dee6e6b76916d",
"validation_jobs_red_past_day": "aecb798a574ba2ff"
}
}
}
2 changes: 2 additions & 0 deletions torchci/test/drciBot.test.ts
Original file line number Diff line number Diff line change
Expand Up @@ -229,6 +229,8 @@ describe("verify-drci-functionality", () => {
return true;
})
.reply(200, {})
.get(`/repos/${OWNER}/${REPO}/git/commits/abcdefg`)
.reply(200, { committer: { date: "Anything goes" } })
.patch(
`/repos/${OWNER}/${REPO}/issues/comments/${comment_id}`,
(body) => {
Expand Down
31 changes: 16 additions & 15 deletions torchci/test/drciUtils.test.ts
Original file line number Diff line number Diff line change
Expand Up @@ -27,15 +27,16 @@ describe("Test various utils used by Dr.CI", () => {
const headBranch = "mock-branch";
const emptyBaseCommitDate = "";
const lookbackPeriodInHours = 24;
const mockCompletedAtDate = dayjs("2023-08-01T00:00:00Z");
const mockHeadShaDate = dayjs("2023-08-01T00:00:00Z");
const job: RecentWorkflowsData = {
id: "12345",
name: "pull / linux-bionic-cuda12.1-py3.10-gcc9-sm86 / test (default, 1, 5, linux.g5.4xlarge.nvidia.gpu)",
html_url: "A",
head_sha: "A",
head_sha_timestamp: mockHeadShaDate.toISOString(),
failure_captures: ["ERROR"],
conclusion: "failure",
completed_at: mockCompletedAtDate.toISOString(),
completed_at: mockHeadShaDate.toISOString(),
head_branch: "whatever",
};

Expand Down Expand Up @@ -64,7 +65,7 @@ describe("Test various utils used by Dr.CI", () => {
id: "54321",
branch: headBranch,
workflowId: "12345",
time: mockCompletedAtDate.toISOString(),
time: mockHeadShaDate.toISOString(),
conclusion: "failure",
htmlUrl: "Anything goes",
failureLines: ["ERROR"],
Expand Down Expand Up @@ -104,8 +105,8 @@ describe("Test various utils used by Dr.CI", () => {
"",
"",
searchUtils.WORKFLOW_JOB_INDEX,
mockCompletedAtDate.subtract(lookbackPeriodInHours, "hour"),
mockCompletedAtDate,
mockHeadShaDate.subtract(lookbackPeriodInHours, "hour"),
mockHeadShaDate,
searchUtils.MIN_SCORE,
searchUtils.MAX_SIZE,
searchUtils.OLDEST_FIRST,
Expand Down Expand Up @@ -172,7 +173,7 @@ describe("Test various utils used by Dr.CI", () => {
await hasSimilarFailures(
{
...job,
completed_at: mockCompletedAtDate.subtract(1, "hour").toISOString(),
head_sha_timestamp: mockHeadShaDate.subtract(1, "hour").toISOString(),
},
emptyBaseCommitDate,
lookbackPeriodInHours,
Expand All @@ -186,8 +187,8 @@ describe("Test various utils used by Dr.CI", () => {
"",
"",
searchUtils.WORKFLOW_JOB_INDEX,
mockCompletedAtDate.subtract(1 + lookbackPeriodInHours, "hour"),
mockCompletedAtDate.subtract(1, "hour"),
mockHeadShaDate.subtract(1 + lookbackPeriodInHours, "hour"),
mockHeadShaDate.subtract(1, "hour"),
searchUtils.MIN_SCORE,
searchUtils.MAX_SIZE,
searchUtils.OLDEST_FIRST,
Expand All @@ -200,9 +201,9 @@ describe("Test various utils used by Dr.CI", () => {
await hasSimilarFailures(
{
...job,
completed_at: mockCompletedAtDate.subtract(1, "hour").toISOString(),
head_sha_timestamp: mockHeadShaDate.subtract(1, "hour").toISOString(),
},
mockCompletedAtDate.subtract(20, "hour").toISOString(),
mockHeadShaDate.subtract(20, "hour").toISOString(),
lookbackPeriodInHours,
"TESTING" as unknown as Client
);
Expand All @@ -214,8 +215,8 @@ describe("Test various utils used by Dr.CI", () => {
"",
"",
searchUtils.WORKFLOW_JOB_INDEX,
mockCompletedAtDate.subtract(20 + lookbackPeriodInHours, "hour"),
mockCompletedAtDate.subtract(1, "hour"),
mockHeadShaDate.subtract(20 + lookbackPeriodInHours, "hour"),
mockHeadShaDate.subtract(1, "hour"),
searchUtils.MIN_SCORE,
searchUtils.MAX_SIZE,
searchUtils.OLDEST_FIRST,
Expand All @@ -228,7 +229,7 @@ describe("Test various utils used by Dr.CI", () => {
expect(
await hasSimilarFailures(
job,
mockCompletedAtDate
mockHeadShaDate
.subtract(
MAX_SEARCH_HOURS_FOR_QUERYING_SIMILAR_FAILURES -
lookbackPeriodInHours +
Expand All @@ -243,10 +244,10 @@ describe("Test various utils used by Dr.CI", () => {
expect(mock).not.toHaveBeenCalled();

mock.mockClear();
// Auto return false if no completed at
// Auto return false if no head sha timestamp
expect(
await hasSimilarFailures(
{ ...job, completed_at: "" },
{ ...job, head_sha_timestamp: "" },
emptyBaseCommitDate,
lookbackPeriodInHours,
"TESTING" as unknown as Client
Expand Down

0 comments on commit 45e6a91

Please sign in to comment.