Skip to content

A job is found with JobFilter if it is running on start_time #319

Closed
@steenlysgaard

Description

@steenlysgaard

Details

  • Slurm Version: 22.05.8 and 23.02.4
  • Python Version: 3.11.1 and 3.10.4
  • Cython Version: 3.0.0 and
  • PySlurm Branch: 22.5.x and main
  • Linux Distribution: CentOS 7

Issue

I am considering using pyslurm when gathering statistics for our cluster, however, I have found a non-ideal behaviour that makes collecting statistics a little cumbersome. I would like to get the cluster usage per month, so I get the jobs started in that month, do some sums, then move on to the next month, etc. However, I found that a job running when one month turns into the next is counted in both months.

This small example shows it. It can run on a test cluster:

import pyslurm
import time
from datetime import datetime, timedelta

# Set up a job - only for show
sjob = pyslurm.JobSubmitDescription(script='#!/bin/bash\nsleep 4\n')
job_id = sjob.submit()

# Wait for the job to finish
time.sleep(6.1)
job = pyslurm.db.Job.load(job_id=job_id)

# Establish some times
start_time = datetime.fromtimestamp(job.start_time)
mid_time = start_time + timedelta(seconds=2)
before_start_time = start_time - timedelta(seconds=6)
after_end_time = start_time + timedelta(seconds=6)

# Find jobs starting after "before_start_time"
filter = pyslurm.db.JobFilter(start_time=before_start_time, end_time=mid_time)
jobs = pyslurm.db.Jobs()
db_jobs = jobs.load(db_filter=filter)
print(db_jobs)
print(db_jobs[job_id].stats.elapsed_cpu_time)

# Find jobs starting after "mid_time" - should be empty
filter = pyslurm.db.JobFilter(start_time=mid_time, end_time=after_end_time)
db_jobs = jobs.load(db_filter=filter)
print(db_jobs)
print(db_jobs[job_id].stats.elapsed_cpu_time)

Note that the issue also occurs on our cluster where the overlap times are hours and days.

Also note that the old API (slurmdb_jobs) also returns the job in both time intervals, however, the elapsed time is set to the amount of time the job ran in each time interval. I don't know if this is the correct way of handling it but at least it makes the elapsed time statistics correct (but not the number of jobs).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions