Skip to content

Conversation

@AlexBodner
Copy link
Collaborator

@AlexBodner AlexBodner commented May 5, 2025

Description

This pull request adds the ByteTrack tracker to the list of implemented trackers.

No extra depedencies are required for usage, for testing I used pytest which is in the dev dependencies. The test file can be removed if you consider so, let me know.

Type of change

  • New feature (non-breaking change which adds functionality)

How has this change been tested, please provide a testcase or example of how you tested the change?

The implementation has been thoroughly tested, passing all the evaluations, and showing the expected results.

The new tracker can be used in the following Google Colab, which shows 2 different examples of the Tracker in its 2 possible modes: using appearance features for high confidence boxes and using IoU.

There is also a test file in the route: trackers/core/bytetrack/tests_bytetrack.py, this tests cover the main functioning of the tracker and check that it behaves as expected.

Any specific deployment considerations

Check trackers/init.py so that it can be imported correctly.
I've updated my trackers/core/deepsort/feature_extractor.py to be exactly like the one in the main branch, but for a reason it compares it to the old one.

@AlexBodner AlexBodner marked this pull request as ready for review May 5, 2025 21:33
Copy link

@rolson24 rolson24 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The implementation looks pretty good! Just have a few comments about organization, but they are just my opinions.

@davidpagnon
Copy link

Very interested in this one :)

@SkalskiP SkalskiP requested review from tstanczyk95 and removed request for onuralpszr November 24, 2025 13:54

def __init__(
self,
reid_model: ReIDModel,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Originally, ByteTrack is designed to be used without re-ID. Re-ID is optional, as an extra feature for ByteTrack. As such, it would be better to make the parameter reid_model in the constructor default as None. It is better and more intuitive for the use to call:
tracker = ByteTrackTracker()
rather than
tracker = ByteTrackTracker(reid_model=None)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Solved

# Assign a default tracker_id with the correct shape
detections.tracker_id = -np.ones(len(detections))
# Split into high confidence boxes and lower based on self.high_prob_boxes_threshold # noqa: E501
high_prob_detections, low_prob_detections = (
Copy link
Collaborator

@tstanczyk95 tstanczyk95 Dec 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Original ByteTrack detection split:

high conf dets: conf > args.track_thresh (0.6)
low conf dets: args.track_thresh > conf > 0.1

[Hence if conf == args.track_thresh: dets skipped... ---> I would say if equal then treat it as high conf dets]

About the track_thresh, yes demo_track.py includes 0.5 as the default, but for the actual evaluations they use 0.6: https://github.com/FoundationVision/ByteTrack/blob/main/tools/track.py#L105

Here however, in this pull request, we have this instead:
higher dets are above self.high_prob_boxes_threshold (currently set as 0.5), lower dets are ALL the other dets (even with conf <= 0.1)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, in the paper they dont mention the lower bound threshold, but i could find it in the implementation:
https://github.com/FoundationVision/ByteTrack/blob/main/yolox/tracker/byte_tracker.py
lines 177-182

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we include this 0.1 as a parameter? I think so, but bytetrack hardcodes it as 0.1

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could include it as a parameter with the default value of 0.1. Usually, confidence of 0.1 or below means quite unreliable and noisy detections. However, we can leave the user an option to adjust it as, depending on their scene environment, they might have very low confidence detections in general (e.g. blurry images, night scenarios, etc.).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implemented

similarity_matrix, maximize=True
)
for row, col in zip(row_indices, col_indices):
if similarity_matrix[row, col] >= min_similarity_thresh:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit concerning here:
Only discarding the matches based on minimum_iou_threshold rather than blocking similarity matrix entries from an association.

Original ByteTrack passes the cost_threshold (equivalent of minimum_iou_threshold) directly to the linear solver:
cost, x, y = lap.lapjv(cost_matrix, extend_cost=True, cost_limit=thresh)

The pull request version on the other hand, firstly runs the linear assignment on all entries of the similarity matrix, no matter what, and only after obtaining the matches (matched indices), discards them based on the corresponding values in similarity matrix.

Original ByteTrack: blocking some entries (setting a high value there) before optimization. Then we might (not guaranteed) still get matches for all pairs.
This pull request version: matching done, then discarding matches if the threshold doesn't allow it - some matches are completely discarded...

The results might be different in the two cases above.

My intuition: it's better to block some match entries (ith tracklet with jth detection) before the optimization/matching process, e.g. by setting these entries to some high/low values above/below the threshold - rather than discard them after the matching is done.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see the difference, do you think that I should patch the implementation with scipy by adding the big costs or should we switch to lapjv?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it probably would result in a faster implementation with lapjv, but it can create dependency issues, specially for installing in some devices which might not be compatible with the build for this package, so im inclined towards fixing the scipy one

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in:
468f49a

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it probably would result in a faster implementation with lapjv, but it can create dependency issues, specially for installing in some devices which might not be compatible with the build for this package, so im inclined towards fixing the scipy one

In this case, I would also opt for the SciPy version.

Copy link
Collaborator

@tstanczyk95 tstanczyk95 Jan 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in: 468f49a

I am a bit concerned about the lines 52-58. It fills the extended cost matrix with values of the initial one, with cost_limit/2.0 and with 0s. I am not sure how it is supposed to help scipy.optimize.linear_sum_assignment to block the entries which have to be excluded by the threshold.

E.g. if we have 3 tracklets and 5 detections and there is cost_limit different than infinity, an extended cost matrix as in the image below will be created:

Selection_273

I don't see how it should involve the actual cost_limit to exclude the relevant entries from c11-c35.

Was it done like that or in a similar way in lap.lapjv? If so, do you have a link or reference to it?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is used here: https://github.com/gatagat/lap/blob/master/_lapjv_cpp/_lapjv.pyx

the things is that if a real cost (c_{ij}) is bigger than the cost limit, then, the algorithm will instead want to match the newly added rows and columns in between themselves and the total cost of that matches would be cost_limit/2 *2 = cost_limit, but if its more optimal to match to one of the costs smaller that cost_limit then it would do so and then match the dummies to one of cost 0

similarity_matrix = None
if association_metric == "IoU":
# Build IOU cost matrix between detections and predicted bounding boxes
similarity_matrix = get_iou_matrix(trackers, detections.xyxy)
Copy link
Collaborator

@tstanczyk95 tstanczyk95 Dec 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Preliminary note: Original ByteTrack constructs a cost matrix (1-IoU) rather than similarity matrix (IoU). While it should not matter or cause any significant differences, it is good to keep it in mind when making the comaparizons to the original ByteTrack implementation.

Mind that original ByteTrack does also this during its first and third association step:
if not self.args.mot20: # thus all the datasets except mot20
dists = matching.fuse_score(dists, detections)
|
In short, it does this:
fuse_sim = iou_sim * det_scores
fuse_cost = 1 - fuse_sim
return fuse_cost

So basically it enforces matches for each detection based on detection confidence.

It might lead to slightly different results, especially in case of noisy detector. Stronger detections will basically dominate.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Im adapating the code to work in this style. Should we fuse scores when working with RE-ID features? I could only find this example where they did actually used embeddings for matching:
https://github.com/FoundationVision/ByteTrack/blob/main/tutorials/qdtrack/tracker_reid_motion.py
here they use the fuse_motion function, and also the script is called tracker_reid_motion which sounds like they are doing something different to ByteTrack, but they implement a class of ByteTrack there. Is there any other implementation you could point me to?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From my point of view, we should leave the subject of re-ID within the ByteTrack and come back to it with other (possibly related or derived) trackers which are actually designed to use it. ByteTrack original intention was to be used without re-ID, plain IoU. When someone wants a tracker with re-ID, they look for something else than ByteTrack.

only ones that are matched using the appearance features.
""" # noqa: E501

def __init__(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Params as in the original ByteTrack should be:

  • minimum_consecutive_frames = 2 (1 for the very first frame), NOT 3 ---> the evaluation script will penalize missing records over more frames.
  • minimum_iou_threshold = 0.1, NOT 0.2 ---> ByteTrack allows lower IoU similarity (higher IoU cost) for the first assocation step: https://github.com/FoundationVision/ByteTrack/blob/main/tools/track.py#L107 (0.9). Setting more demanding value will limit the number of associations made compared to the original version.
  • EXTRA NOTE: Mind that ByteTrack has 3 (thus not 2) association steps (one extra for newly born tracklets), and each of the 3 steps has a different matching threshold ---> see a separate comment on that.
  • high_prob_boxes_threshold = 0.6, NOT 0.5: https://github.com/FoundationVision/ByteTrack/blob/main/tools/track.py#L105
  • track_activation_threshold = high_prob_boxes_threshold + 0.1 = 0.7, NOT 0.25 <--- This might cause chaos with non-oracle detections (coming from a detector, with confidences < 1.0) - Also noisy detections will be considered for new tracklet creation, creating a noise and false positive tracklets.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed defaults

remaining_trackers = [self.trackers[i] for i in unmatched_trackers]

# Step 2: associate Low Probability detections with remaining trackers
matched_indices, unmatched_trackers, unmatched_detections = (
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Original ByteTrack:

1st assoc: currently tracked + lost tracklets
2nd assoc: only currently tracked tracklets

Pull request version:
Both associations use both currently tracked tracklets AND lost tracklets
Basically, there is no clear division between these two. Only this state of a ByteTrackKalmanBoxTracker instance: self.time_since_update, yet not used in bytetrack/tracker.py

Including lost tracklets in the second association step, with the low confidence detections might cause chaos with associations and many redundant associations, which in turn will lower the accuracy performance. Especially in case of noisy detector/detections.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed it for remaining_trackers = [self.trackers[i] for i in unmatched_trackers if self.trackers[i].time_since_update == 1]. This is enough, to fix the behaviour difference with lost tracks, right? Even though it might be nicer to explictly called them lost tracks and treat them differently

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given the current implementation, it seems it is enough. However, we could run it on a few sequences, verify particular cases (e.g. with occlusion) and log the tracklet numbers to be sure it behaves as expected.

if association_metric == "IoU":
# Build IOU cost matrix between detections and predicted bounding boxes
similarity_matrix = get_iou_matrix(trackers, detections.xyxy)
thresh = self.minimum_iou_threshold
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Original ByteTrack:
Each assoc step: different threshold (cost_limit, allowing IoU distance):
1st assoc.: args.match_thresh (0.9) (similarity: 0.1)
2nd assoc.: 0.5 (similarity: 0.5)
3rd assoc.: 0.7 (similarity: 0.3)

Pull request version:
always the same value: similarity thresh: 0.2
|
It will cause different results. The scale of difference might vary depending in the dataset and importantly, detection quality.

Basically, this pull request version allows more associations to happen than the original ByteTrack. Especially within the second association step with low confidence, thus more noisy, detections. Depending on the detector/detections, it might degrade the performance.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

got it, we can add it as another parameter to the tracker. Didnt see it in the paper as well, but is clear in the source code


return matched_indices, unmatched_trackers, unmatched_detections

def _spawn_new_trackers(
Copy link
Collaborator

@tstanczyk95 tstanczyk95 Dec 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[ Big difference here ]

Original ByteTrack:
unmatched high conf dets are initiated as new tracklets only if they are matched with the next frame's unmatched high conf detections - with dist threshold 0.7 (= similarity threshold 0.3). Unless it is the very first frame. Then the new tracklets are instantly activated (see the "MIND THAT:" remark below for more).
|
Thus also: 2 frames are enough, not 3. And based on the high confidence unmatched detections between two frames.

This pull request version:
(_spawn_new_trackers)
Create a new tracklet from unmatached high conf detection. Then include it in the same pool as currently tracked tracklets (active tracklets) for matches with high and low conf dets. (original: matching only with the unmatched high conf dets, separate 3rd assoc step).
|
This will cause different results. The scale of difference might vary depending on the dataset and importantly, detection quality. More noisy detections = more false positive tracklets created. Also more chaos when mixing these freshly created, yet not mature tracklets in the same pool with other tracklets as mentioned above.

[ MIND THAT: ]
In the original ByteTrack, the STrack's (tracklet representation) activate function (https://github.com/FoundationVision/ByteTrack/blob/main/yolox/tracker/byte_tracker.py#L45) does not activate the tracklet yet. It sets self.state = TrackState.Tracked.
Only the STrack's update function truly activates it by setting: self.is_activated = True (STrack's reactivate function also does it).
As a result, the newly born tracklets from frame t are activated only after being confirmed (matched to remaining high conf detections) at frame t+1
---Exception: new tracklets at the very first frame (in ByteTrack settings: t=1) - only then STrack's function truly activates the tracklet by setting self.is_activated = True

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in commit
f1c440f

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in commit f1c440f

Looks good! As with one of the previous comments, to be sure, I recommend running it on a few sequences and log the tracklet numbers to be see if it behaves as expected.

)
if len(final_updated_detections) == 0:
final_updated_detections.tracker_id = np.array([], dtype=int)
return final_updated_detections
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[ Just a note ]
Originally provided with ByteTrack - just before returning the update's function final output - it removes some tracks per frame:

self.tracked_stracks, self.lost_stracks = remove_duplicate_stracks(self.tracked_stracks, self.lost_stracks)
(https://github.com/FoundationVision/ByteTrack/blob/main/yolox/tracker/byte_tracker.py#L285)

However, it, might cause little bugs with tracklet management. Therefore, I just mention it exists, but I don't recommend adding it.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, wont change that for the moment

self.high_prob_association_metric = "IoU" if reid_model is None else "RE-ID"
self.reid_model = reid_model
self.distance_metric = distance_metric
self.trackers: list[ByteTrackKalmanBoxTracker] = []
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would rather call it "tracks", not "trackers". Tracker is an instance that tracks entities (referred to as tracks or opt. tracklets)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done
cd709d4

Copy link
Collaborator

@tstanczyk95 tstanczyk95 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall: nicely written. It looks clear and intuitive. Very good work here.

I spotted quite some differences compared to the original ByteTrack implementation from here: https://github.com/FoundationVision/ByteTrack/tree/main and pointed them with my remarks. Let me know in case you need extra information or clarifications.

Note: I focused on the non-re-ID version as this os the primary focus of ByteTrack. There exist other SORT-based algorithms explicitly using re-ID.

)


class ByteTrackTracker(BaseTrackerWithFeatures):
Copy link
Collaborator

@tstanczyk95 tstanczyk95 Dec 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[ Extra step after tracking ]

Mind that in the original ByteTrack implementation, there is also a post-processing interpolation step:
https://github.com/FoundationVision/ByteTrack/blob/main/tools/interpolation.py .
You can see it to be called here: https://github.com/FoundationVision/ByteTrack?tab=readme-ov-file#tracking .
It can be run on a single or multiple tracking text outputs.

It basically smooths the bounding box coordinates over the trajectories of each tracklet. A detector might provide a bit of noise, i.e. "jumping" /"shaky" bounding boxes of an object over the consecutive frames. This interpolation step rectifies it a bit, so that the overall trajectories are smoother.

In case of MOT17 and MOT20 it did improve the final results (HOTA, MOTA, IDF1; sometimes even by a few points).

Note two things:

Copy link
Collaborator

@tstanczyk95 tstanczyk95 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additional remark to my last ByteTrack review. It discusses additional functionality and post-processing step from the original ByteTrack implementation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants