-
Notifications
You must be signed in to change notification settings - Fork 225
Feat/core/bytetrack #40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
rolson24
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The implementation looks pretty good! Just have a few comments about organization, but they are just my opinions.
fix(trackers): bytetrack feature association
|
Very interested in this one :) |
trackers/core/bytetrack/tracker.py
Outdated
|
|
||
| def __init__( | ||
| self, | ||
| reid_model: ReIDModel, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Originally, ByteTrack is designed to be used without re-ID. Re-ID is optional, as an extra feature for ByteTrack. As such, it would be better to make the parameter reid_model in the constructor default as None. It is better and more intuitive for the use to call:
tracker = ByteTrackTracker()
rather than
tracker = ByteTrackTracker(reid_model=None)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Solved
| # Assign a default tracker_id with the correct shape | ||
| detections.tracker_id = -np.ones(len(detections)) | ||
| # Split into high confidence boxes and lower based on self.high_prob_boxes_threshold # noqa: E501 | ||
| high_prob_detections, low_prob_detections = ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Original ByteTrack detection split:
high conf dets: conf > args.track_thresh (0.6)
low conf dets: args.track_thresh > conf > 0.1
[Hence if conf == args.track_thresh: dets skipped... ---> I would say if equal then treat it as high conf dets]
About the track_thresh, yes demo_track.py includes 0.5 as the default, but for the actual evaluations they use 0.6: https://github.com/FoundationVision/ByteTrack/blob/main/tools/track.py#L105
Here however, in this pull request, we have this instead:
higher dets are above self.high_prob_boxes_threshold (currently set as 0.5), lower dets are ALL the other dets (even with conf <= 0.1)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, in the paper they dont mention the lower bound threshold, but i could find it in the implementation:
https://github.com/FoundationVision/ByteTrack/blob/main/yolox/tracker/byte_tracker.py
lines 177-182
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we include this 0.1 as a parameter? I think so, but bytetrack hardcodes it as 0.1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could include it as a parameter with the default value of 0.1. Usually, confidence of 0.1 or below means quite unreliable and noisy detections. However, we can leave the user an option to adjust it as, depending on their scene environment, they might have very low confidence detections in general (e.g. blurry images, night scenarios, etc.).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Implemented
| similarity_matrix, maximize=True | ||
| ) | ||
| for row, col in zip(row_indices, col_indices): | ||
| if similarity_matrix[row, col] >= min_similarity_thresh: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a bit concerning here:
Only discarding the matches based on minimum_iou_threshold rather than blocking similarity matrix entries from an association.
Original ByteTrack passes the cost_threshold (equivalent of minimum_iou_threshold) directly to the linear solver:
cost, x, y = lap.lapjv(cost_matrix, extend_cost=True, cost_limit=thresh)
The pull request version on the other hand, firstly runs the linear assignment on all entries of the similarity matrix, no matter what, and only after obtaining the matches (matched indices), discards them based on the corresponding values in similarity matrix.
Original ByteTrack: blocking some entries (setting a high value there) before optimization. Then we might (not guaranteed) still get matches for all pairs.
This pull request version: matching done, then discarding matches if the threshold doesn't allow it - some matches are completely discarded...
The results might be different in the two cases above.
My intuition: it's better to block some match entries (ith tracklet with jth detection) before the optimization/matching process, e.g. by setting these entries to some high/low values above/below the threshold - rather than discard them after the matching is done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see the difference, do you think that I should patch the implementation with scipy by adding the big costs or should we switch to lapjv?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it probably would result in a faster implementation with lapjv, but it can create dependency issues, specially for installing in some devices which might not be compatible with the build for this package, so im inclined towards fixing the scipy one
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in:
468f49a
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it probably would result in a faster implementation with lapjv, but it can create dependency issues, specially for installing in some devices which might not be compatible with the build for this package, so im inclined towards fixing the scipy one
In this case, I would also opt for the SciPy version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in: 468f49a
I am a bit concerned about the lines 52-58. It fills the extended cost matrix with values of the initial one, with cost_limit/2.0 and with 0s. I am not sure how it is supposed to help scipy.optimize.linear_sum_assignment to block the entries which have to be excluded by the threshold.
E.g. if we have 3 tracklets and 5 detections and there is cost_limit different than infinity, an extended cost matrix as in the image below will be created:
I don't see how it should involve the actual cost_limit to exclude the relevant entries from c11-c35.
Was it done like that or in a similar way in lap.lapjv? If so, do you have a link or reference to it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is used here: https://github.com/gatagat/lap/blob/master/_lapjv_cpp/_lapjv.pyx
the things is that if a real cost (c_{ij}) is bigger than the cost limit, then, the algorithm will instead want to match the newly added rows and columns in between themselves and the total cost of that matches would be cost_limit/2 *2 = cost_limit, but if its more optimal to match to one of the costs smaller that cost_limit then it would do so and then match the dummies to one of cost 0
trackers/core/bytetrack/tracker.py
Outdated
| similarity_matrix = None | ||
| if association_metric == "IoU": | ||
| # Build IOU cost matrix between detections and predicted bounding boxes | ||
| similarity_matrix = get_iou_matrix(trackers, detections.xyxy) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Preliminary note: Original ByteTrack constructs a cost matrix (1-IoU) rather than similarity matrix (IoU). While it should not matter or cause any significant differences, it is good to keep it in mind when making the comaparizons to the original ByteTrack implementation.
Mind that original ByteTrack does also this during its first and third association step:
if not self.args.mot20: # thus all the datasets except mot20
dists = matching.fuse_score(dists, detections)
|
In short, it does this:
fuse_sim = iou_sim * det_scores
fuse_cost = 1 - fuse_sim
return fuse_cost
So basically it enforces matches for each detection based on detection confidence.
It might lead to slightly different results, especially in case of noisy detector. Stronger detections will basically dominate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Im adapating the code to work in this style. Should we fuse scores when working with RE-ID features? I could only find this example where they did actually used embeddings for matching:
https://github.com/FoundationVision/ByteTrack/blob/main/tutorials/qdtrack/tracker_reid_motion.py
here they use the fuse_motion function, and also the script is called tracker_reid_motion which sounds like they are doing something different to ByteTrack, but they implement a class of ByteTrack there. Is there any other implementation you could point me to?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From my point of view, we should leave the subject of re-ID within the ByteTrack and come back to it with other (possibly related or derived) trackers which are actually designed to use it. ByteTrack original intention was to be used without re-ID, plain IoU. When someone wants a tracker with re-ID, they look for something else than ByteTrack.
| only ones that are matched using the appearance features. | ||
| """ # noqa: E501 | ||
|
|
||
| def __init__( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Params as in the original ByteTrack should be:
- minimum_consecutive_frames = 2 (1 for the very first frame), NOT 3 ---> the evaluation script will penalize missing records over more frames.
- minimum_iou_threshold = 0.1, NOT 0.2 ---> ByteTrack allows lower IoU similarity (higher IoU cost) for the first assocation step: https://github.com/FoundationVision/ByteTrack/blob/main/tools/track.py#L107 (0.9). Setting more demanding value will limit the number of associations made compared to the original version.
- EXTRA NOTE: Mind that ByteTrack has 3 (thus not 2) association steps (one extra for newly born tracklets), and each of the 3 steps has a different matching threshold ---> see a separate comment on that.
- high_prob_boxes_threshold = 0.6, NOT 0.5: https://github.com/FoundationVision/ByteTrack/blob/main/tools/track.py#L105
- track_activation_threshold = high_prob_boxes_threshold + 0.1 = 0.7, NOT 0.25 <--- This might cause chaos with non-oracle detections (coming from a detector, with confidences < 1.0) - Also noisy detections will be considered for new tracklet creation, creating a noise and false positive tracklets.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed defaults
trackers/core/bytetrack/tracker.py
Outdated
| remaining_trackers = [self.trackers[i] for i in unmatched_trackers] | ||
|
|
||
| # Step 2: associate Low Probability detections with remaining trackers | ||
| matched_indices, unmatched_trackers, unmatched_detections = ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Original ByteTrack:
1st assoc: currently tracked + lost tracklets
2nd assoc: only currently tracked tracklets
Pull request version:
Both associations use both currently tracked tracklets AND lost tracklets
Basically, there is no clear division between these two. Only this state of a ByteTrackKalmanBoxTracker instance: self.time_since_update, yet not used in bytetrack/tracker.py
Including lost tracklets in the second association step, with the low confidence detections might cause chaos with associations and many redundant associations, which in turn will lower the accuracy performance. Especially in case of noisy detector/detections.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changed it for remaining_trackers = [self.trackers[i] for i in unmatched_trackers if self.trackers[i].time_since_update == 1]. This is enough, to fix the behaviour difference with lost tracks, right? Even though it might be nicer to explictly called them lost tracks and treat them differently
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given the current implementation, it seems it is enough. However, we could run it on a few sequences, verify particular cases (e.g. with occlusion) and log the tracklet numbers to be sure it behaves as expected.
| if association_metric == "IoU": | ||
| # Build IOU cost matrix between detections and predicted bounding boxes | ||
| similarity_matrix = get_iou_matrix(trackers, detections.xyxy) | ||
| thresh = self.minimum_iou_threshold |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Original ByteTrack:
Each assoc step: different threshold (cost_limit, allowing IoU distance):
1st assoc.: args.match_thresh (0.9) (similarity: 0.1)
2nd assoc.: 0.5 (similarity: 0.5)
3rd assoc.: 0.7 (similarity: 0.3)
Pull request version:
always the same value: similarity thresh: 0.2
|
It will cause different results. The scale of difference might vary depending in the dataset and importantly, detection quality.
Basically, this pull request version allows more associations to happen than the original ByteTrack. Especially within the second association step with low confidence, thus more noisy, detections. Depending on the detector/detections, it might degrade the performance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
got it, we can add it as another parameter to the tracker. Didnt see it in the paper as well, but is clear in the source code
|
|
||
| return matched_indices, unmatched_trackers, unmatched_detections | ||
|
|
||
| def _spawn_new_trackers( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[ Big difference here ]
Original ByteTrack:
unmatched high conf dets are initiated as new tracklets only if they are matched with the next frame's unmatched high conf detections - with dist threshold 0.7 (= similarity threshold 0.3). Unless it is the very first frame. Then the new tracklets are instantly activated (see the "MIND THAT:" remark below for more).
|
Thus also: 2 frames are enough, not 3. And based on the high confidence unmatched detections between two frames.
This pull request version:
(_spawn_new_trackers)
Create a new tracklet from unmatached high conf detection. Then include it in the same pool as currently tracked tracklets (active tracklets) for matches with high and low conf dets. (original: matching only with the unmatched high conf dets, separate 3rd assoc step).
|
This will cause different results. The scale of difference might vary depending on the dataset and importantly, detection quality. More noisy detections = more false positive tracklets created. Also more chaos when mixing these freshly created, yet not mature tracklets in the same pool with other tracklets as mentioned above.
[ MIND THAT: ]
In the original ByteTrack, the STrack's (tracklet representation) activate function (https://github.com/FoundationVision/ByteTrack/blob/main/yolox/tracker/byte_tracker.py#L45) does not activate the tracklet yet. It sets self.state = TrackState.Tracked.
Only the STrack's update function truly activates it by setting: self.is_activated = True (STrack's reactivate function also does it).
As a result, the newly born tracklets from frame t are activated only after being confirmed (matched to remaining high conf detections) at frame t+1
---Exception: new tracklets at the very first frame (in ByteTrack settings: t=1) - only then STrack's function truly activates the tracklet by setting self.is_activated = True
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in commit
f1c440f
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in commit f1c440f
Looks good! As with one of the previous comments, to be sure, I recommend running it on a few sequences and log the tracklet numbers to be see if it behaves as expected.
| ) | ||
| if len(final_updated_detections) == 0: | ||
| final_updated_detections.tracker_id = np.array([], dtype=int) | ||
| return final_updated_detections |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[ Just a note ]
Originally provided with ByteTrack - just before returning the update's function final output - it removes some tracks per frame:
self.tracked_stracks, self.lost_stracks = remove_duplicate_stracks(self.tracked_stracks, self.lost_stracks)
(https://github.com/FoundationVision/ByteTrack/blob/main/yolox/tracker/byte_tracker.py#L285)
However, it, might cause little bugs with tracklet management. Therefore, I just mention it exists, but I don't recommend adding it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, wont change that for the moment
trackers/core/bytetrack/tracker.py
Outdated
| self.high_prob_association_metric = "IoU" if reid_model is None else "RE-ID" | ||
| self.reid_model = reid_model | ||
| self.distance_metric = distance_metric | ||
| self.trackers: list[ByteTrackKalmanBoxTracker] = [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would rather call it "tracks", not "trackers". Tracker is an instance that tracks entities (referred to as tracks or opt. tracklets)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
cd709d4
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall: nicely written. It looks clear and intuitive. Very good work here.
I spotted quite some differences compared to the original ByteTrack implementation from here: https://github.com/FoundationVision/ByteTrack/tree/main and pointed them with my remarks. Let me know in case you need extra information or clarifications.
Note: I focused on the non-re-ID version as this os the primary focus of ByteTrack. There exist other SORT-based algorithms explicitly using re-ID.
| ) | ||
|
|
||
|
|
||
| class ByteTrackTracker(BaseTrackerWithFeatures): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[ Extra step after tracking ]
Mind that in the original ByteTrack implementation, there is also a post-processing interpolation step:
https://github.com/FoundationVision/ByteTrack/blob/main/tools/interpolation.py .
You can see it to be called here: https://github.com/FoundationVision/ByteTrack?tab=readme-ov-file#tracking .
It can be run on a single or multiple tracking text outputs.
It basically smooths the bounding box coordinates over the trajectories of each tracklet. A detector might provide a bit of noise, i.e. "jumping" /"shaky" bounding boxes of an object over the consecutive frames. This interpolation step rectifies it a bit, so that the overall trajectories are smoother.
In case of MOT17 and MOT20 it did improve the final results (HOTA, MOTA, IDF1; sometimes even by a few points).
Note two things:
- You do not need to evaluate mota during this process, this lines 125 and 127 are redundant:
https://github.com/FoundationVision/ByteTrack/blob/main/tools/interpolation.py#L125
Hence there is no need for any dataset at this step. - For actual evaluations, it is highly NOT recommended to use the motmetrics package as done here: https://github.com/FoundationVision/ByteTrack/blob/main/tools/interpolation.py#L4, but use the official evaluation tool: TrackEval https://github.com/JonathonLuiten/TrackEval
TrackEval is an official evaluation script in the community, while motmetrics produce different results, to a higher or lower extend, resulting in confusion and ambiguities.
tstanczyk95
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Additional remark to my last ByteTrack review. It discusses additional functionality and post-processing step from the original ByteTrack implementation.
…onality of Bytetrack implementation based on the paper
…ckers into feat/core/bytetrack
Description
This pull request adds the ByteTrack tracker to the list of implemented trackers.
No extra depedencies are required for usage, for testing I used pytest which is in the dev dependencies. The test file can be removed if you consider so, let me know.
Type of change
How has this change been tested, please provide a testcase or example of how you tested the change?
The implementation has been thoroughly tested, passing all the evaluations, and showing the expected results.
The new tracker can be used in the following Google Colab, which shows 2 different examples of the Tracker in its 2 possible modes: using appearance features for high confidence boxes and using IoU.
There is also a test file in the route: trackers/core/bytetrack/tests_bytetrack.py, this tests cover the main functioning of the tracker and check that it behaves as expected.
Any specific deployment considerations
Check trackers/init.py so that it can be imported correctly.
I've updated my trackers/core/deepsort/feature_extractor.py to be exactly like the one in the main branch, but for a reason it compares it to the old one.