-
Notifications
You must be signed in to change notification settings - Fork 43
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: Add stem activations generate script #49
Conversation
@faroit Let me know if you need any support on this. I'm happy to help :) |
@rabitt cool thanks
|
@faroit the samples dataset is now downloaded in the travis environment in the latest version of the 1.2 branch, and I set the |
Maybe @bmcfee has run into this in the past? |
👍 thats so great. I will give it a try tomorrow |
It's actually more a problem of me being too lazy to install MATLAB here... so if you or @bmcfee have a snippet flying around that does do a 1:1 equivalent of matlabs |
Not that I recall, but I think From the matlab docs:
In librosa, this would be: y = librosa.util.frame(x, frame_length=n, hop_length=n) If you want to do this with end-padding (which invokes a copy): x_pad = librosa.util.fix_length(x, int(n * np.ceil(len(x) / n)))
y = librosa.util.frame(x_pad, frame_length=n, hop_length=n) If you want overlap, then change the hop_length accordingly. |
…to medleydb_v1.2
@bmcfee thanks, this seems to work |
@rabitt next up, the values are a bit different, I will install matlab tomorrow and see if I can reproduce the annotations in matlab and eventually step trough the code. See the first 10 lines from the csv: Reference annotation
created with 2f5ef02
it's not that far off, but the idea is to test against the reference (not creating new annotations with python), right? |
Ideally the output should match the annotations. Let me know what you find after checking against matlab. |
H = [] | ||
|
||
for track_id, track in mtrack.stems.items(): | ||
audio, rate = librosa.load(track.file_path, mono=False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
by default librosa loads the audio at sr=22050
, but I think originally the activations were computed using the original samplerate, sr=44100
.
Why mono=False
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the time vectors were indicating otherwise, but I now found that it's actually the window length parameter that need to be changed. So sr=441000
and win_length=4096
seems to match.
Results are still a bit off. I am working on it...
time,S01,S02,S03,S04,S05
0.0000,0.9927,0.6062,0.1231,0.5787,0.8933
0.0464,0.9966,0.9341,0.3208,0.8458,0.9880
0.0929,0.9983,0.9919,0.6236,0.9549,0.9987
0.1393,0.9991,0.9989,0.8572,0.9870,0.9998
0.1858,0.9994,0.9998,0.9567,0.9959,1.0000
0.2322,0.9996,1.0000,0.9877,0.9985,1.0000
0.2786,0.9997,1.0000,0.9964,0.9994,1.0000
0.3251,0.9997,1.0000,0.9988,0.9997,1.0000
0.3715,0.9998,1.0000,0.9996,0.9998,1.0000
vs
time,S01,S02,S03,S04,S05
0.0000,0.9932,0.5501,0.0728,0.5285,0.8885
0.0464,0.9967,0.9246,0.2522,0.8291,0.9870
0.0929,0.9983,0.9913,0.5889,0.9523,0.9985
0.1393,0.9990,0.9989,0.8569,0.9868,0.9998
0.1858,0.9993,0.9998,0.9607,0.9960,1.0000
0.2322,0.9995,1.0000,0.9896,0.9986,1.0000
0.2786,0.9996,1.0000,0.9971,0.9994,1.0000
0.3251,0.9997,1.0000,0.9991,0.9997,1.0000
0.3715,0.9997,1.0000,0.9997,0.9999,1.0000
seems that the framing might still be an issue...
# MATLAB equivalent to @hanning(win_len) | ||
win = scipy.signal.windows.hann(win_len + 2)[1:-1] | ||
|
||
# mix down to 1 channel |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not do this on load?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right
@@ -188,8 +192,8 @@ def test_add_sequence_to_melody4(self): | |||
print(expected) | |||
self.assertTrue(array_almost_equal(actual, expected)) | |||
|
|||
def test_add_sequence_to_melody4(self): | |||
|
|||
def test_add_sequence_to_melody5(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good catch!
re. the first item:
for now, throw a warning for tracks with bleed and just turn up the confidence needed to be considered "active". We're in the process of plugging in an algorithm by @TGabor that performs bleed removal, so we'll be able to use the same activation code for everything. |
fyi, I (finally) merged the v1.2 branch to master, so this PR can eventually be pulled into master. |
Hey @faroit ! Wanted to check in on the status of this. |
fyi @pli1988 happens to be working on code to convert from activation confidence values to the sourceid annotations. |
@rabitt sorry, quite busy over here. I will continue working on the PR over the course of the weekend. |
# binary thresholding for low overall energy events | ||
mask = np.ones(H.shape) | ||
mask[:, E0 < 0.01] = 0 | ||
H = H * mask |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could replace 36-38 with this one-liner:
H[:, E0 < 0.01] = 0.0
@rabitt sorry for not finishing this up. do you want to take over? Can you edit this PR or do I better close this? |
This addresses #25.
This is currently work in progress. Things that are still missing: