How can we best prepare for the fall 2024 hackathon? #142
Replies: 5 comments 3 replies
-
Thanks so much for raising this! As I pull together our ML resources (documentation, process, platform, etc.), I'll get a better idea of goals for the ML perspective |
Beta Was this translation helpful? Give feedback.
-
Any thoughts at this juncture, Val @veirs or Dave @dbainj1 ? For fun, I'll invite the HALLO crowd to chime in as well. Perhaps we can entrain some Canadian participants in the Microsoft hackathon this September? |
Beta Was this translation helpful? Give feedback.
-
Scott,
I'd welcome participation by the Canadians.
My current issue is the login process. With Microsoft's new security protocol, it takes me about 40 seconds to login. Being able to stay logged in for longer would be really helpful. Being logged in permanently on a trusted device would be even better.
Retraining the model would be another good activity. We're up to 600 minutes with confirmed detections. We also have over 4200 minutes with false positives. That should be an adequate sample to improve the model, and think about hydrophone specific models.
A "heartbeat" monitor would be really helpful. I think a confidence level gets calculated every minute for each hydrophone. Having those values reported out so we can get an idea of whether things are working or not would be valuable. It may also help us figure out how to set up an alarm system for when things aren't working. As the system is now, not receiving notifications could mean the system is working perfectly and not generating any false positives, or that it's not working at all.
On the requests for review, it would be helpful to include the hydrophone site. E.g., a high percentage of the notifications from Point Robinson are false positives, while notifications from Orcasound Lab are more likely to be true positives. That could affect how quickly we try to get to reviewing a tentative detection.
There seems to be a 20-25 minute delay between detection and notification. For now, that's not a problem, but once we link to ship notifications, minimizing that delay will be essential. Automating that link into the system may be timely at the next Hackathon. Updating our list of reviewers and end users would be another simple but valuable task.
While there's room for improvement, I think the most valuable thing we can be doing is getting more hydrophones in and maintaining the ones we've got. If we make progress in that arena, incorporating the new hydrophones into OrcaHello would be another useful task.
Those are the things that immediately come to mind. As I think of more things, I'll let you know.
…--Dave
________________________________
From: Scott Veirs ***@***.***>
Sent: Friday, May 3, 2024 10:51 AM
To: orcasound/aifororcas-livesystem ***@***.***>
Cc: David Bain ***@***.***>; Mention ***@***.***>
Subject: Re: [orcasound/aifororcas-livesystem] How can we best prepare for the fall 2024 hackathon? (Discussion #142)
Any thoughts at this juncture, Val @veirs<https://github.com/veirs> or Dave @dbainj1<https://github.com/dbainj1> ?
For fun, I'll invite the HALLO crowd to chime in as well. Perhaps we can entrain some Canadian participants in the Microsoft hackathon this September?
—
Reply to this email directly, view it on GitHub<#142 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AWBBIJ3VDZTINYQWPY5SH2TZAPFCBAVCNFSM6AAAAABG3V4VA2VHI2DSMVQWIX3LMV43SRDJONRXK43TNFXW4Q3PNVWWK3TUHM4TGMBYGM2TI>.
You are receiving this because you were mentioned.Message ID: ***@***.***>
|
Beta Was this translation helpful? Give feedback.
-
Hi, Assessing performance for Future Hackathons
Benchmarking Multiple ModelsI have benchmarked the following models for accuracy and computational performance across DeepAL Compare, the 400 ONC labels, and ~2 years of Oceans Observatories Initiative data: My current suggested workflow is: run an efficient classifier on all data. This would allow us to discard about 80% of all samples that are very low probability. Models with a higher probability can be sent to a server for classification with a better performing model for confirming detection, species classification, individual classification, call type classification etc. Future considerationsI've also ventured into species classification as well. The unsupervised Wav2Vec2 model clearly separates humpbacks from orcas. The transient orca calls are completely overlapping with a subset of southern resident calls, but the rest of the southern resident calls are not overlapping with transient nor humpback calls. With a bit of supervised learning I anticipate that we will be able to separate them completely. Right now I need some more annotated data before I train something I would depend upon (especially for offshore and northern residents) The full 1603 hours of vocalisation data I have collected from Orcasound and ONC is on huggingface. These data can be streamed from huggingface for inference without the need to download the ~600GB of data locally. I haven't uploaded any negative examples, since there are way too many, but I have noted the files which I have evaluated and deemed not to contain cetaceans. * note: the ONC data is not public yet as I am still shifting around the labels. Once I get it to a stable place I will make it public and switch over to version control. I will upload all my pre-trained and fine-tuned models to huggingface once I finish up the tuning and evaluation. I don't have too many comments on the infrastructure side of things, but I do have a preference for having a near real-time folder of 5-minute long FLAC files I can sync from the cloud. |
Beta Was this translation helpful? Give feedback.
-
I would really love to see a hackathon tackle the HB/KW problem some day! I'm not sure how much this comes up in Orcasound data, but in ONC data it's a huge issue. Looking for orcas in "orca" detections in our data can feel like looking for a needle in a humpback hay stack. Also a quick add-on to Bret's comment about labels - what Bret will be submitting was labelled as presence/absence, but I couldn't help myself from retaining species information for my own records. So all of those labels I produced exist as species labels too that I'd be happy to share if someone might find them useful. I'm also sitting on a mountain of humpback labels from other ONC hydrophones that I've produced while searching through "orca detections" to look for Biggs - I figured one man's signal is another man's noise and someone might want them! |
Beta Was this translation helpful? Give feedback.
-
Let's brainstorm asynchronously: What are our collective goals for the 2024 Microsoft hackathon?
This year will mark the 5th year that the OrcaHello project has worked with both Beam Reach (Scott, Val) and Orca Conservancy (Dave Bain) to automate the detection of SRKW calls in the Orcasound live audio streams! With about 6 months to prepare, can we set some new goals or directions for the OrcaHello project?
Here is a short list of ideas and questions we could begin to address before the hackathon. A goal could be to distill these into issues within the repo and then tag the ones that should be addressed by particular teams during the 2024 hackathon. Please add your thoughts in this discussion!
Short-term priorities
Longer-term, lower-priority (stretch goals?)
Beta Was this translation helpful? Give feedback.
All reactions