Skip to content

ZooterOne/Wildlife-Detection

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

9 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Wildlife Detection

πŸ€” About

The project is designed to solve the "data fatigue" problem associated with motion-activated trail cameras.

While trail cameras are excellent at capturing high-frequency activity, they are notoriously indiscriminate; any movement, ranging from swaying tree branches and wind-blown foliage to domestic activity like gardening or children playing, triggers a recording. This results in a massive library of "false positive" videos that require hours of manual scrubbing to find actual wildlife encounters.

This project aims to implement an intelligent, privacy-preserving, and fully automated pipeline that acts as a first-pass filter. By leveraging local Large Language Models (LLMs) with vision capabilities, the system will autonomously analyze video content and categorize footage, allowing the user to bypass the noise and focus exclusively on meaningful biological observations.

πŸ’‘ Details

The core architecture of the project relies on Multimodal Large Language Models (MLLMs) running locally via Ollama. By keeping the processing local, the project ensures total privacy for the household and eliminates the costs and latency associated with cloud-based video analysis.

The proposed workflow involves three primary stages:

  1. Frame Sampling: Instead of analyzing every frame (which is computationally expensive), the system will extract a series of periodic snapshots (e.g., one frame every 5 seconds) from each video file.
  2. Visual Inference: These sampled frames are fed into a vision-capable model (via Ollama). The model is prompted with a specific instruction: Identify if there is any wildlife (animals, birds, insects) present in this image. Respond with only 'Yes' or 'No'.
  3. Automated Curation: Based on the AI’s response, the system executes a file management command. If "Yes" is detected in any sampled frame, the video is moved to a Keep directory; otherwise, it is moved to a Discard directory.

This pipeline transforms a manual, labor-intensive task into a hands-off, background process that runs on the input folder.

πŸ”¬ Proof of Concept (PoC)

🎯 Objective

The Proof of Concept is designed to validate whether vision-enabled LLMs can reliably produce a binary classification (Wildlife vs. No Wildlife) from static images. This stage avoids the computational overhead of video processing to focus purely on model accuracy.

🧠 Methodology

A representative dataset of images was curated, containing both target wildlife (Wallabies, various birds, rodents, possums, and large lizards) and "noise" images (empty garden or forest).

Sample Image Description Preview
bee-garden.png Garden view with small Bee barely visible close to the camera.
bird-back-close-left-forest.png Forest view with the back of a Kookaburra on the lower-left.
bird-close-center-forest.png Forest view with a Noisy Miner on the lower-center.
bird-close-center-garden.png Garden view with a bird on the center.
bird-close-left-forest.png Forest view with a Kookaburra on the lower-left.
bird-close-left-garden.png Garden view with a colorful bird on the lower-left.
bird-flying-close-left-forest.png Forest view with a flying Kookaburra on the lower-left.
bird-left-garden.png Garden view with a bird on the left.
birds-left-garden.png Garden view with two Noisy Miners on the left.
day-forest.png Forest view by day.
day-garden.png Garden view by day.
early-morning-garden.png Garden view during early morning.
lizard-close-left-garden.png Garden view with a Lace Monitor on the left.
night-forest.png Forest view by night.
night-garden.png Garden view by night.
possum-right-night-forest.png Forest view at night with a Possum on a branch.
rodent-extreme-right-night-forest.png Forest view at night with a Rodent on a branch on the far-right.
rodent-right-night-forest.png Forest view at night with a Rodent on a branch on the right.
wallaby-back-center-night-garden.png Garden view at night with the back of a Wallaby.
wallaby-close-right-night-garden.png Garden view at night with a Wallaby close to the camera.
wallaby-left-garden.png Garden view with a Wallaby on the left.
wallaby-occluded-left-garden.png Garden view with a Wallaby on the left with its face occluded.
wallaby-standing-forest.png Forest view with a Wallaby in the center.
wallaby-standing-right-garden.png Garden view with a Wallaby standing on the right.

The testing was conducted using a shell script located in the Src/PoC directory. The script performs the following steps:

  1. Uses find to iterate through all image files in the sample directory.
  2. Passes each image to Ollama using a vision-capable model.
  3. Utilizes a strict system prompt to ensure a deterministic, parsable output:

"Identify if there is any wildlife (animals, birds, insects) present in this image. Respond with only 'Yes' or 'No'."

# Directory containing the images to process
DIRECTORY="./"
# The vision-enabled model to use
MODEL="llava:latest"
# The strict prompt for the AI
PROMPT="Identify if there is any wildlife (animals, birds, insects) present in this image. Respond with only 'Yes' or 'No'."

# Find all images and pass the prompt and the file path to Ollama model
find "$DIRECTORY" -type f \( -iname "*.jpg" -o -iname "*.png" \) -exec ollama run --hidethinking $MODEL "$PROMPT" {} \;

πŸ’» Hardware reference

The following hardware was utilized for the testing of the shell script and model inference.

Hardware Summary:

  • Processor (CPU): Intel i7-13700K
  • Memory (RAM): 32GB
  • Graphics Card (GPU): NVIDIA RTX 3060 12GB
  • Inference Engine: Ollama

βš—οΈ Model Comparison & Results

The initial testing used the Llava model. However, because the output of this model were unreliable and non-conclusive, multiple models were tested to find the most reliable candidate.

Results Summary Table

The following table compares the performance of different LLMs (Vision-enabled) evaluated against the Testing Samples dataset.

Sample Image Expected Llava:7b Llava:13b ministral-3:14b ministral-3:8b mistral-small3.2:24b
bee-garden.png No βœ”οΈ βœ”οΈ ❌ ❌ ❌
bird-back-close-left-forest.png Yes ❌ βœ”οΈ βœ”οΈ βœ”οΈ βœ”οΈ
bird-close-center-forest.png Yes ❌ βœ”οΈ βœ”οΈ βœ”οΈ βœ”οΈ
bird-close-center-garden.png Yes βœ”οΈ βœ”οΈ βœ”οΈ βœ”οΈ βœ”οΈ
bird-close-left-forest.png Yes ❌ βœ”οΈ βœ”οΈ βœ”οΈ βœ”οΈ
bird-close-left-garden.png Yes ❌ ❌ βœ”οΈ βœ”οΈ βœ”οΈ
bird-flying-close-left-forest.png Yes ❌ βœ”οΈ βœ”οΈ βœ”οΈ βœ”οΈ
bird-left-garden.png Yes ❌ ❌ βœ”οΈ βœ”οΈ βœ”οΈ
birds-left-garden.png Yes ❌ ❌ βœ”οΈ βœ”οΈ βœ”οΈ
day-forest.png No βœ”οΈ ❌ βœ”οΈ βœ”οΈ βœ”οΈ
day-garden.png No βœ”οΈ ❌ βœ”οΈ βœ”οΈ βœ”οΈ
early-morning-garden.png No βœ”οΈ βœ”οΈ βœ”οΈ βœ”οΈ βœ”οΈ
lizard-close-left-garden.png Yes βœ”οΈ βœ”οΈ βœ”οΈ βœ”οΈ βœ”οΈ
night-forest.png No βœ”οΈ βœ”οΈ βœ”οΈ βœ”οΈ βœ”οΈ
night-garden.png No βœ”οΈ ❌ ❌ ❌ ❌
possum-right-night-forest.png Yes βœ”οΈ ❌ βœ”οΈ βœ”οΈ βœ”οΈ
rodent-extreme-right-night-forest.png Yes ❌ ❌ βœ”οΈ βœ”οΈ βœ”οΈ
rodent-right-night-forest.png Yes ❌ ❌ ❌ βœ”οΈ ❌
wallaby-back-center-night-garden.png Yes βœ”οΈ βœ”οΈ βœ”οΈ βœ”οΈ βœ”οΈ
wallaby-close-right-night-garden.png Yes βœ”οΈ βœ”οΈ βœ”οΈ βœ”οΈ βœ”οΈ
wallaby-left-garden.png Yes βœ”οΈ βœ”οΈ βœ”οΈ βœ”οΈ βœ”οΈ
wallaby-occluded-left-garden.png Yes βœ”οΈ ❌ βœ”οΈ βœ”οΈ βœ”οΈ
wallaby-standing-forest.png Yes βœ”οΈ βœ”οΈ βœ”οΈ βœ”οΈ βœ”οΈ
wallaby-standing-right-garden.png Yes βœ”οΈ βœ”οΈ βœ”οΈ βœ”οΈ βœ”οΈ
Sample Image Expected gemma3:27b gemma3:12b qwen3.5:9b gemma4:26b gemma4:e4b
bee-garden.png No ❌ ❌ ❌ βœ”οΈ βœ”οΈ
bird-back-close-left-forest.png Yes βœ”οΈ βœ”οΈ βœ”οΈ βœ”οΈ βœ”οΈ
bird-close-center-forest.png Yes βœ”οΈ βœ”οΈ βœ”οΈ βœ”οΈ βœ”οΈ
bird-close-center-garden.png Yes βœ”οΈ βœ”οΈ βœ”οΈ βœ”οΈ βœ”οΈ
bird-close-left-forest.png Yes βœ”οΈ βœ”οΈ βœ”οΈ βœ”οΈ βœ”οΈ
bird-close-left-garden.png Yes βœ”οΈ βœ”οΈ βœ”οΈ βœ”οΈ βœ”οΈ
bird-flying-close-left-forest.png Yes βœ”οΈ βœ”οΈ βœ”οΈ βœ”οΈ βœ”οΈ
bird-left-garden.png Yes βœ”οΈ βœ”οΈ βœ”οΈ βœ”οΈ βœ”οΈ
birds-left-garden.png Yes βœ”οΈ βœ”οΈ βœ”οΈ βœ”οΈ βœ”οΈ
day-forest.png No βœ”οΈ ❌ βœ”οΈ βœ”οΈ βœ”οΈ
day-garden.png No βœ”οΈ ❌ βœ”οΈ βœ”οΈ βœ”οΈ
early-morning-garden.png No βœ”οΈ ❌ βœ”οΈ βœ”οΈ βœ”οΈ
lizard-close-left-garden.png Yes βœ”οΈ βœ”οΈ βœ”οΈ βœ”οΈ βœ”οΈ
night-forest.png No ❌ ❌ βœ”οΈ βœ”οΈ βœ”οΈ
night-garden.png No ❌ ❌ βœ”οΈ βœ”οΈ βœ”οΈ
possum-right-night-forest.png Yes βœ”οΈ βœ”οΈ βœ”οΈ βœ”οΈ βœ”οΈ
rodent-extreme-right-night-forest.png Yes βœ”οΈ βœ”οΈ βœ”οΈ ❌ ❌
rodent-right-night-forest.png Yes βœ”οΈ βœ”οΈ βœ”οΈ ❌ ❌
wallaby-back-center-night-garden.png Yes βœ”οΈ βœ”οΈ βœ”οΈ βœ”οΈ βœ”οΈ
wallaby-close-right-night-garden.png Yes βœ”οΈ βœ”οΈ βœ”οΈ βœ”οΈ βœ”οΈ
wallaby-left-garden.png Yes βœ”οΈ βœ”οΈ βœ”οΈ βœ”οΈ βœ”οΈ
wallaby-occluded-left-garden.png Yes βœ”οΈ βœ”οΈ βœ”οΈ βœ”οΈ βœ”οΈ
wallaby-standing-forest.png Yes βœ”οΈ βœ”οΈ βœ”οΈ βœ”οΈ βœ”οΈ
wallaby-standing-right-garden.png Yes βœ”οΈ βœ”οΈ βœ”οΈ βœ”οΈ βœ”οΈ

Critical Evaluation:

  • Gemma 4 delivered the best overall results, balancing speed and accuracy effectively.
  • Llava proved unreliable due to inconsistent outputs and frequent errors.
  • Qwen 3.5 was significantly slower than its peers, with the 27B model being particularly unsuitable for the current hardware setup.

Critical Finding: The proof of concept has successfully validated the automated pipeline.

πŸ“ Implementation

The full implementation moves from static images to automated video processing.

Shell Script

This solution is contained within the Src/ShellScript directory and consists of two modular scripts.

Video Frame Processor (process-video.sh)

This script handles the analysis of a single video file.

Workflow:

  1. Frame Extraction: It uses ffmpeg to extract one frame every 5 seconds from the input video. This interval is chosen to balance computational speed with the likelihood of capturing an animal in motion.
  2. Image Analysis: The script iterates through the extracted frames and pipes them into the selected Ollama vision model.
  3. Decision Logic: The script applies the strict "Yes/No" prompt. If any single frame in the video returns a "Yes", the script flags the video as "Keep". If all frames return "No", the video is flagged as "Discard".

Batch Processor (wildlife-detect.sh)

This is the master script used to process entire directories of footage.

Workflow:

  1. File Discovery: It uses the find command to scan a designated input directory for all video files (e.g., .mp4, .avi).
  2. Execution Loop: It iterates through the discovered files and invokes process-video.sh for each one.

Usage

To run the full pipeline on a directory:

./Src/ShellScript/wildlife-detect.sh -d <directory> -m <model>

C#/.Net

Technical Stack

The application is built using a modern, robust .NET ecosystem:

  • Language & Framework: The implementation is written in C# utilizing .NET 10.
  • User Experience (UX): To provide a professional and interactive terminal interface, the tool uses Spectre.Console (for rich text, progress bars, and tables) and Spectre.CLI (for advanced command-line parsing and structured command management).
  • AI Orchestration: The tool interfaces with the Ollama ecosystem via OllamaSharp. This allows the application to send extracted image snapshots to Large Multimodal Models (LMMs) to perform the actual detection and identification logic.
  • Video Processing: Frame extraction and snapshot generation are handled by Xabe.FFmpeg, which provides a high-level wrapper around FFmpeg to programmatically capture specific frames from video files.

Build & Run

Prerequisites

  • .Net 10 SDK.
  • An active Ollama instance running with a multimodal model (like llava) downloaded.
  • FFmpeg will be automatically downloaded and saved under the binaries directory.

Building

Open your terminal in Src/dotNet directory and execute:

dotnet build

Running

From the same terminal, execute:

dotnet run [<directory>] [--url <url>]

Arguments:

  • <directory>: (Optional) The path to the folder containing the video files you wish to analyze. If this argument is omitted, the tool defaults to ./.
  • --url <url>: (Optional) The network address of your Ollama server. If this argument is omitted, the tool defaults to http://localhost:11434.

Preview

The following screenshot demonstrates the interactive terminal interface, showing the progress of video processing and the formatted detection results:

πŸ“Š Demo

Two distinct video files were used to test each automated detection implementation.

Video Filename Video Content Description Expected Label
No-Wildlife.mp4 Footage of a forest with gentle swaying of the leaves and branches in a light breeze. Discard
Wildlife.mp4 Garden footage featuring a visiting wallaby. Keep
No-Wildlife.mp4 Wildlife.mp4
No-Wildlife Preview Wildlife Preview

About

Automated wildlife detection and video curation via local multimodal AI.

Topics

Resources

License

Stars

Watchers

Forks

Contributors