Wildlife Detection

🤔 About

The project is designed to solve the "data fatigue" problem associated with motion-activated trail cameras.

While trail cameras are excellent at capturing high-frequency activity, they are notoriously indiscriminate; any movement, ranging from swaying tree branches and wind-blown foliage to domestic activity like gardening or children playing, triggers a recording. This results in a massive library of "false positive" videos that require hours of manual scrubbing to find actual wildlife encounters.

This project aims to implement an intelligent, privacy-preserving, and fully automated pipeline that acts as a first-pass filter. By leveraging local Large Language Models (LLMs) with vision capabilities, the system will autonomously analyze video content and categorize footage, allowing the user to bypass the noise and focus exclusively on meaningful biological observations.

💡 Details

The core architecture of the project relies on Multimodal Large Language Models (MLLMs) running locally via Ollama. By keeping the processing local, the project ensures total privacy for the household and eliminates the costs and latency associated with cloud-based video analysis.

The proposed workflow involves three primary stages:

Frame Sampling: Instead of analyzing every frame (which is computationally expensive), the system will extract a series of periodic snapshots (e.g., one frame every 5 seconds) from each video file.
Visual Inference: These sampled frames are fed into a vision-capable model (via Ollama). The model is prompted with a specific instruction: Identify if there is any wildlife (animals, birds, insects) present in this image. Respond with only 'Yes' or 'No'.
Automated Curation: Based on the AI’s response, the system executes a file management command. If "Yes" is detected in any sampled frame, the video is moved to a Keep directory; otherwise, it is moved to a Discard directory.

This pipeline transforms a manual, labor-intensive task into a hands-off, background process that runs on the input folder.

🔬 Proof of Concept (PoC)

🎯 Objective

The Proof of Concept is designed to validate whether vision-enabled LLMs can reliably produce a binary classification (Wildlife vs. No Wildlife) from static images. This stage avoids the computational overhead of video processing to focus purely on model accuracy.

🧠 Methodology

A representative dataset of images was curated, containing both target wildlife (Wallabies, various birds, rodents, possums, and large lizards) and "noise" images (empty garden or forest).

Sample Image	Description	Preview
bee-garden.png	Garden view with small Bee barely visible close to the camera.
bird-back-close-left-forest.png	Forest view with the back of a Kookaburra on the lower-left.
bird-close-center-forest.png	Forest view with a Noisy Miner on the lower-center.
bird-close-center-garden.png	Garden view with a bird on the center.
bird-close-left-forest.png	Forest view with a Kookaburra on the lower-left.
bird-close-left-garden.png	Garden view with a colorful bird on the lower-left.
bird-flying-close-left-forest.png	Forest view with a flying Kookaburra on the lower-left.
bird-left-garden.png	Garden view with a bird on the left.
birds-left-garden.png	Garden view with two Noisy Miners on the left.
day-forest.png	Forest view by day.
day-garden.png	Garden view by day.
early-morning-garden.png	Garden view during early morning.
lizard-close-left-garden.png	Garden view with a Lace Monitor on the left.
night-forest.png	Forest view by night.
night-garden.png	Garden view by night.
possum-right-night-forest.png	Forest view at night with a Possum on a branch.
rodent-extreme-right-night-forest.png	Forest view at night with a Rodent on a branch on the far-right.
rodent-right-night-forest.png	Forest view at night with a Rodent on a branch on the right.
wallaby-back-center-night-garden.png	Garden view at night with the back of a Wallaby.
wallaby-close-right-night-garden.png	Garden view at night with a Wallaby close to the camera.
wallaby-left-garden.png	Garden view with a Wallaby on the left.
wallaby-occluded-left-garden.png	Garden view with a Wallaby on the left with its face occluded.
wallaby-standing-forest.png	Forest view with a Wallaby in the center.
wallaby-standing-right-garden.png	Garden view with a Wallaby standing on the right.

The testing was conducted using a shell script located in the Src/PoC directory. The script performs the following steps:

Uses find to iterate through all image files in the sample directory.
Passes each image to Ollama using a vision-capable model.
Utilizes a strict system prompt to ensure a deterministic, parsable output:

"Identify if there is any wildlife (animals, birds, insects) present in this image. Respond with only 'Yes' or 'No'."

# Directory containing the images to process
DIRECTORY="./"
# The vision-enabled model to use
MODEL="llava:latest"
# The strict prompt for the AI
PROMPT="Identify if there is any wildlife (animals, birds, insects) present in this image. Respond with only 'Yes' or 'No'."

# Find all images and pass the prompt and the file path to Ollama model
find "$DIRECTORY" -type f \( -iname "*.jpg" -o -iname "*.png" \) -exec ollama run --hidethinking $MODEL "$PROMPT" {} \;

💻 Hardware reference

The following hardware was utilized for the testing of the shell script and model inference.

Hardware Summary:

Processor (CPU): Intel i7-13700K
Memory (RAM): 32GB
Graphics Card (GPU): NVIDIA RTX 3060 12GB
Inference Engine: Ollama

⚗️ Model Comparison & Results

The initial testing used the Llava model. However, because the output of this model were unreliable and non-conclusive, multiple models were tested to find the most reliable candidate.

Results Summary Table

The following table compares the performance of different LLMs (Vision-enabled) evaluated against the Testing Samples dataset.

Sample Image	Expected	Llava:7b	Llava:13b	ministral-3:14b	ministral-3:8b	mistral-small3.2:24b
bee-garden.png	No	✔️	✔️	❌	❌	❌
bird-back-close-left-forest.png	Yes	❌	✔️	✔️	✔️	✔️
bird-close-center-forest.png	Yes	❌	✔️	✔️	✔️	✔️
bird-close-center-garden.png	Yes	✔️	✔️	✔️	✔️	✔️
bird-close-left-forest.png	Yes	❌	✔️	✔️	✔️	✔️
bird-close-left-garden.png	Yes	❌	❌	✔️	✔️	✔️
bird-flying-close-left-forest.png	Yes	❌	✔️	✔️	✔️	✔️
bird-left-garden.png	Yes	❌	❌	✔️	✔️	✔️
birds-left-garden.png	Yes	❌	❌	✔️	✔️	✔️
day-forest.png	No	✔️	❌	✔️	✔️	✔️
day-garden.png	No	✔️	❌	✔️	✔️	✔️
early-morning-garden.png	No	✔️	✔️	✔️	✔️	✔️
lizard-close-left-garden.png	Yes	✔️	✔️	✔️	✔️	✔️
night-forest.png	No	✔️	✔️	✔️	✔️	✔️
night-garden.png	No	✔️	❌	❌	❌	❌
possum-right-night-forest.png	Yes	✔️	❌	✔️	✔️	✔️
rodent-extreme-right-night-forest.png	Yes	❌	❌	✔️	✔️	✔️
rodent-right-night-forest.png	Yes	❌	❌	❌	✔️	❌
wallaby-back-center-night-garden.png	Yes	✔️	✔️	✔️	✔️	✔️
wallaby-close-right-night-garden.png	Yes	✔️	✔️	✔️	✔️	✔️
wallaby-left-garden.png	Yes	✔️	✔️	✔️	✔️	✔️
wallaby-occluded-left-garden.png	Yes	✔️	❌	✔️	✔️	✔️
wallaby-standing-forest.png	Yes	✔️	✔️	✔️	✔️	✔️
wallaby-standing-right-garden.png	Yes	✔️	✔️	✔️	✔️	✔️

Sample Image	Expected	gemma3:27b	gemma3:12b	qwen3.5:9b	gemma4:26b	gemma4:e4b
bee-garden.png	No	❌	❌	❌	✔️	✔️
bird-back-close-left-forest.png	Yes	✔️	✔️	✔️	✔️	✔️
bird-close-center-forest.png	Yes	✔️	✔️	✔️	✔️	✔️
bird-close-center-garden.png	Yes	✔️	✔️	✔️	✔️	✔️
bird-close-left-forest.png	Yes	✔️	✔️	✔️	✔️	✔️
bird-close-left-garden.png	Yes	✔️	✔️	✔️	✔️	✔️
bird-flying-close-left-forest.png	Yes	✔️	✔️	✔️	✔️	✔️
bird-left-garden.png	Yes	✔️	✔️	✔️	✔️	✔️
birds-left-garden.png	Yes	✔️	✔️	✔️	✔️	✔️
day-forest.png	No	✔️	❌	✔️	✔️	✔️
day-garden.png	No	✔️	❌	✔️	✔️	✔️
early-morning-garden.png	No	✔️	❌	✔️	✔️	✔️
lizard-close-left-garden.png	Yes	✔️	✔️	✔️	✔️	✔️
night-forest.png	No	❌	❌	✔️	✔️	✔️
night-garden.png	No	❌	❌	✔️	✔️	✔️
possum-right-night-forest.png	Yes	✔️	✔️	✔️	✔️	✔️
rodent-extreme-right-night-forest.png	Yes	✔️	✔️	✔️	❌	❌
rodent-right-night-forest.png	Yes	✔️	✔️	✔️	❌	❌
wallaby-back-center-night-garden.png	Yes	✔️	✔️	✔️	✔️	✔️
wallaby-close-right-night-garden.png	Yes	✔️	✔️	✔️	✔️	✔️
wallaby-left-garden.png	Yes	✔️	✔️	✔️	✔️	✔️
wallaby-occluded-left-garden.png	Yes	✔️	✔️	✔️	✔️	✔️
wallaby-standing-forest.png	Yes	✔️	✔️	✔️	✔️	✔️
wallaby-standing-right-garden.png	Yes	✔️	✔️	✔️	✔️	✔️

Critical Evaluation:

Gemma 4 delivered the best overall results, balancing speed and accuracy effectively.
Llava proved unreliable due to inconsistent outputs and frequent errors.
Qwen 3.5 was significantly slower than its peers, with the 27B model being particularly unsuitable for the current hardware setup.

Critical Finding: The proof of concept has successfully validated the automated pipeline.

📝 Implementation

The full implementation moves from static images to automated video processing.

Shell Script

This solution is contained within the Src/ShellScript directory and consists of two modular scripts.

Video Frame Processor (`process-video.sh`)

This script handles the analysis of a single video file.

Workflow:

Frame Extraction: It uses ffmpeg to extract one frame every 5 seconds from the input video. This interval is chosen to balance computational speed with the likelihood of capturing an animal in motion.
Image Analysis: The script iterates through the extracted frames and pipes them into the selected Ollama vision model.
Decision Logic: The script applies the strict "Yes/No" prompt. If any single frame in the video returns a "Yes", the script flags the video as "Keep". If all frames return "No", the video is flagged as "Discard".

Batch Processor (`wildlife-detect.sh`)

This is the master script used to process entire directories of footage.

Workflow:

File Discovery: It uses the find command to scan a designated input directory for all video files (e.g., .mp4, .avi).
Execution Loop: It iterates through the discovered files and invokes process-video.sh for each one.

Usage

To run the full pipeline on a directory:

./Src/ShellScript/wildlife-detect.sh -d <directory> -m <model>

C#/.Net

Technical Stack

The application is built using a modern, robust .NET ecosystem:

Language & Framework: The implementation is written in C# utilizing .NET 10.
User Experience (UX): To provide a professional and interactive terminal interface, the tool uses Spectre.Console (for rich text, progress bars, and tables) and Spectre.CLI (for advanced command-line parsing and structured command management).
AI Orchestration: The tool interfaces with the Ollama ecosystem via OllamaSharp. This allows the application to send extracted image snapshots to Large Multimodal Models (LMMs) to perform the actual detection and identification logic.
Video Processing: Frame extraction and snapshot generation are handled by Xabe.FFmpeg, which provides a high-level wrapper around FFmpeg to programmatically capture specific frames from video files.

Build & Run

Prerequisites

.Net 10 SDK.
An active Ollama instance running with a multimodal model (like llava) downloaded.
FFmpeg will be automatically downloaded and saved under the binaries directory.

Building

Open your terminal in Src/dotNet directory and execute:

dotnet build

Running

From the same terminal, execute:

dotnet run [<directory>] [--url <url>]

Arguments:

<directory>: (Optional) The path to the folder containing the video files you wish to analyze. If this argument is omitted, the tool defaults to ./.
--url <url>: (Optional) The network address of your Ollama server. If this argument is omitted, the tool defaults to http://localhost:11434.

Preview

The following screenshot demonstrates the interactive terminal interface, showing the progress of video processing and the formatted detection results:

📊 Demo

Two distinct video files were used to test each automated detection implementation.

Video Filename	Video Content Description	Expected Label
No-Wildlife.mp4	Footage of a forest with gentle swaying of the leaves and branches in a light breeze.	Discard
Wildlife.mp4	Garden footage featuring a visiting wallaby.	Keep

No-Wildlife.mp4	Wildlife.mp4

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
Src		Src
TestingSamples		TestingSamples
assets		assets
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Wildlife Detection

🤔 About

💡 Details

🔬 Proof of Concept (PoC)

🎯 Objective

🧠 Methodology

💻 Hardware reference

⚗️ Model Comparison & Results

📝 Implementation

Shell Script

Video Frame Processor (`process-video.sh`)

Batch Processor (`wildlife-detect.sh`)

Usage

C#/.Net

Technical Stack

Build & Run

Prerequisites

Building

Running

Preview

📊 Demo

About

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Wildlife Detection

🤔 About

💡 Details

🔬 Proof of Concept (PoC)

🎯 Objective

🧠 Methodology

💻 Hardware reference

⚗️ Model Comparison & Results

📝 Implementation

Shell Script

Video Frame Processor (process-video.sh)

Batch Processor (wildlife-detect.sh)

Usage

C#/.Net

Technical Stack

Build & Run

Prerequisites

Building

Running

Preview

📊 Demo

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages

Video Frame Processor (`process-video.sh`)

Batch Processor (`wildlife-detect.sh`)