This assignment consists of two parts, each designed to evaluate your ability to work with real-world sensor data and video footage collected from Nexar dashcams.
In this task, you'll analyze the behavior of a deployed machine learning model trained on IMU (Inertial Measurement Unit) accelerometer data. Nexar dashcams collect rich sensor data during vehicle operation, including acceleration along multiple axes, enabling the detection of events like collisions or sudden motion changes.
You're provided with pre-extracted features for the training and test sets (with labels), and raw signal files for the inference set. Your goal is to evaluate how the model performs on the inference data and diagnose any performance gaps relative to the original test set.
- Generate features from the raw IMU inference signals.
- Load and run inference using the provided model.
- Compare predictions against the true labels in the manual annotation set.
- Evaluate and compare performance across datasets.
- Perform EDA to investigate potential causes of model degradation.
- Propose short-term and long-term solutions.
├── data/
│ ├── train.csv # Pre-extracted features with labels
│ ├── test.csv # Pre-extracted features with labels
│ ├── inference.csv # ❗ Generated features for inference (no labels)
│ ├── raw/
│ │ ├── train/
│ │ ├── test/
│ │ └── inference/ # Raw .npz files for inference
│
├── data/manual_annotation/
│ └── inference_labels.csv # ✅ Ground truth for inference set
│
├── extract_features.py # Feature extraction logic
├── imu_pipeline.py # Pipeline with trained model
├── visualization.py # Signal viewer for manual exploration
├── example.ipynb # Starter notebook
├── models/
│ └── imu_pipeline.pkl # Pre-trained RandomForest model
└── requirements.txt
pip install -r requirements.txtPython 3.8+ is recommended.
from visualization import signal_viewer
from pathlib import Path
signal_viewer(
data_dir=Path('data/raw/train'),
labels_csv=Path('data/train.csv')
)Use this to better understand the structure of the signals and class distribution.
from extract_features import process_dataset
process_dataset('inference')This will create data/inference.csv using the same logic as in train/test.
Note: Unlike train.csv and test.csv, this file does not contain labels.
import pandas as pd
labels = pd.read_csv("data/manual_annotation/inference_labels.csv")import joblib
from imu_pipeline import IMUPipeline
df_inf = pd.read_csv("data/inference.csv")
model = joblib.load("models/imu_pipeline.pkl")
preds = model.predict(df_inf)
probs = model.predict_proba(df_inf)- Evaluate performance on the inference set using standard metrics (accuracy, precision, recall, F1).
- Compare against performance on the test set (
test.csv). - Perform EDA (exploratory data analysis) on both test and inference sets:
- Identify, explain, and justify the reason for the performance discrepancy. Pinpoint the root cause
- Evaluate the model on the
inference.csvpredictions usinginference_labels.csv. - Compare results to performance on
test.csv. - Perform EDA to understand dataset differences.
- Suggest an immediate workaround.
- Propose a long-term fix.
- At the end of the example.ipynb notebook, you’ll find a section titled Questions to Reflect On. You are required to provide clear answers to all these questions as part of your analysis.
In this task, you'll help define an annotation protocol for driver behavior near stop signs.
You're provided with dashcam video clips that were flagged as potentially involving stop sign interactions. Your job is to create a simple, effective labeling guide for remote annotators — maximizing label quality and training signal with minimal ambiguity.
Reference file:
Near Stop Sign Behavior - Annotation Instructions.pdf
Constraints:
- Only video is available (no metadata or sensor data).
- You may assume annotators are remote and have limited context.
- A clear set of instructions for annotators.
- Assumptions or simplifications you made.
- A short explanation of how your labels support training an effective model.
Expected effort: ~1 hour.
See:
videos/
Please submit the following in your GitHub fork:
example.ipynbwith your full analysis for Part 1.- A file
stop_sign_annotation_protocol.md(or.pdf) for Part 2.
When ready, send the repository link to the recruiter.
Good luck! 🚦