Skip to content
/ ATSal Public

360 video Head and Eye movement prediction framework with two-stream models

Notifications You must be signed in to change notification settings

mtliba/ATSal

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

35 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ATSal : An Attention Based Architecture for Saliency Prediction in 360◦ Videos

Abstract :

The spherical domain representation of 360◦ video/image presents many challenges related to the storage, processing, transmission and rendering of omnidirectional videos (ODV). Models of human visual attention can be used so that only a single viewport is rendered at a time, which is important when developing systems that allow users to explore ODV with head mounted displays (HMD). Accordingly, researchers have proposed various saliency models for 360◦ video/images. This paper proposes ATSal, a novel attention based (head-eye) saliency model for 360◦ videos. The attention mechanism explicitly encodes global static visual attention allowing expert models to focus on learning the saliency on local patches throughout consecutive frames. We compare the proposed approach to other state-ofthe-art saliency models on two datasets: Salient360! and VREyeTracking. Experimental results on over 80 ODV videos (75K+ frames) show that the proposed method outperforms the existing state-of-the-art.

Find the extended pre-print version of our work on arXiv .

Head end Eye movment prediction in omnidirectonal video :

Is the task that aims to model the gaze fixation distribution patterns of humans on static and dynamic omnidirectional scenes, due to the predicted saliency map which defined as a heatmap of probabilities, where every probability corresponds to how likely it is that the corresponding pixel will attract human attention, so it could be used to prioritize the information across space and time for videos, this task is quite beneficial in a variety of computer vision applications including image and video compression, image segmentation, object recognition, etc.

Model Architecture :

architecture-fig

Model Parameters :

ATSal attention model initialization :

ATSal attention model trained on Salient360! and Sitzman image dataset:

ATSal attention model trained on Salient360! and VR-EyeTracking video dataset:

ATSal expert models trained on Salient360! and VR-EyeTracking video dataset:

DATASETS:

saliency prediction studies in 360◦images are still limited. The absence of common head and eye-gaze datasets for 360◦content and difficulties of their reproducibility compared with publicly provided 2D stimuli dataset could be one of the reasons that have hindered progress in the development of computational saliency models on this front so that here we are providing a reproduced version of VR- EyeTracking Dataset with 215 videos, and an augmented version of Sitzmann_TVCG_VR dataset with 440 images.

COMPARATIVE PERFORMANCE STUDY ON: SALIENT360! , VR-EYETRACKING DATASETS:

result-fig

Test model:

To test a pre-trained model on video data and produce saliency maps, execute the following command:

cd test/weight
weight.sh

cd ..
python test -'path to your video dataset' -'output path'

Demo:

Here we are providing a comparison between attention stream, expert stream, and our final model ATSal, as shown bellow the attention stream overestimate salient area where it predicts the static global attention, the expert models predict dynamic saliency on each viewport independently based on its content and location but still introduce artifact on viewports boundaries and ignore the global attention statistic, unlike the fusion of both streams 'ATSal model' that is better at attending salient information distribution over space and time.

Contact:

For questions, bug reports, and suggestions about this work, please create an issue in this repository or send an email to mtliba@inttic.dz .

Releases

No releases published

Packages

No packages published