Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gaze direction plotting #1100

Open
lenamatyjek opened this issue Nov 5, 2024 · 2 comments
Open

Gaze direction plotting #1100

lenamatyjek opened this issue Nov 5, 2024 · 2 comments

Comments

@lenamatyjek
Copy link

lenamatyjek commented Nov 5, 2024

Hi! I am trying to figure out if my participants are looking at the screen or not. For this, I will need to manually decide where the screen actually is (the camera may be above/below, etc.) which can be done manually, but I first need to understand where they are looking. However, I am having issues plotting gaze direction on video frames. I want to reproduce the green rays seen in the OpenFace output videos, but I think I am missing a piece of information about the gaze direction (gaze_0/1_x/y/z) scale.

I understand that it's in world coordinates (=camera coordinates) relative to the centre of the screen (= centre of the camera) on scale [-1 1], so I need to multiply it by some scaling factor. Yet, however I do it, the gaze plotted on the video frames does not seem to correspond to the gaze of the participant.

To Reproduce
Here is what I do:

  • plot the video frame
  • extract frame width and height
  • rescale gaze data to pixels*
  • plot the location of the eyes
  • plot the vector of the eye gaze (origin: location of the eyes, end: gaze coordinates)
  • The landmarks are in pixels, but the gaze direction is in world coordinated [-1 1] relative to the position of the eye. I tried two scenarios to rescale them to a common scale. In scenario 1, I add 1 ([0 2]), divide by 2 ([0 1]), and multiply by dimensions of the frame (for x: image width and for y: image height). This is all assuming that the scale [-1 1] is equivalent to the size of the frame, i.e., camera, but this would mean that the faze can fall outside of the frame (if the eyes are not perfectly in the middle). In scenario 2, I multiply the gaze by the middle point of the screen (i.e., camera origin) and a scaling factor.

Code in R:

f = 1000 
  c = video_df[video_df$frame == f,]
  videoframename = "my_path_to_the_video"
  img <- readPNG(videoframename)

# Define a scaling factor for the arrow length
  scaling = 1
  scale_factor_0 <- -1*c$gaze_0_z*scaling # depth
  scale_factor_1 <- -1*c$gaze_1_z*scaling

# Get image dimensions
img_width <- dim(img)[2]
img_height <- dim(img)[1]
center_x <- img_width/2
center_y <- img_height/2


# Convert OpenFace data for this frame into a data frame
b <- data.frame(
  eye_lmk_x_21 = c$eye_lmk_x_21,
  eye_lmk_y_21 = c$eye_lmk_y_21,
  eye_lmk_x_49 = c$eye_lmk_x_49,
  eye_lmk_y_49 = c$eye_lmk_y_49,

#scenario 1:
  gaze_0_x = ((c$gaze_0_x+1)/2)*img_width, # to pixel coordinates
  gaze_0_y = ((c$gaze_0_y+1)/2)*img_height,
  gaze_1_x = ((c$gaze_1_x+1)/2)*img_width,
  gaze_1_y = ((c$gaze_1_y+1)/2)*img_height

#scenario 2:
    gaze_0_x = c$gaze_0_x * center_x * scale_factor,
    gaze_0_y = c$gaze_0_y * center_y * scale_factor,
    gaze_1_x = c$gaze_1_x * center_x * scale_factor,
    gaze_1_y = c$gaze_1_y * center_y * scale_factor,
)


ggplot() +
  # Add image as background
  annotation_raster(as.raster(img), xmin = 0, xmax = img_width, ymin = 0, ymax = img_height) +
  # The eyes
  geom_point(aes(x = b$eye_lmk_x_21, y = img_height - b$eye_lmk_y_21), color = "yellow", size = 1) +
  geom_point(aes(x = b$eye_lmk_x_49, y = img_height - b$eye_lmk_y_49), color = "yellow", size = 1) +
  # Left eye direction
  geom_segment(
    data = b,
    aes(
      x = eye_lmk_x_21,
      y = img_height - eye_lmk_y_21,  # Flip y-coordinates
      xend = eye_lmk_x_21 + gaze_0_x,
      yend = img_height - eye_lmk_y_21 - gaze_0_y
    ),
    arrow = arrow(type = "closed", length = unit(0.2, "cm")),  # Closed arrow with line
    color = "red",
    size = 0.5
  ) +
# Right eye direction
  geom_segment(
    data = b,
    aes(
      x = eye_lmk_x_49,
      y = img_height - eye_lmk_y_49,  # Flip y-coordinates
      xend = eye_lmk_x_49 + gaze_1_x,
      yend = img_height - eye_lmk_y_49 - gaze_1_y
    ),
    arrow = arrow(type = "closed", length = unit(0.2, "cm")),  # Closed arrow with line
    color = "red",
    size = 0.5
  ) +
  coord_fixed(ratio = 1, expand = FALSE, xlim = c(0, img_width), ylim = c(0, img_height))

Expected behavior
I hope to reproduce the green gaze lines as seen in the video outputs of OpenFace, but this is not working. For some frames it's ok, for some the gaze is completely wrong, that is, pointing in a different direction than where the eyes in the video are clearly looking.

Screenshots
I cannot add a screenshot due to data protection and identifiable faces.

Desktop (please complete the following information):
Linux (can't check the version at the moment)

Any help would be appreciated! Thank you.

@brmarkus
Copy link

brmarkus commented Nov 5, 2024

Have you checked "

void Visualizer::SetObservationGaze(const cv::Point3f& gaze_direction0, const cv::Point3f& gaze_direction1, const std::vector<cv::Point2f>& eye_landmarks2d, const std::vector<cv::Point3f>& eye_landmarks3d, double confidence)
" for how the demo application is doing it?

@lenamatyjek
Copy link
Author

Thank you for that! I've used this code (translated to R) but it still doesn't give me the same results as the videos generated as open face output. Most of the time the direction is similar but a little bit off, but in some frames is completely wrong, even though on the open face videos it seems ok-ish.

I created these two functions to plot a frame from a video (all frames are in video_ppt_folder) and the gaze direction from open face output (video_df). The ideas is to plot the gaze and a circle (manually adjusted per participant in later steps) to decide when a participant is looking at the screen (i.e., the gaze end position is within the circle).

I include the code here in case someone can double-check this with their data:

calculate_screen_gaze <- function(df,threshold_x,video_ppt_folder, approx_screen_gaze_x, approx_screen_gaze_y,scale_factor=500) {

    # Get the frame
    g = 100
    videoframename = find_frame(g,video_ppt_folder)
    img <- readPNG(videoframename)
    img_width <- dim(img)[2]
    img_height <- dim(img)[1]
    cx = img_width/2
    cy = img_height/2
    
  df %>%
    rowwise() %>%
    mutate(
      # Calculate the average x and y positions for the left pupil
      pupil_left_x = mean(c_across(starts_with("eye_lmk_x_")[1:8]), na.rm = TRUE),
      pupil_left_y = mean(c_across(starts_with("eye_lmk_y_")[1:8]), na.rm = TRUE),
      pupil_right_x = mean(c_across(starts_with("eye_lmk_x_")[29:36]), na.rm = TRUE),
      pupil_right_y = mean(c_across(starts_with("eye_lmk_y_")[29:36]), na.rm = TRUE),
      
      # Define gaze directions and scale them
      gaze_left_x = gaze_0_x * scale_factor,
      gaze_left_y = gaze_0_y * scale_factor,
      gaze_right_x = gaze_1_x * scale_factor,
      gaze_right_y = gaze_1_y * scale_factor,
      
      gaze_end_left_x = pupil_left_x + gaze_left_x,
      gaze_end_left_y = pupil_left_y + gaze_left_y,
      gaze_end_right_x = pupil_right_x + gaze_right_x,
      gaze_end_right_y = pupil_right_y + gaze_right_y,
      
      mean_gaze_x = (gaze_end_left_x + gaze_end_right_x) / 2,
      mean_gaze_y = img_height - ((gaze_end_left_y + gaze_end_right_y) / 2),
      
      looking_at_screen = case_when(
        # Check if eyes are closed
        mean_gaze_x <= cx + approx_screen_gaze_x + threshold_x &
        mean_gaze_x >= cx + approx_screen_gaze_x - threshold_x &  
        mean_gaze_y <= cy + approx_screen_gaze_y + threshold_x &
        mean_gaze_y >= cy + approx_screen_gaze_y - threshold_x ~ TRUE,
        TRUE ~ FALSE
      )
      
    ) %>%
    ungroup()
}


 plot_gaze_direction <-  function(video_df,f,video_ppt_folder,approx_screen_gaze_x,approx_screen_gaze_y,threshold_x,shift=0) {
    
    df <- video_df[video_df$frame == f,]
    videoframename = find_frame(f,video_ppt_folder)
    img <- readPNG(videoframename)
    
    looking = ifelse(df$looking_at_screen==1,"LOOKING AT SCREEN","")
    
    if (df$confidence > 0.7) {
    
    img_width <- dim(img)[2]
    img_height <- dim(img)[1]
    
    # Define parameters
    fx <- 500  # example focal length in x
    fy <- 500  # example focal length in y
    cx <- img_width/2  # camera center in x
    cy <- img_height/2  # camera center in y
    
     gaze_line_left <- data.frame(
      x = c(df$pupil_left_x, df$pupil_left_x + df$gaze_left_x),
      y = c(df$pupil_left_y, df$pupil_left_y + df$gaze_left_y)
    )
    
     gaze_line_right <- data.frame(
       x = c(df$pupil_right_x, df$pupil_right_x + df$gaze_right_x),
       y = c(df$pupil_right_y, df$pupil_right_y + df$gaze_right_y)
     )
    
   # Plot
    p = ggplot() +
      # Add image as background
      annotation_raster(as.raster(img), xmin = 0, xmax = img_width, ymin = 0, ymax = img_height) +
      # The eyes
      geom_point(aes(x = df$pupil_left_x, y = img_height - df$pupil_left_y - shift), color = "orange", size = 0.05) +
      geom_point(aes(x = df$pupil_right_x, y = img_height - df$pupil_right_y - shift), color = "orange", size = 0.05) +
       geom_point(aes(x = mean(c(df$pupil_right_x,df$pupil_left_x),na.rm=T),
                      y = img_height -  mean(c(df$pupil_right_y,df$pupil_left_y),na.rm=T) - shift),
                  color = "yellow", size = 1) +
      # Left eye direction
      geom_segment(
        aes(
          x = df$pupil_left_x,
          y = img_height - df$pupil_left_y - shift,  # Flip y-coordinates
          xend = gaze_line_left$x[2],
          #xend = center_x - gaze_0_x,
          #yend = img_height - eye_lmk_y_21 - gaze_0_y
          yend = img_height - gaze_line_left$y[2] - shift
        ),
        arrow = arrow(type = "closed", length = unit(0.2, "cm")),  # Closed arrow with line
        color = "orange",
        size = 0.5
      ) +
      # Right eye direction
      geom_segment(
        aes(
          x = df$pupil_right_x,
          y = img_height - df$pupil_right_y - shift,  # Flip y-coordinates
          xend = gaze_line_right$x[2],
          #xend = center_x - gaze_1_x,
          #yend = img_height - eye_lmk_y_49 - gaze_1_y
          yend = img_height - gaze_line_right$y[2] - shift
        ),
        arrow = arrow(type = "closed", length = unit(0.2, "cm")),  # Closed arrow with line
        color = "orange",
        size = 0.5
      ) +
      # Mean between eyes direction
      geom_segment(
        aes(
          x = mean(c(df$pupil_right_x,df$pupil_left_x)),
          y = img_height - mean(c(df$pupil_right_y,df$pupil_left_y)) - shift,  # Flip y-coordinates
          xend = df$mean_gaze_x,
          #xend = center_x - gaze_1_x,
          #yend = img_height - eye_lmk_y_49 - gaze_1_y
          yend = df$mean_gaze_y - shift
        ),
        arrow = arrow(type = "closed", length = unit(0.2, "cm")),  # Closed arrow with line
        color = "yellow",
        size = 0.5
      ) +
      annotate("text", x= 200, y = 650, label = looking, colour = "red", size = 5) + # looking at screen info
      annotate("text", x= 200, y = 950, label = smile, colour = "red", size = 5) + # smile info
      annotate("text", x= 200, y = 800, label = blink, colour = "red", size = 5) + # blink info
      geom_circle(aes(x0 = cx + approx_screen_gaze_x,
                      y0 = cy + approx_screen_gaze_y,
                      r = threshold_x),
                      color = "lightblue", fill = NA) + # considered "screen"
      geom_point(aes(x = cx + approx_screen_gaze_x,
                      y = cy + approx_screen_gaze_y), color = "lightblue", size = 3) + # middle screen
      coord_fixed(ratio = 1, expand = FALSE, xlim = c(0, img_width), ylim = c(0, img_height),clip = "on")
      p
    } 
    return(p)
  }

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants