Skip to content

map_top_n returns wrong results if NaN appears in the input #22040

@rschlussel

Description

@rschlussel

This issue was discovered as part of an audit of all the comparison and ordering behaviors for NaN across Presto functions (related to #21936 and #21877).

While there are a lot of inconsistencies in how NaN are handled that need to be addressed, map_top_n can produce definite wrong results when NaN values shows up in the map.

According to the documentaion, map_top_n "Truncates map items. Keeps only the top N elements by value. n must be a non-negative integer"

In the presence of NaN values, NaN seems to "reset" the search for topn entries

 
select map_top_n(map(array['a', 'b', 'c'], array[nan(), 3, 2]),1);
  _col0               
---------
 {b=3.0} 
(1 row)

-- BUG! regardless of interpretation of NaN, 2 is always less than 3. So this result is definitely incorrect
 select map_top_n(map(array['a', 'b', 'c'], array[3, nan(), 2]),1);
  _col0               
---------
 {c=2.0} 
(1 row)

select map_top_n(map(array['a', 'b', 'c'], array[3, 2, nan()]),1);
  _col0               
---------
 {c=NaN} 
(1 row)

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

Status

✅ Done

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions