Skip to content

Commit

Permalink
Actually ensure deterministic order of map keys in multimap_from_entr…
Browse files Browse the repository at this point in the history
…ies Presto function (#8407)

Summary:
Pull Request resolved: #8407

Previous attempt #8338 didn't work as std::unordered_map doesn't preserve the order of keys: https://stackoverflow.com/questions/35053544/keep-the-order-of-unordered-map-as-we-insert-a-new-key

Here, we use F14FastMap + std::vector to ensure the order of keys.

Fixes #8396

Reviewed By: amitkdutta

Differential Revision: D52826018

fbshipit-source-id: 5e71b78de48d1f741a1be3aa0c0a752214b2edeb
  • Loading branch information
mbasmanova authored and facebook-github-bot committed Jan 17, 2024
1 parent 82bcd39 commit eb09be0
Showing 1 changed file with 20 additions and 9 deletions.
29 changes: 20 additions & 9 deletions velox/functions/prestosql/MultimapFromEntries.h
Original file line number Diff line number Diff line change
Expand Up @@ -28,10 +28,13 @@ struct MultimapFromEntriesFunction {
FOLLY_ALWAYS_INLINE void call(
out_type<Map<Generic<T1>, Array<Generic<T2>>>>& out,
const arg_type<Array<Row<Generic<T1>, Generic<T2>>>>& inputArray) {
// Reuse map between rows to avoid re-allocating memory. The benchmark shows
// 20-30% performance improvement.
// Reuse map and vector between rows to avoid re-allocating memory. The
// benchmark shows 20-30% performance improvement.
keyValuesMap_.clear();

uniqueKeys_.clear();
uniqueKeys_.reserve(inputArray.size());

for (const auto& entry : inputArray.skipNulls()) {
const auto& key = entry.template at<0>();
const auto& value = entry.template at<1>();
Expand All @@ -40,28 +43,36 @@ struct MultimapFromEntriesFunction {

auto result = keyValuesMap_.insert({key.value(), {}});
result.first->second.push_back(value);
if (result.second) {
uniqueKeys_.push_back(key.value());
}
}

for (const auto& [key, values] : keyValuesMap_) {
for (const auto& key : uniqueKeys_) {
auto [keyWriter, valueWriter] = out.add_item();
keyWriter.copy_from(key);

const auto& values = keyValuesMap_[key];

for (const auto& value : values) {
valueWriter.push_back(value);
}
}
}

private:
// Use std::unordered_map to ensure deterministic order of keys in the
// result. Without ensuring deterministic order of keys, the results of
// expressions like map_keys(multimap_from_entries(...)) will be
// non-deterministic and trigger Fuzzer failures. F14Map is faster, but in
// debug builds it returns keys in non-deterministic order (on purpose).
std::unordered_map<
folly::F14FastMap<
exec::GenericView,
std::vector<std::optional<exec::GenericView>>>
keyValuesMap_;

// List of unique keys in the same order as they appear in inputArray.
// Used to ensure deterministic order of keys in the result. Without ensuring
// deterministic order of keys, the results of expressions like
// map_keys(multimap_from_entries(...)) will be non-deterministic and trigger
// Fuzzer failures. F14FastMap in debug build returns keys in
// non-deterministic order (on purpose).
std::vector<exec::GenericView> uniqueKeys_;
};

} // namespace facebook::velox::functions

0 comments on commit eb09be0

Please sign in to comment.