Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Support retrieval from multiple feature views with different join keys #2835

Merged
merged 5 commits into from
Jun 30, 2022
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
group by join keys instead of feature view
Signed-off-by: Yongheng Lin <yongheng.lin@gmail.com>
  • Loading branch information
yongheng committed Jun 24, 2022
commit b8910eaab0d12886b8af9d40188d6cf94cd01b96
Original file line number Diff line number Diff line change
Expand Up @@ -278,20 +278,25 @@ private List<List<feast.storage.api.retriever.Feature>> retrieveFeatures(
features.add(featuresPerEntity);
}

// Group feature references by feature view.
Map<String, List<FeatureReferenceV2>> featureViewNameToFeatureReferencesMap =
// Group feature references by join keys.
Map<String, List<FeatureReferenceV2>> groupNameToFeatureReferencesMap =
featureReferences.stream()
Copy link
Collaborator

@pyalex pyalex Jun 22, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To speed up this part we might want to extract distinct feature views from all feature references. And then group feature views instead.

Copy link
Contributor Author

@yongheng yongheng Jun 23, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC grouping by join keys results in the same or less groups (therefore same or more efficient) than grouping by feature view. The is because different feature views can have the same join keys. In L286, this.registryRepository.getEntitiesList(featureReference) internally gets feature view spec first, then gets entity names of the feature view spec, then we find join keys for the entity names.

Actually, I grouped by feature view at the beginning. Then I switched to grouping by join keys in the second commit of this PR, as an optimization.

.collect(Collectors.groupingBy(FeatureReferenceV2::getFeatureViewName));

// Retrieve features one feature view at a time.
for (List<FeatureReferenceV2> featureReferencesPerFeatureView :
featureViewNameToFeatureReferencesMap.values()) {
.collect(
Collectors.groupingBy(
featureReference ->
this.registryRepository.getEntitiesList(featureReference).stream()
.map(this.registryRepository::getEntityJoinKey)
.sorted()
.collect(Collectors.joining(","))));

// Retrieve features one group at a time.
for (List<FeatureReferenceV2> featureReferencesPerGroup :
groupNameToFeatureReferencesMap.values()) {
List<String> entityNames =
this.registryRepository.getEntitiesList(featureReferencesPerFeatureView.get(0));
List<Map<String, ValueProto.Value>> entityRowsPerFeatureView =
new ArrayList<>(entityRows.size());
this.registryRepository.getEntitiesList(featureReferencesPerGroup.get(0));
List<Map<String, ValueProto.Value>> entityRowsPerGroup = new ArrayList<>(entityRows.size());
for (Map<String, ValueProto.Value> entityRow : entityRows) {
Map<String, ValueProto.Value> entityRowPerFeatureView =
Map<String, ValueProto.Value> entityRowPerGroup =
entityNames.stream()
.map(this.registryRepository::getEntityJoinKey)
.collect(
Expand All @@ -303,15 +308,14 @@ private List<List<feast.storage.api.retriever.Feature>> retrieveFeatures(
}
return entityRow.get(joinKey);
}));
entityRowsPerFeatureView.add(entityRowPerFeatureView);
entityRowsPerGroup.add(entityRowPerGroup);
}
List<List<feast.storage.api.retriever.Feature>> featuresPerFeatureView =
retriever.getOnlineFeatures(
entityRowsPerFeatureView, featureReferencesPerFeatureView, entityNames);
for (int i = 0; i < featuresPerFeatureView.size(); i++) {
for (int j = 0; j < featureReferencesPerFeatureView.size(); j++) {
int k = featureReferenceToIndexMap.get(featureReferencesPerFeatureView.get(j));
features.get(i).set(k, featuresPerFeatureView.get(i).get(j));
List<List<feast.storage.api.retriever.Feature>> featuresPerGroup =
retriever.getOnlineFeatures(entityRowsPerGroup, featureReferencesPerGroup, entityNames);
for (int i = 0; i < featuresPerGroup.size(); i++) {
for (int j = 0; j < featureReferencesPerGroup.size(); j++) {
int k = featureReferenceToIndexMap.get(featureReferencesPerGroup.get(j));
features.get(i).set(k, featuresPerGroup.get(i).get(j));
}
}
}
Expand Down