-
Notifications
You must be signed in to change notification settings - Fork 147
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Improve DFP period functionality to allow for better sampling and ign…
…oring period (#912) Currently, the DFP pipeline simulates time by breaking incoming messages up by a specific period and processing each period independently. This makes it impossible to process all of the incoming data at once for batch mode with a single trained model. This PR adds a few things: - If `DFPFileBatcherStage.period == None`, then all messages will be processed in a single batch, instead of per period - Fixes how periods were handled to work with counts - Before, `"D"` would work as expected but `"5D"` would not. This was due to using `to_period` - The `DFPFileBatcherStage.sampling_rate_s` property was deprecated in favor of a more general `sampling` property - This property can support different values - If its a string, the value is interpreted as a frequency. The first row for each frequency will be taken - If its a value between [0,1), its a fraction. A percentage of rows will be taken - If its >=1, its a count. A random count of rows will be taken Authors: - Michael Demoret (https://github.com/mdemoret-nv) Approvers: - Devin Robison (https://github.com/drobison00) URL: #912
- Loading branch information
1 parent
31b6748
commit 446f452
Showing
2 changed files
with
54 additions
and
46 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters