Concurrency plans? #46

devinnasar · 2023-12-28T22:59:23Z

devinnasar
Dec 28, 2023

Hello,

I'm curious if the project will add concurrency features similar to this project: https://github.com/aiotinydb/aiotinydb. I'm interested in using tinyflux for caching large amounts of timeseries data from a remote API for performing rolling calculations over that data. My original plan was to use aiotinydb, but the article explaining how large databases can increase write times concerned me.

justin-fung-bayer · 2023-12-29T15:40:48Z

justin-fung-bayer
Dec 29, 2023

Can you tell me more about your use case? You will be fetching data from a remote API, storing it in your client with TinyFlux, and performing rolling calculations on your client as well?

How often are you fetching from the remote API? If you aren't fetching at least once per second or faster, it should not be a problem to perform all write and read operations synchronously with the TinyFlux API as it is.

1 reply

devinnasar Dec 30, 2023
Author

I'm using the New Relic graphql API (Nerdgraph). At the 10k ft. level, I'm estimating what alert thresholds should be for each of several thousand entities based on historical time series data for each entity. The idea is to procedurally generate sane thresholds based on the performance of the actual infrastructure. If we were quiet for the last 3 months, thresholds should be low, but if we had an incident, our thresholds should increase. I'm not ready to graduate to provisioning a full TSDB right now. The alert threshold file artifacts I'm generating are being used in Terraform that provisions Alert Conditions, and the cost of integrating something like full Influx is not affordable at present.

For a given New Relic account, locate all entity types among entities possessing specific tags (entities which belong to an IDed software component)
- for each entity type, run up to 5 golden metric queries
  - for each golden metric query
    - store time series data
    - create a pandas dataframe with all new and previously acquired timeseries data
    - calculate a critical and warning threshold for the golden metric for that entity type
  - write all golden metric thresholds for the entity type to file
  - The result for example tells us what our thresholds should be for all lambdas belonging to component X
Doing this in a reasonable time frame involves asynchronously querying Nerdgraph for the entity data in parallel. Nerdgraph also has several situations where it asks you to 'call it back later' for long running queries that run server side. Asynchronously checking for these queries to be finished is a requirement. I'm using asyncio for this.
When I retrieve time series data for an entity I need to store it and move on. Each of my coroutines would need to be able to do this in a non blocking way.
The goal is to gather the data for all entities in parallel before performing the alert threshold estimation with pandas.
Once estimation has been run on the time series, a file is written for each entity containing the calculated thresholds. Ideally this would be done asynchronously as well.
I'd like my process to run once per 24 hours at least, collecting data for all several thousand entities and adding time series data to the previously collected data. The idea is that as entities manifest to New Relic, we start to collect data for them and creating a rolling estimate of the alert threshold. We need to run the process frequently because New Relic calculates aggregate functions over a limited number of 'buckets' in the query period. Querying over too long of a time range (for example, a week) will cause the alert threshold estimation to have a very poor resolution (we want to get within 5 minute windows).
We will also be dropping time series data older than 3 months.

Currently in other async projects I'm using aiotinydb to accomplish all of my coroutines accessing the data store without blocking each other. The estimation process I've described doesn't attempt to cache data, it tries to ask Nerdgraph for data across a window of 3 months into the past, which is skewing my data. I'm looking at tinyflux as a way to go from querying a large window infrequently with no caching to querying a small window routinely and caching as a way to grow the time series to the required 3 month period.

Thanks for taking the time to read this. Any advice you might have would be appreciated.

citrusvanilla · 2023-12-30T20:53:00Z

citrusvanilla
Dec 30, 2023
Maintainer

Ok I think I know what you are doing. I'm not familiar with New Relic but it looks like they have "intelligent" alert monitoring that should essentially be making alert thresholds that are dynamic in nature: https://newrelic.com/platform/alerts Does this address your use case? As for TinyFlux, you can use it in the manner you described. If you are fetching from New Relic, awaiting all requests, calculating thresholds, and then caching data every 5 minutes you should not need to write to TinyFlux in an async manner. You are awaiting all entities first right? Then you would just be writing one data point to TinyFlux? A Point with a key/val pair for each entity with a time stamp for that 5 minute interval... How many entities per interval? Thousands?

…

On Fri, Dec 29, 2023 at 10:23 PM devinnasar ***@***.***> wrote: I'm using the New Relic graphql API (Nerdgraph). At the 10k ft. level, I'm estimating what alert thresholds should be for each of several thousand entities based on historical time series data for each entity. The idea is to procedurally generate sane thresholds based on the performance of the actual infrastructure. If we were quiet for the last 3 months, thresholds should be low, but if we had an incident, our thresholds should increase. I'm not ready to graduate to provisioning a full TSDB right now. The alert threshold file artifacts I'm generating are being used in Terraform that provisions Alert Conditions, and the cost of integrating something like full Influx is not affordable at present. - For a given New Relic account, locate all entity types among entities possessing specific tags (entities which belong to an IDed software component) - for each entity type, run up to 5 golden metric queries - for each golden metric query - store time series data - create a pandas dataframe with all new and previously acquired timeseries data - calculate a critical and warning threshold for the golden metric for that entity type - write all golden metric thresholds for the entity type to file - The result for example tells us what our thresholds should be for all lambdas belonging to component X - Doing this in a reasonable time frame involves asynchronously querying Nerdgraph for the entity data in parallel. Nerdgraph also has several situations where it asks you to 'call it back later' for long running queries that run server side. Asynchronously checking for these queries to be finished is a requirement. I'm using asyncio for this. - When I retrieve time series data for an entity I need to store it and move on. Each of my coroutines would need to be able to do this in a non blocking way. - The goal is to gather the data for all entities in parallel before performing the alert threshold estimation with pandas. - Once estimation has been run on the time series, a file is written for each entity containing the calculated thresholds. Ideally this would be done asynchronously as well. - I'd like my process to run once per 24 hours at least, collecting data for all several thousand entities and adding time series data to the previously collected data. The idea is that as entities manifest to New Relic, we start to collect data for them and creating a rolling estimate of the alert threshold. We need to run the process frequently because New Relic calculates aggregate functions over a limited number of 'buckets' in the query period. Querying over too long of a time range (for example, a week) will cause the alert threshold estimation to have a very poor resolution (we want to get within 5 minute windows). - We will also be dropping time series data older than 3 months. Currently in other async projects I'm using aiotinydb to accomplish all of my coroutines accessing the data store without blocking each other. The estimation process I've described doesn't attempt to cache data, it tries to ask Nerdgraph for data across a window of 3 months into the past, which is skewing my data. I'm looking at tinyflux as a way to go from querying a large window infrequently with no caching to querying a small window routinely and caching as a way to grow the time series to the required 3 month period. Thanks for taking the time to read this. Any advice you might have would be appreciated. — Reply to this email directly, view it on GitHub <#46 (reply in thread)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AB4EXVIVH5UFLXXSSTWABYDYL6QNLAVCNFSM6AAAAABBF7HSBOVHI2DSMVQWIX3LMV43SRDJONRXK43TNFXW4Q3PNVWWK3TUHM3TSNZXGE4TE> . You are receiving this because you are subscribed to this thread.Message ID: ***@***.*** com>

2 replies

devinnasar Dec 30, 2023
Author

New Relic is an unreliable platform that fails to document its own features adequately. The Baseline alert condition feature you cited is one such case that does not function the way it is represented. We are doing this because New Relic cannot be trusted and I refuse to take no for an answer from their substandard implementation.

I'm not awaiting all requests before write. I'm firing thousands of concurrent requests which return the time series data, and as each request finishes I need a place to store it. So each concurrent coroutine call needs to write to the data store as it completes.

I think I'm understanding what you're suggesting though: you want me to get all of the time series data for all entities and store it in memory in something like a list, then synchronously write the data to tinyflux all at once.

That could become problematic for the number of entities that I'm querying data for, and it would impose a limit on the range of time I could query data for. Just because I need to limit myself to a 1 day limit to get the data resolution I need doesn't mean I'm going to design my program to cap it's use to 1 day queries only.

I would prefer to wrap both the request and the write into one async function so that the program can grab one piece of data, write it, and move on in whatever order the requests complete. It might be possible to fake this by doing something like including asyncio.sleep after the call to tinyflux, but I still think that would block.

That's why I'm arguing that tinyflux should have an async implementation in addition to a synchronous one.

devinnasar Dec 30, 2023
Author

To answer your last question: it's thousands of entities * 4 or 5 queries, * 344 data points per query, on each run of the program. The goal is to run the program once per day to get a resolution of 5 minute aggregation windows (24hrs/344 windows). Each data point needs to be cached in tinyflux

citrusvanilla · 2023-12-31T03:32:31Z

citrusvanilla
Dec 31, 2023
Maintainer

You could use a worker thread with a shared queue to do this I believe, each of your async queries should fetch the data and then write to a shared python Queue, and your worker thread could batch the writes to TinyFlux: https://github.com/citrusvanilla/tinyflux/blob/master/examples/3_iot_datastore_with_mqtt.py Though if you really have 1000 * 5 * 300 = 1,500,000 data points every 5 minutes and each key/val pair is 20 bytes, you're looking at 30mb every 5 minutes. Times that by 12, then 24 hours then 90 days, you have about 750GB of data. That is entirely too much for TinyFlux. Even if you do this once per day, it's 30mb * 90 days = 2.7GB. This is at the very limit of what TinyFlux is capable of.

…

On Sat, Dec 30, 2023 at 3:46 PM devinnasar ***@***.***> wrote: To answer your last question: it's thousands of entities * 4 or 5 queries, * 344 data points per query, on each run of the program. The goal is to run the program once per day to get a resolution of 5 minute aggregation windows (25hrs/344 windows). Each data point needs to be cached in tinyflux — Reply to this email directly, view it on GitHub <#46 (reply in thread)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AB4EXVLCFREHZG7NPCBCSF3YMCKTNAVCNFSM6AAAAABBF7HSBOVHI2DSMVQWIX3LMV43SRDJONRXK43TNFXW4Q3PNVWWK3TUHM3TSOBQHA3DI> . You are receiving this because you commented.Message ID: ***@***.***>

1 reply

devinnasar Dec 31, 2023
Author

You're mistaken about the 'every 5 minutes' part. I'm not saying I'm running this process every 5 minutes. I'm saying I'm running it once per 24 hours. The maximum number of aggregation windows in a New Relic query is 344, meaning if I'm querying a time range starting at 00:00 and ending at 11:59, that gives me 344 windows of roughly 4.2 minutes each, rounded up to windows of 5 minutes. That means I'll have 344 data points coming in for each entity type and golden metric every 24 hours.

citrusvanilla · 2023-12-31T18:26:20Z

citrusvanilla
Dec 31, 2023
Maintainer

Okay well whatever you're up to, I'm not here to architect your pipeline for you. I won't get around to making TinyFlux writes async under the hood anytime soon but you can make writes async yourself with a wrapper. I'll add this as a future feature.

…

On Sun, Dec 31, 2023 at 12:24 AM devinnasar ***@***.***> wrote: You're mistaken about the 'every 5 minutes' part. I'm not saying I'm running this process every 5 minutes. I'm saying I'm running it once per 24 hours. The maximum number of aggregation windows in a New Relic query is 344, meaning if I'm querying a time range starting at 00:00 and ending at 11:59, that gives me 344 windows of roughly 4.1 minutes each, rounded up to windows of 5 minutes. That means I'll have 344 data points coming in for each entity type and golden metric every 24 hours. — Reply to this email directly, view it on GitHub <#46 (reply in thread)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AB4EXVKCMNGIHMFIKF72UT3YMEHMRAVCNFSM6AAAAABBF7HSBOVHI2DSMVQWIX3LMV43SRDJONRXK43TNFXW4Q3PNVWWK3TUHM3TSOBRHEZDG> . You are receiving this because you commented.Message ID: ***@***.***>

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Concurrency plans? #46

{{title}}

Replies: 4 comments 4 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

Concurrency plans? #46

devinnasar Dec 28, 2023

Replies: 4 comments · 4 replies

justin-fung-bayer Dec 29, 2023

devinnasar Dec 30, 2023 Author

citrusvanilla Dec 30, 2023 Maintainer

devinnasar Dec 30, 2023 Author

devinnasar Dec 30, 2023 Author

citrusvanilla Dec 31, 2023 Maintainer

devinnasar Dec 31, 2023 Author

citrusvanilla Dec 31, 2023 Maintainer

devinnasar
Dec 28, 2023

Replies: 4 comments 4 replies

justin-fung-bayer
Dec 29, 2023

devinnasar Dec 30, 2023
Author

citrusvanilla
Dec 30, 2023
Maintainer

devinnasar Dec 30, 2023
Author

devinnasar Dec 30, 2023
Author

citrusvanilla
Dec 31, 2023
Maintainer

devinnasar Dec 31, 2023
Author

citrusvanilla
Dec 31, 2023
Maintainer