Skip to content

Ingester flush queue is backing up again #254

Closed
@tomwilkie

Description

@tomwilkie

Things we should do to help:

Noticed when deploying to prod. Slack logs:

[11:14 AM]  
tom hmm cortex in prod seems unhealthy: https://cloud.weave.works/admin/grafana/dashboard/file/cortex-chunks.json?panelId=6&fullscreen&from=1485746034561&to=1485861234561
huge flush queue

[11:14 AM]  
tom 6 chunks per series
massive backlog
we should have an alert for this...
in the meantime, I’m going to up the dynamodb capacity to see if that helps

[11:15 AM]  
jml do we have any way of measuring for hotspotting?

[11:16 AM]  
tom not really
turned off table manager for now
Okay our graphs aren’t wrong:
From amazon

screen shot 2017-01-31 at 11 19 07

[11:19 AM]  
tom no where near provisioned capacity
upped by 3x to 15k
this is the thing to reread: http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/GuidelinesForTables.html#GuidelinesForTables.Partitions
> A single partition can support a maximum of 3,000 read capacity units or 1,000 write capacity units
since we’re seeing about 1k writes, I guess thats it
> A single partition can hold approximately 10 GB of data
a week for us is about 50GB
so we’re at about 5 shards (5k write throughout, 50GB data)
and we’re not getting balance....
ridiculous
so this flush is clearly going to fail
that doesn’t mean we’ll loose data
as its replicated, and only one ingester is flushing
the deployment will stop after one ingester fails
to give us some time to figure this out
this has been going on since friday 27th
interesting

screen shot 2017-01-31 at 11 33 16

[11:33 AM]  
tom effect of daily buckets
at midnight, a bunch of chunks have to be written to both buckets
so on thursday at midnight we moved to a new table
and ever since then, we’ve been failing to flush
actually our flush rate seems to have been okay, this does seem to just be load from users
right I think I might have a hypothesis
I think they’re stuck flushing to an old table
that we’ve reduced to 1 qps

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions