Skip to content

Lost data during compaction on Swift #4055

Open
@ubcharron

Description

@ubcharron

Describe the bug

The compactor somehow failed to upload the block's index file to Swift, but still deleted the source blocks. There are warnings in the logs, but the compactor does not seem to be aware of them. We lost one day of metrics for our main tenant. (I was hoping to be able to re-generate the index file from the chunks, but that doesn't seem possible as the chunk files only have samples, not the labels themselves.)

We opened a bug in Thanos (thanos-io/thanos#3958), but we're wondering if Cortex would be the more relevant place for it?

To Reproduce

We're not sure how it happens, so here's our best attempt at recollection:

Running Cortex 1.7.0, the Compactor compacted a series of blocks. It then uploaded all resulting files to Swift, but the index file never made it to Swift. In Swift's own logs, there are no traces of the index file ever being uploaded. We /think/ an error might have been detected by "CloseWithLogOnErr", but never made its way back to the Compactor (since it runs as deferred) and thus ignored.

See logs below.

Expected behavior

The Compactor would retry sending a file if there is an error.

Environment:

  • Infrastructure: Kubernetes
  • Deployment tool: helmfile

Storage Engine

  • Blocks
  • Chunks

Additional Context

Compactor logs:

{  
  "caller": "runutil.go:124",
  "err": "upload object close: Timeout when reading or writing data",
  "level": "warn",
  "msg": "detected close error",
  "ts": "2021-03-20T05:12:44.877771796Z"
}
{
  "bucket": "tracing: cortex-tsdb-prod04",
  "caller": "objstore.go:159",
  "component": "compactor",
  "dst": "01F16ZRT8TYA08VJQR1ZPCC5EP/index",
  "from": "data/compact/0@14583055817248146110/01F16ZRT8TYA08VJQR1ZPCC5EP/index",
  "group": "0@{__org_id__=\"1\"}",
  "groupKey": "0@14583055817248146110",
  "level": "debug",
  "msg": "uploaded file",
  "org_id": "1",
  "ts": "2021-03-20T05:12:44.877834603Z"
}
{
  "caller": "compact.go:810",
  "component": "compactor",
  "duration": "4m41.662527735s",
  "group": "0@{__org_id__=\"1\"}",
  "groupKey": "0@14583055817248146110",
  "level": "info",
  "msg": "uploaded block",
  "org_id": "1",
  "result_block": "01F16ZRT8TYA08VJQR1ZPCC5EP",
  "ts": "2021-03-20T05:12:45.140243007Z"
}
{
  "caller": "compact.go:832",
  "component": "compactor",
  "group": "0@{__org_id__=\"1\"}",
  "groupKey": "0@14583055817248146110",
  "level": "info",
  "msg": "marking compacted block for deletion",
  "old_block": "01F15H6D6CXE1ASE788HQECHM4",
  "org_id": "1",
  "ts": "2021-03-20T05:12:45.627586825Z"
}
$ openstack object list cortex-tsdb-prod04 --prefix 1/01F16ZRT8TYA08VJQR1ZPCC5EP
+--------------------------------------------+
| Name                                       |
+--------------------------------------------+
| 1/01F16ZRT8TYA08VJQR1ZPCC5EP/chunks/000001 |
| 1/01F16ZRT8TYA08VJQR1ZPCC5EP/chunks/000002 |
| 1/01F16ZRT8TYA08VJQR1ZPCC5EP/chunks/000003 |
| 1/01F16ZRT8TYA08VJQR1ZPCC5EP/chunks/000004 |
| 1/01F16ZRT8TYA08VJQR1ZPCC5EP/chunks/000005 |
| 1/01F16ZRT8TYA08VJQR1ZPCC5EP/chunks/000006 |
| 1/01F16ZRT8TYA08VJQR1ZPCC5EP/chunks/000007 |
| 1/01F16ZRT8TYA08VJQR1ZPCC5EP/chunks/000008 |
| 1/01F16ZRT8TYA08VJQR1ZPCC5EP/chunks/000009 |
| 1/01F16ZRT8TYA08VJQR1ZPCC5EP/chunks/000010 |
| 1/01F16ZRT8TYA08VJQR1ZPCC5EP/chunks/000011 |
| 1/01F16ZRT8TYA08VJQR1ZPCC5EP/chunks/000012 |
| 1/01F16ZRT8TYA08VJQR1ZPCC5EP/chunks/000013 |
| 1/01F16ZRT8TYA08VJQR1ZPCC5EP/chunks/000014 |
| 1/01F16ZRT8TYA08VJQR1ZPCC5EP/chunks/000015 |
| 1/01F16ZRT8TYA08VJQR1ZPCC5EP/chunks/000016 |
| 1/01F16ZRT8TYA08VJQR1ZPCC5EP/chunks/000017 |
| 1/01F16ZRT8TYA08VJQR1ZPCC5EP/chunks/000018 |
| 1/01F16ZRT8TYA08VJQR1ZPCC5EP/chunks/000019 |
| 1/01F16ZRT8TYA08VJQR1ZPCC5EP/chunks/000020 |
| 1/01F16ZRT8TYA08VJQR1ZPCC5EP/chunks/000021 |
| 1/01F16ZRT8TYA08VJQR1ZPCC5EP/chunks/000022 |
| 1/01F16ZRT8TYA08VJQR1ZPCC5EP/chunks/000023 |
| 1/01F16ZRT8TYA08VJQR1ZPCC5EP/meta.json     |
+--------------------------------------------+

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions