Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

swift: fix file already closed issue #1489

Conversation

FlorinPeter
Copy link

Changes

Fix file already closed issue by passing a reader that supports seek to the swift implementation.

Verification

Tested locally without fix:

$ go test -v ./pkg/shipper/
=== RUN   TestShipper_SyncBlocks_e2e
=== RUN   TestShipper_SyncBlocks_e2e/inmem
level=info msg="upload new block" id=000000000006AFVGQT5ZYC0GEK
level=info msg="upload new block" id=0000000001ZW908PVKS1Q4ZYAZ
level=info msg="upload new block" id=0000000002YSHABVQ85AYZ8JHD
level=info msg="upload new block" id=00000000031DTZP60DNX4AF7Q0
level=info msg="upload new block" id=0000000004P46KJHJHGM7X98BR
level=info msg="upload new block" id=000000000611TXCKQ2TF8D1QKB
level=info msg="upload new block" id=0000000007Z3WV8K78BZR49HNH
level=info msg="upload new block" id=0000000008Z0XRX21VQY2QNAWS
level=info msg="upload new block" id=0000000009RPS55HT2KGSF7A5E
=== RUN   TestShipper_SyncBlocks_e2e/swift
level=info msg="upload new block" id=000000000006AFVGQT5ZYC0GEK
level=warn msg="detected close error" err="close file /var/folders/rl/1l703cjd6d93_4f6ktp13cpr0000gn/T/shipper-e2e-test093408693/thanos/upload/000000000006AFVGQT5ZYC0GEK/meta.json: close /var/folders/rl/1l703cjd6d93_4f6ktp13cpr0000gn/T/shipper-e2e-test093408693/thanos/upload/000000000006AFVGQT5ZYC0GEK/meta.json: file already closed"
level=warn msg="detected close error" err="close file /var/folders/rl/1l703cjd6d93_4f6ktp13cpr0000gn/T/shipper-e2e-test093408693/thanos/upload/000000000006AFVGQT5ZYC0GEK/chunks/0001: close /var/folders/rl/1l703cjd6d93_4f6ktp13cpr0000gn/T/shipper-e2e-test093408693/thanos/upload/000000000006AFVGQT5ZYC0GEK/chunks/0001: file already closed"
level=warn msg="detected close error" err="close file /var/folders/rl/1l703cjd6d93_4f6ktp13cpr0000gn/T/shipper-e2e-test093408693/thanos/upload/000000000006AFVGQT5ZYC0GEK/chunks/0002: close /var/folders/rl/1l703cjd6d93_4f6ktp13cpr0000gn/T/shipper-e2e-test093408693/thanos/upload/000000000006AFVGQT5ZYC0GEK/chunks/0002: file already closed"
level=warn msg="detected close error" err="close file /var/folders/rl/1l703cjd6d93_4f6ktp13cpr0000gn/T/shipper-e2e-test093408693/thanos/upload/000000000006AFVGQT5ZYC0GEK/index: close /var/folders/rl/1l703cjd6d93_4f6ktp13cpr0000gn/T/shipper-e2e-test093408693/thanos/upload/000000000006AFVGQT5ZYC0GEK/index: file already closed"
level=warn msg="detected close error" err="close file /var/folders/rl/1l703cjd6d93_4f6ktp13cpr0000gn/T/shipper-e2e-test093408693/thanos/upload/000000000006AFVGQT5ZYC0GEK/meta.json: close /var/folders/rl/1l703cjd6d93_4f6ktp13cpr0000gn/T/shipper-e2e-test093408693/thanos/upload/000000000006AFVGQT5ZYC0GEK/meta.json: file already closed"
...

With the fix:

$ go test -v ./pkg/shipper/
=== RUN   TestShipper_SyncBlocks_e2e
=== RUN   TestShipper_SyncBlocks_e2e/inmem
level=info msg="upload new block" id=000000000006AFVGQT5ZYC0GEK
level=info msg="upload new block" id=0000000001ZW908PVKS1Q4ZYAZ
level=info msg="upload new block" id=0000000002YSHABVQ85AYZ8JHD
level=info msg="upload new block" id=00000000031DTZP60DNX4AF7Q0
level=info msg="upload new block" id=0000000004P46KJHJHGM7X98BR
level=info msg="upload new block" id=000000000611TXCKQ2TF8D1QKB
level=info msg="upload new block" id=0000000007Z3WV8K78BZR49HNH
level=info msg="upload new block" id=0000000008Z0XRX21VQY2QNAWS
level=info msg="upload new block" id=0000000009RPS55HT2KGSF7A5E
=== RUN   TestShipper_SyncBlocks_e2e/swift
level=info msg="upload new block" id=000000000006AFVGQT5ZYC0GEK
level=info msg="upload new block" id=0000000001ZW908PVKS1Q4ZYAZ
level=info msg="upload new block" id=0000000002YSHABVQ85AYZ8JHD
level=info msg="upload new block" id=00000000031DTZP60DNX4AF7Q0
level=info msg="upload new block" id=0000000004P46KJHJHGM7X98BR
level=info msg="upload new block" id=000000000611TXCKQ2TF8D1QKB
level=info msg="upload new block" id=0000000007Z3WV8K78BZR49HNH
level=info msg="upload new block" id=0000000008Z0XRX21VQY2QNAWS
level=info msg="upload new block" id=0000000009RPS55HT2KGSF7A5E
--- PASS: TestShipper_SyncBlocks_e2e (35.63s)
    --- PASS: TestShipper_SyncBlocks_e2e/inmem (0.45s)
    foreach.go:48: THANOS_SKIP_GCS_TESTS envvar present. Skipping test against GCS.
    foreach.go:70: THANOS_SKIP_S3_AWS_TESTS envvar present. Skipping test against S3 AWS.
    foreach.go:86: THANOS_SKIP_AZURE_TESTS envvar present. Skipping test against Azure.
    swift.go:313: created temporary container for swift tests with name test_testshipper_syncblocks_e2e_456bf734519ee3fa
    --- PASS: TestShipper_SyncBlocks_e2e/swift (23.91s)
    foreach.go:118: THANOS_SKIP_TENCENT_COS_TESTS envvar present. Skipping test against Tencent COS.
=== RUN   TestShipper_SyncBlocksWithMigrating_e2e
=== RUN   TestShipper_SyncBlocksWithMigrating_e2e/v2.9.2
level=info msg="upload new block" id=000000000006AFVGQT5ZYC0GEK
level=info msg="upload new block" id=0000000001ZW908PVKS1Q4ZYAZ
level=info msg="upload new block" id=0000000002YSHABVQ85AYZ8JHD
level=info msg="upload new block" id=00000000031DTZP60DNX4AF7Q0
level=info msg="gathering all existing blocks from the remote bucket for check" id=0000000004P46KJHJHGM7X98BR
level=info msg="upload new block" id=0000000004P46KJHJHGM7X98BR
level=info msg="upload new block" id=0000000005H4QE51FCW58H8NBR
level=info msg="upload new block" id=000000000611TXCKQ2TF8D1QKB
level=info msg="upload new block" id=0000000007Z3WV8K78BZR49HNH
level=info msg="upload new block" id=0000000008Z0XRX21VQY2QNAWS
level=info msg="upload new block" id=0000000009RPS55HT2KGSF7A5E
--- PASS: TestShipper_SyncBlocksWithMigrating_e2e (4.82s)
    --- PASS: TestShipper_SyncBlocksWithMigrating_e2e/v2.9.2 (4.82s)
=== RUN   TestShipperTimestamps
--- PASS: TestShipperTimestamps (0.03s)
PASS
ok  	github.com/thanos-io/thanos/pkg/shipper	40.519s

Copy link
Contributor

@sudhi-vm sudhi-vm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

pkg/objstore/swift/swift.go Outdated Show resolved Hide resolved
if err != nil {
return err
}
readSeeker := strings.NewReader(buf.String())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This issue seems to be due to a bug in the vendored gophercloud dependency and looks like it has been fixed upstream: https://github.com/rackspace/gophercloud/blob/master/openstack/objectstorage/v1/objects/requests.go#L219. Do you think you could re-vendor gophercloud and run your tests?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure I can check it

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sudhi-vm looks like gophercloud was moved from github.com/rackspace to https://github.com/gophercloud and it looks like the version 0.3.0 is the latest one.
I am not an io.Reader expert but I guess the ioutil.ReadAll will close the reader and causes this issue.
https://github.com/gophercloud/gophercloud/blob/master/openstack/objectstorage/v1/objects/requests.go#L198-L199

Signed-off-by: Florin Peter <Florin-Alexandru.Peter@t-systems.com>
@FlorinPeter FlorinPeter force-pushed the swift_avoid_file_already_closed branch from d4e453d to f576a79 Compare September 3, 2019 15:25
Copy link
Contributor

@sudhi-vm sudhi-vm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

We were able to reproduce the issue without the patch in our local OpenStack cluster and verified that the patch solves the issue.

@FlorinPeter Thanks for submitting the fix!

@bwplotka Can you please help re-run ci.

@sudhi-vm
Copy link
Contributor

sudhi-vm commented Oct 9, 2019

retest ci

@povilasv
Copy link
Member

Done with Ci.

This feels a bit like a bug in the library we are using? And this fix feels a bit more like a workaround. As far as I understand the library accepts io.Reader, but works properly only with ReadSeeker ?

Maybe you would like to raise an issue with them?

IMO it's fine as a short term fix and we can merge it if it solves the problem for swift users.

@bwplotka what do you think?

@sudhi-vm
Copy link
Contributor

@povilasv Good point. We upgraded the vendored library (gophercloud) to recent version locally, but it did not fix the issue. I will follow up with gophercloud on this.

As you mention, we could merge this and treat it as a short-term fix. Once the fix is identified and merged in gophercloud, we could consume the change and also upgrade the vendored library to more recent version.

@povilasv povilasv mentioned this pull request Oct 14, 2019
2 tasks
@GiedriusS
Copy link
Member

Has this been reported upstream? (:

@stale
Copy link

stale bot commented Jan 11, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale label Jan 11, 2020
@stale stale bot closed this Jan 18, 2020
@retsuca
Copy link

retsuca commented Jan 31, 2020

Are there any progress about this issue?

@GiedriusS GiedriusS reopened this Jan 31, 2020
@stale stale bot removed the stale label Jan 31, 2020
@retsuca
Copy link

retsuca commented Feb 3, 2020

ping @sudhi-vm

@GiedriusS
Copy link
Member

FWIW I still haven't gotten around to this because I cannot get my hands on a working Swift cluster. Seems like the community projects which let you run Swift in a small deployment unit are buggy and/or something doesn't work :'(

@GiedriusS
Copy link
Member

Reported upstream at gophercloud/gophercloud#1856.

@jtopjian
Copy link

Hi folks, I'm currently looking at the issue @GiedriusS reported at gophercloud/gophercloud#1856. There are some details to my current investigation over there. If anyone has any pointers, it would be very much appreciated!

@jtopjian
Copy link

I was able to get the Thanos S3 e2e tests working and then dug into the minio library to compare behavior. I think I've determined what's going on - there's always room for error, though.

At least for the e2e tests, the calculated partSize is always 0 because the files are small and/or they're less than the default vaules of 128mb:

https://github.com/thanos-io/thanos/blob/master/pkg/objstore/s3/s3.go#L321-L325

Minio implements a 128mb size as a threshold on when to do multipart/streaming uploads:

https://github.com/minio/minio-go/blob/master/api-put-object.go#L171-L174

When the size of the object is less than the default 128mb or whatever was explicitly passed, the following happens:

https://github.com/minio/minio-go/blob/master/api-put-object-streaming.go#L335-L344

ta-da! The object data is being wrapped into something that doesn't implement Close(). Thus when the Transport closes the body, the original file pointer is prevented from being closed. If you remove the highlighted code, you'll see that you get the same error being reported in Swift.

Minio is providing a lot of logic for how the data should be handled prior to being uploaded to S3. This amount of care is intentionally not in Gophercloud.

Gophercloud is meant to be a very low-level API. Its core goal is to model the OpenStack REST API with as little convenience and helper functions/methods as possible. This is because OpenStack is a very flexible set of projects and the APIs can be customized from one environment to the next. Implementing any sort of help/convenience opens the door to a lot of help/convenience to account for everyone's needs.

Gophercloud's current Swift support provides all of the capabilities to implement the same kind of logic that Minio is doing internally - it just expects it to be done a layer above Gophercloud.

We have a repository at https://github.com/gophercloud/utils that is meant to provide helpers/utilities for things that would otherwise be repeated a lot, including support for uploading and downloading objects, too. I've also committed a patch to assist the different kinds of data streams to prevent the original/underlying pointer from being prematurely closed, too.

While I tested this and it works, I don't think this object storage code sees a lot of use, so if you folks choose to adopt it, you might run into some bugs. Please feel free to submit PRs or at least a detailed report so this code can be made better -- you'd end up helping anyone else creating an OpenStack-compatible object storage app.

If you don't want to adopt the utils code, the solution that this PR is implementing is the best that can be done - possibly additionally implementing the same NewSectionReader logic in Minio.

Please let me know if anyone has any questions or comments.

@GiedriusS
Copy link
Member

GiedriusS commented Feb 25, 2020

I was able to get the Thanos S3 e2e tests working and then dug into the minio library to compare behavior. I think I've determined what's going on - there's always room for error, though.

At least for the e2e tests, the calculated partSize is always 0 because the files are small and/or they're less than the default vaules of 128mb:

https://github.com/thanos-io/thanos/blob/master/pkg/objstore/s3/s3.go#L321-L325

Minio implements a 128mb size as a threshold on when to do multipart/streaming uploads:

https://github.com/minio/minio-go/blob/master/api-put-object.go#L171-L174

When the size of the object is less than the default 128mb or whatever was explicitly passed, the following happens:

https://github.com/minio/minio-go/blob/master/api-put-object-streaming.go#L335-L344

ta-da! The object data is being wrapped into something that doesn't implement Close(). Thus when the Transport closes the body, the original file pointer is prevented from being closed. If you remove the highlighted code, you'll see that you get the same error being reported in Swift.

Minio is providing a lot of logic for how the data should be handled prior to being uploaded to S3. This amount of care is intentionally not in Gophercloud.

Gophercloud is meant to be a very low-level API. Its core goal is to model the OpenStack REST API with as little convenience and helper functions/methods as possible. This is because OpenStack is a very flexible set of projects and the APIs can be customized from one environment to the next. Implementing any sort of help/convenience opens the door to a lot of help/convenience to account for everyone's needs.

Gophercloud's current Swift support provides all of the capabilities to implement the same kind of logic that Minio is doing internally - it just expects it to be done a layer above Gophercloud.

We have a repository at https://github.com/gophercloud/utils that is meant to provide helpers/utilities for things that would otherwise be repeated a lot, including support for uploading and downloading objects, too. I've also committed a patch to assist the different kinds of data streams to prevent the original/underlying pointer from being prematurely closed, too.

While I tested this and it works, I don't think this object storage code sees a lot of use, so if you folks choose to adopt it, you might run into some bugs. Please feel free to submit PRs or at least a detailed report so this code can be made better -- you'd end up helping anyone else creating an OpenStack-compatible object storage app.

If you don't want to adopt the utils code, the solution that this PR is implementing is the best that can be done - possibly additionally implementing the same NewSectionReader logic in Minio.

Please let me know if anyone has any questions or comments.

Thanks for your impressive analysis! 💪 I have one small question: is there some small tool which would let us start a small Swift node for e2e testing purposes? Some similar to what minio has.

Now about this PR itself... maybe we shouldn't even log at all that a Close() had failed iff it has been closed already. I think it would be a better solution than all of this (potential) copying or creating new readers. Any other thoughts?

As for the other library, I would love to switch myself but we probably don't have enough human resources for that :(

@jtopjian
Copy link

Thanks for your impressive analysis!

No problem at all 🙂

I have one small question: is there some small tool which would let us start a small Swift node for e2e testing purposes? Some similar to what minio has.

There are a few ways. What you're looking for is something called Swift AIO (all-in-one). There are a bunch of scripts around that will build this. One option is to use the official OpenStack devstack project:

git clone https://opendev.org/openstack/devstack
cd devstack

cat > local.conf <<EOF
[[local|localrc]]
ADMIN_PASSWORD=secret
DATABASE_PASSWORD=\$ADMIN_PASSWORD
RABBIT_PASSWORD=\$ADMIN_PASSWORD
SERVICE_PASSWORD=\$ADMIN_PASSWORD
SWIFT_HASH=foobar
disable_all_services
enable_service key mysql s-proxy s-object s-container s-account
EOF

./stack.sh

The above will take around 5-6 minutes. When complete, you'll have a local OpenStack environment that only has Keystone (authentication) and Swift running:

source openrc
openstack catalog list
openstack container list
openstack container create foo
openstack container list
openstack object create foo local.conf
openstack object list foo

Now about this PR itself... maybe we shouldn't even log at all that a Close() had failed iff it has been closed already. I think it would be a better solution than all of this (potential) copying or creating new readers. Any other thoughts?

IMO, I'm quite thankful that this was being logged. It allowed me to catch an inefficiency, enhance the gophercloud/utils code, and learn about readers in general. But I don't think having the object closed from Transport was actually doing any harm -- at least not this time. If it's possible to restrict this logging to only happen in debug mode, that might be a good tradeoff.

As for the other library, I would love to switch myself but we probably don't have enough human resources for that :(

I totally understand - I'm in the same situation!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants