Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[new] chunk backend: Integrate Alibaba Cloud oss #5522

Closed
wants to merge 8 commits into from

Conversation

liguozhong
Copy link
Contributor

@liguozhong liguozhong commented Mar 3, 2022

What this PR does / why we need it:
[new] chunk backend: Integrate Alibaba Cloud oss
link:https://www.alibabacloud.com/product/object-storage-service

oss is an object storage system similar to s3 in China. Due to performance problems caused by network delays in China, many companies in mainland China use oss as object storage instead of s3.

Which issue(s) this PR fixes:
Fixes #5499 and #3504

Special notes for your reviewer:
1: oss dependency from thanos go mod
github.com/aliyun/aliyun-oss-go-sdk v2.2.2+incompatible // indirect
2: yaml config demo

schema_config:
  configs:
  - from: 2020-05-15
    object_store: oss
    schema: v11
    index:
      prefix: loki_index_
      period: 168h

storage_config:
  alibaba:
    bucket: loki-chunk-test
    endpoint: oss-cn-beijing.aliyuncs.com
    access_key_id: <ak>
    secret_access_key: <sk>

3: oss chunk by loki snapshot
image
4: debug oss getObject success.
image
5: query oss chunk data success IDEA debug shapshot.
image

Checklist

  • Documentation added
  • Tests updated
  • Add an entry in the CHANGELOG.md about the changes.

@liguozhong liguozhong requested review from KMiller-Grafana and a team as code owners March 3, 2022 08:38
@stale
Copy link

stale bot commented Apr 16, 2022

Hi! This issue has been automatically marked as stale because it has not had any
activity in the past 30 days.

We use a stalebot among other tools to help manage the state of issues in this project.
A stalebot can be very useful in closing issues in a number of cases; the most common
is closing issues or PRs where the original reporter has not responded.

Stalebots are also emotionless and cruel and can close issues which are still very relevant.

If this issue is important to you, please add a comment to keep it open. More importantly, please add a thumbs-up to the original issue entry.

We regularly sort for closed issues which have a stale label sorted by thumbs up.

We may also:

  • Mark issues as revivable if we think it's a valid issue but isn't something we are likely
    to prioritize in the future (the issue will still remain closed).
  • Add a keepalive label to silence the stalebot if the issue is very common/popular/important.

We are doing our best to respond, organize, and prioritize all issues but it can be a challenging task,
our sincere apologies if you find yourself at the mercy of the stalebot.

@stale stale bot added the stale A stale issue or PR that will automatically be closed. label Apr 16, 2022
@stale stale bot removed the stale A stale issue or PR that will automatically be closed. label Apr 20, 2022
@stale
Copy link

stale bot commented Jun 12, 2022

Hi! This issue has been automatically marked as stale because it has not had any
activity in the past 30 days.

We use a stalebot among other tools to help manage the state of issues in this project.
A stalebot can be very useful in closing issues in a number of cases; the most common
is closing issues or PRs where the original reporter has not responded.

Stalebots are also emotionless and cruel and can close issues which are still very relevant.

If this issue is important to you, please add a comment to keep it open. More importantly, please add a thumbs-up to the original issue entry.

We regularly sort for closed issues which have a stale label sorted by thumbs up.

We may also:

  • Mark issues as revivable if we think it's a valid issue but isn't something we are likely
    to prioritize in the future (the issue will still remain closed).
  • Add a keepalive label to silence the stalebot if the issue is very common/popular/important.

We are doing our best to respond, organize, and prioritize all issues but it can be a challenging task,
our sincere apologies if you find yourself at the mercy of the stalebot.

@stale stale bot added the stale A stale issue or PR that will automatically be closed. label Jun 12, 2022
@stale stale bot closed this Jul 10, 2022
@liguozhong
Copy link
Contributor Author

Hi! This issue has been automatically marked as stale because it has not had any activity in the past 30 days.

We use a stalebot among other tools to help manage the state of issues in this project. A stalebot can be very useful in closing issues in a number of cases; the most common is closing issues or PRs where the original reporter has not responded.

Stalebots are also emotionless and cruel and can close issues which are still very relevant.

If this issue is important to you, please add a comment to keep it open. More importantly, please add a thumbs-up to the original issue entry.

We regularly sort for closed issues which have a stale label sorted by thumbs up.

We may also:

  • Mark issues as revivable if we think it's a valid issue but isn't something we are likely
    to prioritize in the future (the issue will still remain closed).
  • Add a keepalive label to silence the stalebot if the issue is very common/popular/important.

We are doing our best to respond, organize, and prioritize all issues but it can be a challenging task, our sincere apologies if you find yourself at the mercy of the stalebot.

@Stale ??

[endpoint: <string> | default = ""]

# AWS Access Key ID
# CLI flag: -<prefix>.s3.access-key-id
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oss.access-key-id

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok,done.
image

schema_config:
configs:
- from: 2020-05-15
object_store: oss
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ha, I wish object-storage-service didn't share the same letters as open-source-software 😆

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great advice, the name oss of this product is really confusing.
I have changed oss to alibabacloud, and alibabacloud has only one object storage, so there is no need to specify oss products.

"github.com/grafana/loki/pkg/storage/chunk/client"
)

var ossRequestDuration = instrument.NewHistogramCollector(prometheus.NewHistogramVec(prometheus.HistogramOpts{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use promauto like we do in the GCS client here (currently it doesn't look like the histograms are ever registered and thus absent from the /metrics endpoint):

"github.com/prometheus/client_golang/prometheus/promauto"

We're trying to move to a better way of defining metrics that doesn't auto-instantiate them, but all the object client metrics still function that way I think.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instrument.CollectedRequest can easily implement RED monitoring, I will continue to use instrument to be easier. The init() function has been registered

Namespace: "loki",
Name: "oss_request_duration_seconds",
Help: "Time spent doing OSS requests.",
Buckets: []float64{.025, .05, .1, .25, .5, 1, 2},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest using the same exponential buckets from gcs:

		// 6 buckets from 5ms to 20s.
		Buckets: prometheus.ExponentialBuckets(0.005, 4, 7),

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks.done

Buckets: []float64{.025, .05, .1, .25, .5, 1, 2},
}, []string{"operation", "status_code"}))

type ClientFactory func(ctx context.Context, opts ...option.ClientOption) (*storage.Client, error)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is unused (looks copy/pasted from somewhere else, but it's pulling in a different option pkg)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks,I will delete ClientFactory

}

// OssConfig is config for the OSS Chunk Client.
type OssConfig struct {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this use the same fields as the AWS config?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the oss product of alibabacloud is a follower of aws s3, many fields are very similar


// IsObjectNotFoundErr returns true if error means that object is not found. Relevant to GetObject and DeleteObject operations.
func (s *OssObjectClient) IsObjectNotFoundErr(err error) bool {
return errors.Is(err, storage.ErrObjectNotExist)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks copied/pasted again and is comparing google errors to alibaba responses

Copy link
Contributor Author

@liguozhong liguozhong Feb 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks.done.
The IsObjectNotFoundErr return value of error has been checked in detail.
At the same time, the test function has been added to confirm that it is running correctly.

ossErrorDeatil message snapshot.
image
test case pass. IsObjectNotFoundErr and GetObject
image

oss 404 doc https://www.alibabacloud.com/help/en/object-storage-service/latest/http-404-status-code
image

@owen-d
Copy link
Member

owen-d commented Feb 7, 2023

With some tweaks, it looks like this should work. Have you tried it?

@owen-d
Copy link
Member

owen-d commented Feb 8, 2023

Sounds good, thanks! I think we can reopen, fix the comments in my review, then merge :)

@liguozhong
Copy link
Contributor Author

liguozhong commented Feb 9, 2023

fix the comments

done. ready for review .and testcase passed.
#8410
image

I think we can reopen

This PR was automatically closed by robot, I don't have permission to reopen it

then merge

that's great😃

@chengxiangwang
Copy link

chunks can't upload

@liguozhong
Copy link
Contributor Author

chunks can't upload

Can you file a new issue, follow the issue bug template, and provide more context such as loki version, loki configuration, loki error log, etc.
Alibaba cloud oss storage works success in my environment. I can help you troubleshoot this problem

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
size/L stale A stale issue or PR that will automatically be closed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Integrate Alibaba Cloud oss
3 participants