-
-
Notifications
You must be signed in to change notification settings - Fork 5.8k
Add Bucket Storage Support for files #7795
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
9b8c837
to
e62302b
Compare
modules/setting/setting.go
Outdated
@@ -613,7 +621,7 @@ func NewContext() { | |||
OfflineMode = sec.Key("OFFLINE_MODE").MustBool() | |||
DisableRouterLog = sec.Key("DISABLE_ROUTER_LOG").MustBool() | |||
StaticRootPath = sec.Key("STATIC_ROOT_PATH").MustString(AppWorkPath) | |||
AppDataPath = sec.Key("APP_DATA_PATH").MustString(path.Join(AppWorkPath, "data")) | |||
AppDataPath = sec.Key("APP_DATA_PATH").MustString("data") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should not be changed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lafriks ,I need the followings:
AvatarUploadPath = "data/avatars/"
LFS.ContentPath = "data/lfs/"
AttachmentPath = "data/attachments/"
RepositoryAvatarUploadPath = "data/repo-avatars/"
Should I use:
LFS.ContentPath = sec.Key("LFS_CONTENT_PATH").MustString(filepath.Join("data", "lfs/"))
instead of:
LFS.ContentPath = sec.Key("LFS_CONTENT_PATH").MustString(filepath.Join(AppDataPath, "lfs"))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Problem is that this is just defaults. User could have specified completely different paths in their app.ini for ex:
AvatarUploadPath = "/srv/avatars/upload/"
LFS.ContentPath = "/srv/lfs/"
AttachmentPath = "/srv/attachments/"
RepositoryAvatarUploadPath = "/srv/avatars/repo-upload/"
And in such configuration your code will not work for local storage anymore
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will work for either local or any other cloud storage.
By default I set it to local storage. https://github.com/go-gitea/gitea/blob/16ac0bb98f889f15628e1779889fd0b689927f2a/modules/setting/file_storage.go#L10-L23
If user wants to use any other cloud storage or other local storage, they just have to specify it in their app.ini
Users are free to provide their own upload path like you suggested and it will work fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant it should work for local storage same as it was before without changing app.ini
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it will work as it was before. Just the files will be stored in the directory specified in FileStorage.BucketURL
.
So, what do you suggest about #7795 (comment) ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I've checked out this PR and ran it with my usual test settings:
[server]
SSH_DOMAIN = ....
DOMAIN = ....
HTTP_PORT = ....
PROTOCOL = http
ROOT_URL = ....
OFFLINE_MODE = false
APP_DATA_PATH = /mnt/external/my_test_data
Then I've changed my avatar. I expected the avatar to be created at:
/mnt/external/my_test_data/avatars/a9050bf53d1005dfe5398ce9fa5b07aa
Instead, with this PR, the avatar was incorrectly created at:
/home/giteatest/avatars/a9050bf53d1005dfe5398ce9fa5b07aa
As you can see, with this PR Gitea completely ignored the value of my APP_DATA_PATH
. This is the breaking change we've been talking about. The code must behave like it did before this PR if the user doesn't change their app.ini
. Users that upgrade to this must not notice any difference, unless they actively decide to use the new bucket storage capability.
I hope that this example helps.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @guillep2k, thanks for taking the time for testing.
If AVATAR_UPLOAD_PATH
is set either relative or absolute, it ignores APP_DATA_PATH
as it was ignored previously.
gitea/modules/setting/setting.go
Lines 850 to 854 in 111d31d
AvatarUploadPath = sec.Key("AVATAR_UPLOAD_PATH").MustString(path.Join(AppDataPath, "avatars")) | |
forcePathSeparator(AvatarUploadPath) | |
if !filepath.IsAbs(AvatarUploadPath) { | |
AvatarUploadPath = path.Join(AppWorkPath, AvatarUploadPath) | |
} |
But if AVATAR_UPLOAD_PATH
is not specified, then it uses APP_DATA_PATH
+"avatars" to read or store the avatars.
Either way if the AvatarUploadPath
is relative then the whole path to store the avatars is AppWorkPath
+AvatarUploadPath
.
If the path is absolute the avatars is stored to AvatarUploadPath
.
It works like it did previously.
I've attached the scenarios here :
gitea/modules/storage/storage.go
Lines 124 to 154 in e29ba59
/* | |
- Path represents AttachmentPath, AvatarUploadPath, RepositoryAvatarUploadPath or LFS.ContentPath | |
- corresponding default PathValues are : "attachments", "avatars", "repo-avatars" & "lfs" | |
- appDataUserPath defaults to "data" | |
There may be two scenarios: | |
s1: | |
Path is set in app.ini (rel or abs) | |
s2: | |
Path is unset in app.ini | |
Path <= appDataUserPath + Path | |
If appDataUserPath is set to abs in app.ini (via APP_DATA_PATH) | |
Path is abs | |
Otherwise | |
Path is rel | |
If Path is abs, the files will be read or stored to that abs Path even if the BUCKET_URL is set. | |
Otherwise the following occurs, | |
if BUCKET_URL NOT SET { | |
Path = file://{AppWorkPath}/{Path} | |
} else { | |
Path is used as Bucket prefix | |
} | |
*/ |
May be there was some errors in my previous commit but I refactored the code and I hope it works like it did before this pr.
Please, pull again and check if it's now okay. Thanks again.
LFS.ContentPath = filepath.Join(AppWorkPath, LFS.ContentPath) | ||
} | ||
forcePathSeparator(LFS.ContentPath) | ||
LFS.ContentPath = suffixPathSeparator(LFS.ContentPath) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This breaks current behavior and is not really related to this PR
Hey @lafriks , previously I misunderstood you. |
af91ca1
to
8e770d0
Compare
models/migrations/migrations.go
Outdated
isSucceed = false | ||
return err | ||
} | ||
if filepath.IsAbs(setting.AttachmentPath) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why checking only for absolute attachment path?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have updated it. I've attached a summary how whole BucketURL
and AttachmentPath
or others work.
https://github.com/go-gitea/gitea/blob/4c390c9699ee6400a9467c4c8234af9efba5acaf/modules/setting/setting.go#L1053-L1083
Please take a look if it seems okay. Thanks
models/migrations/v61.go
Outdated
var bucket *blob.Bucket | ||
var err error | ||
ctx := context.Background() | ||
if filepath.IsAbs(setting.AttachmentPath) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMHO it should be checking for some other setting not if path is absolute
I think we should have an abstract layout under |
Codecov Report
@@ Coverage Diff @@
## master #7795 +/- ##
==========================================
- Coverage 41.74% 41.72% -0.02%
==========================================
Files 526 528 +2
Lines 68398 68634 +236
==========================================
+ Hits 28553 28638 +85
- Misses 36201 36340 +139
- Partials 3644 3656 +12
Continue to review full report at Codecov.
|
custom/conf/app.ini.sample
Outdated
|
||
[storage] | ||
; URL of Bucket where files, such as, attachments, avatars, etc. are stored | ||
; If unset, file://<current_directory> is used to maintain backward compatibility |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"backward compatibility
" makes it sound like buckets are the "modern" way to go. Could this be changed to something like "if unset, {APP_DATA_PATH}/attachments is used
" (not checked) or something around those lines?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually if unset, it now uses AppWorkPath
. When using this path for attachments
or avatars
, AttachmentPath
or AvatarUploadPath
is appended to this BucketURL
.
If AttachmentPath
is unset, we set this as data/attachments
, to maintain similarity between all the available buckets.
gitea/modules/setting/setting.go
Line 803 in 83ccf4c
AttachmentPath = sec.Key("PATH").MustString(path.Join("data", "attachments")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, if I understand well, the user can't use APP_DATA_PATH
anymore? I think it's too big of a breaking change. If the user upgrades because they were interested in X other functionality the new version provides, he or she might not be aware of this change and get in a lot of trouble. Can't they both co-exist? e.g. if there are both APP_DATA_PATH
and BUCKET_URL
, then the former should be ignored, but if there's only APP_DATA_PATH
, the old behaviour should be honored.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
APP_DATA_PATH
related issue is now fixed.
https://github.com/go-gitea/gitea/blob/4c390c9699ee6400a9467c4c8234af9efba5acaf/modules/setting/setting.go#L1053-L1083
Now, APP_DATA_PATH
and BUCKET_URL
both co-exist. Please let me know if I did it right.
models/admin.go
Outdated
ctx := context.Background() | ||
bucket, err := setting.OpenBucket(ctx, bucketPath) | ||
if err != nil { | ||
return fmt.Errorf("could not open bucket: %v", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a way to make an error here gracefully fail, so the user can attempt to retry the operation? Third-party requests are likely to fail at some point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The error related to OpenBucket
is handled by the bucket providers using exponential backoff
algo. I think it's okay not to handle it here.
models/attachment.go
Outdated
ctx := context.Background() | ||
bucket, err := setting.OpenBucket(ctx, setting.AttachmentPath) | ||
if err != nil { | ||
return nil, fmt.Errorf("could not open bucket: %v", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same thing as above about gracefully return connection errors.
Hey @lunny, I am working on that now. I will commit those changes in a separate commit. |
9ea8ebe
to
e29ba59
Compare
@lunny , I have refactored this pr. I moved the I hope these changes serve the abstract layout you asked for. |
@lunny / @lafriks / @guillep2k and folks, if you can take another look at it that will be great. |
Now the
Contents:
Why is it there? Is that a side-effect of your changes? |
@guillep2k , this is a result of using the go-cloud/blob/fileblob driver. This library stores file attributes in this .attrs file . |
Please take a look. The git conflicts are fixed. |
Weird fail in the build. Somebody will restart it soon. |
avatars
, repo-avatars
, attachments
& lfs
I will review this ASAP. |
@lunny, I keep getting My go version is Can you please give me some suggestions what should I do to resolve this problem ? |
@masudur-rahman CI failed because of other reason and just fixed. Please resolve the conflicts. |
b0eb4d0
to
7d714c4
Compare
@lunny, I resolved the conflicts and kept the changes in a single commit. Still getting |
8f89c76
to
13adc4c
Compare
13adc4c
to
0e9676f
Compare
At last the Thanks... |
models/attachment.go
Outdated
if err != nil { | ||
return nil, fmt.Errorf("Create: %v", err) | ||
} | ||
defer fw.Close() | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just keep defer fw.Close()
is better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
defer fw.Close()
can't be used here.
fw
must be closed before fi, err := fs.Attributes()
(Line 105). It can't open the file for attributes unless the fw.Close()
is called.
That's why I had to do like this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know about fw
, but in many libraries Close()
can be called twice without any side effects. So you can defer fw.Close()
and call fw.Close()
later when you need to.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the late.
Your suggestion will work fine here and will do that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Decidedly not an easy PR to review. 😅
models/attachment.go
Outdated
if err != nil { | ||
return nil, fmt.Errorf("Create: %v", err) | ||
} | ||
defer fw.Close() | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know about fw
, but in many libraries Close()
can be called twice without any side effects. So you can defer fw.Close()
and call fw.Close()
later when you need to.
Ctx: context.Background(), | ||
Path: setting.AttachmentPath, | ||
} | ||
|
||
for _, attach := range attachments { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that for the duration of this migration you should override the configuration value and make sure local file://
is used. But make sure to restore the configured value before exiting the function, as Gitea will start afterwards.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's very necessary.
fs := storage.FileStorage{ | ||
Ctx: context.Background(), | ||
Path: setting.RepositoryAvatarUploadPath, | ||
FileName: repo.Avatar, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if two users choose the same file name for their repo avatars?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here, <repo.ID>-<md5.Sum(data)>
is used as repo.Avatar
. So even if two user uses same file, repo.ID will differ them.
https://github.com/go-gitea/gitea/blob/98c0c8ea2e0d4896cc3cc00423f0d2952fcbad21/models/repo.go#L2718
fs := storage.FileStorage{ | ||
Ctx: context.Background(), | ||
Path: setting.AvatarUploadPath, | ||
FileName: u.Avatar, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same question as for repo avatar file name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same answer as for repo.avatar.
But previously, if two users uploaded the same file, it would replace the previous one (also in previous system).
Line 523 in d4cd4ed
u.Avatar = fmt.Sprintf("%x", md5.Sum(data)) |
I used same format as repo.Avatar
to avoid this collision.
https://github.com/go-gitea/gitea/blob/98c0c8ea2e0d4896cc3cc00423f0d2952fcbad21/models/user.go#L533
localPath := attach.LocalPath() | ||
if err = os.MkdirAll(path.Dir(localPath), os.ModePerm); err != nil { | ||
return fmt.Errorf("MkdirAll: %v", err) | ||
fs := storage.FileStorage{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is trickier. Depeding on the current version value you should or should not ignore the app.ini setting and use local file storage in this function.
// AvatarOptions represents the available options to configure the AvatarHandler. | ||
type AvatarOptions struct { | ||
Prefix string | ||
SkipLogging bool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this necessary? The new logger system allows to set logging levels for different features individually.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I followed the existing Options
structure to handle this part.
gitea/modules/public/public.go
Lines 24 to 33 in d4cd4ed
type Options struct { | |
Directory string | |
IndexFile string | |
SkipLogging bool | |
// if set to true, will enable caching. Expires header will also be set to | |
// expire after the defined time. | |
ExpiresAfter time.Duration | |
FileSystem http.FileSystem | |
Prefix string | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't realize that.
} | ||
} | ||
|
||
fr, err := fs.NewReader() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think users (admins) should decide which resources are handled locally or remotely. Avatars are too time-sensitive to keep in a remote server, especially if they need to pass through Gitea to be retrieved. Attachments are usually the images of issues and PRs, so they too are pretty much time-sensitive. The Etag
of course helps a lot. Nonetheless, I'd have three separate settings: one for avatars, one for attachments and one for LFS. Even let the admin set different services for each one.
@@ -774,11 +782,15 @@ func NewContext() { | |||
|
|||
InternalToken = loadInternalToken(sec) | |||
|
|||
sec = Cfg.Section("storage") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As said above, I'd have three sections: [storage.avatars]
, [storage.attachments]
, [storage.lfs]
. Or some flexible system like that.
} | ||
|
||
// NewReader provides file reader from bucket and error if occurs | ||
func (fs *FileStorage) NewReader() (*blob.Reader, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you pass an enum here, it's easy to decide between different file storages.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we want to separate storage system for avatars, lfs, attachments ( [storage.avatars]
, [storage.attachments]
, [storage.lfs]
), we can use an extra field BucketURL
here which will be useful to decide between file storages.
https://github.com/go-gitea/gitea/blob/98c0c8ea2e0d4896cc3cc00423f0d2952fcbad21/modules/storage/storage.go#L22-L26
One important aspect that I didn't see covered: how does one user migrate from one storage system to another? For example: I decide I need aws for my files, but my repository is already up and running. If I change |
@guillep2k , we can use a cmd like |
A migration tool will be needed indeed. Between different cloud storages I don't know, but at least between any cloud storage and file://. And it should be reversible (e.g. file names should not be problem, even if e.g. avatars get a different name when migrated). |
Signed-off-by: Masudur Rahman <masud@appscode.com>
98c0c8e
to
caa31bc
Compare
Should we provide a new command to copy local directory data to a storage? |
@lunny You mean something like this?:
|
I would like we have an abstract filesystem layer before this PR. Something like https://godoc.org/github.com/blang/vfs or other better libraries. So we could support both local file system, blob storages and other storage facilities if there are implements for them. |
Any news about this? @masudur-rahman @tamalsaha |
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs during the next 2 months. Thank you for your contributions. |
avatars
repo-avatars
attachments
lfs