Description
Current Situation when uploading PFFileObjects
I'm using Parse Server to upload files from the client apps to S3. This works great. In the default configuration client SDK asks to upload file.txt
and Parse Server generates a random server side filename by generating a random prefix string. This is done here:
As a result, Parse Server will then pass the generated filename, say abcdefgd_file.txt
to a concrete instance of FilesAdapter
and will store a file somewhere, storing a generated file name abcdefgd_file.txt
in the fields that reference this file.
Later when a file needs to be downloaded, the filename abcdefgd_file.txt
is passed to the files adapter to generate a download URL, and this is passed to the client.
The Problem
When using S3 backed storage for uploaded files, all uploaded files end up in one bucket and there is no easy way to organize the files and implement regular maintenance, cleanup, etc. AWS API does not support listing files by modification date (aws/aws-cli#1104). The solution is to prepend filenames uploaded to S3 with a date, timestamp, or any other mechanism that works for you so that you can later quickly analyze files with aws s3 ls yourbucket/fileprefix
.
Regular maintenance can then be done for example like this.
# Delete files uploaded in 2018, i.e. all files whose names start with prefix 2018
$ aws rm mybucket/2018
This seems to be the recommended practice for S3 anyway and prefix based file listing is recommended and supported everywhere [citation needed].
Alternatives
Parse Server S3 Adapter already supports custom generateKey
option that can alter the name of the file before uploading it to S3. This is documented here: https://github.com/parse-community/parse-server-s3-adapter/blob/bc41281a1c9dc24ae853ff4ff50300252a728e99/README.md#L71
This can be used to create a filename on the fly before uploading the file to S3, like shown for example here:
generateKey: (filename) => {
return `${Date.now()}_${filename}`; // unique prefix for every filename
}
At the moment, a filename generated this way is used only by the S3 adapter and is not propagated back from the S3 adapter to correctly store it in the database fields referencing that file. This requires to turn on another option on the Parse Server itself. It is called preserveFileName
and when turned on, it will let the client SDK choose a filename and will never alter it. This is a recommended solution per Parse Server S3 Adapter, however it leaves all the responsibility on generating correct filenames on the client, and it is bad and will fail quickly when two clients choose the same filename.
Proposed Solution
We need a way to make the Parse Server to take the client provided filename, alter/generate a safe variant from it, upload it, and handle everywhere.
For me the easiest way to do this is to reuse the FilesController.validateFilename
function, which is supposed to validate a passed filename and return an error if there is any problem.
This function is called immediately after POST /files/filename
, and is asking the configured Files Adapter to do the validation.
Modifying the function to not return error, but rather to generate a safe filename that will be used later will allow the file adapter to generate whatever filenames work best and propagate them back to the database.
Another option would be to add a new hook, in the spirit of sanitizeFilename
, generateRealFilename
, and pass that hook down to the Adapter to later work with a newly generated filename everywhere.
The fix to this will involve changes in both Parse Server itself as well as in the S3 Adapter. I'm filing a new issue there as well.