-
Notifications
You must be signed in to change notification settings - Fork 3
/
Copy pathREADME.rst
238 lines (164 loc) · 6.16 KB
/
README.rst
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
Add S3 support to dtool
=======================
.. image:: https://badge.fury.io/py/dtool-s3.svg
:target: http://badge.fury.io/py/dtool-s3
:alt: PyPi package
- GitHub: https://github.com/jic-dtool/dtool-S3
- PyPI: https://pypi.python.org/pypi/dtool-S3
- Free software: MIT License
Features
--------
- Copy datasets to and from S3 object storage
- List all the datasets in a S3 bucket
- Create datasets directly in S3
Installation
------------
To install the dtool-S3 package::
pip install dtool-s3
Configuration
-------------
Install the ``aws`` client, for details see
`https://docs.aws.amazon.com/cli/latest/userguide/installing.html <https://docs.aws.amazon.com/cli/latest/userguide/installing.html>`_. In short::
pip install awscli --upgrade --user
Configure the credentials using::
aws configure
These are needed for the ``boto3`` library, for more details see
`https://boto3.readthedocs.io/en/latest/guide/quickstart.html <https://boto3.readthedocs.io/en/latest/guide/quickstart.html>`_.
Configuring custom endpoints
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
It is possible to configure buckets to make use of custom endpoints. This is useful if one wants to make use of S3 storage not hosted in AWS.
Create the file ``.config/dtool/dtool.json`` and add the s3 storage account details
using the format below::
{
"DTOOL_S3_ENDPOINT_<BUCKET NAME>": "<ENDPOINT URL HERE>",
"DTOOL_S3_ACCESS_KEY_<BUCKET NAME>": "<USER NAME HERE>",
"DTOOL_S3_SECRET_ACCESS_KEY_<BUCKET NAME>": "<KEY HERE>"
}
For example::
{
"DTOOL_S3_ENDPOINT_my-bucket": "http://blueberry.famous.uni.ac.uk",
"DTOOL_S3_ACCESS_KEY_ID_my-bucket": "olssont",
"DTOOL_S3_SECRET_ACCESS_KEY_my-bucket": "some-secret-token"
}
The configuration can also be done using your environment variables. For example on Linux/Mac::
export DTOOL_S3_ENDPOINT_my-bucket=http://blueberry.famous.uni.ac.uk
export DTOOL_S3_ACCESS_KEY_ID_my-bucket=olssont
export DTOOL_S3_SECRET_ACCESS_KEY_my-bucket=some-secret-token
Usage
-----
To copy a dataset from local disk (``my-dataset``) to a S3 bucket
(``/data_raw``) one can use the command below::
dtool copy ./my-dataset s3://data_raw
To list all the datasets in a S3 bucket one can use the command below::
dtool ls s3://data_raw
See the `dtool documentation <http://dtool.readthedocs.io>`_ for more detail.
Path prefix and access control
------------------------------
The S3 plugin supports a configurable prefix to the path. This can be used for
access control to the dataset. For example::
export DTOOL_S3_DATASET_PREFIX="u/olssont"
Alternatively one can edit the ``~/.config/dtool/dtool.json`` file::
{
...,
"DTOOL_S3_DATASET_PREFIX": "u/olssont"
}
Use the following S3 access to policy to that allows reading all data
in the bucket but only writing to the prefix `u/<username>` and `dtool-`::
{
"Statement": [
{
"Sid": "AllowReadonlyAccess",
"Effect": "Allow",
"Action": [
"s3:ListBucket",
"s3:ListBucketVersions",
"s3:GetObject",
"s3:GetObjectTagging",
"s3:GetObjectVersion",
"s3:GetObjectVersionTagging"
],
"Resource": [
"arn:aws:s3:::my-bucket",
"arn:aws:s3:::my-bucket/*"
]
},
{
"Sid": "AllowPartialWriteAccess",
"Effect": "Allow",
"Action": [
"s3:DeleteObject",
"s3:PutObject",
"s3:PutObjectAcl"
],
"Resource": [
"arn:aws:s3:::my-bucket/dtool-*",
"arn:aws:s3:::my-bucket/u/${aws:username}/*"
]
},
{
"Sid": "AllowListAllBuckets",
"Effect": "Allow",
"Action": [
"s3:ListAllMyBuckets",
"s3:GetBucketLocation"
],
"Resource": "arn:aws:s3:::*"
}
]
}
The user also needs write access to toplevel objects that start with `dtool-`.
Those are the registration keys that are not stored under the configured
prefix. The registration keys contain the prefix where the respective dataset
is found. They are empty if no prefix is configured.
Testing
-------
Linux/Mac
~~~~~~~~~
All tests need the S3_TEST_BASE_URI environment variable set.
::
export S3_TEST_BASE_URI="s3://your-dtool-s3-test-bucket"
For the ``tests/test_custom_endpoint_config.py`` test one also needs to specify the S3_TEST_ACCESS_KEY_ID and S3_TEST_SECRET_ACCESS_KEY environment variables.
::
export S3_TEST_ACCESS_KEY_ID=YOUR_AWS_ACCESS_KEY
export S3_TEST_SECRET_ACCESS_KEY=YOUR_AWS_SECRET_ACCESS_KEY
To run the tests.
::
python setup.py develop
pytest
Windows PowerShell
~~~~~~~~~~~~~~~~~~
All tests need the S3_TEST_BASE_URI environment variable set.
::
$env:S3_TEST_BASE_URI = "s3://your-dtool-s3-test-bucket"
For the ``tests/test_custom_endpoint_config.py`` test one also needs to specify the S3_TEST_ACCESS_KEY_ID and S3_TEST_SECRET_ACCESS_KEY environment variables.
::
$env:S3_TEST_ACCESS_KEY_ID = YOUR_AWS_ACCESS_KEY
$env:S3_TEST_SECRET_ACCESS_KEY = YOUR_AWS_SECRET_ACCESS_KEY
To run the tests.
::
python setup.py develop
pytest
Windows DOS
~~~~~~~~~~~
All tests need the S3_TEST_BASE_URI environment variable set.
::
setx S3_TEST_BASE_URI "s3://test-dtool-s3-bucket-to"
python setup.py develop
pytest
For the ``tests/test_custom_endpoint_config.py`` test one also needs to specify the S3_TEST_ACCESS_KEY_ID and S3_TEST_SECRET_ACCESS_KEY environment variables.
::
setx S3_TEST_ACCESS_KEY_ID YOUR_AWS_ACCESS_KEY
setx S3_TEST_SECRET_ACCESS_KEY YOUR_AWS_SECRET_ACCESS_KEY
To run the tests.
::
python setup.py develop
pytest
Related packages
----------------
- `dtoolcore <https://github.com/jic-dtool/dtoolcore>`_
- `dtool-cli <https://github.com/jic-dtool/dtool-cli>`_
- `dtool-ecs <https://github.com/jic-dtool/dtool-ecs>`_
- `dtool-http <https://github.com/jic-dtool/dtool-http>`_
- `dtool-azure <https://github.com/jic-dtool/dtool-azure>`_
- `dtool-irods <https://github.com/jic-dtool/dtool-irods>`_
- `dtool-smb <https://github.com/IMTEK-Simulation/dtool-smb>`_