Skip to content

Commit 6c17813

Browse files
authored
Merge pull request #119 from begoldsm/master
Release version 0.0.1 of ADLS Data Plane SDK
2 parents 1e6f459 + d10dce6 commit 6c17813

File tree

111 files changed

+38903
-0
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

111 files changed

+38903
-0
lines changed

.coveragerc

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
[run]
2+
include =
3+
azure/datalake/store/*
4+
5+
omit =
6+
azure/datalake/store/tests/test*
7+
8+
[report]
9+
show_missing = True
10+
11+
[html]
12+
directory = coverage_html_report
13+

.gitignore

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
*.pyc
2+
*.egg-info/
3+
.cache
4+
.coverage
5+
.coverage.*
6+
build/
7+
dist/
8+
tests/__pycache__/
9+
*.suo
10+
publish/

.travis.yml

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,24 @@
1+
sudo: False
2+
3+
language: python
4+
5+
cache: pip
6+
7+
matrix:
8+
include:
9+
- python: 2.7
10+
- python: 3.3
11+
- python: 3.4
12+
- python: 3.5
13+
14+
install:
15+
# Install dependencies
16+
- pip install -U setuptools pip
17+
- pip install -r dev_requirements.txt
18+
- python setup.py develop
19+
20+
script:
21+
- py.test -x -vvv --doctest-modules --pyargs azure.datalake.store tests
22+
23+
notifications:
24+
email: false

License.txt

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
The MIT License (MIT)
2+
3+
Copyright (c) 2016 Microsoft
4+
5+
Permission is hereby granted, free of charge, to any person obtaining a copy
6+
of this software and associated documentation files (the "Software"), to deal
7+
in the Software without restriction, including without limitation the rights
8+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9+
copies of the Software, and to permit persons to whom the Software is
10+
furnished to do so, subject to the following conditions:
11+
12+
The above copyright notice and this permission notice shall be included in all
13+
copies or substantial portions of the Software.
14+
15+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21+
SOFTWARE.

MANIFEST.in

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
recursive-include azure/datalake/store *.py
2+
recursive-include docs *.rst
3+
4+
include setup.py
5+
include README.rst
6+
include LICENSE.txt
7+
include MANIFEST.in
8+
include requirements.txt
9+
10+
prune docs/_build

Milestones.md

Lines changed: 53 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,53 @@
1+
- First milestone: Core data
2+
lake store filesystem client, which includes:
3+
- 1:1 mapping of filesystem
4+
REST endpoints to client methods
5+
- Unit tests and CI tests
6+
for the core client covering all the methods
7+
- Pythonic Filesystem layer, including file class interfacing for the REST calls. This allows native interaction with file objects the same way a user would interact with basic python files and folders.
8+
9+
- Second milestone: performant
10+
extension to the core client for single file upload and download
11+
- Basically, multi-part
12+
upload/download support for single files, with flexibility for the user
13+
to determine how performant/parallel the upload/download is, with smart
14+
defaults that will take advantage of their available network capacity.
15+
- Unit tests and CI tests
16+
for this functionality
17+
18+
- Third milestone: addition
19+
of folder and recursive support for performant upload/download and
20+
performance tests
21+
- Extending the single file
22+
upload/download to allow for folder and recursive folder
23+
upload/download
24+
- Unit tests and CI tests
25+
for this functionality
26+
- Addition of performance
27+
tests to measure performance for large files, folders full of small and
28+
mixed file sizes. Once stable, we will integrate these tests into our
29+
existing performance testing and reporting service.
30+
31+
- Fourth milestone:
32+
Stabilization, Integration and documentation
33+
- Stabilize the work of the
34+
previous three milestones and ensure all tests and CI jobs have robust
35+
coverage
36+
- Integrate this custom
37+
functionality with the existing Azure SDK for python. This includes
38+
proper packaging and naming, ensure inclusion of any common dependencies
39+
for things like error handling (ideally this is done in an ongoing basis
40+
during development in milestone one, but just in case anything is missed
41+
it is fixed here).
42+
- Get the functionality
43+
ready for package publishing, which includes ensuring our getting started
44+
documentation, samples and readthedocs code documentation is ready and
45+
has been reviewed.
46+
47+
- Fifth milestone (if time
48+
remains): Convenience layer for auto-generated clients
49+
- This is much lower
50+
priority than the previous four milestones, but if we have time it would
51+
be good to go over the auto-generated client functionality for our other
52+
four clients and see if there are any good quality of life improvements
53+
we can make for users.

README.rst

Lines changed: 142 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,142 @@
1+
azure-datalake-store
2+
====================
3+
4+
.. image:: https://travis-ci.org/Azure/azure-data-lake-store-python.svg?branch=dev
5+
:target: https://travis-ci.org/Azure/azure-data-lake-store-python
6+
7+
azure-datalake-store is a file-system management system in python for the
8+
Azure Data-Lake Store.
9+
10+
To install from source instead of pip (for local testing and development):
11+
12+
.. code-block:: bash
13+
14+
> pip install -r dev_requirements.txt
15+
> python setup.py develop
16+
17+
18+
To run tests, you are required to set the following environment variables:
19+
azure_tenant_id, azure_username, azure_password, azure_data_lake_store_name
20+
21+
To play with the code, here is a starting point:
22+
23+
.. code-block:: python
24+
25+
from azure.datalake.store import core, lib, multithread
26+
token = lib.auth(tenant_id, username, password)
27+
adl = core.AzureDLFileSystem(store_name, token)
28+
29+
# typical operations
30+
adl.ls('')
31+
adl.ls('tmp/', detail=True)
32+
adl.cat('littlefile')
33+
adl.head('gdelt20150827.csv')
34+
35+
# file-like object
36+
with adl.open('gdelt20150827.csv', blocksize=2**20) as f:
37+
print(f.readline())
38+
print(f.readline())
39+
print(f.readline())
40+
# could have passed f to any function requiring a file object:
41+
# pandas.read_csv(f)
42+
43+
with adl.open('anewfile', 'wb') as f:
44+
# data is written on flush/close, or when buffer is bigger than
45+
# blocksize
46+
f.write(b'important data')
47+
48+
adl.du('anewfile')
49+
50+
# recursively download the whole directory tree with 5 threads and
51+
# 16MB chunks
52+
multithread.ADLDownloader(adl, "", 'my_temp_dir', 5, 2**24)
53+
54+
55+
To interact with the API at a higher-level, you can use the provided
56+
command-line interface in "azure/datalake/store/cli.py". You will need to set
57+
the appropriate environment variables as described above to connect to the
58+
Azure Data Lake Store.
59+
60+
To start the CLI in interactive mode, run "python azure/datalake/store/cli.py"
61+
and then type "help" to see all available commands (similiar to Unix utilities):
62+
63+
.. code-block:: bash
64+
65+
> python azure/datalake/store/cli.py
66+
azure> help
67+
68+
Documented commands (type help <topic>):
69+
========================================
70+
cat chmod close du get help ls mv quit rmdir touch
71+
chgrp chown df exists head info mkdir put rm tail
72+
73+
azure>
74+
75+
76+
While still in interactive mode, you can run "ls -l" to list the entries in the
77+
home directory ("help ls" will show the command's usage details). If you're not
78+
familiar with the Unix/Linux "ls" command, the columns represent 1) permissions,
79+
2) file owner, 3) file group, 4) file size, 5-7) file's modification time, and
80+
8) file name.
81+
82+
.. code-block:: bash
83+
84+
> python azure/datalake/store/cli.py
85+
azure> ls -l
86+
drwxrwx--- 0123abcd 0123abcd 0 Aug 02 12:44 azure1
87+
-rwxrwx--- 0123abcd 0123abcd 1048576 Jul 25 18:33 abc.csv
88+
-r-xr-xr-x 0123abcd 0123abcd 36 Jul 22 18:32 xyz.csv
89+
drwxrwx--- 0123abcd 0123abcd 0 Aug 03 13:46 tmp
90+
azure> ls -l --human-readable
91+
drwxrwx--- 0123abcd 0123abcd 0B Aug 02 12:44 azure1
92+
-rwxrwx--- 0123abcd 0123abcd 1M Jul 25 18:33 abc.csv
93+
-r-xr-xr-x 0123abcd 0123abcd 36B Jul 22 18:32 xyz.csv
94+
drwxrwx--- 0123abcd 0123abcd 0B Aug 03 13:46 tmp
95+
azure>
96+
97+
98+
To download a remote file, run "get remote-file [local-file]". The second
99+
argument, "local-file", is optional. If not provided, the local file will be
100+
named after the remote file minus the directory path.
101+
102+
.. code-block:: bash
103+
104+
> python azure/datalake/store/cli.py
105+
azure> ls -l
106+
drwxrwx--- 0123abcd 0123abcd 0 Aug 02 12:44 azure1
107+
-rwxrwx--- 0123abcd 0123abcd 1048576 Jul 25 18:33 abc.csv
108+
-r-xr-xr-x 0123abcd 0123abcd 36 Jul 22 18:32 xyz.csv
109+
drwxrwx--- 0123abcd 0123abcd 0 Aug 03 13:46 tmp
110+
azure> get xyz.csv
111+
2016-08-04 18:57:48,603 - ADLFS - DEBUG - Creating empty file xyz.csv
112+
2016-08-04 18:57:48,604 - ADLFS - DEBUG - Fetch: xyz.csv, 0-36
113+
2016-08-04 18:57:49,726 - ADLFS - DEBUG - Downloaded to xyz.csv, byte offset 0
114+
2016-08-04 18:57:49,734 - ADLFS - DEBUG - File downloaded (xyz.csv -> xyz.csv)
115+
azure>
116+
117+
118+
It is also possible to run in command-line mode, allowing any available command
119+
to be executed separately without remaining in the interpreter.
120+
121+
For example, listing the entries in the home directory:
122+
123+
.. code-block:: bash
124+
125+
> python azure/datalake/store/cli.py ls -l
126+
drwxrwx--- 0123abcd 0123abcd 0 Aug 02 12:44 azure1
127+
-rwxrwx--- 0123abcd 0123abcd 1048576 Jul 25 18:33 abc.csv
128+
-r-xr-xr-x 0123abcd 0123abcd 36 Jul 22 18:32 xyz.csv
129+
drwxrwx--- 0123abcd 0123abcd 0 Aug 03 13:46 tmp
130+
>
131+
132+
133+
Also, downloading a remote file:
134+
135+
.. code-block:: bash
136+
137+
> python azure/datalake/store/cli.py get xyz.csv
138+
2016-08-04 18:57:48,603 - ADLFS - DEBUG - Creating empty file xyz.csv
139+
2016-08-04 18:57:48,604 - ADLFS - DEBUG - Fetch: xyz.csv, 0-36
140+
2016-08-04 18:57:49,726 - ADLFS - DEBUG - Downloaded to xyz.csv, byte offset 0
141+
2016-08-04 18:57:49,734 - ADLFS - DEBUG - File downloaded (xyz.csv -> xyz.csv)
142+
>

appveyor.yml

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
environment:
2+
matrix:
3+
- PYTHON: "C:\\Python27"
4+
- PYTHON: "C:\\Python33"
5+
- PYTHON: "C:\\Python34"
6+
- PYTHON: "C:\\Python35"
7+
- PYTHON: "C:\\Python27-x64"
8+
- PYTHON: "C:\\Python35-x64"
9+
10+
install:
11+
- "%PYTHON%\\python.exe -m pip install -U pip"
12+
- "%PYTHON%\\python.exe -m pip install -r dev_requirements.txt"
13+
- "%PYTHON%\\python.exe setup.py develop"
14+
15+
build: off
16+
17+
test_script:
18+
- "%PYTHON%\\Scripts\\pytest.exe -x -vvv --doctest-modules --pyargs azure.datalake.store tests"
Lines changed: 64 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,64 @@
1+
<?xml version="1.0" encoding="utf-8"?>
2+
<Project ToolsVersion="4.0" xmlns="http://schemas.microsoft.com/developer/msbuild/2003" DefaultTargets="Build">
3+
<PropertyGroup>
4+
<Configuration Condition=" '$(Configuration)' == '' ">Debug</Configuration>
5+
<SchemaVersion>2.0</SchemaVersion>
6+
<ProjectGuid>{ffed766a-ebcb-4506-b3af-eb152065bb5b}</ProjectGuid>
7+
<ProjectHome />
8+
<StartupFile>setup.py</StartupFile>
9+
<SearchPath />
10+
<WorkingDirectory>.</WorkingDirectory>
11+
<OutputPath>.</OutputPath>
12+
<ProjectTypeGuids>{888888a0-9f3d-457c-b088-3a5042f75d52}</ProjectTypeGuids>
13+
<LaunchProvider>Standard Python launcher</LaunchProvider>
14+
<InterpreterId />
15+
<InterpreterVersion />
16+
</PropertyGroup>
17+
<PropertyGroup Condition="'$(Configuration)' == 'Debug'" />
18+
<PropertyGroup Condition="'$(Configuration)' == 'Release'" />
19+
<PropertyGroup>
20+
<VisualStudioVersion Condition=" '$(VisualStudioVersion)' == '' ">10.0</VisualStudioVersion>
21+
<PtvsTargetsFile>$(MSBuildExtensionsPath32)\Microsoft\VisualStudio\v$(VisualStudioVersion)\Python Tools\Microsoft.PythonTools.targets</PtvsTargetsFile>
22+
</PropertyGroup>
23+
<ItemGroup>
24+
<Content Include="docs\requirements.txt" />
25+
<Content Include="License.txt" />
26+
<Content Include="requirements.txt" />
27+
</ItemGroup>
28+
<ItemGroup>
29+
<Compile Include="azure\datalake\store\cli.py" />
30+
<Compile Include="azure\datalake\store\core.py" />
31+
<Compile Include="azure\datalake\store\exceptions.py" />
32+
<Compile Include="azure\datalake\store\lib.py" />
33+
<Compile Include="azure\datalake\store\multithread.py" />
34+
<Compile Include="azure\datalake\store\transfer.py" />
35+
<Compile Include="azure\datalake\store\utils.py" />
36+
<Compile Include="azure\datalake\store\__init__.py" />
37+
<Compile Include="azure\datalake\__init__.py" />
38+
<Compile Include="azure\__init__.py" />
39+
<Compile Include="docs\source\conf.py" />
40+
<Compile Include="setup.py" />
41+
<Compile Include="tests\benchmarks.py" />
42+
<Compile Include="tests\conftest.py" />
43+
<Compile Include="tests\fake_settings.py" />
44+
<Compile Include="tests\settings.py" />
45+
<Compile Include="tests\testing.py" />
46+
<Compile Include="tests\test_cli.py" />
47+
<Compile Include="tests\test_core.py" />
48+
<Compile Include="tests\test_lib.py" />
49+
<Compile Include="tests\test_multithread.py" />
50+
<Compile Include="tests\test_transfer.py" />
51+
<Compile Include="tests\test_utils.py" />
52+
<Compile Include="tests\__init__.py" />
53+
</ItemGroup>
54+
<ItemGroup>
55+
<Folder Include="azure" />
56+
<Folder Include="azure\datalake" />
57+
<Folder Include="azure\datalake\store" />
58+
<Folder Include="docs" />
59+
<Folder Include="docs\source" />
60+
<Folder Include="tests" />
61+
</ItemGroup>
62+
<Import Project="$(PtvsTargetsFile)" Condition="Exists($(PtvsTargetsFile))" />
63+
<Import Project="$(MSBuildToolsPath)\Microsoft.Common.targets" Condition="!Exists($(PtvsTargetsFile))" />
64+
</Project>

azure-data-lake-store-python.sln

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
2+
Microsoft Visual Studio Solution File, Format Version 12.00
3+
# Visual Studio 14
4+
VisualStudioVersion = 14.0.25420.1
5+
MinimumVisualStudioVersion = 10.0.40219.1
6+
Project("{888888A0-9F3D-457C-B088-3A5042F75D52}") = "azure-data-lake-store-python", "azure-data-lake-store-python.pyproj", "{FFED766A-EBCB-4506-B3AF-EB152065BB5B}"
7+
EndProject
8+
Global
9+
GlobalSection(SolutionConfigurationPlatforms) = preSolution
10+
Debug|Any CPU = Debug|Any CPU
11+
Release|Any CPU = Release|Any CPU
12+
EndGlobalSection
13+
GlobalSection(ProjectConfigurationPlatforms) = postSolution
14+
{FFED766A-EBCB-4506-B3AF-EB152065BB5B}.Debug|Any CPU.ActiveCfg = Debug|Any CPU
15+
{FFED766A-EBCB-4506-B3AF-EB152065BB5B}.Release|Any CPU.ActiveCfg = Release|Any CPU
16+
EndGlobalSection
17+
GlobalSection(SolutionProperties) = preSolution
18+
HideSolutionNode = FALSE
19+
EndGlobalSection
20+
EndGlobal

0 commit comments

Comments
 (0)