|
| 1 | +azure-datalake-store |
| 2 | +==================== |
| 3 | + |
| 4 | +.. image:: https://travis-ci.org/Azure/azure-data-lake-store-python.svg?branch=dev |
| 5 | + :target: https://travis-ci.org/Azure/azure-data-lake-store-python |
| 6 | + |
| 7 | +azure-datalake-store is a file-system management system in python for the |
| 8 | +Azure Data-Lake Store. |
| 9 | + |
| 10 | +To install from source instead of pip (for local testing and development): |
| 11 | + |
| 12 | +.. code-block:: bash |
| 13 | +
|
| 14 | + > pip install -r dev_requirements.txt |
| 15 | + > python setup.py develop |
| 16 | +
|
| 17 | +
|
| 18 | +To run tests, you are required to set the following environment variables: |
| 19 | +azure_tenant_id, azure_username, azure_password, azure_data_lake_store_name |
| 20 | + |
| 21 | +To play with the code, here is a starting point: |
| 22 | + |
| 23 | +.. code-block:: python |
| 24 | +
|
| 25 | + from azure.datalake.store import core, lib, multithread |
| 26 | + token = lib.auth(tenant_id, username, password) |
| 27 | + adl = core.AzureDLFileSystem(store_name, token) |
| 28 | +
|
| 29 | + # typical operations |
| 30 | + adl.ls('') |
| 31 | + adl.ls('tmp/', detail=True) |
| 32 | + adl.cat('littlefile') |
| 33 | + adl.head('gdelt20150827.csv') |
| 34 | +
|
| 35 | + # file-like object |
| 36 | + with adl.open('gdelt20150827.csv', blocksize=2**20) as f: |
| 37 | + print(f.readline()) |
| 38 | + print(f.readline()) |
| 39 | + print(f.readline()) |
| 40 | + # could have passed f to any function requiring a file object: |
| 41 | + # pandas.read_csv(f) |
| 42 | +
|
| 43 | + with adl.open('anewfile', 'wb') as f: |
| 44 | + # data is written on flush/close, or when buffer is bigger than |
| 45 | + # blocksize |
| 46 | + f.write(b'important data') |
| 47 | +
|
| 48 | + adl.du('anewfile') |
| 49 | +
|
| 50 | + # recursively download the whole directory tree with 5 threads and |
| 51 | + # 16MB chunks |
| 52 | + multithread.ADLDownloader(adl, "", 'my_temp_dir', 5, 2**24) |
| 53 | +
|
| 54 | +
|
| 55 | +To interact with the API at a higher-level, you can use the provided |
| 56 | +command-line interface in "azure/datalake/store/cli.py". You will need to set |
| 57 | +the appropriate environment variables as described above to connect to the |
| 58 | +Azure Data Lake Store. |
| 59 | + |
| 60 | +To start the CLI in interactive mode, run "python azure/datalake/store/cli.py" |
| 61 | +and then type "help" to see all available commands (similiar to Unix utilities): |
| 62 | + |
| 63 | +.. code-block:: bash |
| 64 | +
|
| 65 | + > python azure/datalake/store/cli.py |
| 66 | + azure> help |
| 67 | +
|
| 68 | + Documented commands (type help <topic>): |
| 69 | + ======================================== |
| 70 | + cat chmod close du get help ls mv quit rmdir touch |
| 71 | + chgrp chown df exists head info mkdir put rm tail |
| 72 | +
|
| 73 | + azure> |
| 74 | +
|
| 75 | +
|
| 76 | +While still in interactive mode, you can run "ls -l" to list the entries in the |
| 77 | +home directory ("help ls" will show the command's usage details). If you're not |
| 78 | +familiar with the Unix/Linux "ls" command, the columns represent 1) permissions, |
| 79 | +2) file owner, 3) file group, 4) file size, 5-7) file's modification time, and |
| 80 | +8) file name. |
| 81 | + |
| 82 | +.. code-block:: bash |
| 83 | +
|
| 84 | + > python azure/datalake/store/cli.py |
| 85 | + azure> ls -l |
| 86 | + drwxrwx--- 0123abcd 0123abcd 0 Aug 02 12:44 azure1 |
| 87 | + -rwxrwx--- 0123abcd 0123abcd 1048576 Jul 25 18:33 abc.csv |
| 88 | + -r-xr-xr-x 0123abcd 0123abcd 36 Jul 22 18:32 xyz.csv |
| 89 | + drwxrwx--- 0123abcd 0123abcd 0 Aug 03 13:46 tmp |
| 90 | + azure> ls -l --human-readable |
| 91 | + drwxrwx--- 0123abcd 0123abcd 0B Aug 02 12:44 azure1 |
| 92 | + -rwxrwx--- 0123abcd 0123abcd 1M Jul 25 18:33 abc.csv |
| 93 | + -r-xr-xr-x 0123abcd 0123abcd 36B Jul 22 18:32 xyz.csv |
| 94 | + drwxrwx--- 0123abcd 0123abcd 0B Aug 03 13:46 tmp |
| 95 | + azure> |
| 96 | +
|
| 97 | +
|
| 98 | +To download a remote file, run "get remote-file [local-file]". The second |
| 99 | +argument, "local-file", is optional. If not provided, the local file will be |
| 100 | +named after the remote file minus the directory path. |
| 101 | + |
| 102 | +.. code-block:: bash |
| 103 | +
|
| 104 | + > python azure/datalake/store/cli.py |
| 105 | + azure> ls -l |
| 106 | + drwxrwx--- 0123abcd 0123abcd 0 Aug 02 12:44 azure1 |
| 107 | + -rwxrwx--- 0123abcd 0123abcd 1048576 Jul 25 18:33 abc.csv |
| 108 | + -r-xr-xr-x 0123abcd 0123abcd 36 Jul 22 18:32 xyz.csv |
| 109 | + drwxrwx--- 0123abcd 0123abcd 0 Aug 03 13:46 tmp |
| 110 | + azure> get xyz.csv |
| 111 | + 2016-08-04 18:57:48,603 - ADLFS - DEBUG - Creating empty file xyz.csv |
| 112 | + 2016-08-04 18:57:48,604 - ADLFS - DEBUG - Fetch: xyz.csv, 0-36 |
| 113 | + 2016-08-04 18:57:49,726 - ADLFS - DEBUG - Downloaded to xyz.csv, byte offset 0 |
| 114 | + 2016-08-04 18:57:49,734 - ADLFS - DEBUG - File downloaded (xyz.csv -> xyz.csv) |
| 115 | + azure> |
| 116 | +
|
| 117 | +
|
| 118 | +It is also possible to run in command-line mode, allowing any available command |
| 119 | +to be executed separately without remaining in the interpreter. |
| 120 | + |
| 121 | +For example, listing the entries in the home directory: |
| 122 | + |
| 123 | +.. code-block:: bash |
| 124 | +
|
| 125 | + > python azure/datalake/store/cli.py ls -l |
| 126 | + drwxrwx--- 0123abcd 0123abcd 0 Aug 02 12:44 azure1 |
| 127 | + -rwxrwx--- 0123abcd 0123abcd 1048576 Jul 25 18:33 abc.csv |
| 128 | + -r-xr-xr-x 0123abcd 0123abcd 36 Jul 22 18:32 xyz.csv |
| 129 | + drwxrwx--- 0123abcd 0123abcd 0 Aug 03 13:46 tmp |
| 130 | + > |
| 131 | +
|
| 132 | +
|
| 133 | +Also, downloading a remote file: |
| 134 | + |
| 135 | +.. code-block:: bash |
| 136 | +
|
| 137 | + > python azure/datalake/store/cli.py get xyz.csv |
| 138 | + 2016-08-04 18:57:48,603 - ADLFS - DEBUG - Creating empty file xyz.csv |
| 139 | + 2016-08-04 18:57:48,604 - ADLFS - DEBUG - Fetch: xyz.csv, 0-36 |
| 140 | + 2016-08-04 18:57:49,726 - ADLFS - DEBUG - Downloaded to xyz.csv, byte offset 0 |
| 141 | + 2016-08-04 18:57:49,734 - ADLFS - DEBUG - File downloaded (xyz.csv -> xyz.csv) |
| 142 | + > |
0 commit comments