Open
Description
Describe the bug
After a successful installation of DataHub to a secured machine without internet access, an ingestion process fails, because it attempts to download packages from https://pypi.python.org/simple/wheel/
To Reproduce
Steps to reproduce the behavior:
- Install a new instance of DataHub to a machine by following quickstart guide https://docs.datahub.com/docs/quickstart
- Turn off the internet access on that machine
- Login to DataHub as admin
- Go to Ingestion -> Create new source -> select Postgres (my specific example) -> put whatever values as host / port / user / password / database name / datasource name
- Click save & run ingestion
- See the ingestion process for this new data source has started, then running for some time and and then failed
- Click on "Details" and see in the the "Logs" section that it was trying to create python venv and access to pypi.org and then failed:
~~~~ Execution Summary - RUN_INGEST ~~~~
Execution finished with errors.
{'exec_id': '8bb47f33-cddb-4db7-9369-edb2faddd142',
'infos': ['2025-05-15 14:15:14.802052 INFO: Starting execution for task with name=RUN_INGEST',
"2025-05-15 14:17:16.043186 INFO: Failed to execute 'datahub ingest', exit code 2",
'2025-05-15 14:17:16.043688 INFO: Caught exception EXECUTING task_id=8bb47f33-cddb-4db7-9369-edb2faddd142, name=RUN_INGEST, '
'stacktrace=Traceback (most recent call last):\n'
' File "/home/datahub/.venv/lib/python3.10/site-packages/acryl/executor/execution/default_executor.py", line 139, in execute_task\n'
' task_event_loop.run_until_complete(task_future)\n'
' File "/usr/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete\n'
' return future.result()\n'
' File "/home/datahub/.venv/lib/python3.10/site-packages/acryl/executor/execution/sub_process_ingestion_task.py", line 402, in execute\n'
' raise TaskError("Failed to execute \'datahub ingest\'")\n'
"acryl.executor.execution.task.TaskError: Failed to execute 'datahub ingest'\n"],
'errors': []}
~~~~ Ingestion Logs ~~~~
Obtaining venv creation lock...
Acquired venv creation lock
venv doesn't exist.. minting..
Using CPython 3.10.17 interpreter at: /usr/bin/python
Creating virtual environment at: /tmp/datahub/ingest/venv-postgres-f9103e0adae041e3
Using Python 3.10.17 environment at: /tmp/datahub/ingest/venv-postgres-f9103e0adae041e3
error: Failed to fetch: `https://pypi.python.org/simple/wheel/`
Caused by: Request failed after 3 retries
Caused by: error sending request for url (https://pypi.python.org/simple/wheel/)
Caused by: operation timed out
Expected behavior
After installation, DataHub features should work out-of-the-box without the dependency of downloading additional packages from internet.