Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Quickstart starts a Datahub version without matching PyPi packages for dbt #12538

Open
james-larsen opened this issue Feb 3, 2025 · 5 comments
Labels
bug Bug report

Comments

@james-larsen
Copy link

Describe the bug
When running the datahub docker quickstart and attempting to ingest dbt metadata objects, it cannot create a virtual env since there is no corresponding PyPi packages. Below is how this problem presents itself when trying to ingest metadata from dbt.

To Reproduce
Steps to reproduce the behavior:

  1. Follow Datahub Quickstart
  2. Login using demo "datahub" account
  3. To generate the below logs: Try to run metadata ingestion for dbt
~~~~ Execution Summary - RUN_INGEST ~~~~
Execution finished with errors.
{'exec_id': '20240006-70db-45c4-958f-daae42ca1dcd',
 'infos': ['2025-01-31 15:40:26.524331 INFO: Starting execution for task with name=RUN_INGEST',
           "2025-01-31 15:40:30.775089 INFO: Failed to execute 'datahub ingest', exit code 1",
           '2025-01-31 15:40:30.776539 INFO: Caught exception EXECUTING task_id=20240006-70db-45c4-958f-daae42ca1dcd, name=RUN_INGEST, '
           'stacktrace=Traceback (most recent call last):\n'
           '  File "/datahub-ingestion/.venv/lib/python3.10/site-packages/acryl/executor/execution/default_executor.py", line 139, in execute_task\n'
           '    task_event_loop.run_until_complete(task_future)\n'
           '  File "/usr/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete\n'
           '    return future.result()\n'
           '  File "/datahub-ingestion/.venv/lib/python3.10/site-packages/acryl/executor/execution/sub_process_ingestion_task.py", line 402, in '
           'execute\n'
           '    raise TaskError("Failed to execute \'datahub ingest\'")\n'
           "acryl.executor.execution.task.TaskError: Failed to execute 'datahub ingest'\n"],
 'errors': []}

~~~~ Ingestion Logs ~~~~
Obtaining venv creation lock...
Acquired venv creation lock
venv doesn't exist.. minting..
Using CPython 3.10.12 interpreter at: /usr/bin/python
Creating virtual environment at: /tmp/datahub/ingest/venv-dbt-9c8a88a4b3d58d78
Resolved 3 packages in 237ms
Prepared 3 packages in 2ms
Installed 3 packages in 2.66s
 + pip==25.0
 + setuptools==75.8.0
 + wheel==0.45.1
+ uv pip install 'acryl-datahub[datahub-rest,datahub-kafka,dbt]==1.0.0rc1'
  × No solution found when resolving dependencies:
  ╰─▶ Because there is no version of acryl-datahub[dbt]==1.0.0rc1 and
      you require acryl-datahub[dbt]==1.0.0rc1, we can conclude that your
      requirements are unsatisfiable.

Expected behavior
Ingestion in the quickstart version works. Quickstart uses a fully supported version. Alternatively update the documentation to cover this issue.

Desktop (please complete the following information):

  • OS: Windows 10 using WSL2 and Rancher

Additional context
This seems very similar to this bug report for Postgres, which was resolved back in September 2024.

@james-larsen james-larsen added the bug Bug report label Feb 3, 2025
@james-larsen
Copy link
Author

Tried again this morning, and not getting the original error anymore. However, I am now getting the below error, which seems to still be about quickstart not installing the proper dependencies for dbt:

ModuleNotFoundError: No module named 'more_itertools'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/datahub-ingestion/.venv/lib/python3.10/site-packages/datahub/ingestion/run/pipeline.py", line 138, in _add_init_error_context
    yield
  File "/datahub-ingestion/.venv/lib/python3.10/site-packages/datahub/ingestion/run/pipeline.py", line 289, in __init__
    source_class = source_registry.get(self.source_type)
  File "/datahub-ingestion/.venv/lib/python3.10/site-packages/datahub/ingestion/api/registry.py", line 178, in get
    raise ConfigurationError(
datahub.configuration.common.ConfigurationError: dbt is disabled due to a missing dependency: more_itertools; try running `pip install 'acryl-datahub[dbt]'`

@cloonix
Copy link

cloonix commented Feb 7, 2025

I have the same issue with a SAP HANA connection.

+ uv pip install 'acryl-datahub[datahub-rest,datahub-kafka,hana]==1.0.0rc1'
  × No solution found when resolving dependencies:
  ╰─▶ Because there is no version of acryl-datahub[hana]==1.0.0rc1 and
      you require acryl-datahub[hana]==1.0.0rc1, we can conclude that your
      requirements are unsatisfiable.

@bossenti
Copy link
Contributor

This is due to the fact that there is no release candidate for 1.0 of the Python package available on PyPi yet: https://pypi.org/project/acryl-datahub/#history
You can try the latest version of acryl-datahub with the quickstart using the rc1 of 1.0. At least it worked for me :)

@cloonix
Copy link

cloonix commented Feb 11, 2025

This is due to the fact that there is no release candidate for 1.0 of the Python package available on PyPi yet: https://pypi.org/project/acryl-datahub/#history You can try the latest version of acryl-datahub with the quickstart using the rc1 of 1.0. At least it worked for me :)

I don't understand what you mean. The quickstart is using 1.0.0rc1, which is the problem. No pypi packages yet.

There should be an option for quickstart to use another github tag/version.

@hsheth2
Copy link
Collaborator

hsheth2 commented Feb 11, 2025

This was an oversight on our part - we've cut the pypi rc release now: https://pypi.org/project/acryl-datahub/1.0.0rc1/

Also note that there's a "default cli version" config that you can use to override these defaults in the future. It's configurable globally with the UI_INGESTION_DEFAULT_CLI_VERSION env var on GMS, and also on a per-source basis in the advanced setup field: https://datahubproject.io/docs/ui-ingestion/#advanced-ingestion-configs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Bug report
Projects
None yet
Development

No branches or pull requests

4 participants