Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"Datahub integration" with postgres exception #12604

Open
rospe opened this issue Feb 12, 2025 · 0 comments
Open

"Datahub integration" with postgres exception #12604

rospe opened this issue Feb 12, 2025 · 0 comments
Assignees
Labels
bug Bug report datahub-v1.0-rc Issue or PR related to DataHub v1.0 Release Candidates

Comments

@rospe
Copy link

rospe commented Feb 12, 2025

Describe the bug
I have setup the "Datahub integration" to pull data from one env into the other.
It fails right at the beginning:

[2025-02-11 14:27:07,266] INFO     {datahub.cli.ingest_cli:150} - DataHub CLI version: 1!0.15.0+docker
[2025-02-11 14:27:07,738] INFO     {datahub.ingestion.run.pipeline:272} - Sink configured successfully. DataHubRestEmitter: configured to talk to https://XXXX/api/gms with token: XXX
[2025-02-11 14:27:10,042] INFO     {datahub.ingestion.run.pipeline:297} - Source configured successfully.
[2025-02-11 14:27:10,043] INFO     {datahub.cli.ingest_cli:131} - Starting metadata ingestion
[2025-02-11 14:27:10,044] INFO     {datahub.ingestion.source.datahub.datahub_source:64} - Ingesting DataHub metadata up until 2025-02-11 14:27:10.044630+00:00
[2025-02-11 14:27:10,331] INFO     {datahub.ingestion.source.datahub.datahub_source:108} - Fetching database aspects starting from 1970-01-01 00:00:00+00:00
                aspect,
                version
        ) as t
        WHERE 1=1
            AND (removed = false or removed is NULL)
        ORDER BY
            createdon,
            urn,
            aspect,
            version
        ]
[parameters: {'exclude_aspects': ['globalSettingsInfo', 'testResults', 'dataHubIngestionSourceKey', 'dataHubIngestionSourceInfo', 'dataHubSecretKey', 'datahubIngestionCheckpoint', 'datahubIngestionRunSummary', 'globalSettingsKey', 'dataHubSecretValue'], 'since_createdon': '1970-01-01 00:00:00.000000'}]
(Background on this error at: https://sqlalche.me/e/14/f405)
Traceback (most recent call last):
  File "/datahub-ingestion/.venv/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 1900, in _execute_context
    self.dialect.do_execute(
  File "/datahub-ingestion/.venv/lib/python3.10/site-packages/sqlalchemy/engine/default.py", line 736, in do_execute
    cursor.execute(statement, parameters)
psycopg2.errors.SyntaxError: syntax error at or near "ARRAY"
LINE 23:                 AND mav.aspect NOT IN ARRAY['globalSettingsI...
                                               ^


The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/metadata-ingestion/src/datahub/ingestion/run/pipeline.py", line 465, in run
    for wu in itertools.islice(
  File "/metadata-ingestion/src/datahub/ingestion/api/source_helpers.py", line 148, in auto_workunit_reporter
    for wu in stream:
  File "/metadata-ingestion/src/datahub/ingestion/source/datahub/datahub_source.py", line 76, in get_workunits_internal
    yield from self._get_database_workunits(
  File "/metadata-ingestion/src/datahub/ingestion/source/datahub/datahub_source.py", line 111, in _get_database_workunits
    for i, (mcp, createdon) in enumerate(mcps):
  File "/metadata-ingestion/src/datahub/ingestion/source/datahub/datahub_database_reader.py", line 198, in get_aspects
    for row in orderer(rows):
  File "/metadata-ingestion/src/datahub/ingestion/source/datahub/datahub_database_reader.py", line 40, in __call__
    for row in rows:
  File "/metadata-ingestion/src/datahub/ingestion/source/datahub/datahub_database_reader.py", line 189, in _get_rows
    yield from self.execute_server_cursor(self.query, params)
  File "/metadata-ingestion/src/datahub/ingestion/source/datahub/datahub_database_reader.py", line 160, in execute_server_cursor
    result = conn.execute(query, params)
  File "/datahub-ingestion/.venv/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 1365, in execute
    return self._exec_driver_sql(
  File "/datahub-ingestion/.venv/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 1669, in _exec_driver_sql
    ret = self._execute_context(
  File "/datahub-ingestion/.venv/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 1943, in _execute_context
    self._handle_dbapi_exception(
  File "/datahub-ingestion/.venv/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 2124, in _handle_dbapi_exception
    util.raise_(
  File "/datahub-ingestion/.venv/lib/python3.10/site-packages/sqlalchemy/util/compat.py", line 210, in raise_
    raise exception
  File "/datahub-ingestion/.venv/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 1900, in _execute_context
    self.dialect.do_execute(
  File "/datahub-ingestion/.venv/lib/python3.10/site-packages/sqlalchemy/engine/default.py", line 736, in do_execute
    cursor.execute(statement, parameters)
sqlalchemy.exc.ProgrammingError: (psycopg2.errors.SyntaxError) syntax error at or near "ARRAY"
LINE 23:                 AND mav.aspect NOT IN ARRAY['globalSettingsI...
                                               ^

[SQL: 
        SELECT *
        FROM (
            SELECT
                mav.urn,
                mav.aspect,
                mav.metadata,
                mav.systemmetadata,
                mav.createdon,
                mav.version,
                removed
            FROM metadata_aspect_v2 as mav
            LEFT JOIN (
                SELECT
                    *,
                    JSON_EXTRACT(metadata, '$.removed') as removed
                FROM metadata_aspect_v2
                WHERE aspect = 'status'
                AND version = 0
            ) as sd ON sd.urn = mav.urn
            WHERE 1 = 1
                AND mav.version = 0
                AND mav.aspect NOT IN %(exclude_aspects)s
                AND mav.createdon >= %(since_createdon)s
            ORDER BY
                createdon,
                urn,
                aspect,
                version
        ) as t
        WHERE 1=1
            AND (removed = false or removed is NULL)
        ORDER BY
            createdon,
            urn,
            aspect,
            version
        ]
[parameters: {'exclude_aspects': ['globalSettingsInfo', 'testResults', 'dataHubIngestionSourceKey', 'dataHubIngestionSourceInfo', 'dataHubSecretKey', 'datahubIngestionCheckpoint', 'datahubIngestionRunSummary', 'globalSettingsKey', 'dataHubSecretValue'], 'since_createdon': '1970-01-01 00:00:00.000000'}]
(Background on this error at: https://sqlalche.me/e/14/f405)
[2025-02-11 14:27:10,579] INFO     {datahub.cli.ingest_cli:144} - Finished metadata ingestion
Pipeline finished with at least 4 failures; produced 0 events in 0.41 seconds.

System details (please complete the following information):

  • DataHub Version Tag [v1.0-rc1] as target and source, using latest CLI.
  • Both DBs are Aurora serverless v2 with postgres flavor.
@rospe rospe added bug Bug report datahub-v1.0-rc Issue or PR related to DataHub v1.0 Release Candidates labels Feb 12, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Bug report datahub-v1.0-rc Issue or PR related to DataHub v1.0 Release Candidates
Projects
None yet
Development

No branches or pull requests

4 participants