Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug: Move from psycopg2 to psycopg makes Redshift unusable with ibis-framework #10804

Open
1 task done
galen-ft opened this issue Feb 6, 2025 · 9 comments
Open
1 task done
Labels
bug Incorrect behavior inside of ibis

Comments

@galen-ft
Copy link

galen-ft commented Feb 6, 2025

What happened?

Hi ibis support team and thank you for this wonderful package ❤

I was looking for an alternative to R's dplyr + dbplyr for python and the ibis-framework is the best one I could find.

Issue description

I've been experimenting with the new features of ibis-framework 10.0.0.dev490 (i.e. the addition of cases() and parameter distinct of the collect() method among others) and noticed that the move from psycopg2 to psycopg (3) introduces a regression which makes ibis-framework unusable with the Redshift backend. Please note that ibis-framework v9.5.0, which uses psycopg2, works well with Redshift.

Potential solutions

From psycopg/psycopg#122 it isn't really clear when and if official Redshift support will be added to psycopg (3). Also, looking at #10659 it appears that the move away from psycopg2 was not motivated by breaking changes. I do understand that there are of course other valid reasons for moving to psycopg (3).

Would it be possible to go back to psycopg2?

Alternatively, is adding official Redshift support (via package redshift-connector or equivalent) on your project roadmap?

Steps to reproduce the issue

Opening a connection to Redshift:

rs_conn = ibis.postgres.connect(
        user=<RS_UID>,
        password=<RS_PWD>,
        host=<RS_HOST>,
        port=<RS_PORT>,
        database=<RS_DATABASE>,
)

results in:

psycopg.NotSupportedError: codec not available in Python: 'UNICODE'

when execution reaches the Backend._post_connect() method.

I was able to get past the connection issue by using client_encoding=UTF8 as suggested in psycopg/psycopg#122 like this:

os.environ["PGCLIENTENCODING"] = "utf-8"

but then creating an ibis table t = rs_conn.table(name="table", database="prod.schema") using method Backend.raw_sql() hits a type incompatibility:

psycopg.errors.UndefinedColumn: column "typarray" does not exist in t

What version of ibis are you using?

10.0.0.dev490

What backend(s) are you using, if any?

Redshift (through the ibis.postgres backend)

Relevant log output

Code of Conduct

  • I agree to follow this project's Code of Conduct
@galen-ft galen-ft added the bug Incorrect behavior inside of ibis label Feb 6, 2025
@galen-ft galen-ft changed the title bug: bug: Move from psycopg2 to psycopg makes Redshift unusable with ibis-framework Feb 6, 2025
@cpcloud
Copy link
Member

cpcloud commented Feb 6, 2025

Thanks for the issue!

We've heard reports from ... I can't remember exactly where that redshift works with Ibis, but glad to have a concrete statement about it somewhere.

psycopg.errors.UndefinedColumn: column "typarray" does not exist in t

For this error, can you provide a more detailed traceback? It might be possible to debug that without a redshift setup (probably wishful thinking!), but I would need the traceback.

@galen-ft
Copy link
Author

galen-ft commented Feb 6, 2025

Hi @cpcloud and thank you for the prompt reply.

t = rs_conn.table(name="table", database="prod.schema")

Here is a more complete traceback:

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "[REDACTED]/.venv2/lib/python3.12/site-packages/ibis/backends/sql/__init__.py", line 94, in table
    table_schema = self.get_schema(name, catalog=catalog, database=database)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "[REDACTED]/.venv2/lib/python3.12/site-packages/ibis/backends/postgres/__init__.py", line 514, in get_schema
    with self._safe_raw_sql(type_info) as cur:
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/python/3.12.6/lib/python3.12/contextlib.py", line 137, in __enter__
    return next(self.gen)
           ^^^^^^^^^^^^^^
  File "[REDACTED]/.venv2/lib/python3.12/site-packages/ibis/backends/postgres/__init__.py", line 700, in _safe_raw_sql
    with contextlib.closing(self.raw_sql(*args, **kwargs)) as result:
                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "[REDACTED]/.venv2/lib/python3.12/site-packages/ibis/backends/postgres/__init__.py", line 717, in raw_sql
    psycopg.types.TypeInfo.fetch(con, "hstore"),
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "[REDACTED]/.venv2/lib/python3.12/site-packages/psycopg/_typeinfo.py", line 85, in fetch
    return cls._fetch(conn, name)
           ^^^^^^^^^^^^^^^^^^^^^^
  File "[REDACTED]/.venv2/lib/python3.12/site-packages/psycopg/_typeinfo.py", line 104, in _fetch
    cur.execute(cls._get_info_query(conn), {"name": name})
  File "[REDACTED]/.venv2/lib/python3.12/site-packages/psycopg/cursor.py", line 97, in execute
    raise ex.with_traceback(None)
psycopg.errors.UndefinedColumn: column "typarray" does not exist in t

Let me know if any further info would be useful.

@cpcloud
Copy link
Member

cpcloud commented Feb 6, 2025

It looks like catching that exception is probably the way to go to unblock this very official redshift support.

@galen-ft I will put up a PR that you should be able test against.

@galen-ft
Copy link
Author

galen-ft commented Feb 7, 2025

Just tried PR#10805 and it does work around the immediate issue:

t = rs_conn.table("table", database="prod.schema")
# I get the warning you defined:
[REDACTED].venv2/lib/python3.12/site-packages/ibis/backends/postgres/__init__.py:730: UserWarning: Failed to load hstore extension: column "typarray" does not exist in t
  warnings.warn(f"Failed to load hstore extension: {e}")

and then the following works as expected:

ibis.to_sql(t)
# SQLString('SELECT\n  *\nFROM "prod"."schema"."table"')

Could you please keep this issue open a little longer so that we can do some more testing?

SQLBackend.table() is no longer backward compatible

By the way, the following worked on ibis-framework==10.0.0.dev490:

t = rs_conn.table(name="table", database="prod.schema")

but no longer does on your branch (respectively, branch main):

t = rs_conn.table(name="table", database="prod.schema")
Traceback (most recent call last):
  File "<string>", line 1, in <module>
TypeError: SQLBackend.table() got some positional-only arguments passed as keyword arguments: 'name'

Seems like the first argument of method SQLBackend.table() is no longer a keyword arg (name), but is now a positional one (since commit 22de2b5).

Was this change intentional? If so, it causes errors when we instantiate tables by passing the name kwarg explicitly.

@cpcloud
Copy link
Member

cpcloud commented Feb 7, 2025

Was this change intentional? If so, it causes errors when we instantiate tables by passing the name kwarg explicitly.

Yes, this was intentional as part of addressing #9125.

@cpcloud
Copy link
Member

cpcloud commented Feb 7, 2025

Could you please keep this issue open a little longer so that we can do some more testing?

Yep, no worries!

@rudolfix
Copy link

the issue happens here:
https://github.com/ibis-project/ibis/blame/bbf98de4a71e4e9318919053cf2cdc8a0304a41e/ibis/backends/postgres/__init__.py#L725

hstore is not available on Redshift. to fix let UndefinedColumn to pass along with TypeError

@cpcloud
Copy link
Member

cpcloud commented Feb 12, 2025

I'm going to merge the PR associated with this issue, but I'll leave this issue open until you say so @galen-ft!

Thanks!

@galen-ft
Copy link
Author

Hi @cpcloud , thank you for being patient w.r.t. this issue. We are still writing a few more tests which we'd like to run against the Redshift backend. I will let you know once we're done.

Also, happy to have the PR merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Incorrect behavior inside of ibis
Projects
Status: backlog
Development

No branches or pull requests

3 participants