Skip to content

multiple partitions for hive not delimited correctly #234

@Robert-Christensen-visa

Description

To get a list of hive partitions the command SHOW PARTITIONS is used:

def _parse_hive_partition_description(
self,
cursor: Union["sqlalchemy.engine.base.Connection", "hive.Cursor"],
schema: str,
table_name: str,
):
"""
Extract all partition informaton for a given table
"""
cursor.execute(f"USE {schema}")
result = self._fetch_all_results(cursor, f"SHOW PARTITIONS {table_name}")
return [row[0] for row in result]

For tables with multiple partition keys, it will return a list that is / delimited. For example, if the table has two partition keys, date and region, one of the entries returned might be like this:

date=20210101/region=south

This string is used without modifications to get additional information about each partition using DESCRIBE FORMATTED {table_name} PARTITION ({partition}). If multiple partition keys exist when using this command, the list should be , separated, not / separated.

if partition:
result = self._fetch_all_results(
cursor, f"DESCRIBE FORMATTED {table_name} PARTITION ({partition})"
)

Because of this, when I try to read in a table with multiple hive partition keys dask-sql throws an error.

The issue #179 might also be somewhat related to the example above.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workinghiveImprovements to or issues with Hive functionalityneeds triageAwaiting triage by a dask-sql maintainer

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions