Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document how to configure DuckDB datasource #1033

Open
MarcSkovMadsen opened this issue Feb 2, 2025 · 0 comments
Open

Document how to configure DuckDB datasource #1033

MarcSkovMadsen opened this issue Feb 2, 2025 · 0 comments

Comments

@MarcSkovMadsen
Copy link

MarcSkovMadsen commented Feb 2, 2025

I would like to be able to use the DuckDBSource data source. But its not enough for me to just read datafiles. I need to

pre-configure

timezone + home directory + azure blobstorage. I cannot use this in production without setting the home_directory to /tmp as its the only writeable /path for security reasons.

An example code section is I normally run is:

connection = duckdb.connect(str(db_path))

    connection.sql(
        """
SET TIMEZONE='Europe/Copenhagen';
"""
    )

    if _use_tmp_dir_instead_of_home_dir():  # nosec
        connection.sql("SET home_directory='/tmp';")
    if isinstance(source_filesystem, AzureBlobFileSystem):
        query = f"""
            INSTALL azure;
            LOAD azure;
            SET azure_transport_option_type='curl';
            CREATE OR REPLACE SECRET secret1 (
                TYPE AZURE,
                CONNECTION_STRING '{config._blob_storage_connection_string}'
            );
        """  # noqa: SLF001
        connection.sql(query=query)
        prefix = "az://"

    elif isinstance(source_filesystem, DirFileSystem):
        prefix = f"{source_filesystem.path}/"

post-configure

After loading the tables/ files I would like to create a custom view combining the tables. That is really the data set I would like to query:

    query = """
CREATE OR REPLACE VIEW
    timeseries_view AS
SELECT
   ...
FROM
    timeseries_results ts
    join model_job ..
    join model ...
    left join job_attribute ja ...
    """  # noqa: E501
    connection.sql(query)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant