Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

server.views.populate_image gives outdated image even with refresh to the datasource #1531

Closed
david5010 opened this issue Nov 11, 2024 · 4 comments
Labels
help wanted A user needs help, may be a mistake, a bug or a feature request

Comments

@david5010
Copy link

Describe the bug
Goal: I want to download new images from a tableau sheet. I'm using server.views.populate_image(sheet_name, img_request_option). Everything is working. I made sure to refresh the datasource prior to downloading, but the change isn't reflected. I go on Tableau on my browser and everything seems fine.

However, when I try to refresh the workbook, I get this error:
409090: Bad Request Extract operation for the workbook '...' is not allowed.. (Extract operation for the workbook is not allowed.)
Versions
Details of your environment, including:

  • Tableau Server version (or note if using Tableau Online)
  • Python version: 3.11.10
  • TSC library version: 0.34

To Reproduce

def refresh_workbook(server, tableau_auth, workbook) -> None:
    # workbook is a str (name of the workbook)
    with server.auth.sign_in(tableau_auth):
        all_workbooks, _ = server.workbooks.get()
        workbook_obj = next((wb for wb in all_workbooks if wb.name == workbook), None)
        if not workbook_obj:
            print("Workbook not found.")
            return
        job_done = False
        print(f"Refreshing {workbook_obj.name}")
        refresh_workbook = server.workbooks.refresh(workbook_obj.id)
        while not job_done:
            job = server.jobs.get_by_id(refresh_workbook.id)  # Use the job id to get the latest status
            if job.finish_code == TSC.JobItem.FinishCode.Success:
                print("Refresh completed successfully.")
                job_done = True
            elif job.finish_code == TSC.JobItem.FinishCode.Failed:
                print("Error: Refresh job failed.")
                job_done = True
            else:
                print(f"Refresh in progress... Status: {job.progress}%")
                time.sleep(5)  # Wait a bit before checking again
        return

def refresh_db(server, tableau_auth, datasource) -> None:

    with server.auth.sign_in(tableau_auth):
        all_datasources, _ = server.datasources.get()
        datasource_obj = next((ds for ds in all_datasources if ds.name == datasource), None)
        if datasource_obj:
            try:
                refresh_job = server.datasources.refresh(datasource_obj)
                print(f"Refresh started for {datasource_obj.name}")
                job_done = False
            except Exception as e:
                print(f"Looking for queued job... {e}")
                all_jobs, _ = server.jobs.get()
                refresh_job = None
                for j in all_jobs:
                    job = server.jobs.get_by_id(j.id)
                    if job.datasource_name == datasource:
                        refresh_job = job
                        job_done = False
                        print("Found queued job")
                        break
                if refresh_job is None:
                    print("Couldn't find the queued job...")
                    return
                
            
            while not job_done:
                job = server.jobs.get_by_id(refresh_job.id)  # Use the job id to get the latest status
                if job.finish_code == TSC.JobItem.FinishCode.Success:
                    print("Refresh completed successfully.")
                    job_done = True
                elif job.finish_code == TSC.JobItem.FinishCode.Failed:
                    print("Error: Refresh job failed.")
                    job_done = True
                else:
                    print(f"Refresh in progress... Status: {job.progress}%")
                    time.sleep(5)  # Wait a bit before checking again
        else:
            print("Datasource not found.")

def get_img(site_id, server, workbook,
            sheet_name, width, height, outpath) -> None:
    
    tableau_auth = TSC.PersonalAccessTokenAuth(TOKEN_NAME, TOKEN_VALUE, site_id=site_id)
    t_server = TSC.Server(server, use_server_version=True)
    print(f"Refresh DataSource {DATASOURCE}")
    refresh_db(t_server, tableau_auth, DATASOURCE) # ! Testing, temporary might need fixing
    # print(f"Refresh Workbook {workbook}")
    # refresh_workbook(t_server, tableau_auth, workbook)
    with t_server.auth.sign_in(tableau_auth):
        all_workbooks, _ = t_server.workbooks.get()
        workbook_obj = next((wb for wb in all_workbooks if wb.name == workbook), None)

        if workbook_obj:
            print(f"Workbook Found: {workbook_obj.name}")
            t_server.workbooks.populate_views(workbook_obj)
            sheet_view = next((v for v in workbook_obj.views if v.name == sheet_name), None)
            
            if sheet_view:
                print(f"Sheet Found: {sheet_view.name}")
                image_req_option = TSC.ImageRequestOptions(
                                                        imageresolution=TSC.ImageRequestOptions.Resolution.High,
                                                        viz_height=height,
                                                        viz_width=width)  
                t_server.views.populate_image(sheet_view, image_req_option)
                img_file_name = f"{workbook}_{sheet_name}.png".replace(" ", "")
                with open(f"{outpath}/{img_file_name}", "wb") as file:
                    file.write(sheet_view.image)
                print(f"Screenshot saved at {outpath}/{img_file_name}")
        
            else:
                print("Sheet not found.")
        else:
            print("Workbook not found.")

Results
What are the results or error messages received?

NOTE: Be careful not to post user names, passwords, auth tokens or any other private or sensitive information.

@jorwoods
Copy link
Contributor

First, let's address the core issue. Have you tried setting the maxage parameter in your ImageRequestOptions? Your code snippet doesn't show it. From The REST API Reference

If you make multiple requests for an image, subsequent calls return a cached version of the image. This means that the returned image might not include the latest changes to the view. To decrease the amount of time that an image is cached, use the maxAge parameter.

max-age-minutes | (Optional) The maximum number of minutes a view image will be cached before being refreshed. To prevent multiple image requests from overloading the server, the shortest interval you can set is one minute. There is no maximum value, but the server job enacting the caching action may expire before a long cache period is reached.

The default data-access cache is "low" which means it will reuse the cache as long as possible. From the tsm data-access docs

Sets the frequency to refresh cached data with a new query to the underlying data source. You can specify a number to define the maximum number of minutes that data should be cached. You can also specify low to cache and reuse data for as long as possible, or always (equivalent to 0) to refresh data each time that a page is loaded. If this option is not specified, it defaults to low.

Now, onto another issue I see:

all_workbooks, _ = t_server.workbooks.get()
workbook_obj = next((wb for wb in all_workbooks if wb.name == workbook), None)

if workbook_obj:
    print(f"Workbook Found: {workbook_obj.name}")
    t_server.workbooks.populate_views(workbook_obj)
    sheet_view = next((v for v in workbook_obj.views if v.name == sheet_name), None)

tsc endpoints also allow you to Filter and sort what you get from the server so you don't need to iterate over everything. Views even allow you to filter directly by name and workbook name, so you could filter for both criteria in one API call; reducing the load on your server and probably dramatically reducing the run time of your code.

@jorwoods
Copy link
Contributor

tsc also offers a method wait_for_job that will handle polling for job status.

@jorwoods
Copy link
Contributor

jorwoods commented Nov 19, 2024

Another thing that occurs to me is that server.workbooks.get is a paginated endpoint, and you do not handle the multiple pages. So if your intended workbook is not within the first 100 results, you end up running the "workbook not found" block. And since no sorting was provided to the .get method, the workbooks are returned in a non-deterministic order.

Even though it's a common pattern, even within the samples of this library, to call the variable all_workbooks or all_datasources, it's actually a misnomer. It should really be one_page_of_workbooks.

I recommend using the queryset methods (.all and in this case .filter) over calling .get directly. You can also use TSC.Pager to wrap the endpoint and turn it into an iterable, so you can directly iterate over all results on the server. Note that .all and .filter are themselves already directly iterable as well automatically handle pagination and do not need to be passed into TSC.Pager.

@jacalata jacalata added the help wanted A user needs help, may be a mistake, a bug or a feature request label Jan 4, 2025
@jacalata
Copy link
Contributor

jacalata commented Jan 4, 2025

Seems like you covered all the likely angles. (I'm assuming the Bad Request: Extract Operation not allowed was because the workbook connects directly to the datasource, not to an extract)

We should fix that pattern in the samples, I'll open a separate work item for some cleanup I've been doing anyway.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted A user needs help, may be a mistake, a bug or a feature request
Projects
None yet
Development

No branches or pull requests

3 participants