Upgrade Azure Blob Storage SDK to v12 #2573

pamelafox · 2024-11-21T20:47:49Z

This PR upgrades the azure-storage-blob SDK from v2 to v12, which involved a lot of interface changes. I followed the migration guide @ https://github.com/Azure/azure-sdk-for-python/blob/main/sdk/storage/azure-storage-blob/migration_guide.md and was able to get all the previous functionality working, at least according to the tests.

For ease of testing, I added a simple example app and a devcontainer.azure.json which brings in the Azurite local emulator. That means you can open this repo inside a Codespace or Dev Container with that configuration, and Azure Blob Storage will be running for you.

pamelafox · 2024-11-21T21:27:21Z

There's an admin test that is failing in CI that isn't failing locally, so I'm going to put some debugging in in the next commits to try to figure out what's happening.

samuelhwilliams · 2024-11-21T22:36:02Z

FYI, it is failing for me locally too 👀 Not that it helps much ... but I can possibly investigate a bit myself and see if I find anything.

samuelhwilliams · 2024-11-21T22:52:37Z

If it helps:

/Users/sam/work/personal/flask-admin/flask_admin/contrib/fileadmin/azure.py(220)read_file()
-> blob = self._container_client.get_blob_client(path).download_blob()
(Pdb) ll
215         def read_file(self, path):
216             path = self._ensure_blob_path(path)
217             if path is None:
218                 raise ValueError("No path provided")
219             breakpoint()
220  ->         blob = self._container_client.get_blob_client(path).download_blob()
221             return blob.readall()
(Pdb) pp path
'dummy.txt'
(Pdb) n
azure.core.exceptions.DeserializationError: Unable to deserialize response data. Data: undefined, bytearray
> /Users/sam/work/personal/flask-admin/flask_admin/contrib/fileadmin/azure.py(220)read_file()
-> blob = self._container_client.get_blob_client(path).download_blob()

pamelafox · 2024-11-21T23:08:06Z

Thanks! Might be a statefulness thing, I'll start a fresh env.

LeXofLeviafan · 2024-11-22T18:32:48Z

...Wouldn't it be better to include actual information in that error variable? I.e. make it an optional string (containing the error message if an error occurred).

The check would work the same, and the error could be logged properly before redirect (...as it probably should be, anyway.)

pamelafox · 2024-11-23T01:07:17Z

@samuelhwilliams It was an error due to the tests using an older Azure storage emulator, I've updated them to the official Microsoft hosted emulator now, and they're passing fine.
@LeXofLeviafan I agree that it would be nice if the errors were easier to debug. They should probably be logged as either warning or error, depending on whether they're likely user errors or server errors. That would be better in a different PR though, I think. I'm trying to only affect the Azure module for this one.

pamelafox · 2024-11-23T01:09:48Z

.devcontainer/azure/devcontainer.json

@@ -0,0 +1,14 @@
+// For format details, see https://aka.ms/devcontainer.json.
+{
+	"name": "flask-admin (Python + Azurite)",


In the future, I could add Postgres and Mongo to this dev container too, to have a single container that can run all the tests. It should be fairly easy given tests.yaml has the services setup, just copying that to docker-compose.yaml.

flask_admin/contrib/fileadmin/azure.py

pamelafox · 2024-11-23T01:18:57Z

Hm, it appears I still don't have an approach for handling unimportable Azure modules that makes mypy happy. Let me know if you have any thoughts about the best practice for importing extra modules in tests (and skipping tests if they don't exist). I'll take another look Monday otherwise.

samuelhwilliams · 2024-11-23T09:39:15Z

I don't think we need to support tests running without azure installed - I'd probably be fine with you removing the try/except around that test import.

pamelafox · 2024-11-25T20:59:28Z

Okay, I've made it so that the tests assume you've got azure-blob-storage installed. I've also made the tests devcontainer bring in Postgres and Mongo too, so I was able to get all the tests passing in my dev container environment, without additional setup.

samuelhwilliams · 2024-11-25T21:05:13Z

flask_admin/contrib/fileadmin/azure.py

        self._container_name = container_name
        self._connection_string = connection_string


One of the changes I made on the s3 admin side when bringing it up to date was to have __init__ take a client instance rather than parameters that get passed the client.

Do you think we should do something similar here and accept an instance of BlobServiceClient, or is it still fine to just use the connection string?

Oh I think thats nice, as I personally don't typically use connection strings (this was my first time using from_connection_string), so that gives developers more flexibility as to how they connect. I can make that change. That'd be breaking, right?

It would be, but this is scheduled to go out for the v2 release where we're making a bunch of breaking changes, so I'm ok with it.

If you'd be happy to, feel free :) 🙏

samuelhwilliams · 2024-12-04T08:41:22Z

I can still reproduce an error on file renaming that looks like this

From a bit of playing around, it seems limited to large files (4MB+ or so) rather than JPG or a specific filetype. Can you reproduce with that?

pamelafox · 2024-12-07T00:21:31Z

I tried a larger file, no luck, but can you attach your large file and also tell me what you tried renaming it to? In theory, I should be able to replicate, given we're in a containerized environment. If I still fail, then I'll DM next week on Discord and see if we can hop on a screen share or some thing.

pamelafox · 2024-12-12T00:32:46Z

Following up on bug bash issues:

Timestamps of directories: that was an existing issue, where directories always passed down a last modified of 0, for both Azure and S3. However, I went ahead and changed the logic for Azure to grab the timestamp, since that looks nicer.
UTC timezones: that's expected for flask-admin, samuel has an example checked in that shows how to add user-specific timezone rendering for those who want it
Possible memory leak with send_file: According to my research, there shouldn't be a leak.

pamelafox · 2024-12-12T00:49:33Z

I am still working through an issue with "rename" when connecting to a prod account using the connection string (versus using OAuth token-based connection).

samuelhwilliams · 2024-12-12T07:59:53Z

I tried a larger file, no luck, but can you attach your large file and also tell me what you tried renaming it to? In theory, I should be able to replicate, given we're in a containerized environment. If I still fail, then I'll DM next week on Discord and see if we can hop on a screen share or some thing.

Unfortunately all of the files I've ended up reproducing it with are personal ones that I'd rather not share (eg a passport scan) 😂 I'm willing to accept that this might be something about my setup...

pamelafox · 2025-01-06T23:55:31Z

@samuelhwilliams I worked a bunch on this today, with help from an Azure Blob SDK engineer, and ended up changing the rename method so that it uses an asynchronous copy behind the scenes (versus synchronous).

Now the code works for me for:

Local Azurite
Prod Azure account with keyless auth
Prod Azure account with connection string

I tested with files of up to a size of 5 GB on my prod Azure account, and was able to rename them all.
I don't know if this would resolve the issue you were seeing with Azurite, however, since I was never able to replicate that. It's possible.

I do not currently have any other planned changes, so I've marked this as ready for review again.

samuelhwilliams

@pamelafox really nice work - sorry it's taken a bit of time to get around to testing this out locally.

When I upload files in a sub-directory, then go back to the root, the directory shows up multiple times (seemingly once for itself and once for each item in the directory). This is with local azurite:

Just spent a bit of time trying to test with a real azure account but I'm getting errors currently when trying to sign up for a storage account, so that's great 🙃

This feels close though - only spotted that one issue, really. A few other comments but nothing major.

README.md

samuelhwilliams · 2025-01-19T09:53:41Z

examples/azure-blob-storage/app.py

+if __name__ == "__main__":
+    app.run(debug=True)


Thanks again for adding the example - useful 👍

samuelhwilliams · 2025-01-19T10:04:58Z

flask_admin/contrib/fileadmin/azure.py

+        src_blob_client = self._container_client.get_blob_client(src)
+        dst_blob_client = self._container_client.get_blob_client(dst)
+        copy_result = dst_blob_client.start_copy_from_url(src_blob_client.url)
+        if copy_result.get("copy_status") == "success":


Is it well known in what conditions blob storage will decide to do this synchronously vs asynchronously? Is it just based on file size, or some other information?

I'm fairly uncomfortable with the sleep(10) below in the async journey. It was previously sleep(1), which is better, but I guess I'd prefer to avoid sleeping in a request at all.

I suppose if we're keeping the rename operation as a copy+delete we probably can't just move on immediately if the copy hasn't finished, so maybe this is an OK workaround if async is only going to happen for really large files. It mightstill be nice to poll for updates more frequently than every 10 seconds though?

Good question! I was never able to get it to go down this path, even with large files. I tested up to 20 GB, and it still didn't go down the polling path. The engineer said the Blob service decides behind the scenes when to use async vs sync, and that perhaps it's always using sync for a same-account copy. I've asked if we can find that out from the Blob service team for sure.
I'm also asking if we can decrease the sleep timeout.

Update: The SDK engineer said sleep(1) is good, so I've changed to that.

LeXofLeviafan · 2025-02-10T13:46:59Z

…There seems to have been no activity here for a few weeks by now. Was the work on this pull-request paused? 🤔

pamelafox · 2025-02-11T08:02:05Z

Sorry, just had some busy weeks! I will hopefully have time this week, if not, certainly next week.

samuelhwilliams · 2025-02-11T08:06:08Z

Really appreciate your hard work and persistence on this @pamelafox - thank you 🙇

Co-authored-by: Samuel Williams <[email protected]>

pamelafox · 2025-02-11T20:00:49Z

@samuelhwilliams I'm pushing a fix for the double-directory issue. That was introduced when I tried to remedy the issue of the Date field displaying a timestamp of 0:

However, from what I can see in the S3 code, it might have the same issue, and I think this has always been an issue for Azure. Removing that 1970 date would be a larger change involving the BaseFileAdmin and possibly the templates, depending on how sorting works.

For the record, Azure shows blank dates for directories, since directories aren't real entities:

I assume that I should just leave that be, for now? Let me know what you think.

samuelhwilliams · 2025-02-12T07:32:52Z

I think it's fine to leave the timestamp issue for directories for now, agreed. That can be addressed separately 👍

pamelafox · 2025-02-12T17:29:20Z

Okay, then the only pending question is if we can get clarity on when/whether the async path for copying gets used. I've sent an inquiry to the Blob storage team, but don't know if I'll definitely get a response.

I've filed the date-for-directories issue here:
#2598

pamelafox added 4 commits November 21, 2024 20:42

Upgrade Azure to v12

bf69e03

Update changelog

e106bc8

New lines

073b7e7

Remove comment

e50a3b8

Why is test failing

11b1c7d

pamelafox added 7 commits November 22, 2024 23:45

Rename dev containers

30a6304

Fix dev container config

8c87d91

More remote print debugging

ff6f3dc

Update Azurite emulator for tests

f78c728

update minimum reqs for azure-storage-blob

95dd260

Apply pre-commit

a8af6dc

Change how import error is handled

70508d5

Update the min reqs for 3.9

a7a8e5f

pamelafox commented Nov 23, 2024

View reviewed changes

flask_admin/contrib/fileadmin/azure.py Show resolved Hide resolved

pamelafox added 2 commits November 23, 2024 01:13

Revert unneeded tox change

da835a8

Lower numpy version

cde5d70

pamelafox added 3 commits November 25, 2024 19:13

Update tests, readme, dev container

c526059

Update devcontainer

96c6048

Make devcontainer work for all services

d7167eb

samuelhwilliams reviewed Nov 25, 2024

View reviewed changes

pamelafox marked this pull request as draft December 3, 2024 22:29

ibigbug mentioned this pull request Dec 10, 2024

upgrade azure blob storage to >= 12 #2581

Closed

Show timestamp for directories

b989875

File issues

129a245

pamelafox added 5 commits January 6, 2025 23:22

Use requires_sync=False for compatibility with connection strings

52311f5

Fix ruff issues

4f86809

Force precommit to run

9801c0a

Sort imports

1955b03

Merge branch 'master' into azureupgrade

c66dfe3

pamelafox marked this pull request as ready for review January 6, 2025 23:52

samuelhwilliams reviewed Jan 19, 2025

View reviewed changes

samuelhwilliams mentioned this pull request Jan 21, 2025

Plan for next release - v2.0 #2451

Open

18 tasks

Update README.md

5cc6884

Co-authored-by: Samuel Williams <[email protected]>

pamelafox added 2 commits February 11, 2025 20:02

Fix directory rendering

2d59d12

Changed sleep 10 to sleep 1

ce794f4

pamelafox mentioned this pull request Feb 12, 2025

In FileAdmin, if a directory has no date, it should display as empty (not 1970) #2598

Open

Merge branch 'master' into azureupgrade

8926695

pamelafox requested a review from samuelhwilliams February 12, 2025 17:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Upgrade Azure Blob Storage SDK to v12 #2573

Upgrade Azure Blob Storage SDK to v12 #2573

pamelafox commented Nov 21, 2024

pamelafox commented Nov 21, 2024

samuelhwilliams commented Nov 21, 2024

samuelhwilliams commented Nov 21, 2024

pamelafox commented Nov 21, 2024

LeXofLeviafan commented Nov 22, 2024

pamelafox commented Nov 23, 2024

pamelafox Nov 23, 2024

pamelafox commented Nov 23, 2024

samuelhwilliams commented Nov 23, 2024

pamelafox commented Nov 25, 2024

samuelhwilliams Nov 25, 2024

pamelafox Nov 25, 2024

samuelhwilliams Nov 25, 2024

samuelhwilliams commented Dec 4, 2024

pamelafox commented Dec 7, 2024

pamelafox commented Dec 12, 2024

pamelafox commented Dec 12, 2024

samuelhwilliams commented Dec 12, 2024

pamelafox commented Jan 6, 2025

samuelhwilliams left a comment

samuelhwilliams Jan 19, 2025

samuelhwilliams Jan 19, 2025

pamelafox Feb 11, 2025

pamelafox Feb 11, 2025

LeXofLeviafan commented Feb 10, 2025

pamelafox commented Feb 11, 2025

samuelhwilliams commented Feb 11, 2025

pamelafox commented Feb 11, 2025

samuelhwilliams commented Feb 12, 2025

pamelafox commented Feb 12, 2025

		self._container_name = container_name
		self._connection_string = connection_string

Upgrade Azure Blob Storage SDK to v12 #2573

Are you sure you want to change the base?

Upgrade Azure Blob Storage SDK to v12 #2573

Conversation

pamelafox commented Nov 21, 2024

pamelafox commented Nov 21, 2024

samuelhwilliams commented Nov 21, 2024

samuelhwilliams commented Nov 21, 2024

pamelafox commented Nov 21, 2024

LeXofLeviafan commented Nov 22, 2024

pamelafox commented Nov 23, 2024

Choose a reason for hiding this comment

pamelafox commented Nov 23, 2024

samuelhwilliams commented Nov 23, 2024

pamelafox commented Nov 25, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

samuelhwilliams commented Dec 4, 2024

pamelafox commented Dec 7, 2024

pamelafox commented Dec 12, 2024

pamelafox commented Dec 12, 2024

samuelhwilliams commented Dec 12, 2024

pamelafox commented Jan 6, 2025

samuelhwilliams left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

LeXofLeviafan commented Feb 10, 2025

pamelafox commented Feb 11, 2025

samuelhwilliams commented Feb 11, 2025

pamelafox commented Feb 11, 2025

samuelhwilliams commented Feb 12, 2025

pamelafox commented Feb 12, 2025