Automating download of UHC files from website #534
Replies: 12 comments 17 replies
-
I would think that this and similar techniques that make the files hard to download are not in keeping with the language of the rule (bold added by me) "The Departments also proposed to allow plans and issuers flexibility to publish the files in the locations of their choosing based upon their superior knowledge of their website traffic and the places on their website where the machine-readable files would be readily accessible by the intended users." The intended users are defined in other language in the rule: "The intended audience also includes health care stakeholders such as researchers, legislators, and regulators, as well as application developers who could make the information usable and easily understood by laypersons. Accordingly, application developers will be able to access the data in a format that is easily used and understood using skills common to application developers." Assuming you are an "application developer", it seems that your statement of the problem you are facing downloading the file makes it clear that what is required is beyond the "skills common to application developers". This may have been unintentional and simply web developers not thinking about the intended users. In that case, it should be easy for uhc to fix this by improving their presentation of the list of files. The language of the guidance on GitHub can also be improved to make it clear that the files should be easy to download. "These machine-readable files must be made available to the public without restrictions that would impede the re-use of that information." was a good start, but it didn't anticipate all of the possible ways one could use to make it hard to download the MRFs. |
Beta Was this translation helpful? Give feedback.
-
Thank you for your response. I tried that URL and it didn't work. Everything after the question mark (?) is a command line argument that is usually information that the server side program interprets or the javascript interprets, so I wasn't optimistic this would work. And when you cut and paste everything before the ? https://mrfstorageprod.blob.core.windows.net/mrf-even/2022-07-03_UnitedHealthcare-Insurance-Company_Navigate_allowed-amounts.json.gz Then you get a message from the server saying that we are not authorized. So, this clearly wasn't intented for the publice to access directly, it is being handled in a script to retrieve the file from its actual location and return it to a browser. |
Beta Was this translation helpful? Give feedback.
-
As an Azure developer, I understand that URL format. It's a lease to access the actual file, and that lease expires after a certain time at which point you must generate a new lease URL. Azure blob storage handles the internals of validating the query string.
I use this pattern extensively in scenarios that require high security. What makes their implementation almost pointless is their API to generate the leases is public, so it's not actually secure.
I guess they could implement some kind of DDoS protection via the API, but really they could let the Azure handle that.
In short, I support flagging this as a violation of the spirit of the spec. They should just post a link to publicly accessible Azure blob storage (or a CDN).
Get Outlook for Android<https://aka.ms/AAb9ysg>
…________________________________
From: Tkurzendoerfer ***@***.***>
Sent: Thursday, July 7, 2022 11:34:29 AM
To: CMSgov/price-transparency-guide ***@***.***>
Cc: Subscribed ***@***.***>
Subject: Re: [CMSgov/price-transparency-guide] Automating download of UHC files from website (Discussion #534)
Thank you for your response.
I tried that URL and it didn't work. Everything after the question mark (?) is a command line argument that is usually information that the server side program interprets or the javascript interprets, so I wasn't optimistic this would work.
And when you cut and paste everything before the ? https://mrfstorageprod.blob.core.windows.net/mrf-even/2022-07-03_UnitedHealthcare-Insurance-Company_Navigate_allowed-amounts.json.gz
Then you get a message from the server saying that we are not authorized. So, this clearly wasn't intented for the publice to access directly, it is being handled in a script to retrieve the file from its actual location and return it to a browser.
—
Reply to this email directly, view it on GitHub<#534 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/ASF44GJD5A2OHVVUFMV2QYDVS4PLLANCNFSM523PSTYA>.
You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>
|
Beta Was this translation helpful? Give feedback.
-
I have spent a full week trying to figure out how to open these stupid files, this is NOT TRANSPARENT. They have made this as challenging as possible! |
Beta Was this translation helpful? Give feedback.
-
Not the best but you can grab the file names from
Then use the filename to request the sasUrl from
Then use the sasUrl to download the file. |
Beta Was this translation helpful? Give feedback.
-
Awesome! I actually found a simple answer is to use a headless browser with Playwright. It’s not fast but it gets the job done.
…Sent from my iPhone
On Jul 17, 2022, at 7:43 PM, TJ ***@***.***> wrote:
Not the best but you can grab the file names from
https://uhc-sas-function-apim-prod.azure-api.net/api/blobs-function
-----
{
"blobs": [
{
"name": "2022-07-01_02-GLOBAL-CHAUFFEURED-SERVICE_index.json",
"origin": "uhc"
},
{
"name": "2022-07-01_1-BETTER-LLC_index.json",
"origin": "uhc"
},
...
Then use the filename to request the sasUrl from
https://uhc-sas-function-apim-prod.azure-api.net/api/blob-sas?blobName=2022-07-01_02-GLOBAL-CHAUFFEURED-SERVICE_index.json
-----
{
"sasUrl": "https://mrfstorageprod.blob.core.windows.net/mrf-even/2022-07-01_02-GLOBAL-CHAUFFEURED-SERVICE_index.json?blahblahblah"
}
Then use the sasUrl to download the file.
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you commented.
|
Beta Was this translation helpful? Give feedback.
-
@Tkurzendoerfer did you ever find a good workaround? I'm currently dealing with this issue and am curious if there has been any further developments. |
Beta Was this translation helpful? Give feedback.
-
There was a suggestion I haven’t tried yet but sounds like it would work. The suggestion was to use an object in .NET which simulates a browser.
From: lkoll ***@***.***>
Sent: Wednesday, July 27, 2022 1:02 PM
To: CMSgov/price-transparency-guide ***@***.***>
Cc: Tkurzendoerfer ***@***.***>; Mention ***@***.***>
Subject: Re: [CMSgov/price-transparency-guide] Automating download of UHC files from website (Discussion #534)
@Tkurzendoerfer <https://github.com/Tkurzendoerfer> did you ever find a good workaround? I'm currently dealing with this issue and am curious if there has been any further developments.
—
Reply to this email directly, <#534 (comment)> view it on GitHub, or <https://github.com/notifications/unsubscribe-auth/AX2PY5VOD7EVJHI54XCIQWTVWFTP5ANCNFSM523PSTYA> unsubscribe.
You are receiving this because you were mentioned. <https://github.com/notifications/beacon/AX2PY5W3IOW5PRXIGMC6VALVWFTP5A5CNFSM523PSTYKYY3PNVWWK3TUL52HS4DFWFCGS43DOVZXG2LPNZBW63LNMVXHJKTDN5WW2ZLOORPWSZGOAAY4T2Q.gif> Message ID: < ***@***.***> ***@***.***>
|
Beta Was this translation helpful? Give feedback.
-
Yes, I reported it. But I have not heard back from CMS. We have reported HIPAA EDI issues in the past with CMS and we usually at least receive an acknowledgement about the fact that they received the notice and can get some status information. But, this process is new and it may take some time to get procedures in place to inform us on status and any decisions that have been made.
From: Sam Pullman ***@***.***>
Sent: Thursday, July 28, 2022 8:23 AM
To: CMSgov/price-transparency-guide ***@***.***>
Cc: Tkurzendoerfer ***@***.***>; Mention ***@***.***>
Subject: Re: [CMSgov/price-transparency-guide] Automating download of UHC files from website (Discussion #534)
I believe a report was made, I've been on UHC's site and it does not appear that they've fixed the issue yet, still lots of JS.
—
Reply to this email directly, view it on GitHub <#534 (reply in thread)> , or unsubscribe <https://github.com/notifications/unsubscribe-auth/AX2PY5QSLAM4CHAAVFDDETDVWJ3T7ANCNFSM523PSTYA> .
You are receiving this because you were mentioned. <https://github.com/notifications/beacon/AX2PY5XDHE3SQRMMQVMBDC3VWJ3T7A5CNFSM523PSTYKYY3PNVWWK3TUL52HS4DFWFCGS43DOVZXG2LPNZBW63LNMVXHJKTDN5WW2ZLOORPWSZGOAAY6FCA.gif> Message ID: ***@***.*** ***@***.***> >
|
Beta Was this translation helpful? Give feedback.
-
This is a python snippet that will provide you with all the download urls.
|
Beta Was this translation helpful? Give feedback.
-
Just use a headless browser. It’ll render the JavaScript. Then, find the elements you want once the HTML is fully loaded and one at a time, send a click over. It’s best to use ‘async’ here. As long as your code is told to expect a download you can then save the download with the proper name and move on to the next file.
You don’t need all files which is what makes this approach tenable.
…Sent from my iPhone
On Jul 27, 2022, at 12:01 PM, lkoll ***@***.***> wrote:
@Tkurzendoerfer did you ever find a good workaround? I'm currently dealing with this issue and am curious if there has been any further developments.
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you commented.
|
Beta Was this translation helpful? Give feedback.
-
@gardnmi Is there a way we can connect on zoom or google meet? I would like to ask some questions and exchange some ideas with you. Thank you. |
Beta Was this translation helpful? Give feedback.
-
Apparently UHC is using a javascript on their webpage that makes automating the download of files out of the ordinary.
Has anyone found a way to automate the downloading of files. Here is an example from one of their index files: https://transparency-in-coverage.uhc.com/?file=2022-07-03_UnitedHealthcare-Insurance-Company_Navigate_allowed-amounts.json.gz&origin=uhc
But, that only works in a browser, if we try that to download with WebClient in c# it only downloads the web page HTML.
Surely, someone has run into this problem already and has found a way to accomplish the intended purpose of the index file which is to help automate the downloading of the files in the index.
Beta Was this translation helpful? Give feedback.
All reactions