feat(RELEASE-1246): implement script to push files to CGW #361

seanconroy2021 · 2025-02-03T03:14:24Z

This script automates pushing files to the CGW:

Reads metadata and processes files from the content directory.
Checks if a file already exists in the product version based on the label, short URL, and download URL, skipping duplicates to avoid failure.
Creates new files and returns their IDs upon success.
Generates a JSON report summarising the number of files created, skipped, and the metadata used.

RELATES TO: konflux-ci/release-service-catalog#781

mmalina · 2025-02-04T12:18:52Z

publish-to-cgw-wrapper/publish_to_cgw_wrapper.py

Is this basically a copy of the python script that was originally in the task? Or is it that plus changes for the idempotency? Or anything else?

The only thing that is the same is generate_download_url and generate_metadata with the new script calling the endpoint itself instead of using pub tools. The way it works now is it gets the product ID and product version, creates the metadata, and then checks each item in metadata against the existing files in the product version based on the label, short URL and download URL if they match, it will skip the creation of the item.

ok, thanks. it would be good if Scott or Parthey could review. I pinged them here in the PR.

mmalina · 2025-02-04T13:11:28Z

@swickersh @pkhander can you review this please?

swickersh · 2025-02-04T14:11:56Z

publish-to-cgw-wrapper/publish_to_cgw_wrapper.py

+6. Creates new files using the metadata.
+7. Rolls back created files if an error occurs during execution.
+8. Writes the final result, including processed, created, and skipped files, to a JSON file.
+9. Outputs the path of the generated result.json file to the an output file.


extra 'the' or 'an'

swickersh · 2025-02-04T14:15:25Z

publish-to-cgw-wrapper/publish_to_cgw_wrapper.py

+from requests.auth import HTTPBasicAuth
+
+# Default values for each component,
+# values from data_file takes presedence over these


typo in precedence

swickersh · 2025-02-04T14:37:44Z

publish-to-cgw-wrapper/publish_to_cgw_wrapper.py

+                "CGW_USERNAME and CGW_PASSWORD environment variables are required"
+            )
+
+        auth = HTTPBasicAuth(USERNAME, PASSWORD)


It would be better to create a requests.session() instead of re-authenticating with every API call.

swickersh · 2025-02-04T14:42:13Z

publish-to-cgw-wrapper/publish_to_cgw_wrapper.py

+
+
+def generate_metadata(
+    content_dir, components, product_Code, version_id, version_name, mirror_openshift_Push


This is a lot of positional args and could be error prone.
I'd recommend keyword args for all these functions that have more than 2 parameters.

swickersh · 2025-02-04T15:44:08Z

publish-to-cgw-wrapper/publish_to_cgw_wrapper.py

+2. Retrieves the product ID using the provided product name and product code.
+3. Retrieves the version ID using the product version name.
+4. Generates metadata for each file in the content directory.
+5. Checks for existing files and skips them if they match the label, short URL, and download


I think this is more complex than what we discussed.
Instead of checking for existing files and skipping, it's more efficient to wrap the original call in a try/except block.
If (and only if) the error returned from Content Gateway is something along the lines of Short URL is already present in the system! or File is already present in the system! then we would retry the API call as an UPDATE.

Any other failures should not be ignored or trigger a rollback as anything else would be unexpected and we would want to fail and investigate.

+1 to use a try.except block and UPDATE on those two cases. That will save extra API calls to get all the info from CGW every time.

I like the rollback option thou! This ensures that we do not release "half" content when a release fails.

We can keep the rollback functionality so if the file is already present or download URL is already present, it will skip the creation else any other errors it will fail and rollback the previously created files

I guess the roll back functionality could be kept. So just to clarify the workflow would be something like this:

try to create files

if we get already present error, attempt the api call again, but this time as UPDATE

if we get any other error, attempt to rollback (delete) whatever files were just successfully pushed during this pipeline and then exit/fail the pipelinerun.

So the idea was that if the file is already present due to a short URL or download URL, there is no need to update it; instead, just skip, as the content should not change on a re-run.

Parthey and I synced up real quick. Since the main point of this effort is to be idempotent, then we can skip the small edge case of updating. (That is just a nice-to-have if something were tweaked)
But I still just wanted to be clear, you plan to remove the stuff where you're fetching if the files exist first or not, correct? Because you can simply determine they exist when you attempt to create them and are met with a 'already present' error from the API.

Also, when you are iterating through the files, if one exists already and warrants skipping, can you print a line to the logs stating it was skipped for that reason.

Yeah, I will remove the fetching of the files and switch to try/except to see if the file can be created, then output the success or skip.

It would be nice to have the update, but without getting each file ID, I can't call the update endpoint.

This script automates pushing files to the CGW: - Reads metadata and processes files from the content directory. - Checks if a file already exists in the product version based on label, short URL, and download URL, skipping duplicates to avoid failure. - Creates new files and returns their IDs upon success. - Generates a JSON report summarizing the number of files created, skipped, and the metadata used. Signed-off-by: Sean Conroy <[email protected]>

seanconroy2021 requested a review from a team as a code owner February 3, 2025 03:14

seanconroy2021 force-pushed the RELEASE-1246 branch 2 times, most recently from 92c25c4 to c0f6b99 Compare February 4, 2025 09:45

mmalina reviewed Feb 4, 2025

View reviewed changes

swickersh reviewed Feb 4, 2025

View reviewed changes

seanconroy2021 force-pushed the RELEASE-1246 branch from c0f6b99 to 1871712 Compare February 6, 2025 12:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(RELEASE-1246): implement script to push files to CGW #361

feat(RELEASE-1246): implement script to push files to CGW #361

seanconroy2021 commented Feb 3, 2025

mmalina Feb 4, 2025

seanconroy2021 Feb 4, 2025

mmalina Feb 4, 2025

mmalina commented Feb 4, 2025

swickersh Feb 4, 2025

swickersh Feb 4, 2025

swickersh Feb 4, 2025

swickersh Feb 4, 2025

swickersh Feb 4, 2025

pkhander Feb 5, 2025

seanconroy2021 Feb 6, 2025 •

edited

Loading

swickersh Feb 6, 2025

seanconroy2021 Feb 6, 2025 •

edited

Loading

swickersh Feb 6, 2025

seanconroy2021 Feb 6, 2025



		def generate_metadata(
		content_dir, components, product_Code, version_id, version_name, mirror_openshift_Push

feat(RELEASE-1246): implement script to push files to CGW #361

Are you sure you want to change the base?

feat(RELEASE-1246): implement script to push files to CGW #361

Conversation

seanconroy2021 commented Feb 3, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mmalina commented Feb 4, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

seanconroy2021 Feb 6, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

seanconroy2021 Feb 6, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

seanconroy2021 Feb 6, 2025 •

edited

Loading

seanconroy2021 Feb 6, 2025 •

edited

Loading