chore: deleteme #450

NicolasIRAGNE · 2025-07-25T15:01:34Z

No description provided.

- Add utility functions for S3 configuration, URL generation, and file uploads. - Enhance ingestion flow to optionally upload digests to S3 if enabled. - Modify API endpoints to redirect downloads to S3 if files are stored there. - Extend `IngestResponse` schema to include S3 URL when applicable. - Introduce `get_current_commit_hash` utility to retrieve commit SHA in ingestion. - add Docker Compose configuration for dev/prod environments with documented usage details - integrate MinIO S3-compatible storage for local development, including bucket auto-setup and app credentials - add S3 storage toggle, test service in Docker Compose, and boto3 dependency - enforce UUID type for ingest_id, resolve comments - Implement `JSONFormatter` and methods for structured logging. - Integrate logging into S3 client creation, uploads, and URL lookups. - Enhance logging with extra fields for better traceability. - add optional S3 directory prefix support - remove unused test service from Docker Compose configuration - improve `get_s3_config` to handle optional environment variables more robustly - add centralized JSON logging and integrate into S3 utilities Co-authored-by: Filip Christiansen <[email protected]>

github-actions · 2025-07-25T15:03:15Z

⚙️ Preview environment was undeployed.

This reverts commit 1cc54dd.

src/server/routers/ingest.py

src/server/s3_utils.py

src/server/routers/ingest.py

    # Normalize and validate the directory path
-    directory = (TMP_BASE_PATH / ingest_id).resolve()
+    directory = (TMP_BASE_PATH / str(ingest_id)).resolve()


To fix the problem, we should ensure that the ingest_id parameter is strictly validated as a UUID and that the constructed path is securely checked to prevent path traversal. Since FastAPI already enforces the type as UUID, we can further strengthen the check by:

Using the UUID object directly (not its string representation) to construct the path, which avoids any unexpected characters.

Ensuring that the resolved path is a subpath of the resolved base directory using robust path comparison (not just string prefix matching). The best way is to use the Path.relative_to() method, which will raise a ValueError if the path is not within the base directory.

Optionally, we can add a check to ensure that the directory name matches the UUID format, though FastAPI should already enforce this.

The changes should be made in the download_ingest function in src/server/routers/ingest.py, specifically in the region where the path is constructed and validated (lines 154-160). No new imports are needed, as we can use the standard pathlib library.

src/server/routers/ingest.py

        raise HTTPException(status_code=status.HTTP_403_FORBIDDEN, detail=f"Invalid ingest ID: {ingest_id!r}")

    if not directory.is_dir():
+        logger.error(f"Digest directory not found - ingest_id: {ingest_id}, directory_path: {str(directory)}, directory_exists: {directory.exists()}, is_directory: {directory.is_dir() if directory.exists() else False}")


To fix the issue, we will enhance the validation logic for the directory path to ensure it is safe and strictly contained within the TMP_BASE_PATH. This involves:

Normalizing the path using .resolve() to remove any .. segments.

Validating that the resolved path starts with TMP_BASE_PATH and does not contain any unexpected characters or patterns.

Adding a stricter check to ensure the ingest_id conforms to expected UUID formatting before constructing the path.

Changes required:

Update the validation logic for directory to include stricter checks.

Ensure the ingest_id is validated as a proper UUID before using it to construct the path.

src/server/routers/ingest.py

        raise HTTPException(status_code=status.HTTP_403_FORBIDDEN, detail=f"Invalid ingest ID: {ingest_id!r}")

    if not directory.is_dir():
+        logger.error(f"Digest directory not found - ingest_id: {ingest_id}, directory_path: {str(directory)}, directory_exists: {directory.exists()}, is_directory: {directory.is_dir() if directory.exists() else False}")


To fix the issue, we will add validation to ensure that the ingest_id parameter is a valid UUID and does not contain any unexpected characters or patterns. This will prevent malicious input from being used to construct unsafe paths. Additionally, we will ensure that the directory path is normalized and verified to be within the TMP_BASE_PATH.

Steps to fix:

Validate that ingest_id is a valid UUID before using it to construct the directory path.

Retain the existing normalization and prefix-checking logic for directory.

Log an error and raise an HTTP 400 (Bad Request) exception if ingest_id is invalid.

src/server/routers/ingest.py

        raise HTTPException(status_code=status.HTTP_403_FORBIDDEN, detail=f"Invalid ingest ID: {ingest_id!r}")

    if not directory.is_dir():
+        logger.error(f"Digest directory not found - ingest_id: {ingest_id}, directory_path: {str(directory)}, directory_exists: {directory.exists()}, is_directory: {directory.is_dir() if directory.exists() else False}")


To fix the problem, we should strengthen the validation that ensures the constructed path is strictly within the intended base directory. Instead of using a string-based startswith check, we should use pathlib.Path.relative_to to confirm that the resolved path is a subpath of TMP_BASE_PATH. If relative_to raises a ValueError, the path is outside the base directory and should be rejected. This approach is robust against symlinks and other filesystem tricks. The fix should be applied in the region where the path validation occurs (lines 153–160). No new imports are needed, as pathlib is already in use via TMP_BASE_PATH.

src/server/routers/ingest.py

        raise HTTPException(status_code=status.HTTP_404_NOT_FOUND, detail=f"Digest {ingest_id!r} not found")

    try:
+        # List all txt files for debugging
+        txt_files = list(directory.glob("*.txt"))


To fix the issue, we will enhance the validation of the directory path to ensure it is securely contained within the TMP_BASE_PATH. Specifically:

Retain the use of .resolve() to normalize the path.

Add a check to ensure that directory is a direct subdirectory of TMP_BASE_PATH by comparing its parent directory to TMP_BASE_PATH.

Log any invalid paths and raise an appropriate HTTP exception if the validation fails.

This approach ensures that even if symbolic links or other edge cases are present, the directory path cannot escape the TMP_BASE_PATH.

src/server/routers/ingest.py

    except StopIteration as exc:
+        # List all files in directory for debugging
+        all_files = list(directory.glob("*"))


To address the issue, we will enhance the validation logic for the directory path. Specifically:

Ensure that ingest_id is strictly validated as a UUID before constructing the path.

Use os.path.normpath to normalize the path and verify that it is contained within TMP_BASE_PATH.

Add explicit error handling for edge cases in path resolution.

These changes will ensure that the constructed path is safe and cannot be exploited for path traversal or other attacks.

src/server/s3_utils.py

+            f"Creating S3 client - endpoint: {log_config.get('endpoint_url', 'NOT_SET')}, "
+            f"region: {log_config.get('region_name', 'NOT_SET')}, "
+            f"has_access_key: {has_access_key}, has_secret_key: {has_secret_key}, "
+            f"credentials_provided: {has_access_key and has_secret_key}"


To fix the issue, we will remove the logging of sensitive metadata (has_access_key and has_secret_key) and replace it with a generic message that does not reveal any sensitive information. This ensures that no sensitive data or metadata is exposed in the logs. Specifically:

Modify the logging statement on line 156 to exclude has_access_key, has_secret_key, and credentials_provided.

Replace it with a generic message indicating that the S3 client is being created, without revealing sensitive details.

No new imports, methods, or definitions are required for this fix.

NicolasIRAGNE · 2025-07-25T21:14:28Z

oy copilot how about you stop being an arse

MickaelCa and others added 3 commits July 25, 2025 15:47

rebase

416254c

rebase

3de3e70

NicolasIRAGNE changed the title ~~deleteme~~ chore: deleteme Jul 25, 2025

NicolasIRAGNE added the deploy-pr-temp-env label Jul 25, 2025

NicolasIRAGNE added 6 commits July 25, 2025 17:20

rebase

f99859c

idk

1cc54dd

idk

1e791c6

idk

087c9e8

Revert "idk"

e696407

This reverts commit 1cc54dd.

idk

c4ee8b3

github-advanced-security bot found potential problems Jul 25, 2025

View reviewed changes

idk

d2ccfb8

github-advanced-security bot found potential problems Jul 25, 2025

View reviewed changes

idk

83aafe5

github-advanced-security bot found potential problems Jul 25, 2025

View reviewed changes

idk

f3fb41f

NicolasIRAGNE closed this Jul 25, 2025

NicolasIRAGNE removed the deploy-pr-temp-env label Jul 25, 2025

NicolasIRAGNE deleted the debug/s3_debug branch July 25, 2025 21:15

@@ -157,3 +157,6 @@
-                if not str(directory).startswith(str(TMP_BASE_PATH.resolve())):
+                try:
+                    # Ensure the resolved directory is a subpath of TMP_BASE_PATH
+                    directory.relative_to(TMP_BASE_PATH.resolve())
+                except ValueError:
                     logger.error(f"Invalid ingest ID - path traversal attempt - ingest_id: {ingest_id}, directory_path: {str(directory)}, tmp_base_path: {str(TMP_BASE_PATH.resolve())}")

@@ -153,2 +153,9 @@
                 # Normalize and validate the directory path
+                try:
+                    # Validate that ingest_id is a proper UUID
+                    UUID(str(ingest_id))
+                except ValueError:
+                    logger.error(f"Invalid ingest ID format - ingest_id: {ingest_id}")
+                    raise HTTPException(status_code=status.HTTP_403_FORBIDDEN, detail=f"Invalid ingest ID format: {ingest_id!r}")
                 directory = (TMP_BASE_PATH / str(ingest_id)).resolve()
@@ -157,9 +164,5 @@
-                if not str(directory).startswith(str(TMP_BASE_PATH.resolve())):
-                    logger.error(f"Invalid ingest ID - path traversal attempt - ingest_id: {ingest_id}, directory_path: {str(directory)}, tmp_base_path: {str(TMP_BASE_PATH.resolve())}")
-                    raise HTTPException(status_code=status.HTTP_403_FORBIDDEN, detail=f"Invalid ingest ID: {ingest_id!r}")
-                if not directory.is_dir():
-                    logger.error(f"Digest directory not found - ingest_id: {ingest_id}, directory_path: {str(directory)}, directory_exists: {directory.exists()}, is_directory: {directory.is_dir() if directory.exists() else False}")
-                    raise HTTPException(status_code=status.HTTP_404_NOT_FOUND, detail=f"Digest {ingest_id!r} not found")
+                if not directory.is_dir() or not str(directory).startswith(str(TMP_BASE_PATH.resolve())):
+                    logger.error(f"Invalid ingest ID or path traversal attempt - ingest_id: {ingest_id}, directory_path: {str(directory)}, tmp_base_path: {str(TMP_BASE_PATH.resolve())}")
+                    raise HTTPException(status_code=status.HTTP_403_FORBIDDEN, detail=f"Invalid ingest ID or path traversal attempt: {ingest_id!r}")

@@ -106,3 +106,3 @@
             async def download_ingest(
-                ingest_id: UUID,
+                ingest_id: str,
             ) -> Union[RedirectResponse, FileResponse]:  # noqa: FA100 (future-rewritable-type-annotation) (pydantic)
@@ -153,3 +153,10 @@
                 # Normalize and validate the directory path
-                directory = (TMP_BASE_PATH / str(ingest_id)).resolve()
+                try:
+                    # Validate that ingest_id is a valid UUID
+                    ingest_id_uuid = UUID(ingest_id)
+                except ValueError:
+                    logger.error(f"Invalid ingest ID format - ingest_id: {ingest_id}")
+                    raise HTTPException(status_code=status.HTTP_400_BAD_REQUEST, detail=f"Invalid ingest ID format: {ingest_id!r}")
+                directory = (TMP_BASE_PATH / str(ingest_id_uuid)).resolve()

@@ -157,3 +157,6 @@
-                if not str(directory).startswith(str(TMP_BASE_PATH.resolve())):
+                try:
+                    # Ensure the resolved directory is strictly within TMP_BASE_PATH
+                    directory.relative_to(TMP_BASE_PATH.resolve())
+                except ValueError:
                     logger.error(f"Invalid ingest ID - path traversal attempt - ingest_id: {ingest_id}, directory_path: {str(directory)}, tmp_base_path: {str(TMP_BASE_PATH.resolve())}")

@@ -157,3 +157,4 @@
-                if not str(directory).startswith(str(TMP_BASE_PATH.resolve())):
+                # Validate that the directory is within TMP_BASE_PATH and is a direct subdirectory
+                if not str(directory).startswith(str(TMP_BASE_PATH.resolve())) or directory.parent != TMP_BASE_PATH.resolve():
                     logger.error(f"Invalid ingest ID - path traversal attempt - ingest_id: {ingest_id}, directory_path: {str(directory)}, tmp_base_path: {str(TMP_BASE_PATH.resolve())}")

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

chore: deleteme #450

chore: deleteme #450

Uh oh!

NicolasIRAGNE commented Jul 25, 2025

Uh oh!

github-actions bot commented Jul 25, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Check failure

Copilot Autofix

Check failure

Copilot Autofix

Check failure

Copilot Autofix

Check failure

Copilot Autofix

Check failure

Copilot Autofix

Check failure

Copilot Autofix

Check failure

Copilot Autofix

NicolasIRAGNE commented Jul 25, 2025

Uh oh!

Uh oh!

@@ -153,9 +153,21 @@
                 # Normalize and validate the directory path
-                directory = (TMP_BASE_PATH / str(ingest_id)).resolve()
-                logger.info(f"Local directory path resolved - ingest_id: {ingest_id}, directory_path: {str(directory)}, tmp_base_path: {str(TMP_BASE_PATH.resolve())}")
-                if not str(directory).startswith(str(TMP_BASE_PATH.resolve())):
-                    logger.error(f"Invalid ingest ID - path traversal attempt - ingest_id: {ingest_id}, directory_path: {str(directory)}, tmp_base_path: {str(TMP_BASE_PATH.resolve())}")
-                    raise HTTPException(status_code=status.HTTP_403_FORBIDDEN, detail=f"Invalid ingest ID: {ingest_id!r}")
+                try:
+                    # Validate ingest_id as a UUID
+                    if not isinstance(ingest_id, UUID):
+                        logger.error(f"Invalid ingest ID format - ingest_id: {ingest_id}")
+                        raise HTTPException(status_code=status.HTTP_400_BAD_REQUEST, detail=f"Invalid ingest ID format: {ingest_id!r}")
+                    # Construct and normalize the directory path
+                    directory = TMP_BASE_PATH / str(ingest_id)
+                    directory = directory.resolve(strict=False)  # Resolve without strict mode to handle non-existent paths
+                    logger.info(f"Local directory path resolved - ingest_id: {ingest_id}, directory_path: {str(directory)}, tmp_base_path: {str(TMP_BASE_PATH.resolve())}")
+                    # Verify the normalized path is within TMP_BASE_PATH
+                    if not str(directory).startswith(str(TMP_BASE_PATH.resolve())):
+                        logger.error(f"Invalid ingest ID - path traversal attempt - ingest_id: {ingest_id}, directory_path: {str(directory)}, tmp_base_path: {str(TMP_BASE_PATH.resolve())}")
+                        raise HTTPException(status_code=status.HTTP_403_FORBIDDEN, detail=f"Invalid ingest ID: {ingest_id!r}")
+                except Exception as exc:
+                    logger.error(f"Error during path validation - ingest_id: {ingest_id}, error_type: {type(exc).__name__}, error_message: {str(exc)}")
+                    raise HTTPException(status_code=status.HTTP_500_INTERNAL_SERVER_ERROR, detail="Internal server error during path validation")

chore: deleteme #450

chore: deleteme #450

Uh oh!

Conversation

NicolasIRAGNE commented Jul 25, 2025

Uh oh!

github-actions bot commented Jul 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Check failure

Uh oh!

Copilot Autofix

Check failure

Uh oh!

Copilot Autofix

Check failure

Uh oh!

Copilot Autofix

Check failure

Uh oh!

Copilot Autofix

Check failure

Uh oh!

Copilot Autofix

Check failure

Uh oh!

Copilot Autofix

Check failure

Uh oh!

Copilot Autofix

NicolasIRAGNE commented Jul 25, 2025

Uh oh!

Uh oh!

github-actions bot commented Jul 25, 2025 •

edited

Loading