Skip to content

Conversation

@garland3
Copy link
Collaborator

@garland3 garland3 commented Nov 2, 2025

Introduce USE_MOCK_S3 flag in .env to toggle between in-process mock S3 (default) and MinIO via Docker. This simplifies the development workflow by eliminating the need for Docker in local setups, while maintaining production compatibility. Updated documentation and startup script accordingly.

Add .ruff_cache to .gitignore to exclude Ruff linter cache files from version control, preventing unnecessary commits of generated cache data.
Introduce USE_MOCK_S3 flag in .env to toggle between in-process mock S3 (default) and MinIO via Docker. This simplifies the development workflow by eliminating the need for Docker in local setups, while maintaining production compatibility. Updated documentation and startup script accordingly.
@garland3 garland3 requested a review from Copilot November 2, 2025 18:19
Copy link
Contributor

@github-advanced-security github-advanced-security bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CodeQL found more than 20 potential problems in the proposed changes. Check the Files changed tab for more details.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds an in-process Mock S3 implementation to eliminate Docker/MinIO dependency during development. The mock uses FastAPI TestClient to provide S3-compatible REST API responses, allowing file storage to work without external services.

Key changes:

  • New S3 mock server with file-based storage supporting core S3 operations (PUT, GET, HEAD, DELETE, ListObjectsV2, tagging)
  • MockS3StorageClient that wraps the mock server via TestClient
  • Configuration toggle (USE_MOCK_S3) to switch between mock and MinIO
  • Documentation updates reflecting the mock S3 option

Reviewed Changes

Copilot reviewed 15 out of 16 changed files in this pull request and generated 12 comments.

Show a summary per file
File Description
mocks/s3-mock/storage.py Core file storage logic with path sanitization and metadata handling
mocks/s3-mock/s3_xml.py XML generation/parsing for S3 API responses
mocks/s3-mock/main.py FastAPI server with S3 REST endpoints
mocks/s3-mock/smoke_test.py Integration tests for S3 mock functionality
mocks/s3-mock/README.md Documentation for the mock server
backend/modules/file_storage/mock_s3_client.py TestClient-based S3 client for in-process testing
backend/infrastructure/app_factory.py Factory logic to select between mock and real S3
backend/modules/config/manager.py Added use_mock_s3 setting
backend/application/chat/preprocessors/message_builder.py Added debug logging for file manifests
agent_start.sh Updated to conditionally start MinIO based on mock setting
GEMINI.md, CLAUDE.md Added documentation about mock S3
.env.example Updated S3 configuration examples
config/overrides/mcp.json Added progress_demo server entry
.gitignore Added .ruff_cache
Comments suppressed due to low confidence (4)

backend/infrastructure/app_factory.py:78

  • Return type annotation is incorrect. This method returns either S3StorageClient or MockS3StorageClient depending on configuration, but is typed as only S3StorageClient. Use a Union type or create a common protocol/base class: 'Union[S3StorageClient, MockS3StorageClient]' or define a shared interface.
    def get_file_storage(self) -> S3StorageClient:  # noqa: D401

backend/modules/file_storage/mock_s3_client.py:205

  • Except block directly handles BaseException.
                except:

backend/modules/file_storage/mock_s3_client.py:205

  • 'except' clause does nothing but pass and there is no explanatory comment.
                except:

mocks/s3-mock/storage.py:115

  • File is opened but is not closed.
                "ETag": meta.get("etag") or calc_etag(open(obj_path, "rb").read()),



if __name__ == "__main__":
port = int(os.getenv("PORT", "9002"))
Copy link

Copilot AI Nov 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Port default is inconsistent with README documentation. The code defaults to 9002 but the README states the server runs on port 9001 by default (lines 7, 49, 57 of README.md). This mismatch will cause connection failures when following documentation.

Suggested change
port = int(os.getenv("PORT", "9002"))
port = int(os.getenv("PORT", "9001"))

Copilot uses AI. Check for mistakes.
Comment on lines +58 to +59
@app.put("/{bucket}/{key:path}")
async def put_object(bucket: str, key: str, request: Request):
Copy link

Copilot AI Nov 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duplicate route definition: This PUT endpoint at line 58 conflicts with the put_object_tagging endpoint at line 183, both using the same path pattern '/{bucket}/{key:path}'. FastAPI will only register one of these routes, causing tagging operations to fail. The tagging endpoint should use a query parameter check or different decorator order to disambiguate.

Copilot uses AI. Check for mistakes.
Comment on lines 112 to 115
items.append({
"Key": key,
"Size": size,
"ETag": meta.get("etag") or calc_etag(open(obj_path, "rb").read()),
Copy link

Copilot AI Nov 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

File is opened and read without using a context manager, risking resource leaks. Additionally, this fallback reads the entire file into memory just to calculate ETag when metadata is missing. Consider using 'with open(obj_path, "rb") as f: calc_etag(f.read())' and potentially logging when metadata is unavailable.

Suggested change
items.append({
"Key": key,
"Size": size,
"ETag": meta.get("etag") or calc_etag(open(obj_path, "rb").read()),
etag = meta.get("etag")
if not etag:
with open(obj_path, "rb") as f:
etag = calc_etag(f.read())
items.append({
"Key": key,
"Size": size,
"ETag": etag,

Copilot uses AI. Check for mistakes.
value_elem = tag_elem.find("Value")
if key_elem is not None and value_elem is not None:
tags[key_elem.text] = value_elem.text
except:
Copy link

Copilot AI Nov 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bare except clause silently swallows all exceptions including SystemExit and KeyboardInterrupt. This should catch specific exceptions like 'ET.ParseError' or 'Exception' to avoid hiding critical errors during XML tag parsing.

Suggested change
except:
except ET.ParseError:

Copilot uses AI. Check for mistakes.
if include_files_manifest:
session_context = build_session_context(session)
files_in_context = session_context.get("files", {})
logger.info(f"DEBUG: Session has {len(files_in_context)} files: {list(files_in_context.keys())}")
Copy link

Copilot AI Nov 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Log messages prefixed with 'DEBUG:' are being sent at INFO level. Either change the log level to logger.debug() or remove the 'DEBUG:' prefix from the message. This applies to lines 59, 62, and 65.

Copilot uses AI. Check for mistakes.
import os
import urllib.parse
from pathlib import Path
from typing import Dict, Optional
Copy link

Copilot AI Nov 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Import of 'Optional' is not used.

Suggested change
from typing import Dict, Optional
from typing import Dict

Copilot uses AI. Check for mistakes.
Tests all supported S3 operations to ensure compatibility.
"""

import hashlib
Copy link

Copilot AI Nov 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Import of 'hashlib' is not used.

Suggested change
import hashlib

Copilot uses AI. Check for mistakes.
"""

import hashlib
import time
Copy link

Copilot AI Nov 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Import of 'time' is not used.

Suggested change
import time

Copilot uses AI. Check for mistakes.

import hashlib
import time
import os
Copy link

Copilot AI Nov 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Import of 'os' is not used.

Suggested change
import os

Copilot uses AI. Check for mistakes.
import hashlib
import time
import os
from pathlib import Path
Copy link

Copilot AI Nov 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Import of 'Path' is not used.

Suggested change
from pathlib import Path

Copilot uses AI. Check for mistakes.
- Adjust log levels from info to debug in message_builder.py and remove debug prefixes from warnings for cleaner output
- Replace bare except with specific ET.ParseError in mock_s3_client.py for better error handling
- Update README.md to use uv pip for installation
- Consolidate object tagging logic into main PUT endpoint in main.py, removing redundant function for simpler API structure
- Remove outdated comment about enabling mock S3 in README.md
- Add production usage warning in main README guidelines
- Expand S3 mock README with detailed security limitations and alternatives
- Ensures developers understand risks and avoid production deployment of insecure mock
@garland3 garland3 requested a review from Copilot November 2, 2025 18:48
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 16 out of 17 changed files in this pull request and generated 12 comments.



def calc_etag(data: bytes) -> str:
return hashlib.md5(data).hexdigest()
Copy link

Copilot AI Nov 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MD5 is used for ETag calculation without the usedforsecurity=False parameter. While MD5 is acceptable for ETag generation (which is not a security-critical operation), Python 3.9+ recommends explicitly marking non-security uses to avoid warnings in FIPS mode. Consider adding usedforsecurity=False: hashlib.md5(data, usedforsecurity=False).hexdigest()

Suggested change
return hashlib.md5(data).hexdigest()
return hashlib.md5(data, usedforsecurity=False).hexdigest()

Copilot uses AI. Check for mistakes.
Comment on lines 18 to 27
# Sanitize key to prevent traversal
key = key.lstrip("/")
safe_parts = []
for part in key.split("/"):
if part in ("..", ""):
continue
safe_parts.append(part)
safe_key = "/".join(safe_parts)
obj_path = bucket_root / safe_key
meta_path = bucket_root / f"{safe_key}.meta.json"
Copy link

Copilot AI Nov 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Path traversal protection only filters .. and empty strings, but doesn't handle other potentially dangerous path components like . (current directory) or absolute paths. While . is generally safe, consider also checking for absolute path indicators on Windows (e.g., drive letters) if cross-platform compatibility is a concern. Additionally, the current approach silently drops dangerous parts - consider whether this could lead to unexpected behavior where a key like ../../../etc/passwd becomes etc/passwd instead of raising an error.

Suggested change
# Sanitize key to prevent traversal
key = key.lstrip("/")
safe_parts = []
for part in key.split("/"):
if part in ("..", ""):
continue
safe_parts.append(part)
safe_key = "/".join(safe_parts)
obj_path = bucket_root / safe_key
meta_path = bucket_root / f"{safe_key}.meta.json"
# Sanitize key to prevent traversal and reject dangerous keys
key = key.lstrip("/")
if os.path.isabs(key):
raise ValueError("Absolute paths are not allowed in S3 keys")
safe_parts = []
for part in key.split("/"):
if part in ("..", ".", ""):
raise ValueError(f"Invalid path component in S3 key: {part!r}")
# On Windows, reject drive letters (e.g., C:)
if os.name == "nt" and len(part) == 2 and part[1] == ":" and part[0].isalpha():
raise ValueError(f"Drive letter not allowed in S3 key: {part!r}")
safe_parts.append(part)
safe_key = "/".join(safe_parts)
obj_path = (bucket_root / safe_key).resolve()
meta_path = (bucket_root / f"{safe_key}.meta.json").resolve()
# Ensure the resolved paths are within the bucket root
bucket_root_resolved = bucket_root.resolve()
if not str(obj_path).startswith(str(bucket_root_resolved)):
raise ValueError("Path traversal detected in S3 key")
if not str(meta_path).startswith(str(bucket_root_resolved)):
raise ValueError("Path traversal detected in S3 key (meta)")

Copilot uses AI. Check for mistakes.
test_delete_object()

print("\n" + "=" * 40)
print("✅ All tests passed!")
Copy link

Copilot AI Nov 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The success message uses an emoji '✅' which violates the project's coding guidelines that explicitly state 'No emojis in code or docs' (from GEMINI.md and CLAUDE.md).

Suggested change
print("All tests passed!")
print("All tests passed!")

Copilot uses AI. Check for mistakes.
print("✅ All tests passed!")

except Exception as e:
print(f"\n❌ Test failed: {e}")
Copy link

Copilot AI Nov 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error message uses an emoji '❌' which violates the project's coding guidelines that explicitly state 'No emojis in code or docs' (from GEMINI.md and CLAUDE.md).

Suggested change
print(f"\n Test failed: {e}")
print(f"\nERROR: Test failed: {e}")

Copilot uses AI. Check for mistakes.

def _calculate_etag(self, content_bytes: bytes) -> str:
"""Calculate ETag for file content."""
return hashlib.md5(content_bytes).hexdigest()
Copy link

Copilot AI Nov 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MD5 is used for ETag calculation without the usedforsecurity=False parameter. While MD5 is acceptable for ETag generation (which is not a security-critical operation), Python 3.9+ recommends explicitly marking non-security uses to avoid warnings in FIPS mode. Consider adding usedforsecurity=False: hashlib.md5(content_bytes, usedforsecurity=False).hexdigest()

Suggested change
return hashlib.md5(content_bytes).hexdigest()
return hashlib.md5(content_bytes, usedforsecurity=False).hexdigest()

Copilot uses AI. Check for mistakes.
agent_start.sh Outdated
Comment on lines 32 to 41
echo "✅ Using Mock S3 (no Docker required)"
else
# Check if MinIO is running
if ! docker ps | grep -q atlas-minio; then
echo "⚠️ MinIO is not running. Starting MinIO with docker-compose..."
docker-compose up -d minio minio-init
echo "✅ MinIO started successfully"
sleep 3
else
echo "✅ MinIO is already running"
Copy link

Copilot AI Nov 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The message uses an emoji '✅' which violates the project's coding guidelines that explicitly state 'No emojis in code or docs' (from GEMINI.md and CLAUDE.md).

Suggested change
echo "Using Mock S3 (no Docker required)"
else
# Check if MinIO is running
if ! docker ps | grep -q atlas-minio; then
echo "⚠️ MinIO is not running. Starting MinIO with docker-compose..."
docker-compose up -d minio minio-init
echo "MinIO started successfully"
sleep 3
else
echo "MinIO is already running"
echo "Using Mock S3 (no Docker required)"
else
# Check if MinIO is running
if ! docker ps | grep -q atlas-minio; then
echo "⚠️ MinIO is not running. Starting MinIO with docker-compose..."
docker-compose up -d minio minio-init
echo "MinIO started successfully"
sleep 3
else
echo "MinIO is already running"

Copilot uses AI. Check for mistakes.
agent_start.sh Outdated
if ! docker ps | grep -q atlas-minio; then
echo "⚠️ MinIO is not running. Starting MinIO with docker-compose..."
docker-compose up -d minio minio-init
echo "✅ MinIO started successfully"
Copy link

Copilot AI Nov 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The message uses an emoji '✅' which violates the project's coding guidelines that explicitly state 'No emojis in code or docs' (from GEMINI.md and CLAUDE.md).

Suggested change
echo "MinIO started successfully"
echo "MinIO started successfully"

Copilot uses AI. Check for mistakes.
agent_start.sh Outdated
Comment on lines 32 to 41
echo "✅ Using Mock S3 (no Docker required)"
else
# Check if MinIO is running
if ! docker ps | grep -q atlas-minio; then
echo "⚠️ MinIO is not running. Starting MinIO with docker-compose..."
docker-compose up -d minio minio-init
echo "✅ MinIO started successfully"
sleep 3
else
echo "✅ MinIO is already running"
Copy link

Copilot AI Nov 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The message uses an emoji '✅' which violates the project's coding guidelines that explicitly state 'No emojis in code or docs' (from GEMINI.md and CLAUDE.md).

Suggested change
echo "Using Mock S3 (no Docker required)"
else
# Check if MinIO is running
if ! docker ps | grep -q atlas-minio; then
echo "⚠️ MinIO is not running. Starting MinIO with docker-compose..."
docker-compose up -d minio minio-init
echo "MinIO started successfully"
sleep 3
else
echo "MinIO is already running"
echo "Using Mock S3 (no Docker required)"
else
# Check if MinIO is running
if ! docker ps | grep -q atlas-minio; then
echo "MinIO is not running. Starting MinIO with docker-compose..."
docker-compose up -d minio minio-init
echo "MinIO started successfully"
sleep 3
else
echo "MinIO is already running"

Copilot uses AI. Check for mistakes.
agent_start.sh Outdated
else
# Check if MinIO is running
if ! docker ps | grep -q atlas-minio; then
echo "⚠️ MinIO is not running. Starting MinIO with docker-compose..."
Copy link

Copilot AI Nov 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The warning message uses an emoji '⚠️' which violates the project's coding guidelines that explicitly state 'No emojis in code or docs' (from GEMINI.md and CLAUDE.md).

Suggested change
echo "⚠️ MinIO is not running. Starting MinIO with docker-compose..."
echo "MinIO is not running. Starting MinIO with docker-compose..."

Copilot uses AI. Check for mistakes.
if key_elem is not None and value_elem is not None:
tags[key_elem.text] = value_elem.text
except ET.ParseError:
pass
Copy link

Copilot AI Nov 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'except' clause does nothing but pass and there is no explanatory comment.

Suggested change
pass
# Failed to parse tags XML; tags will be left empty. This is non-fatal as tags are optional.
logger.warning(f"Failed to parse tags XML for file {sanitize_for_logging(file_key)}", exc_info=True)

Copilot uses AI. Check for mistakes.
@garland3 garland3 merged commit 08e0e4d into main Nov 2, 2025
5 checks passed
@garland3 garland3 deleted the wip-bring-back-s3-mock-issue-36 branch November 2, 2025 19:08
@garland3 garland3 mentioned this pull request Nov 2, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants