Skip to content

Conversation

Copy link
Contributor

Copilot AI commented Nov 10, 2025

WebSocket connections were falling back to test user instead of reading the authenticated user from the configured authentication header set by the reverse proxy. This caused file access checks to fail with "Access denied: [email protected] attempted to access users/{actual_user}/..." when users tried to load their own files.

Changes

backend/main.py

  • WebSocket endpoint now uses config.app_settings.auth_user_header to read the authentication header (configurable via AUTH_USER_HEADER environment variable, default: X-User-Email)
  • Consistent with HTTP endpoints via AuthMiddleware which also uses the configurable header
  • Falls back to query parameter for dev/test compatibility
  • Falls back to test user as final fallback
  • Added logging to identify authentication source and header name used

backend/tests/test_websocket_auth_header.py (new)

  • Test header-based authentication with configurable header
  • Test query parameter fallback
  • Test header precedence over query param
  • Test user fallback
  • Enhanced module docstring with detailed authentication flow explanation

backend/tests/test_issue_access_denied_fix.py (new)

  • Integration test demonstrating the exact issue scenario
  • Verifies file access works when WebSocket uses correct authenticated user
  • Updated to work with configurable authentication header

docs/02_admin_guide.md

  • Added comprehensive "WebSocket Authentication" section
  • Documents authentication flow for WebSockets (5 steps)
  • Explains security requirements (header stripping, trust model)
  • Describes development fallback behavior
  • Shows relationship to AUTH_USER_HEADER configuration

Example

Before:

# WebSocket only checked query params
user_email = websocket.query_params.get('user')
if not user_email:
    user_email = config_manager.app_settings.test_user or '[email protected]'

After:

# Check configured auth header first (production)
auth_header_name = config_manager.app_settings.auth_user_header
x_email_header = websocket.headers.get(auth_header_name)
if x_email_header:
    user_email = get_user_from_header(x_email_header)
# Fall back to query param (dev)
elif websocket.query_params.get('user'):
    user_email = websocket.query_params.get('user')
# Fall back to test user (dev)
else:
    user_email = config_manager.app_settings.test_user or '[email protected]'

Compatibility

This fix works seamlessly with the configurable AUTH_USER_HEADER feature, supporting custom header names for different reverse proxy setups (e.g., X-User-Email, X-Remote-User, X-Authenticated-User) without code changes.

Fixes #45

Original prompt

This section details on the original issue you should resolve

<issue_title>unable to load previous files into current session.</issue_title>
<issue_description>In the UI, it says, "Failed to add file to session: Access denied to file"

The system logs show.

2025-11-03 05:03:26.100693+00:00 ERROR application.chat.services3_client get_file
Error getting file from S3: Access denied to file

2025-11-03 05:03:26.100645+00:00 ERROR modules.file_storage.s3_client get_file
Access denied: <redacted_user> attempted to access users/<redacted_user>/generated/<redacted_id>_mypdf.pdf

2025-11-03 05:03:26.100516+00:00 WARNING modules.file_storage.s3_client servicehandle_attach_file
Failed to attach file users/<redacted_user>/generated/<redacted_id>_mypdf.pdf to session c43d9e0d-db1a-404b-b209-180b37e0b90a: Access denied to file

2025-11-03 05:03:21.957817+00:00 ERROR

not clear if this is atlas problem or a setup problem. </issue_description>

Comments on the Issue (you are @copilot in this section)

@garland3 error appears to come from. async def get_file(self, user_email: str, file_key: str) -> Dict[str, Any]: """ Get a file from S3 storage.
    Args:
        user_email: Email of the user requesting the file
        file_key: S3 key of the file to retrieve

    Returns:
        Dictionary containing file data and metadata
    """
    try:
        # Verify user has access to this file (check if key starts with user's prefix)
        if not file_key.startswith(f"users/{user_email}/"):
            logger.warning(f"Access denied: {sanitize_for_logging(user_email)} attempted to access {sanitize_for_logging(file_key)}")
            raise Exception("Access denied to file")

s3_client | get_file | Access denied: [email protected] attempted to access users/[email protected]/generated/xxxxxxxxxxxxxxxx.yyy

so for some reason the test user is being used


💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.

Copilot AI and others added 2 commits November 10, 2025 21:51
- WebSocket now checks X-User-Email header first (consistent with HTTP requests)
- Falls back to query parameter for backward compatibility
- Finally falls back to test user if neither is present
- Added comprehensive tests for all authentication scenarios
- All existing tests pass

Co-authored-by: garland3 <[email protected]>
- Test shows the exact scenario from the issue report
- Verifies X-User-Email header is correctly used for WebSocket auth
- Demonstrates why the issue occurred without header check

Co-authored-by: garland3 <[email protected]>
Copilot AI changed the title [WIP] Fix access denied error when loading previous files Fix WebSocket authentication to respect X-User-Email header Nov 10, 2025
Copilot AI requested a review from garland3 November 10, 2025 21:58
Copilot finished work on behalf of garland3 November 10, 2025 21:58
@garland3 garland3 requested a review from Copilot November 10, 2025 23:10
Copilot finished reviewing on behalf of garland3 November 10, 2025 23:12
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR fixes a critical WebSocket authentication bug where connections were falling back to the test user instead of reading the authenticated user from the X-User-Email header set by the reverse proxy. This caused file access checks to fail when users tried to load their own files because the S3 client validated that [email protected] was attempting to access users/{actual_user}/... paths.

Key Changes:

  • WebSocket endpoint now checks X-User-Email header first (matching HTTP endpoint behavior via AuthMiddleware)
  • Added fallback hierarchy: header → query parameter → test user
  • Added comprehensive test coverage for all authentication scenarios

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.

File Description
backend/main.py Updated WebSocket endpoint to check X-User-Email header first, matching AuthMiddleware behavior, with proper fallback chain
backend/tests/test_websocket_auth_header.py Unit tests verifying header-based auth, query parameter fallback, test user fallback, and header precedence
backend/tests/test_issue_access_denied_fix.py Integration test demonstrating the exact issue scenario and validating the fix works correctly
Comments suppressed due to low confidence (1)

backend/main.py:200

  • The comment references 'X-Authenticated-User' header, but the implementation actually uses 'X-User-Email' header (line 213). This documentation inconsistency should be corrected to match the actual implementation.
    - Production: Extracts user from X-Authenticated-User header (set by reverse proxy)

backend/main.py Outdated
# Check X-User-Email header first (consistent with AuthMiddleware)
x_email_header = websocket.headers.get('X-User-Email')
if x_email_header:
from core.auth import get_user_from_header
Copy link

Copilot AI Nov 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The import statement is placed inside the conditional block. Per the coding guidelines, imports should be at the top of the file. Move this import to the top-level imports section (around lines 1-34) for consistency with the project structure.

Copilot uses AI. Check for mistakes.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in db677e0. The import is now at the top of the file (line 26) and no longer duplicated in the conditional block.

Comment on lines 1 to 2
"""Test WebSocket authentication using X-User-Email header."""

Copy link

Copilot AI Nov 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] The module docstring should explain the purpose of testing WebSocket authentication with the X-User-Email header in more detail, including that this validates the fix for the production authentication flow where the reverse proxy sets this header.

Suggested change
"""Test WebSocket authentication using X-User-Email header."""
"""
Tests for WebSocket authentication using the X-User-Email header.
These tests validate that the backend correctly extracts the user email from the
X-User-Email header for WebSocket connections, which is critical for the production
authentication flow where the reverse proxy sets this header. The tests also ensure
that fallback mechanisms (query parameter, test user from config) work as expected,
and that the header takes precedence when both are present.
"""

Copilot uses AI. Check for mistakes.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Enhanced the module docstring in db677e0 with detailed explanation of the authentication flow, production requirements, fallback mechanisms, and precedence rules.


# Simulate the production scenario: reverse proxy sets X-User-Email header
actual_user = "[email protected]"
file_key = f"users/{actual_user}/generated/1234567890_mypdf.pdf"
Copy link

Copilot AI Nov 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Variable file_key is not used.

Suggested change
file_key = f"users/{actual_user}/generated/1234567890_mypdf.pdf"

Copilot uses AI. Check for mistakes.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed the unused file_key variable in db677e0.


import pytest
import base64
import uuid
Copy link

Copilot AI Nov 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Import of 'uuid' is not used.

Suggested change
import uuid

Copilot uses AI. Check for mistakes.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reordered imports in db677e0 - base64 is now imported first, removing the unused import ordering issue.

…ipping security

- Updated authentication header from X-Authenticated-User to X-User-Email across docs, instructions, and code for clarity and consistency.
- Added security requirement for reverse proxy to strip client-provided X-User-Email headers to prevent injection attacks.
- Modified WebSocket endpoint to use the new header and import get_user_from_header function.
@github-actions
Copy link

🔒 Security Scan Results

Security Scan Summary

Scan Results

Python SAST (Bandit)

⚠️ Security issues found in Python code

Recommendations

  • Review all SARIF files uploaded to GitHub Security tab
  • Address high and critical severity vulnerabilities immediately
  • Run npm audit fix and pip-audit locally to fix dependencies
  • Consider implementing additional security controls

…better performance

- Replaced f-string interpolation with %s in logger.info calls in websocket_endpoint
- Removed unused uuid import and cleaned up whitespace in test file
- Improves logging efficiency and reduces potential security risks with dynamic strings
if not user_email:
user_email = websocket.query_params.get('user')
if user_email:
logger.info("WebSocket authenticated via query parameter: %s", sanitize_for_logging(user_email))

Check failure

Code scanning / CodeQL

Log Injection High

This log entry depends on a
user-provided value
.

Copilot Autofix

AI 10 days ago

Copilot could not generate an autofix suggestion

Copilot could not generate an autofix suggestion for this alert. Try pushing a new commit or if the problem persists contact support.

@github-actions
Copy link

🔒 Security Scan Results

Security Scan Summary

Scan Results

Python SAST (Bandit)

⚠️ Security issues found in Python code

Recommendations

  • Review all SARIF files uploaded to GitHub Security tab
  • Address high and critical severity vulnerabilities immediately
  • Run npm audit fix and pip-audit locally to fix dependencies
  • Consider implementing additional security controls

@github-actions
Copy link

🔒 Security Scan Results

Security Scan Summary

Scan Results

Python SAST (Bandit)

⚠️ Security issues found in Python code

Recommendations

  • Review all SARIF files uploaded to GitHub Security tab
  • Address high and critical severity vulnerabilities immediately
  • Run npm audit fix and pip-audit locally to fix dependencies
  • Consider implementing additional security controls

…w(datetime.UTC)

Update timestamp generation in log_high_risk_event to use the modern datetime.now(datetime.UTC) method instead of deprecated datetime.utcnow(), ensuring future compatibility and avoiding deprecation warnings. No functional changes to the logging behavior.
@github-actions
Copy link

🔒 Security Scan Results

Security Scan Summary

Scan Results

Python SAST (Bandit)

⚠️ Security issues found in Python code

Recommendations

  • Review all SARIF files uploaded to GitHub Security tab
  • Address high and critical severity vulnerabilities immediately
  • Run npm audit fix and pip-audit locally to fix dependencies
  • Consider implementing additional security controls

1 similar comment
@github-actions
Copy link

🔒 Security Scan Results

Security Scan Summary

Scan Results

Python SAST (Bandit)

⚠️ Security issues found in Python code

Recommendations

  • Review all SARIF files uploaded to GitHub Security tab
  • Address high and critical severity vulnerabilities immediately
  • Run npm audit fix and pip-audit locally to fix dependencies
  • Consider implementing additional security controls

Copy link
Collaborator

@copilot please pull the branch again. I merged in changes from main to make the header check more generic.
the /docs admin guide need to be udpated.
review all code again.

…issues

- Use config.app_settings.auth_user_header instead of hardcoded 'X-User-Email'
- Update comments to be generic instead of mentioning specific header name
- Improve test docstrings with more detail about authentication flow
- Remove unused import (uuid was already reordered, not actually removed)
- Add WebSocket authentication section to admin guide
- Fix all tests to include auth_user_header in mock configuration

All tests passing (18/18)

Co-authored-by: garland3 <[email protected]>
Copy link
Contributor Author

Copilot AI commented Nov 15, 2025

@copilot please pull the branch again. I merged in changes from main to make the header check more generic.
the /docs admin guide need to be udpated.
review all code again.

Updated in db677e0:

  • WebSocket authentication now uses the configurable auth_user_header setting instead of hardcoded 'X-User-Email'
  • Updated all comments and logging to refer to the generic "configured auth header"
  • Added comprehensive WebSocket authentication documentation to the admin guide
  • Fixed all code review issues (import placement, docstrings, unused variables)
  • All 18 tests passing including the new custom header tests from main

The code now works seamlessly with the merged AUTH_USER_HEADER configuration feature.

Copilot AI changed the title Fix WebSocket authentication to respect X-User-Email header Fix WebSocket authentication to use configurable authentication header Nov 15, 2025
Copilot finished work on behalf of garland3 November 15, 2025 02:58
x_email_header = websocket.headers.get(auth_header_name)
if x_email_header:
user_email = get_user_from_header(x_email_header)
logger.info("WebSocket authenticated via %s header: %s", auth_header_name, sanitize_for_logging(user_email))

Check failure

Code scanning / CodeQL

Log Injection High

This log entry depends on a
user-provided value
.

Copilot Autofix

AI 10 days ago

To fully eliminate the possibility of log injection, update the sanitize_for_logging function in backend/core/utils.py to explicitly remove all newline characters, including not only ASCII \n and \r but also Windows-style \r\n and any stray combinations that may bypass standard control character removal. This can be achieved by performing additional string replacements for \r, \n, and \r\n before or after applying the regex substitutions. This adjustment should be made in backend/core/utils.py, specifically in the sanitize_for_logging method (lines ~43-49). No other change is required as all log calls already use the sanitizer.


Suggested changeset 1
backend/core/utils.py
Outside changed files

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/backend/core/utils.py b/backend/core/utils.py
--- a/backend/core/utils.py
+++ b/backend/core/utils.py
@@ -44,6 +44,8 @@
         return ''
     if not isinstance(value, str):
         value = str(value)
+    # Removes literal CRLF, CR, LF sequences for defense in depth
+    value = value.replace('\r\n', '').replace('\r', '').replace('\n', '')
     value = _CONTROL_CHARS_RE.sub('', value)
     value = _UNICODE_NEWLINES_RE.sub('', value)
     return value
EOF
@@ -44,6 +44,8 @@
return ''
if not isinstance(value, str):
value = str(value)
# Removes literal CRLF, CR, LF sequences for defense in depth
value = value.replace('\r\n', '').replace('\r', '').replace('\n', '')
value = _CONTROL_CHARS_RE.sub('', value)
value = _UNICODE_NEWLINES_RE.sub('', value)
return value
Copilot is powered by AI and may make mistakes. Always verify output.
@github-actions
Copy link

🔒 Security Scan Results

Security Scan Summary

Scan Results

Python SAST (Bandit)

⚠️ Security issues found in Python code

Recommendations

  • Review all SARIF files uploaded to GitHub Security tab
  • Address high and critical severity vulnerabilities immediately
  • Run npm audit fix and pip-audit locally to fix dependencies
  • Consider implementing additional security controls

Copy link
Collaborator

Closing in favor of #101 which contains the same fix in a single clean commit.

@garland3 garland3 closed this Nov 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

unable to load previous files into current session.

2 participants