Fix: Arbitrary File Read Vulnerability + Feat: Security settings #1991
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Arbitrary File Read Vulnerability Fix + Security Enhancements
Summary
This document outlines the security vulnerabilities identified in the previous implementation of image handling within the
ImageTextPromptValue
class and details the remediation steps taken in the refactored code. The core issues revolved around insufficient input validation and unsafe handling of user-provided strings, leading to Arbitrary File Read (LFI), Server-Side Request Forgery (SSRF), and potential Denial of Service (DoS) vulnerabilities.Identified Vulnerabilities (Original Code)
Arbitrary File Read (LFI):
file://
URL andurlopen
: Theis_valid_url
check allowedfile://
schemes if anetloc
(hostname likelocalhost
) was provided.mimetypes.guess_type
could be tricked by appending an image extension in a URL fragment (e.g.,#fake.jpg
), whichurlopen
ignores when accessing the file system. This allowed reading arbitrary files accessible to the application user (e.g.,file://localhost/etc/passwd#fake.jpg
).open()
(Potential): Although the primary exploit usedurlopen
, the code path involvingencode_image_to_base64(item)
directly calledopen(item)
. If an attacker could bypass theis_valid_url
check but still trickmimetypes.guess_type
(e.g., using path manipulation if not properly handled beforeopen
), this path could also lead to LFI. Null bytes (\0
) were blocked by Python'sopen()
, but other techniques might exist depending on the OS and context.Server-Side Request Forgery (SSRF):
http://
/https://
URL andurlopen
: The code allowedhttp
andhttps
URLs viais_valid_url
. Ifmimetypes.guess_type
passed (e.g., URL ending in.jpg
), thedownload_and_encode_image
function would useurllib.request.urlopen
to fetch the URL. Attackers could provide URLs pointing to internal network services, cloud metadata endpoints (like AWS169.254.169.254
), or external servers, making the application act as a proxy. The fetched content was then base64 encoded and potentially returned, facilitating data exfiltration.Denial of Service (DoS):
/dev/zero
) using thefile://
LFI could exhaust memory or CPU.Weak Input Validation:
mimetypes.guess_type
based on file extensions in user-controlled strings is fundamentally insecure. It does not verify the actual content.is_valid_url
only checked for scheme and netloc presence, not which schemes were safe or allowed.is_base64
check was basic and could potentially match non-image base64 data.Remediation Strategy Implemented
The refactored code abandons the flawed
is_image
logic and implements a secure processing pipeline (_securely_process_item
) with the following principles:Explicit Input Type Handling: The code now explicitly checks for distinct input formats in a specific order:
data:image/...;base64,...
)http://
,https://
)Secure Base64 Handling:
DATA_URI_REGEX
) to strictly match the expecteddata:image/...;base64,...
format.Secure URL Fetching:
ALLOWED_URL_SCHEMES
(default:http
,https
) are processed.file://
is disallowed by default.requests
library instead ofurllib.request
for easier configuration of timeouts (REQUESTS_TIMEOUT_SECONDS
) and streaming downloads.Content-Length
header and enforcesMAX_DOWNLOAD_SIZE_BYTES
during streaming download to prevent DoS.Pillow
library (Image.open
,img.verify()
) to ensure it is actually a valid image file before encoding and returning. This prevents processing malicious non-image files delivered via allowed URLs.data:
URI is derived from the actual image format identified by Pillow, not guessed from the URL.Secure Local File Handling (Optional & Default Disabled):
ALLOW_LOCAL_FILE_ACCESS
isFalse
. Must be explicitly enabled and configured with extreme care.ALLOWED_IMAGE_BASE_DIR
. User input is treated as relative to this directory.os.path.abspath
andos.path.commonprefix
to rigorously ensure the resolved file path remains within theALLOWED_IMAGE_BASE_DIR
, preventing directory traversal attacks (../
).MAX_LOCAL_FILE_SIZE_BYTES
usingos.path.getsize
before reading the file.Removal of Insecure Functions: The original, vulnerable methods (
is_image
,get_image
,is_base64
,is_valid_url
,encode_image_to_base64
,download_and_encode_image
) have been removed or replaced by the secure processing logic.Required Configuration & Dependencies
requests
andPillow
:ALLOWED_URL_SCHEMES
MAX_DOWNLOAD_SIZE_BYTES
REQUESTS_TIMEOUT_SECONDS
ALLOW_LOCAL_FILE_ACCESS = True
.ALLOWED_IMAGE_BASE_DIR
to the absolute path of the only directory allowed for image loading. Ensure this directory has appropriate permissions and contains no sensitive files.MAX_LOCAL_FILE_SIZE_BYTES
if needed.Conclusion
The refactored code significantly enhances security by replacing insecure pattern matching and uncontrolled resource fetching with explicit validation, strict policy enforcement (schemes, paths, sizes), and content verification using trusted libraries. This approach mitigates the identified LFI, SSRF, and DoS vulnerabilities. Remember to keep dependencies (
requests
,Pillow
) updated to patch potential vulnerabilities within them.