Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DjangoUnicodeDecodeError for binary fields #700

Open
HealperRasmus opened this issue Feb 10, 2025 · 1 comment
Open

DjangoUnicodeDecodeError for binary fields #700

HealperRasmus opened this issue Feb 10, 2025 · 1 comment

Comments

@HealperRasmus
Copy link

Issue Description

The smart_str() function fails when handling binary fields, raising:
django.utils.encoding.DjangoUnicodeDecodeError: 'utf-8' codec can't decode byte 0xaf in position 2: invalid start byte

Steps to Reproduce

  1. Create a model with a BinaryField
  2. Enable audit logging for this model
  3. Attempt to save/update the model with binary data

Current Behavior

Function fails with UnicodeDecodeError when trying to decode binary data as UTF-8

Expected Behavior

Binary data should be handled gracefully, possibly by converting to hex representation

Proposed Fix

The error occurs in the "smart_str" function. Adding a try catch with hex fallback could be a solution, but I cannot predict if that would have any unforeseen consequences?

def smart_str(s, encoding="utf-8", strings_only=False, errors="strict"):
    if isinstance(s, Promise):
        return s
    if isinstance(s, bytes):
        try:
            return force_str(s, encoding, strings_only, errors)
        except UnicodeDecodeError:
            return s.hex()
    return force_str(s, encoding, strings_only, errors)
@HealperRasmus
Copy link
Author

HealperRasmus commented Feb 10, 2025

Sorry, I just realized that smart_str is built in django function so not an option to edit.

The error happens here: auditlog.diff.get_field_value where it calls the the smart_str. So a better solution would be to handle it there:

Replace:

value = smart_str(
    getattr(obj, field.get_attname(), None), strings_only=True
)

with:

 value = getattr(obj, field.name, None)
if isinstance(field, models.BinaryField):
    # We cannot just use try/except here, because the binary fields might be memoryview objects, which
    # CAN be converted to string, but it will not actually represent the value stored in the database.
    # When larger bytes values are stored in the database, they are returned as memoryview objects.
    # in order to convert them to hex, we need to convert them to bytes first.
    if isinstance(value, (bytes, memoryview)):
        return getattr(obj, field.name, b'').hex()
return smart_str(value, strings_only=True)

An even more generic solution would be to define a dictionary of field handlers, mapping field types to functions that convert their field values to strings. This way its users have a way to handle their special cases without making a PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant