Skip to content

Inconsistent character encodings in source files #10

@alexhenrie

Description

@alexhenrie

The C and C++ files in this repository use 9 different character encodings, which breaks the flawfinder tool. (flawfinder expects all source files to have the same character encoding, preferably UTF-8.)

$ git ls-files | grep -E '\.(c|cpp|h)$' | xargs -n 1 uchardet | sort | uniq -c | sort -rn
    186 ISO-8859-2
     71 ASCII
      8 ISO-8859-1
      4 WINDOWS-1252
      4 UTF-8
      4 ISO-8859-3
      2 WINDOWS-1250
      2 IBM852
      1 ISO-8859-9

The files can be converted en masse to UTF-8 with the following commands:

sudo pip install cvt2utf
git ls-files | grep -E '\.(c|cpp|h)$' | xargs -n 1 sed -i $'s/\xA3/\\\\xA3/g'
git ls-files | grep -E '\.(c|cpp|h)$' | xargs -n 1 cvt2utf convert

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions