Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UTF-8 column names and error messages are octet-streams, not strings [rt.cpan.org #120141] #214

Open
mbeijen opened this issue Nov 15, 2017 · 6 comments
Labels
utf8 Unicode and UTF-8 handling

Comments

@mbeijen
Copy link
Contributor

mbeijen commented Nov 15, 2017

Migrated from rt.cpan.org#120141 (status was 'open')

Requestors:

Attachments:

From [email protected] on 2017-02-08 04:07:52:

Hello,

Column names and error messages should be treated as strings, but
they are octet-streams in DBD-mysql-4.041.

The attached code creates a table with a column whose name
contains a non ASCII character.  After issueing a SELECT statement
and fetchrow_hashref, it tries to get a value using the column name
at (1), but the result is undef.  If you use the octet stream for
the column name as a key, you get the value, at (2).

Also, when you use Japanese error messages by adding line
	lc_messages=ja_JP
in [mysqld] section of my.ini, messages are not decoded in
DBD::mysql.  As a result, messages are unreadable in (3) and (4).
We could explicitly decode them as in (5) for message caught, but
this cannot be applied to (3).  Of course, it can be avoided by
not using automatic encoding for STDERR at (6), but then we need
to manually encode all other strings, a nightmare.

Finally, I noticed that when error messages are in Japanese, make
test of DBD-mysql fails.  It may be difficult to avoid (I do not
know), but a warning message (lc_messages should not be changed)
in make test would help.

DBD::mysql version: 4.041
Strawberry perl 64bit, v5.22.1
MariaDB
   $dbh->{mysql_clientinfo, mysql_clientversion, mysql_serverversion}  
returns:
   5.1.44, 50144, 50505, respectively.
Windows 7 Pro Service Pack 1

Regards,
Tanabe Yoshinori

From [email protected] on 2017-02-08 10:32:43:

On Tue Feb 07 23:07:52 2017, [email protected] wrote:
> Hello,
> 
> Column names and error messages should be treated as strings, but
> they are octet-streams in DBD-mysql-4.041.
> 
> The attached code creates a table with a column whose name
> contains a non ASCII character.  After issueing a SELECT statement
> and fetchrow_hashref, it tries to get a value using the column name
> at (1), but the result is undef.  If you use the octet stream for
> the column name as a key, you get the value, at (2).
> 
> Also, when you use Japanese error messages by adding line
> 	lc_messages=ja_JP
> in [mysqld] section of my.ini, messages are not decoded in
> DBD::mysql.  As a result, messages are unreadable in (3) and (4).
> We could explicitly decode them as in (5) for message caught, but
> this cannot be applied to (3).  Of course, it can be avoided by
> not using automatic encoding for STDERR at (6), but then we need
> to manually encode all other strings, a nightmare.
> 
> Finally, I noticed that when error messages are in Japanese, make
> test of DBD-mysql fails.  It may be difficult to avoid (I do not
> know), but a warning message (lc_messages should not be changed)
> in make test would help.
> 
> DBD::mysql version: 4.041
> Strawberry perl 64bit, v5.22.1
> MariaDB
>    $dbh->{mysql_clientinfo, mysql_clientversion, mysql_serverversion}  
> returns:
>    5.1.44, 50144, 50505, respectively.
> Windows 7 Pro Service Pack 1
> 
> Regards,
> Tanabe Yoshinori
> 

Hello, please try development version 4.041_1 of DBD-mysql. That one has fixed UTF-8 support for passing statements and parameters.

From [email protected] on 2017-02-08 11:20:34:

On 2017/02/08 19:32, Pali via RT wrote:
> <URL: https://rt.cpan.org/Ticket/Display.html?id=120141 >
>
> On Tue Feb 07 23:07:52 2017, [email protected] wrote:
>> Hello,
>>
>> Column names and error messages should be treated as strings, but
>> they are octet-streams in DBD-mysql-4.041.
>>
>> The attached code creates a table with a column whose name
>> contains a non ASCII character.  After issueing a SELECT statement
>> and fetchrow_hashref, it tries to get a value using the column name
>> at (1), but the result is undef.  If you use the octet stream for
>> the column name as a key, you get the value, at (2).
>>
>> Also, when you use Japanese error messages by adding line
>> 	lc_messages=ja_JP
>> in [mysqld] section of my.ini, messages are not decoded in
>> DBD::mysql.  As a result, messages are unreadable in (3) and (4).
>> We could explicitly decode them as in (5) for message caught, but
>> this cannot be applied to (3).  Of course, it can be avoided by
>> not using automatic encoding for STDERR at (6), but then we need
>> to manually encode all other strings, a nightmare.
>>
>> Finally, I noticed that when error messages are in Japanese, make
>> test of DBD-mysql fails.  It may be difficult to avoid (I do not
>> know), but a warning message (lc_messages should not be changed)
>> in make test would help.
>>
>> DBD::mysql version: 4.041
>> Strawberry perl 64bit, v5.22.1
>> MariaDB
>>    $dbh->{mysql_clientinfo, mysql_clientversion, mysql_serverversion}
>> returns:
>>    5.1.44, 50144, 50505, respectively.
>> Windows 7 Pro Service Pack 1
>>
>> Regards,
>> Tanabe Yoshinori
>>
>
> Hello, please try development version 4.041_1 of DBD-mysql. That one has fixed UTF-8 support for passing statements and parameters.
>

Hello,

I have just installed 4.041_01 ("print $DBD::mysql::VERSION" shows the 
number) and run the script again.  The results are the same as in my
first report.

Thank you.
Tanabe

From [email protected] on 2017-02-12 12:52:30:

On Str Feb 08 06:20:34 2017, [email protected] wrote:
> On 2017/02/08 19:32, Pali via RT wrote:
> > <URL: https://rt.cpan.org/Ticket/Display.html?id=120141 >
> >
> > On Tue Feb 07 23:07:52 2017, [email protected] wrote:
> >> Hello,
> >>
> >> Column names and error messages should be treated as strings, but
> >> they are octet-streams in DBD-mysql-4.041.
> >>
> >> The attached code creates a table with a column whose name
> >> contains a non ASCII character.  After issueing a SELECT statement
> >> and fetchrow_hashref, it tries to get a value using the column name
> >> at (1), but the result is undef.  If you use the octet stream for
> >> the column name as a key, you get the value, at (2).
> >>
> >> Also, when you use Japanese error messages by adding line
> >>      lc_messages=ja_JP
> >> in [mysqld] section of my.ini, messages are not decoded in
> >> DBD::mysql.  As a result, messages are unreadable in (3) and (4).
> >> We could explicitly decode them as in (5) for message caught, but
> >> this cannot be applied to (3).  Of course, it can be avoided by
> >> not using automatic encoding for STDERR at (6), but then we need
> >> to manually encode all other strings, a nightmare.
> >>
> >> Finally, I noticed that when error messages are in Japanese, make
> >> test of DBD-mysql fails.  It may be difficult to avoid (I do not
> >> know), but a warning message (lc_messages should not be changed)
> >> in make test would help.
> >>
> >> DBD::mysql version: 4.041
> >> Strawberry perl 64bit, v5.22.1
> >> MariaDB
> >>    $dbh->{mysql_clientinfo, mysql_clientversion,
> >> mysql_serverversion}
> >> returns:
> >>    5.1.44, 50144, 50505, respectively.
> >> Windows 7 Pro Service Pack 1
> >>
> >> Regards,
> >> Tanabe Yoshinori
> >>
> >
> > Hello, please try development version 4.041_1 of DBD-mysql. That one
> > has fixed UTF-8 support for passing statements and parameters.
> >
> 
> Hello,
> 
> I have just installed 4.041_01 ("print $DBD::mysql::VERSION" shows the
> number) and run the script again.  The results are the same as in my
> first report.
> 
> Thank you.
> Tanabe

Hi! Can you try compile DBD::mysql (either 4.041_01 or from git master) with these two attached patches? It should fix wide Unicode characters in column names and error messages. Note that DBI itself has broken Unicode messages prior to version 1.635 (see https://rt.cpan.org/Public/Bug/Display.html?id=102404).

From [email protected] on 2017-02-12 12:54:03:

On Ned Feb 12 07:52:30 2017, PALI wrote:
> On Str Feb 08 06:20:34 2017, [email protected] wrote:
> > On 2017/02/08 19:32, Pali via RT wrote:
> > > <URL: https://rt.cpan.org/Ticket/Display.html?id=120141 >
> > >
> > > On Tue Feb 07 23:07:52 2017, [email protected] wrote:
> > >> Hello,
> > >>
> > >> Column names and error messages should be treated as strings, but
> > >> they are octet-streams in DBD-mysql-4.041.
> > >>
> > >> The attached code creates a table with a column whose name
> > >> contains a non ASCII character.  After issueing a SELECT statement
> > >> and fetchrow_hashref, it tries to get a value using the column
> > >> name
> > >> at (1), but the result is undef.  If you use the octet stream for
> > >> the column name as a key, you get the value, at (2).
> > >>
> > >> Also, when you use Japanese error messages by adding line
> > >>      lc_messages=ja_JP
> > >> in [mysqld] section of my.ini, messages are not decoded in
> > >> DBD::mysql.  As a result, messages are unreadable in (3) and (4).
> > >> We could explicitly decode them as in (5) for message caught, but
> > >> this cannot be applied to (3).  Of course, it can be avoided by
> > >> not using automatic encoding for STDERR at (6), but then we need
> > >> to manually encode all other strings, a nightmare.
> > >>
> > >> Finally, I noticed that when error messages are in Japanese, make
> > >> test of DBD-mysql fails.  It may be difficult to avoid (I do not
> > >> know), but a warning message (lc_messages should not be changed)
> > >> in make test would help.
> > >>
> > >> DBD::mysql version: 4.041
> > >> Strawberry perl 64bit, v5.22.1
> > >> MariaDB
> > >>    $dbh->{mysql_clientinfo, mysql_clientversion,
> > >> mysql_serverversion}
> > >> returns:
> > >>    5.1.44, 50144, 50505, respectively.
> > >> Windows 7 Pro Service Pack 1
> > >>
> > >> Regards,
> > >> Tanabe Yoshinori
> > >>
> > >
> > > Hello, please try development version 4.041_1 of DBD-mysql. That
> > > one
> > > has fixed UTF-8 support for passing statements and parameters.
> > >
> >
> > Hello,
> >
> > I have just installed 4.041_01 ("print $DBD::mysql::VERSION" shows
> > the
> > number) and run the script again.  The results are the same as in my
> > first report.
> >
> > Thank you.
> > Tanabe
> 
> Hi! Can you try compile DBD::mysql (either 4.041_01 or from git
> master) with these two attached patches? It should fix wide Unicode
> characters in column names and error messages. Note that DBI itself
> has broken Unicode messages prior to version 1.635 (see
> https://rt.cpan.org/Public/Bug/Display.html?id=102404).

Trying to attach patches again...

From [email protected] on 2017-02-13 02:34:37:

On 2017/02/12 21:52, Pali via RT wrote:
> <URL: https://rt.cpan.org/Ticket/Display.html?id=120141 >
>
> On Str Feb 08 06:20:34 2017, [email protected] wrote:
>> On 2017/02/08 19:32, Pali via RT wrote:
>>> <URL: https://rt.cpan.org/Ticket/Display.html?id=120141 >
>>>
>>> On Tue Feb 07 23:07:52 2017, [email protected] wrote:
>>>> Hello,
>>>>
>>>> Column names and error messages should be treated as strings, but
>>>> they are octet-streams in DBD-mysql-4.041.
>>>>
>>>> The attached code creates a table with a column whose name
>>>> contains a non ASCII character.  After issueing a SELECT statement
>>>> and fetchrow_hashref, it tries to get a value using the column name
>>>> at (1), but the result is undef.  If you use the octet stream for
>>>> the column name as a key, you get the value, at (2).
>>>>
>>>> Also, when you use Japanese error messages by adding line
>>>>      lc_messages=ja_JP
>>>> in [mysqld] section of my.ini, messages are not decoded in
>>>> DBD::mysql.  As a result, messages are unreadable in (3) and (4).
>>>> We could explicitly decode them as in (5) for message caught, but
>>>> this cannot be applied to (3).  Of course, it can be avoided by
>>>> not using automatic encoding for STDERR at (6), but then we need
>>>> to manually encode all other strings, a nightmare.
>>>>
>>>> Finally, I noticed that when error messages are in Japanese, make
>>>> test of DBD-mysql fails.  It may be difficult to avoid (I do not
>>>> know), but a warning message (lc_messages should not be changed)
>>>> in make test would help.
>>>>
>>>> DBD::mysql version: 4.041
>>>> Strawberry perl 64bit, v5.22.1
>>>> MariaDB
>>>>    $dbh->{mysql_clientinfo, mysql_clientversion,
>>>> mysql_serverversion}
>>>> returns:
>>>>    5.1.44, 50144, 50505, respectively.
>>>> Windows 7 Pro Service Pack 1
>>>>
>>>> Regards,
>>>> Tanabe Yoshinori
>>>>
>>>
>>> Hello, please try development version 4.041_1 of DBD-mysql. That one
>>> has fixed UTF-8 support for passing statements and parameters.
>>>
>>
>> Hello,
>>
>> I have just installed 4.041_01 ("print $DBD::mysql::VERSION" shows the
>> number) and run the script again.  The results are the same as in my
>> first report.
>>
>> Thank you.
>> Tanabe
>
> Hi! Can you try compile DBD::mysql (either 4.041_01 or from git master) with these two attached patches? It should fix wide Unicode characters in column names and error messages. Note that DBI itself has broken Unicode messages prior to version 1.635 (see https://rt.cpan.org/Public/Bug/Display.html?id=102404).
>

Hello,  I have confirmed that the problems have gone by applying the 
patches (and upgrading DBI to a later version).  Thank you very much for 
the quick fix.
One concern is that the fix can break code currently running.
Best regards,
Tanabe

From [email protected] on 2017-02-13 08:26:46:

On Sun Feb 12 21:34:37 2017, [email protected] wrote:
> On 2017/02/12 21:52, Pali via RT wrote:
> > <URL: https://rt.cpan.org/Ticket/Display.html?id=120141 >
> >
> > On Str Feb 08 06:20:34 2017, [email protected] wrote:
> >> On 2017/02/08 19:32, Pali via RT wrote:
> >>> <URL: https://rt.cpan.org/Ticket/Display.html?id=120141 >
> >>>
> >>> On Tue Feb 07 23:07:52 2017, [email protected] wrote:
> >>>> Hello,
> >>>>
> >>>> Column names and error messages should be treated as strings, but
> >>>> they are octet-streams in DBD-mysql-4.041.
> >>>>
> >>>> The attached code creates a table with a column whose name
> >>>> contains a non ASCII character.  After issueing a SELECT statement
> >>>> and fetchrow_hashref, it tries to get a value using the column
> >>>> name
> >>>> at (1), but the result is undef.  If you use the octet stream for
> >>>> the column name as a key, you get the value, at (2).
> >>>>
> >>>> Also, when you use Japanese error messages by adding line
> >>>>      lc_messages=ja_JP
> >>>> in [mysqld] section of my.ini, messages are not decoded in
> >>>> DBD::mysql.  As a result, messages are unreadable in (3) and (4).
> >>>> We could explicitly decode them as in (5) for message caught, but
> >>>> this cannot be applied to (3).  Of course, it can be avoided by
> >>>> not using automatic encoding for STDERR at (6), but then we need
> >>>> to manually encode all other strings, a nightmare.
> >>>>
> >>>> Finally, I noticed that when error messages are in Japanese, make
> >>>> test of DBD-mysql fails.  It may be difficult to avoid (I do not
> >>>> know), but a warning message (lc_messages should not be changed)
> >>>> in make test would help.
> >>>>
> >>>> DBD::mysql version: 4.041
> >>>> Strawberry perl 64bit, v5.22.1
> >>>> MariaDB
> >>>>    $dbh->{mysql_clientinfo, mysql_clientversion,
> >>>> mysql_serverversion}
> >>>> returns:
> >>>>    5.1.44, 50144, 50505, respectively.
> >>>> Windows 7 Pro Service Pack 1
> >>>>
> >>>> Regards,
> >>>> Tanabe Yoshinori
> >>>>
> >>>
> >>> Hello, please try development version 4.041_1 of DBD-mysql. That
> >>> one
> >>> has fixed UTF-8 support for passing statements and parameters.
> >>>
> >>
> >> Hello,
> >>
> >> I have just installed 4.041_01 ("print $DBD::mysql::VERSION" shows
> >> the
> >> number) and run the script again.  The results are the same as in my
> >> first report.
> >>
> >> Thank you.
> >> Tanabe
> >
> > Hi! Can you try compile DBD::mysql (either 4.041_01 or from git
> > master) with these two attached patches? It should fix wide Unicode
> > characters in column names and error messages. Note that DBI itself
> > has broken Unicode messages prior to version 1.635 (see
> > https://rt.cpan.org/Public/Bug/Display.html?id=102404).
> >
> 
> Hello,  I have confirmed that the problems have gone by applying the
> patches (and upgrading DBI to a later version).  Thank you very much
> for
> the quick fix.
> One concern is that the fix can break code currently running.
> Best regards,
> Tanabe

Thank you for testing. I will reuse your script to create tests for this issue.

Currently Unicode support is broken for a long time in DBD::mysql and proper way is to fix current code.

From [email protected] on 2017-07-01 09:12:29:

Reopening, fix was reverted in 4.043.
@dveeden dveeden added the utf8 Unicode and UTF-8 handling label Oct 6, 2023
@michal-josef-spacek
Copy link
Contributor

I created PR with tests for this issue (#467).

Can you give me feedback? The code is duplicated, I don't like it, but I don't know how to write better. Any hint?

@michal-josef-spacek
Copy link
Contributor

I will rewrite to one test file.

@michal-josef-spacek
Copy link
Contributor

I finished PR (#467)

I split the test file into two:

  • the first one is for testing UTF8 identifiers in the table name and column name.
  • the second one is for testing of UTF8 errors.

Any feedback?

@michal-josef-spacek
Copy link
Contributor

@dveeden @Grinnz I think that we could merge this PR. Only describe the actual situation. I am for releasing it to process tests in the testing farm.

The main question is, what is the next step in this case?
There are two independent patches. It's possible to test them, fix tests etc.

Do we want to fix it?
I mean fixing of error messages and table/column identifiers in case of mysql_enable_utf8 and mysql_enable_utf8mb4 set.
I think that's fine. But this is incompatible with users and other modules.
How to do it?

@dveeden
Copy link
Collaborator

dveeden commented Apr 7, 2025

@michal-josef-spacek Maybe with an option on the DSN? e.g. mysql_perlutf8_compat=0 ?

@michal-josef-spacek
Copy link
Contributor

@dveeden Yes, that is possible. I don't know if we need this kind of thing. Maybe.

I am proposing:

  1. Improve the other two UTF-8 test files before this. To confirm the situation. (I will do)
  2. Release (are you ok with it?). I think we need some test results with the new tests. (On your side)
  3. Prepare the fix, tests, and I could try to create a new option. And documentation update. (I will do)

ok?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
utf8 Unicode and UTF-8 handling
Projects
None yet
Development

No branches or pull requests

3 participants