Some Chinese names have unnecessary spaces at the end when transliterating #64

taraskuzyk · 2021-05-20T22:15:17Z

When trying to transliterate

"马云"
I receive

"Ma Yun " (notice the space in the end) instead of

"Ma Yun"

Here's the code you can use to replicate this issue:

import unittest
import unidecode

class TestStrings(unittest.TestCase):
    def test_replace_non_ascii_letters_with_chinese_name(self):
        self.assertEquals(unidecode.unidecode("马云"), "Ma Yun")

The test fails with the following error:

AssertionError: 'Ma Yun ' != 'Ma Yun'
- Ma Yun 
?       -
+ Ma Yun

Run on Python 3.8.5

EDIT:

Google Translate seems to be doing this with no issue, but perhaps Google Translate has the faulty transliteration. Chinese speakers welcome to correct me.

The text was updated successfully, but these errors were encountered:

avian2 · 2021-05-21T06:35:14Z

The technical reason why transliteration for each letter includes a space at the end is because otherwise you would not get spaces between letters. In your example you would get "MaYun". Unidecode just does a simple mapping from a Unicode character to ASCII sequences and doesn't know which letter appears last in your name. Hence the last letter will leave a trailing space.

I don't speak Chines, but the original author of Unidecode thought it was better to have spaces so I will leave it like that.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Some Chinese names have unnecessary spaces at the end when transliterating #64

Some Chinese names have unnecessary spaces at the end when transliterating #64

taraskuzyk commented May 20, 2021 •

edited

Loading

avian2 commented May 21, 2021

Some Chinese names have unnecessary spaces at the end when transliterating #64

Some Chinese names have unnecessary spaces at the end when transliterating #64

Comments

taraskuzyk commented May 20, 2021 • edited Loading

avian2 commented May 21, 2021

taraskuzyk commented May 20, 2021 •

edited

Loading