如何在 python 中將原始 unicode 轉換為 utf8-unicode? (How to convert raw unicode to utf8-unicode in python?)


問題描述

如何在 python 中將原始 unicode 轉換為 utf8‑unicode? (How to convert raw unicode to utf8‑unicode in python?)

第一次來這裡,我會盡力解釋我的問題。

我正在使用 Maya 中的 python2.7。我得到了一個使用 Maya API 導入的字符串(稱為屬性)“attr”,如下所示:

print(attr)
print(type(attr))

>> Générique
>> <type 'unicode'>

我需要將其轉換為 utf‑8 可讀格式,然後才能繼續使用我的工作。基本上我需要能夠做到這一點:

print(attr)
print(type(attr))

>>Générique
>><type 'unicode'>

我已經嘗試了 attr.encode / attr.decode 的多種組合,但我無法真正掌握我應該做什麼。最困擾我的是,當我嘗試在代碼中手動鍵入變量時,您實際上可以得到:

attr = 'Générique'
print(type(attr))
attr = attr.decode('utf‑8')
print(attr)
print(type(attr))

>><type 'str'>
>>Générique
>><type 'unicode'>

所以我知道我最初應該將 'attr' 轉換為 str 類型,但我可以不要在不丟失信息的情況下這樣做。

有任何想法嗎 ?請?

編輯:由snakecharmerb(和ftfy)解決。非常感謝。這篇文章下的兩種解決方案。


參考解法

方法 1:

SOLVED :

I found out about the module FTFY. Was a bit of a hassle to make pip work with Maya but it's all fine and done. To anyone with the same issue: make pip work with maya: https://forums.autodesk.com/t5/maya‑programming/can‑i‑use‑pip‑in‑maya‑script‑editor/td‑p/7638107 (you'll need to run admin cmd or it won't install)

grab ftfy (version below 5 was compatible with python2.7): pip install ftfy==4.4.3

my unclean code looks like this :

from __future__ import unicode_literals
import pymel.core as pm
import maya.cmds as cmds
import maya.utils
import unicodedata
import StringIO
import codecs
import sys
import re
from ftfy import fix_text

attr = cmds.getAttr(*objectName*)
attr = fix_text(attr)
print(attr)

方法 2:

What you have is text that was originally UTF‑8 but decoded with an 8‑bit encoding, likely latin‑1 or cp1252. To fix the text you need to encode to the 8‑bit encoding to get the UTF‑8 bytes and then decode.

>>> u = u'Générique'
>>> fixed = u.encode('latin‑1').decode('utf‑8')
>>> print fixed
Générique

(by gargamgargamsnakecharmerb)

參考文件

  1. How to convert raw unicode to utf8‑unicode in python? (CC BY‑SA 2.5/3.0/4.0)

#character-encoding #Python #python-2.7






相關問題

android webview顯示windows-1250字符集html的問題 (Trouble with android webview displaying windows-1250 charset html)

SQL Server 2008:字符編碼 (SQL Server 2008 : Character encoding)

刪除不可打印的字符 (Removing non-printable character)

電子郵件客戶端如何讀取內容類型標頭進行編碼? (How does an email client read the content-type headers for encoding?)

帶有 iText 7 的 PDF 中的希臘字符 (Greek characters in PDF with iText 7)

如何在 C 字符串中的文本或字母中添加下標字符? (How to add a subscript character to text or a letter in a C string?)

來自 URL 編碼問題的 NSArray (NSArray from URL encoding problem)

網絡上有免費提供的 HTML URL 編碼功能嗎?(在 C 中實現) (Is there any HTML URL encoding function freely available on web?? (Implementation in C))

讀取未知編碼的文本行 (Reading lines of text in unknown encoding)

Python - 以 Python 可以使用的格式編碼外來字符的方法? (Python - Way to encode foreign characters in format Python can work with?)

決定 HTTP 標頭的字符集。我應該簡單地把 utf-8 和 fuggedaboutit 放在一起嗎? (Deciding charset for HTTP Headers. Should i simply put utf-8 and fuggedaboutit?)

如何在 python 中將原始 unicode 轉換為 utf8-unicode? (How to convert raw unicode to utf8-unicode in python?)







留言討論