Fixing problems and getting explanations¶
Ode to a Shipping Label¶
A poem about mojibake, whose original author might be Carlos Bueno on Facebook, shows a shipping label that serves as an excellent example for this section, addressed to the surname LóPEZ
.

We can use ftfy not only to fix the text that was on the label, but to show us what happened to it (like the poem does):
>>> from ftfy import fix_and_explain, apply_plan
>>> shipping_label = "LóPEZ"
>>> fixed, explanation = fix_and_explain(shipping_label)
>>> fixed
'LóPEZ'
>>> explanation
[('apply', 'unescape_html'),
('apply', 'unescape_html'),
('apply', 'unescape_html'),
('encode', 'latin-1'),
('decode', 'utf-8')]
The capitalization is inconsistent because the encoding of a lowercase “ó” is in there, but everything was printed in capital letters.
The explanation may even be able to be applied to different text with the same problem:
>>> label2 = "CARRé"
>>> apply_plan(label2, explanation)
'CARRé'
Functions that fix text¶
The function that you’ll probably use most often is ftfy.fix_text()
, which applies all the fixes it can to every line of text, and returns the fixed text.
ftfy.fix_and_explain()
takes the same arguments as ftfy.fix_text()
, but provides an explanation, like we saw in the first section.
Unlike ftfy.fix_text()
, ftfy.fix_and_explain()
doesn’t separate the text into lines that it fixes separately – because it’s looking for a unified explanation of what happened to the text, not a different one for each line.
A more targeted function is ftfy.fix_encoding_and_explain()
, which only fixes problems that can be solved by encoding and decoding the text, not other problems such as HTML entities:
This function has a counterpart that returns just the fixed string, without the explanation. It still fixes the string as a whole, not line by line.
The return type of the ..._and_explain
functions is a kind of NamedTuple called ExplainedText
:
These explanations can be re-applied to text using apply_plan()
:
Showing the characters in a string¶
A different kind of explanation you might need is simply a breakdown of what Unicode characters a string contains. For this, ftfy provides a utility function, ftfy.explain_unicode()
.
A command-line utility that provides similar information, and even more detail, is lunasorcery’s utf8info.