Convert Cantonese characters into Jyutping romanization.
The conversion model is based on the HKCanCor corpus and rime-cantonese data. Any unseen Cantonese character (or punctuation mark, for that matter) is represented by None in the output.
The output is a list of segmented words, where each word is a 2-tuple of (Cantonese characters, Jyutping romanization).
New in version 3.0.0: This function replaces the deprecated equivalent
Changed in version 3.0.0: The returned valued is now a list of segmented words, where each word is a 2-tuple of (Cantonese characters, Jyutping). Previously, it was a list of Jyutping strings for the individual Cantonese characters.
A string of Cantonese characters.
>>> characters_to_jyutping("香港人講廣東話。") # Hongkongers speak Cantonese. [('香港人', 'hoeng1gong2jan4'), ('講', 'gong2'), ('廣東話', 'gwong2dung1waa2'), ('。', None)]