pycantonese.characters_to_jyutping¶
-
pycantonese.
characters_to_jyutping
(chars)[source]¶ Convert Cantonese characters into Jyutping romanization.
The conversion model is based on the HKCanCor corpus and rime-cantonese data. Any unseen Cantonese character (or punctuation mark, for that matter) is represented by None in the output.
The output is a list of segmented words, where each word is a 2-tuple of (Cantonese characters, Jyutping romanization).
New in version 3.0.0: This function replaces the deprecated equivalent
characters2jyutping
.Changed in version 3.0.0: The returned valued is now a list of segmented words, where each word is a 2-tuple of (Cantonese characters, Jyutping). Previously, it was a list of Jyutping strings for the individual Cantonese characters.
- Parameters
- charsstr
A string of Cantonese characters.
- Returns
- list[tuple[str]]
Examples
>>> characters_to_jyutping("香港人講廣東話。") # Hongkongers speak Cantonese. [('香港人', 'hoeng1gong2jan4'), ('講', 'gong2'), ('廣東話', 'gwong2dung1waa2'), ('。', None)]