pycantonese.characters_to_jyutping
- pycantonese.characters_to_jyutping(chars: str, segmenter: Optional[pycantonese.word_segmentation.Segmenter] = None) List[Tuple[str, str]] [source]
Convert Cantonese characters into Jyutping romanization.
The conversion model is based on the HKCanCor corpus and rime-cantonese data. Any unseen Cantonese character (or punctuation mark, for that matter) is represented by None in the output.
This function also performs word segmentation, in order to resolve potential ambiguity in mapping characters to Jyutping.
- Parameters
- Returns
- list[tuple[str, str]]
A list of segmented words, where each word is a 2-tuple of (Cantonese characters, Jyutping romanization).
Examples
>>> characters_to_jyutping("香港人講廣東話。") # Hongkongers speak Cantonese. [('香港人', 'hoeng1gong2jan4'), ('講', 'gong2'), ('廣東話', 'gwong2dung1waa2'), ('。', None)]