pycantonese.characters_to_jyutping

pycantonese.characters_to_jyutping(chars: str, segmenter: Optional[pycantonese.word_segmentation.Segmenter] = None) → List[Tuple[str, str]][source]

Convert Cantonese characters into Jyutping romanization.

The conversion model is based on the HKCanCor corpus and rime-cantonese data. Any unseen Cantonese character (or punctuation mark, for that matter) is represented by None in the output.

This function also performs word segmentation, in order to resolve potential ambiguity in mapping characters to Jyutping.

Parameters

charsstr: A string of Cantonese characters.
segmenterSegmenter, optional: A Segmenter instance to customize word segmentation. If specified, this segmenter is passed to the cls keyword argument of segment(). If None or not given, the default segmenter is used.

Returns

list[tuple[str, str]]: A list of segmented words, where each word is a 2-tuple of (Cantonese characters, Jyutping romanization).

Examples

>>> characters_to_jyutping("香港人講廣東話。")  # Hongkongers speak Cantonese.
[('香港人', 'hoeng1gong2jan4'), ('講', 'gong2'), ('廣東話', 'gwong2dung1waa2'), ('。', None)]