pycantonese.characters_to_jyutping

pycantonese.characters_to_jyutping(chars)[source]

Convert Cantonese characters into Jyutping romanization.

The conversion model is based on the HKCanCor corpus and rime-cantonese data. Any unseen Cantonese character (or punctuation mark, for that matter) is represented by None in the output.

The output is a list of segmented words, where each word is a 2-tuple of (Cantonese characters, Jyutping romanization).

New in version 3.0.0: This function replaces the deprecated equivalent characters2jyutping.

Changed in version 3.0.0: The returned valued is now a list of segmented words, where each word is a 2-tuple of (Cantonese characters, Jyutping). Previously, it was a list of Jyutping strings for the individual Cantonese characters.

Parameters
charsstr

A string of Cantonese characters.

Returns
list[tuple[str]]

Examples

>>> characters_to_jyutping("香港人講廣東話。")  # Hongkongers speak Cantonese.
[('香港人', 'hoeng1gong2jan4'), ('講', 'gong2'), ('廣東話', 'gwong2dung1waa2'), ('。', None)]