API Reference

Corpus Data

read_chat(*filenames, **kwargs)

Read Cantonese CHAT data files into a reader object.

hkcancor()

Create a corpus object for the Hong Kong Cantonese Corpus.

corpus.CantoneseCHATReader(*filenames, **kwargs)

A reader for Cantonese CHAT corpus files.

corpus.CantoneseCHATReader.search(*[, …])

Search the data for the given criteria.

Jyutping Romanization

characters_to_jyutping(chars)

Convert Cantonese characters into Jyutping romanization.

parse_jyutping(jp_str)

Parse Jyutping romanization into onset, nucleus, code, and tone.

jyutping_to_yale(jp_str[, as_list])

Convert Jyutping romanization into Yale romanization.

jyutping_to_tipa(jp_str)

Convert Jyutping romanization into LaTeX TIPA.

Natural Language Processing

stop_words([add, remove])

Return Cantonese stop words.

segment(unsegmented[, cls])

Segment the unsegmented input.

word_segmentation.Segmenter(*[, …])

A customizable word segmentation model.

pos_tag(words)

Tag the words for their parts of speech.

pos_tagging.hkcancor_to_ud([tag])

Map a part-of-speech tag from HKCanCor to Universal Dependencies.