pycantonese.CHATReader

class pycantonese.CHATReader[source]

A reader for Cantonese CHAT corpus files.

Note

Some of the methods are inherited from the parent class Reader for language acquisition, which may or may not be applicable to your use case.

Methods

ages([participant, months])

Return the ages of the given participant in the data.

append(reader)

Append data from another reader.

append_left(reader)

Left-append data from another reader.

characters([participants, exclude, …])

Return the data in individual Chinese characters.

clear()

Remove all data from this reader.

dates_of_recording([by_files])

Return the dates of recording.

extend(readers)

Extend data from other readers.

extend_left(readers)

Left-extend data from other readers.

file_paths()

Return the file paths.

filter([match, exclude])

Return a new reader filtered by file paths.

from_dir(path[, match, exclude, extension, …])

Instantiate a reader from a local directory with CHAT data files.

from_files(paths[, match, exclude, …])

Instantiate a reader from local CHAT data files.

from_strs(strs[, ids, parallel])

Instantiate a reader from in-memory CHAT data strings.

from_zip(path[, match, exclude, extension, …])

Instantiate a reader from a local or remote ZIP file.

headers()

Return the headers.

ipsyn()

(Not implemented - the upstream ipsyn method works for English only.)

jyutping([participants, exclude, …])

Return the data in Jyutping romanization.

languages([by_files])

Return the languages in the data.

mlu([participant])

Return the mean lengths of utterance (MLU).

mlum([participant])

Return the mean lengths of utterance by morphemes.

mluw([participant])

Return the mean lengths of utterance by words.

n_files()

Return the number of files.

participants([by_files])

Return the participants (e.g., CHI, MOT).

pop()

Drop the last data file from the reader and return it as a reader.

pop_left()

Drop the first data file from the reader and return it as a reader.

search(*[, onset, nucleus, coda, tone, …])

Search the data for the given criteria.

sents([participants, exclude, by_files])

Return the sents.

tagged_sents([participants, exclude, by_files])

Return the tagged sents.

tagged_words([participants, exclude, by_files])

Return the tagged words.

tokens([participants, exclude, …])

Return the tokens.

ttr([keep_case, participant])

Return the type-token ratios (TTR).

utterances([participants, exclude, by_files])

Return the utterances.

word_frequencies([keep_case, participants, …])

Return word frequencies.

word_ngrams(n[, keep_case, participants, …])

Return word ngrams.

words([participants, exclude, …])

Return the words.

character_sents

jyutping_sents

jyutpings

__init__()

Initialize an empty reader.

Methods

__init__()

Initialize an empty reader.

ages([participant, months])

Return the ages of the given participant in the data.

append(reader)

Append data from another reader.

append_left(reader)

Left-append data from another reader.

character_sents([participants, exclude, …])

characters([participants, exclude, …])

Return the data in individual Chinese characters.

clear()

Remove all data from this reader.

dates_of_recording([by_files])

Return the dates of recording.

extend(readers)

Extend data from other readers.

extend_left(readers)

Left-extend data from other readers.

file_paths()

Return the file paths.

filter([match, exclude])

Return a new reader filtered by file paths.

from_dir(path[, match, exclude, extension, …])

Instantiate a reader from a local directory with CHAT data files.

from_files(paths[, match, exclude, …])

Instantiate a reader from local CHAT data files.

from_strs(strs[, ids, parallel])

Instantiate a reader from in-memory CHAT data strings.

from_zip(path[, match, exclude, extension, …])

Instantiate a reader from a local or remote ZIP file.

headers()

Return the headers.

ipsyn()

(Not implemented - the upstream ipsyn method works for English only.)

jyutping([participants, exclude, …])

Return the data in Jyutping romanization.

jyutping_sents([participants, exclude, by_files])

jyutpings([participants, exclude, …])

languages([by_files])

Return the languages in the data.

mlu([participant])

Return the mean lengths of utterance (MLU).

mlum([participant])

Return the mean lengths of utterance by morphemes.

mluw([participant])

Return the mean lengths of utterance by words.

n_files()

Return the number of files.

participants([by_files])

Return the participants (e.g., CHI, MOT).

pop()

Drop the last data file from the reader and return it as a reader.

pop_left()

Drop the first data file from the reader and return it as a reader.

search(*[, onset, nucleus, coda, tone, …])

Search the data for the given criteria.

sents([participants, exclude, by_files])

Return the sents.

tagged_sents([participants, exclude, by_files])

Return the tagged sents.

tagged_words([participants, exclude, by_files])

Return the tagged words.

tokens([participants, exclude, …])

Return the tokens.

ttr([keep_case, participant])

Return the type-token ratios (TTR).

utterances([participants, exclude, by_files])

Return the utterances.

word_frequencies([keep_case, participants, …])

Return word frequencies.

word_ngrams(n[, keep_case, participants, …])

Return word ngrams.

words([participants, exclude, …])

Return the words.