pycantonese.read_chat

pycantonese.read_chat(path: str, match: str = None, exclude: str = None, encoding: str = 'utf-8') → pycantonese.corpus.CHATReader[source]

Read Cantonese CHAT data files.

Parameters
pathstr

A path that points to one of the following:

  • ZIP file. Either a local .zip file path or a URL (one that begins with "https://" or "http://"). URL example: "https://childes.talkbank.org/data/Biling/YipMatthews.zip"

  • A local directory, for files under this directory recursively.

  • A single .cha CHAT file.

matchstr, optional

If provided, only the file paths that match this string (by regular expression matching) are read and parsed. For example, to work with the American English dataset Brown (containing data for the children Adam, Eve, and Sarah), you can pass in "Eve" here to only handle the data for Eve, since the unzipped Brown data from CHILDES has a directory structure of Brown/Eve/xxx.cha for Eve’s data. If this parameter is not specified or None is passed in (the default), such file path filtering does not apply.

excludestr, optional

If provided, the file paths that match this string (by regular expression matching) are excluded for reading and parsing.

encodingstr, optional

Text encoding to parse the CHAT data. The default value is "utf-8" for Unicode UTF-8.

Returns
CHATReader