- pycantonese.read_chat(path: str, match: str = None, exclude: str = None, encoding: str = 'utf-8') pycantonese.corpus.CHATReader [source]
Read Cantonese CHAT data files.
A path that points to one of the following:
ZIP file. Either a local
.zipfile path or a URL (one that begins with
"http://"). URL example:
A local directory, for files under this directory recursively.
- matchstr, optional
If provided, only the file paths that match this string (by regular expression matching) are read and parsed. For example, to work with the American English dataset Brown (containing data for the children Adam, Eve, and Sarah), you can pass in
"Eve"here to only handle the data for Eve, since the unzipped Brown data from CHILDES has a directory structure of
Brown/Eve/xxx.chafor Eve’s data. If this parameter is not specified or
Noneis passed in (the default), such file path filtering does not apply.
- excludestr, optional
If provided, the file paths that match this string (by regular expression matching) are excluded for reading and parsing.
- encodingstr, optional
Text encoding to parse the CHAT data. The default value is
"utf-8"for Unicode UTF-8.