pycantonese.read_chat
- pycantonese.read_chat(path: str, match: str = None, exclude: str = None, encoding: str = 'utf-8') pycantonese.corpus.CHATReader [source]
Read Cantonese CHAT data files.
- Parameters
- pathstr
A path that points to one of the following:
ZIP file. Either a local
.zip
file path or a URL (one that begins with"https://"
or"http://"
). URL example:"https://childes.talkbank.org/data/Biling/YipMatthews.zip"
A local directory, for files under this directory recursively.
A single
.cha
CHAT file.
- matchstr, optional
If provided, only the file paths that match this string (by regular expression matching) are read and parsed. For example, to work with the American English dataset Brown (containing data for the children Adam, Eve, and Sarah), you can pass in
"Eve"
here to only handle the data for Eve, since the unzipped Brown data from CHILDES has a directory structure ofBrown/Eve/xxx.cha
for Eve’s data. If this parameter is not specified orNone
is passed in (the default), such file path filtering does not apply.- excludestr, optional
If provided, the file paths that match this string (by regular expression matching) are excluded for reading and parsing.
- encodingstr, optional
Text encoding to parse the CHAT data. The default value is
"utf-8"
for Unicode UTF-8.
- Returns