pycantonese.CHATReader.search

CHATReader.search(*, onset=None, nucleus=None, coda=None, tone=None, initial=None, final=None, jyutping=None, character=None, pos=None, word_range=(0, 0), utterance_range=(0, 0), sent_range=(0, 0), by_tokens=True, by_utterances=False, tagged=None, sents=None, participants=None, exclude=None, by_files=False)[source]

Search the data for the given criteria.

For examples, please see https://pycantonese.org/searches.html.

Parameters

onsetstr, optional: Onset to search for. A regex is supported.
nucleusstr, optional: Nucleus to search for. A regex is supported.
codastr, optional: Coda to search for. A regex is supported.
tonestr, optional: Tone to search for. A regex is supported.
initialstr, optional: Initial to search for. A regex is supported. An initial, a term more prevalent in traditional Chinese phonology, is the equivalent of an onset.
finalstr, optional: Final to search for. A final, a term more prevalent in traditional Chinese phonology, is the equivalent of a nucleus plus a coda.
jyutpingstr, optional: Jyutping romanization of one Cantonese character to search for. If the romanization contains more than one character, a ValueError is raised.
characterstr, optional: One or more Cantonese characters (within a segmented word) to search for.
posstr, optional: A part-of-speech tag to search for. A regex is supported.
word_rangetuple[int, int], optional: Span of words to the left and right of a matching word to include in the output. The default is (0, 0) to disable a range. If sent_range is used, word_range is ignored.
utterance_rangeTuple[int, int], optional: Span of utterances before and after an utterance containing a matching word to include in the output. If set to (0, 0) (the default), no utterance range output is generated. If utterance_range is used, word_range is ignored.
sent_rangeTuple[int, int], optional: [Deprecated; please use utterance_range instead]
by_tokensbool, optional: If True (the default), words in the output are in the token form (i.e., with Jyutping and part-of-speech tags). Otherwise just words as text strings are returned.
by_utterancesbool, optional: If True (default is False), utterances containing matching words are returned. Otherwise, only matching words are returned.
taggedbool, optional: [Deprecated; please use by_tokens instead]
sentsbool, optional: [Deprecated; please use by_utterances instead]
participantsstr or iterable[str], optional: One or more participants to include in the search. If unspecified, all participants are included.
excludestr or iterable[str], optional: One or more participants to exclude in the search. If unspecified, no participants are excluded.
by_filesbool, optional: If True (default: False), return data organized by the individual file paths.

Returns

list