pycantonese.CHATReader.search
- CHATReader.search(*, onset=None, nucleus=None, coda=None, tone=None, initial=None, final=None, jyutping=None, character=None, pos=None, word_range=(0, 0), utterance_range=(0, 0), sent_range=(0, 0), by_tokens=True, by_utterances=False, tagged=None, sents=None, participants=None, exclude=None, by_files=False)[source]
Search the data for the given criteria.
For examples, please see https://pycantonese.org/searches.html.
- Parameters
- onsetstr, optional
Onset to search for. A regex is supported.
- nucleusstr, optional
Nucleus to search for. A regex is supported.
- codastr, optional
Coda to search for. A regex is supported.
- tonestr, optional
Tone to search for. A regex is supported.
- initialstr, optional
Initial to search for. A regex is supported. An initial, a term more prevalent in traditional Chinese phonology, is the equivalent of an onset.
- finalstr, optional
Final to search for. A final, a term more prevalent in traditional Chinese phonology, is the equivalent of a nucleus plus a coda.
- jyutpingstr, optional
Jyutping romanization of one Cantonese character to search for. If the romanization contains more than one character, a ValueError is raised.
- characterstr, optional
One or more Cantonese characters (within a segmented word) to search for.
- posstr, optional
A part-of-speech tag to search for. A regex is supported.
- word_rangetuple[int, int], optional
Span of words to the left and right of a matching word to include in the output. The default is (0, 0) to disable a range. If sent_range is used, word_range is ignored.
- utterance_rangeTuple[int, int], optional
Span of utterances before and after an utterance containing a matching word to include in the output. If set to
(0, 0)
(the default), no utterance range output is generated. If utterance_range is used, word_range is ignored.- sent_rangeTuple[int, int], optional
[Deprecated; please use utterance_range instead]
- by_tokensbool, optional
If
True
(the default), words in the output are in the token form (i.e., with Jyutping and part-of-speech tags). Otherwise just words as text strings are returned.- by_utterancesbool, optional
If
True
(default is False), utterances containing matching words are returned. Otherwise, only matching words are returned.- taggedbool, optional
[Deprecated; please use by_tokens instead]
- sentsbool, optional
[Deprecated; please use by_utterances instead]
- participantsstr or iterable[str], optional
One or more participants to include in the search. If unspecified, all participants are included.
- excludestr or iterable[str], optional
One or more participants to exclude in the search. If unspecified, no participants are excluded.
- by_filesbool, optional
If True (default: False), return data organized by the individual file paths.
- Returns
- list