pycantonese.CHATReader.search
- CHATReader.search(*, onset=None, nucleus=None, coda=None, tone=None, initial=None, final=None, jyutping=None, character=None, pos=None, word_range=(0, 0), utterance_range=(0, 0), sent_range=(0, 0), by_tokens=True, by_utterances=False, tagged=None, sents=None, participants=None, exclude=None, by_files=False)[source]
- Search the data for the given criteria. - For examples, please see https://pycantonese.org/searches.html. - Parameters
- onsetstr, optional
- Onset to search for. A regex is supported. 
- nucleusstr, optional
- Nucleus to search for. A regex is supported. 
- codastr, optional
- Coda to search for. A regex is supported. 
- tonestr, optional
- Tone to search for. A regex is supported. 
- initialstr, optional
- Initial to search for. A regex is supported. An initial, a term more prevalent in traditional Chinese phonology, is the equivalent of an onset. 
- finalstr, optional
- Final to search for. A final, a term more prevalent in traditional Chinese phonology, is the equivalent of a nucleus plus a coda. 
- jyutpingstr, optional
- Jyutping romanization of one Cantonese character to search for. If the romanization contains more than one character, a ValueError is raised. 
- characterstr, optional
- One or more Cantonese characters (within a segmented word) to search for. 
- posstr, optional
- A part-of-speech tag to search for. A regex is supported. 
- word_rangetuple[int, int], optional
- Span of words to the left and right of a matching word to include in the output. The default is (0, 0) to disable a range. If sent_range is used, word_range is ignored. 
- utterance_rangeTuple[int, int], optional
- Span of utterances before and after an utterance containing a matching word to include in the output. If set to - (0, 0)(the default), no utterance range output is generated. If utterance_range is used, word_range is ignored.
- sent_rangeTuple[int, int], optional
- [Deprecated; please use utterance_range instead] 
- by_tokensbool, optional
- If - True(the default), words in the output are in the token form (i.e., with Jyutping and part-of-speech tags). Otherwise just words as text strings are returned.
- by_utterancesbool, optional
- If - True(default is False), utterances containing matching words are returned. Otherwise, only matching words are returned.
- taggedbool, optional
- [Deprecated; please use by_tokens instead] 
- sentsbool, optional
- [Deprecated; please use by_utterances instead] 
- participantsstr or iterable[str], optional
- One or more participants to include in the search. If unspecified, all participants are included. 
- excludestr or iterable[str], optional
- One or more participants to exclude in the search. If unspecified, no participants are excluded. 
- by_filesbool, optional
- If True (default: False), return data organized by the individual file paths. 
 
- Returns
- list