PyCantonese Logo
3.4.0
  • Quickstart
  • Corpus Data
    • CHAT Format
    • Built-in Data
    • CHILDES and TalkBank Data
    • Custom Data
  • Corpus Reader Methods
    • Headers
    • Transcriptions and Annotations
      • Jyutping Romanization
      • Chinese Characters
    • Word Frequencies and Ngrams
  • Corpus Search Queries
    • Searching by a Jyutping Element
    • Searching by a Chinese Character
    • Searching by a Part-of-speech Tag
    • Searching by a Word or Utterance Range
    • Searching by Multiple Criteria
    • Output Format of Search Results
    • Complex Searches
  • Parsing Cantonese Text
    • Input 1: A Plain String
    • Input 2: A List of Strings
    • Input 3: A List of Tuples of Strings
    • Customizing Word Segmentation
    • Customizing Part-of-Speech Tagging
    • Outputting CHAT Data
    • More Customization
  • Jyutping Romanization
    • Characters-to-Jyutping Conversion
    • Parsing Jyutping Strings
    • Jyutping-to-Yale Conversion
    • Jyutping-to-TIPA Conversion
  • Stop Words
  • Word Segmentation
    • Customizing Segmentation
  • Part-of-Speech Tagging
  • API Reference
    • Corpus Data
      • pycantonese.read_chat
      • pycantonese.hkcancor
      • pycantonese.CHATReader
        • pycantonese.CHATReader.search
      • pycantonese.CHATReader.search
    • Jyutping Romanization
      • pycantonese.characters_to_jyutping
      • pycantonese.parse_jyutping
      • pycantonese.jyutping_to_yale
      • pycantonese.jyutping_to_tipa
    • Natural Language Processing
      • pycantonese.stop_words
      • pycantonese.parse_text
      • pycantonese.segment
      • pycantonese.word_segmentation.Segmenter
      • pycantonese.pos_tag
      • pycantonese.pos_tagging.hkcancor_to_ud
    • CHATReader
    • Token
    • Jyutping
  • Changelog
    • [Unreleased]
      • Added
      • Changed
      • Deprecated
      • Removed
      • Fixed
      • Security
    • [3.4.0] - 2021-12-28
      • Added
      • Changed
      • Removed
      • Security
    • [3.3.1] - 2021-05-14
      • Fixed
    • [3.3.0] - 2021-05-14
      • Changed
      • Fixed
    • [3.2.4] - 2021-05-07
      • Fixed
    • [3.2.3] - 2021-04-12
      • Fixed
    • [3.2.2] - 2021-03-23
      • Fixed
    • [3.2.1] - 2021-03-21
      • Fixed
    • [3.2.0] - 2021-03-20
      • Added
      • Changed
      • Deprecated
      • Fixed
    • [3.1.1] - 2021-03-18
      • Fixed
    • [3.1.0] - 2021-02-21
      • Added
      • Fixed
    • [3.0.0] - 2020-10-25
      • Added
      • Changed
        • API-breaking Changes
        • Non-API-breaking Changes
      • Deprecated
      • Security
    • [2.4.1] - 2020-10-10
      • Fixed
    • [2.4.0] - 2020-10-10
      • Added
    • [2.3.0] - 2020-07-24
      • Added
      • Removed
    • [2.2.0] - 2018-06-30
      • Added
    • [2.1.0] - 2018-06-11
      • Added
      • Fixed
    • [2.0.0] - 2016-02-06
    • [1.0] - 2015-09-06
    • [1.0dev] - 2015-09-02
    • [0.2.1] - 2015-01-25
    • [0.2] - 2015-01-22
    • [0.1] - 2014-12-17
  • Archives
    • Tutorials
    • Research Outputs
PyCantonese
  • »
  • API Reference »
  • pycantonese.CHATReader »
  • pycantonese.CHATReader.search

pycantonese.CHATReader.search

CHATReader.search(*, onset=None, nucleus=None, coda=None, tone=None, initial=None, final=None, jyutping=None, character=None, pos=None, word_range=(0, 0), utterance_range=(0, 0), sent_range=(0, 0), by_tokens=True, by_utterances=False, tagged=None, sents=None, participants=None, exclude=None, by_files=False)[source]

Search the data for the given criteria.

For examples, please see https://pycantonese.org/searches.html.

Parameters
onsetstr, optional

Onset to search for. A regex is supported.

nucleusstr, optional

Nucleus to search for. A regex is supported.

codastr, optional

Coda to search for. A regex is supported.

tonestr, optional

Tone to search for. A regex is supported.

initialstr, optional

Initial to search for. A regex is supported. An initial, a term more prevalent in traditional Chinese phonology, is the equivalent of an onset.

finalstr, optional

Final to search for. A final, a term more prevalent in traditional Chinese phonology, is the equivalent of a nucleus plus a coda.

jyutpingstr, optional

Jyutping romanization of one Cantonese character to search for. If the romanization contains more than one character, a ValueError is raised.

characterstr, optional

One or more Cantonese characters (within a segmented word) to search for.

posstr, optional

A part-of-speech tag to search for. A regex is supported.

word_rangetuple[int, int], optional

Span of words to the left and right of a matching word to include in the output. The default is (0, 0) to disable a range. If sent_range is used, word_range is ignored.

utterance_rangeTuple[int, int], optional

Span of utterances before and after an utterance containing a matching word to include in the output. If set to (0, 0) (the default), no utterance range output is generated. If utterance_range is used, word_range is ignored.

sent_rangeTuple[int, int], optional

[Deprecated; please use utterance_range instead]

by_tokensbool, optional

If True (the default), words in the output are in the token form (i.e., with Jyutping and part-of-speech tags). Otherwise just words as text strings are returned.

by_utterancesbool, optional

If True (default is False), utterances containing matching words are returned. Otherwise, only matching words are returned.

taggedbool, optional

[Deprecated; please use by_tokens instead]

sentsbool, optional

[Deprecated; please use by_utterances instead]

participantsstr or iterable[str], optional

One or more participants to include in the search. If unspecified, all participants are included.

excludestr or iterable[str], optional

One or more participants to exclude in the search. If unspecified, no participants are excluded.

by_filesbool, optional

If True (default: False), return data organized by the individual file paths.

Returns
list
Previous Next

© Copyright 2014-2022, Jackson L. Lee | Documentation last updated on June 06, 2022.

Built with Sphinx using a theme provided by Read the Docs.