PyCantonese Logo
3.1.0
  • Corpus Data
    • The CHAT Transcription Format
    • Accessing Built-in Data
    • Accessing Custom Data
  • Corpus Reader Methods
    • The Representation of “Words”
    • A Note on the Access Methods
    • Full Reader API
  • Corpus Search Queries
    • Searching by a Jyutping Element
    • Searching by a Chinese Character
    • Searching by a Part-of-speech Tag
    • Searching by a Word or Sentence Range
    • Searching by Multiple Criteria
    • Output Format of Search Results
  • Jyutping Romanization
    • Characters-to-Jyutping Conversion
    • Parsing Jyutping Strings
    • Jyutping-to-Yale Conversion
    • Jyutping-to-TIPA Conversion
  • Stop Words
  • Word Segmentation
    • Customizing Segmentation
  • Part-of-Speech Tagging
  • API Reference
    • Corpus Data
      • pycantonese.read_chat
      • pycantonese.hkcancor
      • pycantonese.corpus.CantoneseCHATReader
        • pycantonese.corpus.CantoneseCHATReader.search
      • pycantonese.corpus.CantoneseCHATReader.search
    • Jyutping Romanization
      • pycantonese.characters_to_jyutping
      • pycantonese.parse_jyutping
      • pycantonese.jyutping_to_yale
      • pycantonese.jyutping_to_tipa
    • Natural Language Processing
      • pycantonese.stop_words
      • pycantonese.segment
      • pycantonese.word_segmentation.Segmenter
      • pycantonese.pos_tag
      • pycantonese.pos_tagging.hkcancor_to_ud
  • Changelog
    • [Unreleased]
      • Added
      • Changed
      • Deprecated
      • Removed
      • Fixed
      • Security
    • [3.1.0] - 2021-02-21
      • Added
      • Fixed
    • [3.0.0] - 2020-10-25
      • Added
      • Changed
        • API-breaking Changes
        • Non-API-breaking Changes
      • Deprecated
      • Security
    • [2.4.1] - 2020-10-10
      • Fixed
    • [2.4.0] - 2020-10-10
      • Added
    • [2.3.0] - 2020-07-24
      • Added
      • Removed
    • [2.2.0] - 2018-06-30
      • Added
    • [2.1.0] - 2018-06-11
      • Added
      • Fixed
    • [2.0.0] - 2016-02-06
    • [1.0] - 2015-09-06
    • [1.0dev] - 2015-09-02
    • [0.2.1] - 2015-01-25
    • [0.2] - 2015-01-22
    • [0.1] - 2014-12-17
  • Research Outputs
PyCantonese
  • »
  • API Reference »
  • pycantonese.corpus.CantoneseCHATReader »
  • pycantonese.corpus.CantoneseCHATReader.search

pycantonese.corpus.CantoneseCHATReader.search¶

CantoneseCHATReader.search(*, onset=None, nucleus=None, coda=None, tone=None, initial=None, final=None, jyutping=None, character=None, pos=None, word_range=0, 0, sent_range=0, 0, tagged=True, sents=False, participant=None, exclude=None, by_files=False)[source]¶

Search the data for the given criteria.

For examples, please see https://pycantonese.org/searches.html.

Parameters
onsetstr, optional

Onset to search for. A regex is supported.

nucleusstr, optional

Nucleus to search for. A regex is supported.

codastr, optional

Coda to search for. A regex is supported.

tonestr, optional

Tone to search for. A regex is supported.

initialstr, optional

Initial to search for. A regex is supported. An initial, a term more prevalent in traditional Chinese phonology, is the equivalent of an onset.

finalstr, optional

Final to search for. A final, a term more prevalent in traditional Chinese phonology, is the equivalent of a nucleus plus a coda.

jyutpingstr, optional

Jyutping romanization of one Cantonese character to search for. If the romanization contains more than one character, a ValueError is raised.

characterstr, optional

One or more Cantonese characters (within a segmented word) to search for.

posstr, optional

A part-of-speech tag to search for. A regex is supported.

word_rangetuple[int, int], optional

Span of words to the left and right of a matching word to include in the output. The default is (0, 0) to disable a range. If sent_range is used, word_range is ignored.

sent_rangetuple[int, int], optional

Span of sentences before and after a sentence containing a matching word to include in the output. The default is (0, 0) to disable a range. If sent_range is used, word_range is ignored.

taggedbool, optional

If True (the default), words in the output are in the tagged form. Otherwise just word token strings are returned.

sentsbool, optional

If True (default is False), sentences containing matching words are returned. Otherwise, only matching words are returned.

participantstr or iterable[str], optional

One or more participants to include in the search. If unspecified, all participants are included.

excludestr or iterable[str], optional

One or more participants to exclude in the search. If unspecified, no participants are excluded.

by_filesbool, optional

If True (default: False), return data organized by the individual file paths.

Returns
list
Next Previous

© Copyright 2014-2021, Jackson L. Lee | Documentation last updated on February 22, 2021

Built with Sphinx using a theme provided by Read the Docs.