Lecture 15: Introduction to POS Tagging

Поділитися
Вставка
  • Опубліковано 24 гру 2024

КОМЕНТАРІ • 11

  • @pawanchoure1289
    @pawanchoure1289 2 роки тому +2

    One of the oldest techniques of tagging is rule-based POS tagging. Rule-based taggers use a dictionary or lexicon for getting possible tags for tagging each word. If the word has more than one possible tag, then rule-based taggers use hand-written rules to identify the correct tag.

  • @pawanchoure1289
    @pawanchoure1289 2 роки тому +3

    Stochastic (Probabilistic) tagging: A stochastic approach includes frequency, probability or statistics. The simplest stochastic approach finds out the most frequently used tag for a specific word in the annotated training data and uses this information to tag that word in the unannotated text.

  • @pawanchoure1289
    @pawanchoure1289 2 роки тому +2

    Part-of-speech (POS) tagging is a popular Natural Language Processing process that refers to categorizing words in a text (corpus) in correspondence with a particular part of speech, depending on the definition of the word and its context.

  • @pawanchoure1289
    @pawanchoure1289 2 роки тому +1

    the following types of POS taggers:
    Rule-Based: A dictionary is constructed with possible tags for each word. Rules guide the tagger to disambiguate. Rules are either hand-crafted, learned or both. An example rule might say, "If an ambiguous/unknown word X is preceded by a determiner and followed by a noun, tag it as an adjective."
    Statistical: A text corpus is used to derive useful probabilities. Given a sequence of words, the most probable sequence of tags is selected. These are also called stochastic or probabilistic taggers. Among the common models are n-gram model, Hidden Markov Model (HMM) and Maximum Entropy Model (MEM).
    Memory-Based: A set of cases is stored in memory, each case containing a word, its context and suitable tag. A new sentence is tagged based on best match from cases stored in memory. It's a combination of rule-based and stochastic method.
    Transformation-Based: Rules are automatically induced from data. Thus, it's a combination of rule-based and stochastic methods. Tagging is done using broad rules and then improved or transformed by applying narrower rules.
    Neural Net: RNN and Bidirectional LSTM are two examples of neural network architectures for POS tagging

  • @pawanchoure1289
    @pawanchoure1289 2 роки тому +1

    Lemmatization usually refers to doing things properly with the use of a vocabulary and morphological analysis of words, normally aiming to remove inflectional endings only and to return the base or dictionary form of a word, which is known as the lemma .
    Morphological parsing, in natural language processing, is the process of determining the morphemes from which a given word is constructed. It must be able to distinguish between orthographic rules and morphological rules.

  • @pawanchoure1289
    @pawanchoure1289 2 роки тому +1

    Naïve Bayes has a naive assumption of conditional independence for every feature, which means that the algorithm expects the features to be independent which not always is the case. Logistic regression is a linear classification method that learns the probability of a sample belonging to a certain class.

  • @pawanchoure1289
    @pawanchoure1289 2 роки тому +1

    Discriminative models draw boundaries in the data space, while generative models try to model how data is placed throughout the space. A generative model focuses on explaining how the data was generated, while a discriminative model focuses on predicting the labels of the data.

  • @ananyapamde4514
    @ananyapamde4514 3 роки тому

    I think the POS he means when he says particle is actually participle. Took me some time to figure it out.

    • @jinunayak
      @jinunayak 2 роки тому

      Parts of Speech- POS

  • @pawanchoure1289
    @pawanchoure1289 2 роки тому

    transformation-based learning tag(TBL) in NLP.

  • @tanmaysinha987
    @tanmaysinha987 7 років тому +4

    excellent class sir