Unicode Normalization for NLP in Python

Поділитися
Вставка
  • Опубліковано 11 жов 2024

КОМЕНТАРІ • 12

  • @SuperMaker.M
    @SuperMaker.M Рік тому

    That's great bro, clean and simple explanation loved it a lot !

  • @mayankmaurya8631
    @mayankmaurya8631 Рік тому +1

    Thank you very much, you were a great help.

  • @meylyssa3666
    @meylyssa3666 3 роки тому

    Nice explanations, thank you!

  • @dshefman
    @dshefman 3 роки тому

    What method do you use to normalize punctuation? For example, “ vs ". I attempted to use unicode normalization with NFKC, but it didn't normalize these two quotation marks to be equal (==). In addition to quotation marks, there are many other punctuation marks that are nearly equivalent but are not normalized using NFKC. Any recommendations or thoughts about normalizing them?

    • @jamesbriggs
      @jamesbriggs  3 роки тому

      yes there are some stubborn characters unfortunately, I usually convert them manually with regex/.replace

    • @RatafakRatafak
      @RatafakRatafak Рік тому

      @@jamesbriggs Do you generally use use punctuation in NLP-based models/applications or is it strictly use case dependent?

  • @knowit3765
    @knowit3765 3 роки тому

    nice one bro