NLP Demystified 6: TF-IDF and Simple Document Search

Understanding Word2Vec

NLP Demystified 4: Advanced Preprocessing (part-of-speech tagging, entity tagging, parsing)

«Просив пробачення, що не уберіг Діму» - історія братів Василя Репчука і Дмитра Мурару #shorts

Прочистка шлюзов

вернулись в ПРОШЛОЕ 🔃 | WICSUR #shorts

NLP Demystified 5: Basic Bag-of-Words and Measuring Document Similarity

Future Mojo

Переглядів 12 096

Додати в
- Мій плейлист
- Переглянути пізніше
Поділитися

Поділитися

Вставка

Розмір відео:

Показувати елементи керування програвачем

Автоматичне відтворення

Автоповтор

Опубліковано 7 січ 2025

КОМЕНТАРІ • 13

@futuremojo 2 роки тому ⁺⁴
Timestamps:
00:00:00 Basic bag-of-words (BoW)
00:00:22 The need for vectors
00:00:53 Selecting and extracting features from our data
00:04:04 Idea: similar documents share similar vocabulary
00:04:46 Turning a corpus into a BoW matrix
00:07:10 What vectorization helps us accomplish
00:08:20 Measuring document similarity
00:11:09 Shortcomings of basic BoW
00:12:37 Capturing a bit of context with n-grams
00:14:10 DEMO: creating basic BoW with scikit-learn and spaCy
00:17:47 DEMO: measuring document similarity
00:18:40 DEMO: creating n-grams with scikit-learn
00:19:35 Basic BoW recap
@vipulmaheshwari2321 Рік тому ⁺⁴
I am truly amazed by the excellence of this course. It is undoubtedly the finest NLP course I have come across, and the teaching and explanations provided are unparalleled. I have the utmost respect and admiration for it. Kudos to you, and thank you for such a remarkable learning experience! BOWING DOWN IN RESPECT!
@futuremojo Рік тому ⁺¹
Thank you so much!
@NAEXTRO Рік тому ⁺³
Thanks for this awesome course. :)
@FrankCai-e7r Рік тому
great lectures, I learned a lot of NLP concepts.
@aneshsrivastav8092 Рік тому
You are the best!! This course is soo soo helpful man!!
@frankrobert9199 Рік тому
great lectures.
@zhuchenwang4747 Рік тому
Hi sir, maybe the calculation of dot product at 9:05 in the video is wrong. It should be (6x4)+(6x2)=36. By the way, your videos are very helpful for a beginner. Thank you very much for your effort. Looking forward to seeing more good videos in your channel.
@futuremojo Рік тому
Thank you for the correction!
@metavore7790 Рік тому
If anybody is getting "ValueError: Input vector should be 1-D", in the Cosine Similarity section, the fix is simple. Change where the indices are on toarray(). For example:
bow[0].toarray() is replaced by
bow.toarray()[0]
@futuremojo Рік тому
Thank you! Code updated.
@techaztech2335 Рік тому
I am bit confused about the cosine similarity metric. I thought the cosine similarity range is from -1 to 1, instead of 0 to 1. I've seen 0 to 1 threshold being used elsewhere as well but I do notice more popular embedding models generate -ve vector elements and naturally the normalized versions produce ranges from -1 to 1. Can you please clarify this? Cuz I've been struggling to wrap my head around this.
@IvanKleshnin 3 місяці тому
It's mentioned in the video, a line with asterisk at 10:00 timestamp. Cosine similarity measures as [-1:1] but word frequencies do not produce negative values. Hence vectors can't point in opposite directions, to put it in plain words. So, in the context of the task, the effective range is [0:1]. Frequency-based vectors are not rare, you can easily see both [0:1] and [-1:1] ranges in the wild. Think of the first as the subset of the second.

Наступне

Автоматичне відтворення

NLP Demystified 6: TF-IDF and Simple Document Search

NLP Demystified 6: TF-IDF and Simple Document Search

Understanding Word2Vec

Understanding Word2Vec

NLP Demystified 4: Advanced Preprocessing (part-of-speech tagging, entity tagging, parsing)

NLP Demystified 4: Advanced Preprocessing (part-of-speech tagging, entity tagging, parsing)

«Просив пробачення, що не уберіг Діму» - історія братів Василя Репчука і Дмитра Мурару #shorts

«Просив пробачення, що не уберіг Діму» — історія братів Василя Репчука і Дмитра Мурару #shorts

Прочистка шлюзов

Прочистка шлюзов

вернулись в ПРОШЛОЕ 🔃 | WICSUR #shorts

вернулись в ПРОШЛОЕ 🔃 | WICSUR #shorts

🤔Можно ли спастись от Ядерки в Холодильнике ? #shorts

🤔Можно ли спастись от Ядерки в Холодильнике ? #shorts

Feature Extraction from Text (USING PYTHON)

Feature Extraction from Text (USING PYTHON)

Word2Vec Simplified|Word2Vec explained in simple language|CBOW and Skipgrm methods in word2vec

Word2Vec Simplified|Word2Vec explained in simple language|CBOW and Skipgrm methods in word2vec

Calculating Text Similarity in Python with NLP

Calculating Text Similarity in Python with NLP

NATURAL LANGUAGE PROCESSING With Python | Theory & Hands-On Exercise

NATURAL LANGUAGE PROCESSING With Python | Theory & Hands-On Exercise

NLP Demystified 3: Basic Preprocessing (case-folding, stop words, stemming, lemmatization)

NLP Demystified 3: Basic Preprocessing (case-folding, stop words, stemming, lemmatization)

Natural Language Processing - in 30 minutes | NLP Full Course

Natural Language Processing - in 30 minutes | NLP Full Course

NLP Demystified 9: Automatically Finding Topics in Documents with Latent Dirichlet Allocation

NLP Demystified 9: Automatically Finding Topics in Documents with Latent Dirichlet Allocation

Text Representation Using Bag Of Words (BOW): NLP Tutorial For Beginners - S2 E3

Text Representation Using Bag Of Words (BOW): NLP Tutorial For Beginners - S2 E3

Search Like You Mean It: Semantic Search with NLP and a Vector Database

Search Like You Mean It: Semantic Search with NLP and a Vector Database

СИНИЙ ИНЕЙ УЖЕ ВЫШЕЛ!❄️

СИНИЙ ИНЕЙ УЖЕ ВЫШЕЛ!❄️

ふわふわシフォン大作戦🩷スイーツ戦隊のキラキラミッション✨【銀座コージーコーナー】 #shorts #シフォンケーキ #クリスマスケーキ #クリスマス #ケーキ #チョコケーキ #christmas

ふわふわシフォン大作戦🩷スイーツ戦隊のキラキラミッション✨【銀座コージーコーナー】 #shorts #シフォンケーキ #クリスマスケーキ #クリスマス #ケーキ #チョコケーキ #christmas

Рабочий способ бросить вредную привычку

Рабочий способ бросить вредную привычку

Удержаться на воде?? 🌊 #симбочкапимпочка #симбочка #симба

Удержаться на воде?? 🌊 #симбочкапимпочка #симбочка #симба

The Security Guard Fell Into The Trap Of The Beauty #still #parkour #funny#skate

The Security Guard Fell Into The Trap Of The Beauty #still #parkour #funny#skate

ДИЗЕЛЬ ШОУ 2024 🇺🇦 ❄️ ЗИМОВА ПРЕМ'ЄРА ❄️ 🇺🇦 ВИПУСК 154 на підтримку ЗСУ ⭐ Гумор ICTV від 13.12.2024

ДИЗЕЛЬ ШОУ 2024 🇺🇦 ❄️ ЗИМОВА ПРЕМ'ЄРА ❄️ 🇺🇦 ВИПУСК 154 на підтримку ЗСУ ⭐ Гумор ICTV від 13.12.2024

НА ЦЕ можна дивитись ВІЧНО! Такої ПАЛКОЇ зустрічі НІХТО НЕ ЧЕКАВ

НА ЦЕ можна дивитись ВІЧНО! Такої ПАЛКОЇ зустрічі НІХТО НЕ ЧЕКАВ