66
63 252

AI Innovation Through Collaboration With Startups

20:44

AI in IT Service

19:38

The Tukey Cramer Procedure

14:41

Levene Test

17:09

Neural Networks Meeting

35:04

Neural Networks 2: MNIST Classification

26:36

Generative AI - Introduction

Generative AI - Introduction

Відео

AI Innovation Through Collaboration With Startups

20:44

AI Innovation Through Collaboration With Startups

Переглядів 586 місяців тому

AI Innovation Through Collaboration With Startups

19:38

AI in IT Service

Переглядів 877 місяців тому

AI in IT Service

14:41

The Tukey Cramer Procedure

Переглядів 3444 роки тому

The Tukey Cramer Procedure

17:09

Levene Test

Переглядів 4 тис.4 роки тому

Levene Test

35:04

Neural Networks Meeting

Переглядів 394 роки тому

Neural Networks Meeting

26:36

Neural Networks 2: MNIST Classification

Переглядів 1414 роки тому

Neural Networks 2: MNIST Classification

41:48

Neural Networks - Introduction

Переглядів 1124 роки тому

Neural Networks - Introduction

13:12

Chi square lesson overview

Переглядів 224 роки тому

Chi square lesson overview

38:47

Text Classification With Python

Переглядів 33 тис.4 роки тому

Text Classification With Python

13:03

Our Little Girl Grows Up

Переглядів 1224 роки тому

Our Little Girl Grows Up

1:03:14

Machine Learning

Переглядів 1544 роки тому

Machine Learning

0:12

test

Переглядів 274 роки тому

test

36:56

Expert Systems

Переглядів 13 тис.4 роки тому

Expert Systems

11:20

MGNT 671 - Python Getting Started

Переглядів 754 роки тому

MGNT 671 - Python Getting Started

26:59

MGNT 671 - Meeting 1 Extended

Переглядів 144 роки тому

MGNT 671 - Meeting 1 Extended

introduction to MGNT 671: Artificial Intelligence and Machine Learning for Managers

26:30

introduction to MGNT 671: Artificial Intelligence and Machine Learning for Managers

Переглядів 955 років тому

introduction to MGNT 671: Artificial Intelligence and Machine Learning for Managers

4:30

Multiple Linear Regression Problem

Переглядів 325 років тому

Multiple Linear Regression Problem

10:35

Chi Square Test

Переглядів 285 років тому

Chi Square Test

22:59

MGNT 333 Summer 1-- Course Introduction

Переглядів 815 років тому

MGNT 333 Summer 1 Course Introduction

Syllabus Text Analytics: Inclusive Excellence

25:51

Syllabus Text Analytics: Inclusive Excellence

Переглядів 365 років тому

Syllabus Text Analytics: Inclusive Excellence

24:37

Decision Analysis

Переглядів 1275 років тому

Decision Analysis

28:52

Welcome to Text Analytics!

Переглядів 1195 років тому

Welcome to Text Analytics!

11:07

MSL4 Problem 5

Переглядів 485 років тому

MSL4 Problem 5

29:33

MSL3 5 And 7

Переглядів 925 років тому

MSL3 5 And 7

22:29

Visualization Lab

Переглядів 335 років тому

Visualization Lab

19:13

Lab 1: Grade What if

Переглядів 555 років тому

Lab 1: Grade What if

27:59

MGNT 333 - Course Introduction

Переглядів 785 років тому

MGNT 333 - Course Introduction

21:22

Jmp Lab

Переглядів 325 років тому

Jmp Lab

15:16

MSL13 Problem 5

Переглядів 186 років тому

MSL13 Problem 5

КОМЕНТАРІ

@PANDURANG99 14 днів тому
How to do Multi Level Hierarchical Classification
@chrisphayao 2 місяці тому
Thanks for clear explanation - only the wild clicking around is very confusing, maybe consider to go step by step through a code without clicking around, thank you
@rajm5349 5 місяців тому
can i get this coding
@hotdogsinmytummy 5 місяців тому
Interesting🙌🗣️
@slushys9919 7 місяців тому
Hi guys, for anyone that's looking for the code you may use the following: import os import random import string from nltk import word_tokenize from collections import defaultdict from nltk import FreqDist from nltk.corpus import stopwords from sklearn.feature_extraction.text import TfidfVectorizer from sklearn.feature_extraction.text import CountVectorizer from sklearn.naive_bayes import MultinomialNB from sklearn import metrics import pickle import csv stop_words = set(stopwords.words('english')) stop_words.add('said') stop_words.add('mr') BASE_DIR = LABELS = ['business', 'entertainment', 'politics', 'sport', 'tech'] # step 1 def create_data_set(): with open('data.csv', 'w', encoding='utf8', newline='') as csvfile: csv_writer = csv.writer(csvfile, delimiter=',') csv_writer.writerow(['Label', 'Filename', 'Text']) # Write header for label in LABELS: dir = '%s/%s' % (BASE_DIR, label) for filename in os.listdir(dir): fullfilename = '%s/%s' % (dir, filename) print(fullfilename) with open(fullfilename, 'rb') as file: text = file.read().decode(errors='replace').replace(' ', '') csv_writer.writerow([label, filename, text]) # [ (label, text), (label, text) ] # step 2 def setup_docs(): docs = [] # (label, text) with open('data.csv', 'r', encoding='utf8') as datafile: for row in datafile: parts = row.split(',') doc = (parts[0], parts[2].strip()) # assuming label is in the first column, and text in the third column docs.append(doc) return docs def get_tokens(text): tokens = word_tokenize(text) tokens = [t for t in tokens if not t in stop_words] return tokens def clean_text(text): text = text.translate(str.maketrans('', '', string.punctuation)) text = text.lower() return text def print_frequency_dist(docs): tokens = defaultdict(list) for doc in docs: doc_label = doc[0] doc_text = clean_text(doc[1]) doc_tokens = get_tokens(doc_text) tokens[doc_label].extend(doc_tokens) # Corrected typo for category_label, category_tokens in tokens.items(): print(category_label) fd = FreqDist(category_tokens) print(fd.most_common(20)) def get_splits(docs): random.shuffle(docs) X_train = [] # training documents y_train = [] # corresponding training labels X_test = [] # test documents y_test = [] # corresponding test labels pivot = int(.80 * len(docs)) for i in range(0, pivot): X_train.append(docs[i][1]) y_train.append(docs[i][0]) for i in range(pivot, len(docs)): X_test.append(docs[i][1]) y_test.append(docs[i][0]) return X_train, X_test, y_train, y_test def evaluate_classifier(title, classifier, vectorizer, X_test, y_test): X_test_tfidf = vectorizer.transform(X_test) y_pred = classifier.predict(X_test_tfidf) precision = metrics.precision_score(y_test, y_pred, average='weighted', zero_division=0) # Use 'weighted' for multiclass recall = metrics.recall_score(y_test, y_pred, average='weighted') # Use 'weighted' for multiclass f1 = metrics.f1_score(y_test, y_pred, average='weighted') # Use 'weighted' for multiclass print("%s\t%f\t%f\t%f " % (title, precision, recall, f1)) def train_classifier(docs): X_train, X_test, y_train, y_test = get_splits(docs) # the object that turns text into vectors/numbers vectorizer = CountVectorizer(stop_words='english', ngram_range=(1, 3), min_df=3, analyzer='word') # creates doc-term matrix dtm = vectorizer.fit_transform(X_train) # train Naive Bayes classifier naive_bayas_classifier = MultinomialNB().fit(dtm, y_train) evaluate_classifier("Naive Bayes\tTRAIN\t", naive_bayas_classifier, vectorizer, X_train, y_train) evaluate_classifier("Naive Bayes\tTEST\t", naive_bayas_classifier, vectorizer, X_test, y_test) # store the classifier clf_filename = 'naive_bayas_classifier.pkl' pickle.dump(naive_bayas_classifier, open(clf_filename, 'wb')) # also store the vectorizer so we can transform new data vec_filename = 'count_vectorizer.pkl' pickle.dump(vectorizer, open(vec_filename, 'wb')) def classify(text): # load classifier clf_filename = nb_clf = pickle.load(open(clf_filename, 'rb')) # vectorize the new text vec_filename = vectorizer = pickle.load(open(vec_filename, 'rb')) # preprocess the text processed_text = clean_text(text) tokens = get_tokens(processed_text) # make prediction pred = nb_clf.predict(vectorizer.transform([processed_text])) print(pred[0]) if __name__ == '__main__': # create_data_set() # docs = setup_docs() # print_frequency_dist(docs) # train_classifier(docs) # deployment in production new_doc = "Google showed off some new camera features on the Pixel 4 today" classify(new_doc) print("Done")
@slushys9919 7 місяців тому
Note that the file paths need to be changed and there are steps that need to be followed in the video for the program to work. Remember to download the sample dataset from bbc as well when you're testing the code
@slushys9919 7 місяців тому
I've also adjusted the parts of the codes due to some errors popping up but it should still work
@slushys9919 7 місяців тому
what are the pkl and txt files for? and are they needed for the code to function?
@affectionlifeaffliction 8 місяців тому
nothing works in jupyter notebook
@andonij 9 місяців тому
Thank you so much Richard. You make my sunday with this explanation, excellent video.
@chrissonntag9 10 місяців тому
I cannot find the source code or the data source :-( so not useful for me
@giantdutchviking 11 місяців тому
Thanks for taking the time to make this vid. Been learning Python for a short while, although I didnt understand all, it gave a good insight what machine learning does. Doesnt sound so "scary" anymore
@niteshsneha 11 місяців тому
Can you please share the github link for source code?
@InnocenceVVX Рік тому
This is very cool, thank you
@tharindunilakshana1883 Рік тому
import pandas as pd import json import numpy as np import csv import os import random import string from nltk import word_tokenize from collections import defaultdict from nltk import FreqDist from nltk.corpus import stopwords from sklearn. feature_extraction.text import TfidfVectorizer from sklearn. feature_extraction.text import CountVectorizer from sklearn.naive_bayes import MultinomialNB from sklearn import metrics import pickle import os.path stop_words = set (stopwords.words('english')) stop_words.add('said') stop_words.add('mr') BASE_DIR='C:/Users/user/Desktop/bbc/News Articles' LABELS = ['business', 'entertainment', 'politics', 'sport', 'tech'] def create_data_set(): with open('data.csv', 'w', encoding='utf8') as outfile: for label in LABELS: dir = '%s/%s' % (BASE_DIR, label) for filename in os. listdir(dir): fullfilename = '%s/%s' % (dir, filename) print(fullfilename) with open(fullfilename, 'rb') as file: text = file.read().decode(errors='replace').replace(' ', '') feilds = [label,filename,text] # creating a csv writer object csvwriter = csv.writer(outfile) # writing the fields csvwriter.writerow(feilds) print(text) #outfile.write('%s\t%s\t%s ' % (label, filename, text)) def setup_docs(): docs = [] # (label, text) with open('data.txt', 'r', encoding='utf8') as datafile: for row in datafile: parts = row.split('\t') doc = (parts[0], parts[2].strip()) docs.append(doc) return docs def clean_text(text): #remove punctuation text = text.translate(str.maketrans('', '', string. punctuation)) #convert to lower case text = text. lower() return text def get_tokens(text): # get individual words. tokens = word_tokenize (text) # remove common words that are useless tokens = [t for t in tokens if not t in stop_words] return tokens def print_frequency_dist(docs): tokens = defaultdict(list) #lets make a giant list of all the words for each category for doc in docs: doc_label = doc[0] doc_text = clean_text(doc[1]) doc_tokens = get_tokens(doc_text) # doc_text = clean_text(doc[1]) # doc_tokens = get_tokens(doc_text) tokens[doc_label].extend(doc_tokens) for category_label, category_tokens in tokens. items(): print (category_label) fd = FreqDist(category_tokens) print(fd.most_common(20)) def get_splits(docs): # scramble docs. random.shuffle(docs) X_train = []#training documents y_train = [] #corresponding training labels X_test=[] #test documents y_test= [] #correspoding test label pivot = int(.80*len(docs)) for i in range(0, pivot): X_train.append(docs[1][1]) y_train.append(docs[i][0]) for i in range(pivot, len(docs)): X_test.append(docs[i][1]) y_test.append(docs[i][0]) return X_train, X_test, y_train, y_test def evaluate_classifier(title, classifier, vectorizer, X_test, y_test): X_test_tfidf = vectorizer.transform(X_test) y_pred = classifier.predict(X_test_tfidf) precision = metrics.precision_score(y_test, y_pred,average='micro') recall = metrics.recall_score(y_test, y_pred,average='micro') f1 = metrics.f1_score(y_test, y_pred,average='micro') print("%s\t%f\t%f\t%f " % (title, precision, recall, f1)) def train_classifier(docs): X_train, X_test, y_train, y_test = get_splits(docs) # the object that turns text into vectors vectorizer = CountVectorizer(stop_words='english', ngram_range=(1, 3), min_df=3, analyzer='word') # create doc-term matrix dtm = vectorizer.fit_transform(X_train) # train Naive Bayes classifier naive_bayes_classifier = MultinomialNB().fit(dtm, y_train) evaluate_classifier("Naive Bayes\tTRAIN\t", naive_bayes_classifier, vectorizer, X_train, y_train) evaluate_classifier("Naive Bayes\tTEST\t", naive_bayes_classifier, vectorizer, X_test, y_test) # store the classifier clf_filename = 'naive_bayes_classifier.pkl' pickle.dump(naive_bayes_classifier, open(clf_filename, 'wb')) # also store the vectorizer so we can transform new data vec_filename = 'count_vectorizer.pkl' pickle.dump(vectorizer, open(vec_filename, 'wb')) #clssify the new content function def classify(text): #load classifier clf_filename='naive_bayes_classifier.pkl' nb_clf = pickle.load(open(clf_filename, 'rb')) #vectorize, the new text vec_filename = 'count_vectorizer.pkl' vectorizer = pickle.load(open (vec_filename, 'rb')) pred = nb_clf.predict (vectorizer.transform([text])) print(pred[0]) #create_data_set() #set up the document #docs = setup_docs() #print the word frequency #print_frequency_dist(docs) #train the classifier #train_classifier(docs) #classify the new content using pkl files new_doc = "Transparency International Sri Lanka (TISL) filed a petition in the Supreme Court yesterday (June 12), seeking to intervene in the ongoing Fundamental Rights case (SC/FR/Application No.168/2021) filed by the Center for Environmental Justice (CEJ) and three more petitioners, highlighting the serious allegations of bribery and corruption surrounding the X-Press Pearl disaster. The intervention petition is filed in the public interest. It refers to serious allegations of irregularity, mishandling, sabotage, bribery and corruption surrounding the claim for compensation arising from the X-Press Pearl disaster. Several key points have been raised in the intervention petition: The grave allegations of interference and extraneous pressure surrounding the claim for compensation arising from the X-Press Pearl disaster. The statement by the Justice Minister in Parliament on April 25, 2023, that one Chamara Gunasekara alias Manjusiri Nissanka had received a payment of USD 250 million into a private bank account in connection with the X-Press Pearl disaster. The media statements of Chinthaka Waragoda, who reportedly invented a machine to remove debris which washed ashore after the shipwreck, alleging that he was offered payment to discontinue the use of his machine, to avoid exposing the full extent of the damage caused by the disaster. Questions surrounding the quantum of compensation due to Sri Lanka for the damages caused by MV X-Press Pearl.The freight ship ‘MV X-Press Pearl’ caught fire off the coast of Colombo on 20th May, 2021. It sank a few days later, releasing its cargo of plastic pellets and tons of toxic chemicals into the ocean, causing Sri Lanka’s worst maritime disaster to date. It is alleged that Sri Lankan authorities obtained the assistance of the International Tanker Owners Pollution Federation Limited (ITOPF), a representative of the insurer of the Shipowner, in the post-disaster activities, despite the grave conflict of interest arising from it. TISL has urged that the private parties involved in the X-Press Pearl incident be held accountable, and be made to pay optimal compensation for the damage and pollution caused to the marine and coastal ecology of Sri Lanka, and the payment of compensation for the loss caused to the fishing communities and those engaged in tourism, as well as obtaining compensation under the Marine Pollution Prevention Act. TISL has also highlighted the need to hold anyone guilty of wrongdoing fully accountable. The petition for intervention is to be mentioned for Support in the Supreme Court on Thursday (June 15)." classify(new_doc) print("Done...!")
@anirbanghose8647 Рік тому
loved it. it made a complete sense. thanks.
@viane123456 Рік тому
I have a text file in which I have many lines i want to classify each lines how can i do it?
@anirbanghose4512 Рік тому
Rewatch the video.
@mohamedsidhoum6835 Рік тому
thank you for this video , form where i can get the source code please ?
@yongxing1848 Рік тому
where is the code for this
@amanichouk8967 Рік тому
Thank you so much this is amazing and so structured
@master2wia536 Рік тому
can you send me the code
@tonyhasago 2 роки тому
H - great video and it worked!! How would I asses a score the accuracy of the final step, classifying some new text?
@agradel100 Рік тому
have you found out how to do it?
@bongimaposa 2 роки тому
Wonderful! Thank you so much - detailed and I could follow.
@ikennanwankwo7448 2 роки тому
Hello thank you for your video but the "create_data_set" function does not work if there are varying multiple files(.txt, .doc, .bin etc) in the subfolder. The Data.txt file output is empty (hence nothing is written to the data.txt file)
@ikennanwankwo7448 2 роки тому
I just want it to write only the .txt files to the data.txt file(All the .txt files in the subfolders have the same file name if that helps)
@PD-qg2fo 2 роки тому
Thank you so much sir. I was looking for this kind of tutorial..
@bibichbicha687 2 роки тому
Can i get the code ,please sir?
@fathersonduo 2 роки тому
Please make more videos!!
@ananthakrishnan4754 2 роки тому
Some people talk about p vale some talk about critical value - now I am confused.
@Muuip 2 роки тому
Great presentation, much appreciated!
@Arrato1977 2 роки тому
Explained super easy ! Thank you !
@walidbenaouda8935 2 роки тому
can i get the code ?
@hggaming911 2 роки тому
Awesome simple and clean code. Please can we have link to download the code?
@mikiyasassefakassa9136 2 роки тому
mr i have aquation how you prepare dataset for text news classification in document level
@lunabaalbaki3169 2 роки тому
Hey, thank you for the tutorial it's very helpful. Can you share your code?
@adylmanulat2465 2 роки тому
good day sir, I just wanted to ask if an independent variable is not significant or does not have an explanatory power to the model but when removing it lowers the adjusted r-square what does this imply? so far the reason that i know the reason is because the t-statistic is greater than one. With this information, what can we infer?
@naughtychohan7956 2 роки тому
can i get this code?
@JiminPark-ld2xx 2 роки тому
Does anyone use excel or csv data to work with text classification. Or should I create these .txt file for each and every row of my data?
@learner3585 2 роки тому
Very good tutorial with good explanation. I was able to follow along and also able to run the whole program while watching the video. Thanks.
@doloresdizon7685 2 роки тому
I am having trouble, Can you please help me :(
@StanleyDenman 2 роки тому
Your video seems to be right on point as to what I want to do, but I am very confused. I am confused about the learning model aspect. If I want to just create hard rules for text classification, I do not need the data set training, right?
@LeomarOsorio 2 роки тому
Thank you for this tutorial. This is a good walkthrough.
@mandysingh4044 2 роки тому
Hlo sir, i want to contact you.
@pujanrajrai4930 2 роки тому
thank you very much sir for this lecture this really helped me a lot hoping to see more content on machine learning
@timhn4010 3 роки тому
The dataset: www.kaggle.com/pariza/bbc-news-summary
@rabbilbhuiyan5666 3 роки тому
Excellent video and demonstration for text analysis. Thank you very much Sir !
@engbahja 3 роки тому
many thanks for this. could you please the expert systems list you mention at the end of the video?
@lukajozic9768 3 роки тому
Nice! There is however a function in the sklearn library called train_test_split I believe, that does exactly what your get_splits function does. Also, it would be helpful if you submitted the code in the description. Good video and great explanations!
@angelpascual1516 3 роки тому
Can you pls help me to answer/solve this with conclusion. Alternative Capacity for New Store New Bridge Built No New Bride A. 1 14 B. 2 10 C. 4 6 Where A= Small B= Medium C= Large 1.Assume the pay offs represents profits. Determine the alternative that would under minimax approach. 2. Assume the pay offs represents profits. Determine the alternative that would be chosen under maximin approach. 3. Assume the pay offs represents the profits. Determine the alternative that would be chosen under maximax approach. 4. Assume the pay offs represents profits. Determine the alternative that would be chosen under laplace approach.
@faisalagarbaa1 3 роки тому
Where is the Python source code?
@PULAMOLUDEEPAKBCE 3 роки тому
can you share the code
@peterkirka4862 4 роки тому
Hello Mr. Gruss. Is it possible to find Your code from this video somewhere?
@eduardolopez7323 4 роки тому
Just what i was looking for, thanks man.

Richard Gruss

КОМЕНТАРІ