Richard Gruss
Richard Gruss
  • 66
  • 63 252
Generative AI - Introduction
Generative AI - Introduction
Переглядів: 666

Відео

AI Innovation Through Collaboration With Startups
Переглядів 586 місяців тому
AI Innovation Through Collaboration With Startups
AI in IT Service
Переглядів 877 місяців тому
AI in IT Service
The Tukey Cramer Procedure
Переглядів 3444 роки тому
The Tukey Cramer Procedure
Levene Test
Переглядів 4 тис.4 роки тому
Levene Test
Neural Networks Meeting
Переглядів 394 роки тому
Neural Networks Meeting
Neural Networks 2: MNIST Classification
Переглядів 1414 роки тому
Neural Networks 2: MNIST Classification
Neural Networks - Introduction
Переглядів 1124 роки тому
Neural Networks - Introduction
Chi square lesson overview
Переглядів 224 роки тому
Chi square lesson overview
Text Classification With Python
Переглядів 33 тис.4 роки тому
Text Classification With Python
Our Little Girl Grows Up
Переглядів 1224 роки тому
Our Little Girl Grows Up
Machine Learning
Переглядів 1544 роки тому
Machine Learning
test
Переглядів 274 роки тому
test
Expert Systems
Переглядів 13 тис.4 роки тому
Expert Systems
MGNT 671 - Python Getting Started
Переглядів 754 роки тому
MGNT 671 - Python Getting Started
MGNT 671 - Meeting 1 Extended
Переглядів 144 роки тому
MGNT 671 - Meeting 1 Extended
introduction to MGNT 671: Artificial Intelligence and Machine Learning for Managers
Переглядів 955 років тому
introduction to MGNT 671: Artificial Intelligence and Machine Learning for Managers
Multiple Linear Regression Problem
Переглядів 325 років тому
Multiple Linear Regression Problem
Chi Square Test
Переглядів 285 років тому
Chi Square Test
MGNT 333 Summer 1-- Course Introduction
Переглядів 815 років тому
MGNT 333 Summer 1 Course Introduction
Syllabus Text Analytics: Inclusive Excellence
Переглядів 365 років тому
Syllabus Text Analytics: Inclusive Excellence
Decision Analysis
Переглядів 1275 років тому
Decision Analysis
Welcome to Text Analytics!
Переглядів 1195 років тому
Welcome to Text Analytics!
MSL4 Problem 5
Переглядів 485 років тому
MSL4 Problem 5
MSL3 5 And 7
Переглядів 925 років тому
MSL3 5 And 7
Visualization Lab
Переглядів 335 років тому
Visualization Lab
Lab 1: Grade What if
Переглядів 555 років тому
Lab 1: Grade What if
MGNT 333 - Course Introduction
Переглядів 785 років тому
MGNT 333 - Course Introduction
Jmp Lab
Переглядів 325 років тому
Jmp Lab
MSL13 Problem 5
Переглядів 186 років тому
MSL13 Problem 5

КОМЕНТАРІ

  • @PANDURANG99
    @PANDURANG99 14 днів тому

    How to do Multi Level Hierarchical Classification

  • @chrisphayao
    @chrisphayao 2 місяці тому

    Thanks for clear explanation - only the wild clicking around is very confusing, maybe consider to go step by step through a code without clicking around, thank you

  • @rajm5349
    @rajm5349 5 місяців тому

    can i get this coding

  • @hotdogsinmytummy
    @hotdogsinmytummy 5 місяців тому

    Interesting🙌🗣️

  • @slushys9919
    @slushys9919 7 місяців тому

    Hi guys, for anyone that's looking for the code you may use the following: import os import random import string from nltk import word_tokenize from collections import defaultdict from nltk import FreqDist from nltk.corpus import stopwords from sklearn.feature_extraction.text import TfidfVectorizer from sklearn.feature_extraction.text import CountVectorizer from sklearn.naive_bayes import MultinomialNB from sklearn import metrics import pickle import csv stop_words = set(stopwords.words('english')) stop_words.add('said') stop_words.add('mr') BASE_DIR = LABELS = ['business', 'entertainment', 'politics', 'sport', 'tech'] # step 1 def create_data_set(): with open('data.csv', 'w', encoding='utf8', newline='') as csvfile: csv_writer = csv.writer(csvfile, delimiter=',') csv_writer.writerow(['Label', 'Filename', 'Text']) # Write header for label in LABELS: dir = '%s/%s' % (BASE_DIR, label) for filename in os.listdir(dir): fullfilename = '%s/%s' % (dir, filename) print(fullfilename) with open(fullfilename, 'rb') as file: text = file.read().decode(errors='replace').replace(' ', '') csv_writer.writerow([label, filename, text]) # [ (label, text), (label, text) ] # step 2 def setup_docs(): docs = [] # (label, text) with open('data.csv', 'r', encoding='utf8') as datafile: for row in datafile: parts = row.split(',') doc = (parts[0], parts[2].strip()) # assuming label is in the first column, and text in the third column docs.append(doc) return docs def get_tokens(text): tokens = word_tokenize(text) tokens = [t for t in tokens if not t in stop_words] return tokens def clean_text(text): text = text.translate(str.maketrans('', '', string.punctuation)) text = text.lower() return text def print_frequency_dist(docs): tokens = defaultdict(list) for doc in docs: doc_label = doc[0] doc_text = clean_text(doc[1]) doc_tokens = get_tokens(doc_text) tokens[doc_label].extend(doc_tokens) # Corrected typo for category_label, category_tokens in tokens.items(): print(category_label) fd = FreqDist(category_tokens) print(fd.most_common(20)) def get_splits(docs): random.shuffle(docs) X_train = [] # training documents y_train = [] # corresponding training labels X_test = [] # test documents y_test = [] # corresponding test labels pivot = int(.80 * len(docs)) for i in range(0, pivot): X_train.append(docs[i][1]) y_train.append(docs[i][0]) for i in range(pivot, len(docs)): X_test.append(docs[i][1]) y_test.append(docs[i][0]) return X_train, X_test, y_train, y_test def evaluate_classifier(title, classifier, vectorizer, X_test, y_test): X_test_tfidf = vectorizer.transform(X_test) y_pred = classifier.predict(X_test_tfidf) precision = metrics.precision_score(y_test, y_pred, average='weighted', zero_division=0) # Use 'weighted' for multiclass recall = metrics.recall_score(y_test, y_pred, average='weighted') # Use 'weighted' for multiclass f1 = metrics.f1_score(y_test, y_pred, average='weighted') # Use 'weighted' for multiclass print("%s\t%f\t%f\t%f " % (title, precision, recall, f1)) def train_classifier(docs): X_train, X_test, y_train, y_test = get_splits(docs) # the object that turns text into vectors/numbers vectorizer = CountVectorizer(stop_words='english', ngram_range=(1, 3), min_df=3, analyzer='word') # creates doc-term matrix dtm = vectorizer.fit_transform(X_train) # train Naive Bayes classifier naive_bayas_classifier = MultinomialNB().fit(dtm, y_train) evaluate_classifier("Naive Bayes\tTRAIN\t", naive_bayas_classifier, vectorizer, X_train, y_train) evaluate_classifier("Naive Bayes\tTEST\t", naive_bayas_classifier, vectorizer, X_test, y_test) # store the classifier clf_filename = 'naive_bayas_classifier.pkl' pickle.dump(naive_bayas_classifier, open(clf_filename, 'wb')) # also store the vectorizer so we can transform new data vec_filename = 'count_vectorizer.pkl' pickle.dump(vectorizer, open(vec_filename, 'wb')) def classify(text): # load classifier clf_filename = nb_clf = pickle.load(open(clf_filename, 'rb')) # vectorize the new text vec_filename = vectorizer = pickle.load(open(vec_filename, 'rb')) # preprocess the text processed_text = clean_text(text) tokens = get_tokens(processed_text) # make prediction pred = nb_clf.predict(vectorizer.transform([processed_text])) print(pred[0]) if __name__ == '__main__': # create_data_set() # docs = setup_docs() # print_frequency_dist(docs) # train_classifier(docs) # deployment in production new_doc = "Google showed off some new camera features on the Pixel 4 today" classify(new_doc) print("Done")

    • @slushys9919
      @slushys9919 7 місяців тому

      Note that the file paths need to be changed and there are steps that need to be followed in the video for the program to work. Remember to download the sample dataset from bbc as well when you're testing the code

    • @slushys9919
      @slushys9919 7 місяців тому

      I've also adjusted the parts of the codes due to some errors popping up but it should still work

  • @slushys9919
    @slushys9919 7 місяців тому

    what are the pkl and txt files for? and are they needed for the code to function?

  • @affectionlifeaffliction
    @affectionlifeaffliction 8 місяців тому

    nothing works in jupyter notebook

  • @andonij
    @andonij 9 місяців тому

    Thank you so much Richard. You make my sunday with this explanation, excellent video.

  • @chrissonntag9
    @chrissonntag9 10 місяців тому

    I cannot find the source code or the data source :-( so not useful for me

  • @giantdutchviking
    @giantdutchviking 11 місяців тому

    Thanks for taking the time to make this vid. Been learning Python for a short while, although I didnt understand all, it gave a good insight what machine learning does. Doesnt sound so "scary" anymore

  • @niteshsneha
    @niteshsneha 11 місяців тому

    Can you please share the github link for source code?

  • @InnocenceVVX
    @InnocenceVVX Рік тому

    This is very cool, thank you

  • @tharindunilakshana1883
    @tharindunilakshana1883 Рік тому

    import pandas as pd import json import numpy as np import csv import os import random import string from nltk import word_tokenize from collections import defaultdict from nltk import FreqDist from nltk.corpus import stopwords from sklearn. feature_extraction.text import TfidfVectorizer from sklearn. feature_extraction.text import CountVectorizer from sklearn.naive_bayes import MultinomialNB from sklearn import metrics import pickle import os.path stop_words = set (stopwords.words('english')) stop_words.add('said') stop_words.add('mr') BASE_DIR='C:/Users/user/Desktop/bbc/News Articles' LABELS = ['business', 'entertainment', 'politics', 'sport', 'tech'] def create_data_set(): with open('data.csv', 'w', encoding='utf8') as outfile: for label in LABELS: dir = '%s/%s' % (BASE_DIR, label) for filename in os. listdir(dir): fullfilename = '%s/%s' % (dir, filename) print(fullfilename) with open(fullfilename, 'rb') as file: text = file.read().decode(errors='replace').replace(' ', '') feilds = [label,filename,text] # creating a csv writer object csvwriter = csv.writer(outfile) # writing the fields csvwriter.writerow(feilds) print(text) #outfile.write('%s\t%s\t%s ' % (label, filename, text)) def setup_docs(): docs = [] # (label, text) with open('data.txt', 'r', encoding='utf8') as datafile: for row in datafile: parts = row.split('\t') doc = (parts[0], parts[2].strip()) docs.append(doc) return docs def clean_text(text): #remove punctuation text = text.translate(str.maketrans('', '', string. punctuation)) #convert to lower case text = text. lower() return text def get_tokens(text): # get individual words. tokens = word_tokenize (text) # remove common words that are useless tokens = [t for t in tokens if not t in stop_words] return tokens def print_frequency_dist(docs): tokens = defaultdict(list) #lets make a giant list of all the words for each category for doc in docs: doc_label = doc[0] doc_text = clean_text(doc[1]) doc_tokens = get_tokens(doc_text) # doc_text = clean_text(doc[1]) # doc_tokens = get_tokens(doc_text) tokens[doc_label].extend(doc_tokens) for category_label, category_tokens in tokens. items(): print (category_label) fd = FreqDist(category_tokens) print(fd.most_common(20)) def get_splits(docs): # scramble docs. random.shuffle(docs) X_train = []#training documents y_train = [] #corresponding training labels X_test=[] #test documents y_test= [] #correspoding test label pivot = int(.80*len(docs)) for i in range(0, pivot): X_train.append(docs[1][1]) y_train.append(docs[i][0]) for i in range(pivot, len(docs)): X_test.append(docs[i][1]) y_test.append(docs[i][0]) return X_train, X_test, y_train, y_test def evaluate_classifier(title, classifier, vectorizer, X_test, y_test): X_test_tfidf = vectorizer.transform(X_test) y_pred = classifier.predict(X_test_tfidf) precision = metrics.precision_score(y_test, y_pred,average='micro') recall = metrics.recall_score(y_test, y_pred,average='micro') f1 = metrics.f1_score(y_test, y_pred,average='micro') print("%s\t%f\t%f\t%f " % (title, precision, recall, f1)) def train_classifier(docs): X_train, X_test, y_train, y_test = get_splits(docs) # the object that turns text into vectors vectorizer = CountVectorizer(stop_words='english', ngram_range=(1, 3), min_df=3, analyzer='word') # create doc-term matrix dtm = vectorizer.fit_transform(X_train) # train Naive Bayes classifier naive_bayes_classifier = MultinomialNB().fit(dtm, y_train) evaluate_classifier("Naive Bayes\tTRAIN\t", naive_bayes_classifier, vectorizer, X_train, y_train) evaluate_classifier("Naive Bayes\tTEST\t", naive_bayes_classifier, vectorizer, X_test, y_test) # store the classifier clf_filename = 'naive_bayes_classifier.pkl' pickle.dump(naive_bayes_classifier, open(clf_filename, 'wb')) # also store the vectorizer so we can transform new data vec_filename = 'count_vectorizer.pkl' pickle.dump(vectorizer, open(vec_filename, 'wb')) #clssify the new content function def classify(text): #load classifier clf_filename='naive_bayes_classifier.pkl' nb_clf = pickle.load(open(clf_filename, 'rb')) #vectorize, the new text vec_filename = 'count_vectorizer.pkl' vectorizer = pickle.load(open (vec_filename, 'rb')) pred = nb_clf.predict (vectorizer.transform([text])) print(pred[0]) #create_data_set() #set up the document #docs = setup_docs() #print the word frequency #print_frequency_dist(docs) #train the classifier #train_classifier(docs) #classify the new content using pkl files new_doc = "Transparency International Sri Lanka (TISL) filed a petition in the Supreme Court yesterday (June 12), seeking to intervene in the ongoing Fundamental Rights case (SC/FR/Application No.168/2021) filed by the Center for Environmental Justice (CEJ) and three more petitioners, highlighting the serious allegations of bribery and corruption surrounding the X-Press Pearl disaster. The intervention petition is filed in the public interest. It refers to serious allegations of irregularity, mishandling, sabotage, bribery and corruption surrounding the claim for compensation arising from the X-Press Pearl disaster. Several key points have been raised in the intervention petition: The grave allegations of interference and extraneous pressure surrounding the claim for compensation arising from the X-Press Pearl disaster. The statement by the Justice Minister in Parliament on April 25, 2023, that one Chamara Gunasekara alias Manjusiri Nissanka had received a payment of USD 250 million into a private bank account in connection with the X-Press Pearl disaster. The media statements of Chinthaka Waragoda, who reportedly invented a machine to remove debris which washed ashore after the shipwreck, alleging that he was offered payment to discontinue the use of his machine, to avoid exposing the full extent of the damage caused by the disaster. Questions surrounding the quantum of compensation due to Sri Lanka for the damages caused by MV X-Press Pearl.The freight ship ‘MV X-Press Pearl’ caught fire off the coast of Colombo on 20th May, 2021. It sank a few days later, releasing its cargo of plastic pellets and tons of toxic chemicals into the ocean, causing Sri Lanka’s worst maritime disaster to date. It is alleged that Sri Lankan authorities obtained the assistance of the International Tanker Owners Pollution Federation Limited (ITOPF), a representative of the insurer of the Shipowner, in the post-disaster activities, despite the grave conflict of interest arising from it. TISL has urged that the private parties involved in the X-Press Pearl incident be held accountable, and be made to pay optimal compensation for the damage and pollution caused to the marine and coastal ecology of Sri Lanka, and the payment of compensation for the loss caused to the fishing communities and those engaged in tourism, as well as obtaining compensation under the Marine Pollution Prevention Act. TISL has also highlighted the need to hold anyone guilty of wrongdoing fully accountable. The petition for intervention is to be mentioned for Support in the Supreme Court on Thursday (June 15)." classify(new_doc) print("Done...!")

  • @anirbanghose8647
    @anirbanghose8647 Рік тому

    loved it. it made a complete sense. thanks.

  • @viane123456
    @viane123456 Рік тому

    I have a text file in which I have many lines i want to classify each lines how can i do it?

  • @mohamedsidhoum6835
    @mohamedsidhoum6835 Рік тому

    thank you for this video , form where i can get the source code please ?

  • @yongxing1848
    @yongxing1848 Рік тому

    where is the code for this

  • @amanichouk8967
    @amanichouk8967 Рік тому

    Thank you so much this is amazing and so structured

  • @master2wia536
    @master2wia536 Рік тому

    can you send me the code

  • @tonyhasago
    @tonyhasago 2 роки тому

    H - great video and it worked!! How would I asses a score the accuracy of the final step, classifying some new text?

    • @agradel100
      @agradel100 Рік тому

      have you found out how to do it?

  • @bongimaposa
    @bongimaposa 2 роки тому

    Wonderful! Thank you so much - detailed and I could follow.

  • @ikennanwankwo7448
    @ikennanwankwo7448 2 роки тому

    Hello thank you for your video but the "create_data_set" function does not work if there are varying multiple files(.txt, .doc, .bin etc) in the subfolder. The Data.txt file output is empty (hence nothing is written to the data.txt file)

    • @ikennanwankwo7448
      @ikennanwankwo7448 2 роки тому

      I just want it to write only the .txt files to the data.txt file(All the .txt files in the subfolders have the same file name if that helps)

  • @PD-qg2fo
    @PD-qg2fo 2 роки тому

    Thank you so much sir. I was looking for this kind of tutorial..

  • @bibichbicha687
    @bibichbicha687 2 роки тому

    Can i get the code ,please sir?

  • @fathersonduo
    @fathersonduo 2 роки тому

    Please make more videos!!

  • @ananthakrishnan4754
    @ananthakrishnan4754 2 роки тому

    Some people talk about p vale some talk about critical value - now I am confused.

  • @Muuip
    @Muuip 2 роки тому

    Great presentation, much appreciated!

  • @Arrato1977
    @Arrato1977 2 роки тому

    Explained super easy ! Thank you !

  • @walidbenaouda8935
    @walidbenaouda8935 2 роки тому

    can i get the code ?

  • @hggaming911
    @hggaming911 2 роки тому

    Awesome simple and clean code. Please can we have link to download the code?

  • @mikiyasassefakassa9136
    @mikiyasassefakassa9136 2 роки тому

    mr i have aquation how you prepare dataset for text news classification in document level

  • @lunabaalbaki3169
    @lunabaalbaki3169 2 роки тому

    Hey, thank you for the tutorial it's very helpful. Can you share your code?

  • @adylmanulat2465
    @adylmanulat2465 2 роки тому

    good day sir, I just wanted to ask if an independent variable is not significant or does not have an explanatory power to the model but when removing it lowers the adjusted r-square what does this imply? so far the reason that i know the reason is because the t-statistic is greater than one. With this information, what can we infer?

  • @naughtychohan7956
    @naughtychohan7956 2 роки тому

    can i get this code?

  • @JiminPark-ld2xx
    @JiminPark-ld2xx 2 роки тому

    Does anyone use excel or csv data to work with text classification. Or should I create these .txt file for each and every row of my data?

  • @learner3585
    @learner3585 2 роки тому

    Very good tutorial with good explanation. I was able to follow along and also able to run the whole program while watching the video. Thanks.

  • @StanleyDenman
    @StanleyDenman 2 роки тому

    Your video seems to be right on point as to what I want to do, but I am very confused. I am confused about the learning model aspect. If I want to just create hard rules for text classification, I do not need the data set training, right?

  • @LeomarOsorio
    @LeomarOsorio 2 роки тому

    Thank you for this tutorial. This is a good walkthrough.

  • @mandysingh4044
    @mandysingh4044 2 роки тому

    Hlo sir, i want to contact you.

  • @pujanrajrai4930
    @pujanrajrai4930 2 роки тому

    thank you very much sir for this lecture this really helped me a lot hoping to see more content on machine learning

  • @timhn4010
    @timhn4010 3 роки тому

    The dataset: www.kaggle.com/pariza/bbc-news-summary

  • @rabbilbhuiyan5666
    @rabbilbhuiyan5666 3 роки тому

    Excellent video and demonstration for text analysis. Thank you very much Sir !

  • @engbahja
    @engbahja 3 роки тому

    many thanks for this. could you please the expert systems list you mention at the end of the video?

  • @lukajozic9768
    @lukajozic9768 3 роки тому

    Nice! There is however a function in the sklearn library called train_test_split I believe, that does exactly what your get_splits function does. Also, it would be helpful if you submitted the code in the description. Good video and great explanations!

  • @angelpascual1516
    @angelpascual1516 3 роки тому

    Can you pls help me to answer/solve this with conclusion. Alternative Capacity for New Store New Bridge Built No New Bride A. 1 14 B. 2 10 C. 4 6 Where A= Small B= Medium C= Large 1.Assume the pay offs represents profits. Determine the alternative that would under minimax approach. 2. Assume the pay offs represents profits. Determine the alternative that would be chosen under maximin approach. 3. Assume the pay offs represents the profits. Determine the alternative that would be chosen under maximax approach. 4. Assume the pay offs represents profits. Determine the alternative that would be chosen under laplace approach.

  • @faisalagarbaa1
    @faisalagarbaa1 3 роки тому

    Where is the Python source code?

  • @PULAMOLUDEEPAKBCE
    @PULAMOLUDEEPAKBCE 3 роки тому

    can you share the code

  • @peterkirka4862
    @peterkirka4862 4 роки тому

    Hello Mr. Gruss. Is it possible to find Your code from this video somewhere?

  • @eduardolopez7323
    @eduardolopez7323 4 роки тому

    Just what i was looking for, thanks man.