Live Day 1-Live Session On EDA And Feature Engineering- Zomato Dataset

Поділитися
Вставка
  • Опубліковано 22 сер 2024
  • Join the community session courses.ineuro... . Here All the materials will be uploaded.
    Download The Dataset: github.com/kri...
    The Oneneuron Lifetime subscription has been extended.
    In Oneneuron platform you will be able to get 100+ courses(Monthly atleast 20 courses will be added based on your demand)
    Features of the course
    1. You can raise any course demand.(Fulfilled within 45-60 days)
    2. You can access innovation lab from ineuron.
    3. You can use our incubation based on your ideas
    4. Live session coming soon(Mostly till Feb)
    Use Coupon code KRISH10 for addition 10% discount.
    And Many More.....
    Enroll Now
    OneNeuron Link: one-neuron.ine...
    Direct call to our Team incase of any queries
    8788503778
    6260726925
    9538303385
    866003424

КОМЕНТАРІ • 192

  • @anandmohite
    @anandmohite 2 роки тому +46

    so just to clarify, UTF8 and Latin1 encoded means
    UTF8 is used for electronic communication, like the data we process we convert this alpha numeric data into machine language using UTF8 encoding .. and we all know at granular level the machine language is in 0s and 1s, so we need some sort of encoding to convert the incoming English language alphabets into machine language for a machine to process the data ...
    but when the incoming language has some characters which are not defined in English language then machine will not able to convert it into 0s and 1s cos for it its a foreign word/character not listed in given reference directory which is UTF8 in this case, so we need to provide appropriate reference so in case of incoming data has Japanese character then you can use JIS encoding, in case of incoming data has Latin characters then latin-1 etc ...

  • @prashindu
    @prashindu 6 місяців тому +4

    Enjoyed going through this EDA session. Sharing my new learnings from my first EDA session as my thanksgiving.
    1) Learnt about encoding errors.
    2) Learnt about matplotlib figure sizing use rcParams
    3) Learnt about why hue did not work in Seaborn as intended.
    4) Learnt to understand the difference between value_counts and groupby().size
    5) Loved to see how reset_index was being used time and again.
    58:08
    We use hue when we wish to differentiate the categories. But remember the colours of the bar graph were already in different colours. Why was that? In Seaborn library, the default palette is to show a mix of colours for each bar, which we subsequently changed to the colours we wanted. So the hue parameter did not do anything additional in this context. But the hue parameter did give us a legend.
    1:18:45
    We used groupby().size plenty of times. I was wondering when to use value_counts and when to use groupby().size
    df[condition] returns a series ==> Here we can use value_counts(). We cant use groupby when it is a series.
    df[df[condition]] returns a dataframe ==> We can use groupby().size here.
    Also size gives total count of all elements. If used with a groupby, it gives count of all elements for each group, including null values. But if we use value_counts, we get the element-wise total excluding null values.

  • @antonioarana8002
    @antonioarana8002 2 роки тому +11

    Is really helpful that someone with real knowledge and experience teaches this kind of hands on real example stuff! so many thanks Krish

  • @niharikathakur1672
    @niharikathakur1672 2 роки тому +9

    Just completed this session. Now everything seems so relatable and understandable. THANK YOU

    • @PMKB4
      @PMKB4 2 роки тому

      hi

    • @Uda_dunga
      @Uda_dunga 2 роки тому

      pls help me to understand 😭🙏🏽

    • @jaitiwari241
      @jaitiwari241 Рік тому

      Placement hue aapki kahi

  • @ajaykushwaha-je6mw
    @ajaykushwaha-je6mw 2 роки тому +36

    In real project we do not import CSV file, we pull data from mongo or from SQL db, can you please create video on importing data frame from Database.

    • @iakhileshgupta3553
      @iakhileshgupta3553 2 роки тому +2

      I cannot open this CSV to read in pandas can you please help me to read it.
      I'm getting a permission error

    • @sidindian1982
      @sidindian1982 2 роки тому +1

      @@iakhileshgupta3553 save the csv file in local folder ... then assign the path example
      Data= pd.read_csv(C:\\program file \\user\\folder\\'file_name.csv')
      Then print data.head() ... First 5 columns & rows come into picture ...

    • @iakhileshgupta3553
      @iakhileshgupta3553 2 роки тому +1

      @@sidindian1982 permissionerror: [errno 13] permission denied: 'c:\\users\\hp\\desktop' this is the error message

    • @sidindian1982
      @sidindian1982 2 роки тому +1

      @@iakhileshgupta3553 recheck CMD prompt ... whether pandas is installed or not ...if not then ...
      Pip install Numpy ...
      Pip install pandas ....both in cmd propmt
      In Jupyter notebook .. .import
      Import pandas as pd
      Import Numpy as np ..
      THEN RUN IT .. CLICK ON RUN BUTTON ( it has to reset always whenever you open Jupyter notebook ....
      Then type ... Df= read_csv('dats.csv')
      df.head ()

    • @iakhileshgupta3553
      @iakhileshgupta3553 2 роки тому +2

      Sir @@sidindian1982 i have pandas & numpy installed i have checked on cmd as pip list it is there & also done what you suggested it says already installed but it won't read the file & gives the above permission error in the terminal

  • @rutu.dances.to.express
    @rutu.dances.to.express 2 роки тому +9

    Thank you sir for this! I think, you should conduct more such sessions where you assign us such questions related to Data Analytics and then discuss answers

  • @kangkankalita5221
    @kangkankalita5221 2 роки тому +4

    Awesome session , much better than paid session, please keep posting sir.. thanks lot

  • @bhooshan25
    @bhooshan25 2 роки тому +6

    Thanks!

  • @aryamanbansal1
    @aryamanbansal1 2 роки тому +10

    final_df['Cuisines'].value_counts(sort=True, ascending=False)[:10]

    • @ankitraj3180
      @ankitraj3180 2 роки тому +1

      on which basis you define the top 10 cuisines..... and one more thing I tried a code
      final_df[final_df['Rating text']=='Excellent']['Cuisines'].head(10) so I found this code more efficient on the basis of info.
      which code is more appropricate??

    • @dr.madhurinaik
      @dr.madhurinaik 7 місяців тому

      final_df['Cuisines'].value_counts().head(10)

    • @Arjun-hc7ow
      @Arjun-hc7ow 6 місяців тому

      @@ankitraj3180 yepp, correct way

  • @soumya7427
    @soumya7427 2 роки тому +4

    Thank you sir. This is a great session. It is very helpful and everything seems very easy to understand.

  • @ashulohar8948
    @ashulohar8948 Рік тому +1

    Best teaching I am from non tech background even I understand ur teaching 😊 god of data science krish naik

  • @anuragthakur5787
    @anuragthakur5787 2 роки тому +2

    Wonderful session sir thank you very much
    we are catching up please don't get disheartened by viewer counts

  • @ankitayadav2690
    @ankitayadav2690 2 роки тому +1

    Superb sir, we are lucky to have a mentor like you

  • @gh504
    @gh504 2 роки тому +3

    Thank you sir for this amazing session. Sir please do live sessions on deep learning and NLP

  • @equiwave80
    @equiwave80 2 роки тому +1

    Thanks for this video. I spent my Sunday morning in a very useful way in brushing up my Python skills.

  • @garimaattri4760
    @garimaattri4760 3 місяці тому

    You make it very easy sir..the way you teach is fabulous....

  • @muhammadowaiskhan6831
    @muhammadowaiskhan6831 Рік тому +3

    I am from Pakistan and I have seen a lot of videos about EDA, But this one is just amazing. It is really easily understandable for begginers.
    Respect to you Sir!!!

    • @cyberpro151
      @cyberpro151 Рік тому

      do you work as a data analyst?

    • @muhammadowaiskhan6831
      @muhammadowaiskhan6831 Рік тому

      @@cyberpro151 yes

    • @shafatnawaz6102
      @shafatnawaz6102 3 місяці тому

      Bruh Watching this legend and honestly man today i miss the superchat faeture must be added in youtubePk

  • @sumijasukumaran1394
    @sumijasukumaran1394 2 роки тому +1

    Good live class ,understandable,thank you for the session

  • @niteshkuwarbi04
    @niteshkuwarbi04 7 місяців тому +1

    for assignment 1:26:16
    final_df['Cuisines'].value_counts().reset_index().head(10)

    • @Siddhant_Banerjee
      @Siddhant_Banerjee 4 місяці тому +1

      I tried this
      `df.Cuisines.value_counts()[:10].index`
      But both are basically the same thing I guess.

  • @karthikrajendran3394
    @karthikrajendran3394 Рік тому

    This is convenient data set, no strings in numerical columns, or extra characters. That's a challenge.

  • @karankumarchaudhari9477
    @karankumarchaudhari9477 2 роки тому +2

    Thank you sir for this amazing session. Sir please do live sessions on NLP and Deep Learning

  • @pawankatwe8985
    @pawankatwe8985 2 роки тому +2

    Great session... Thank you

  • @antonioarana8002
    @antonioarana8002 2 роки тому

    Prefect explanation! the visualization, the sub-setting , everything, the queries and the observations! GREAT thanks so much... (i just sttruggle a little bit about when to use groupby)

  • @evelyncusilopez6776
    @evelyncusilopez6776 11 місяців тому

    Awesome, thank you Krish!

  • @AnkJyotishAaman
    @AnkJyotishAaman 2 роки тому +2

    For the last Assingment which he has given as homework
    you can replace final_df as what you've coded in your book
    final_df[["Cuisines"]].groupby(["Cuisines"]).size().reset_index().sort_values(by=0,ascending=False).head(10)

    • @nimisha9095
      @nimisha9095 2 роки тому

      Can you please explain??

    • @MuhammadAhmed-jm1bs
      @MuhammadAhmed-jm1bs 2 роки тому +1

      Damn man that's a long code.
      Or you could simply write : final_df['Cuisines'].value_counts().head(10)

    • @AbdulHannan-dg6dl
      @AbdulHannan-dg6dl 2 роки тому

      cuisine_names=final_df.Cuisines.value_counts()
      cuisine_names[:10]

  • @user-rg6og5en2k
    @user-rg6og5en2k Рік тому +1

    This was sooo helpful krish sir! U made it like butter for us. I liked everything thank you u keep inspiring us

  • @maneeshmm8105
    @maneeshmm8105 2 роки тому

    thank you for giving such an amazing things that from your channel....

  • @user-wi7mt5st2s
    @user-wi7mt5st2s 3 місяці тому

    Thank you so much Sir for the great video

  • @riteshmukhopadhyay6922
    @riteshmukhopadhyay6922 2 роки тому +1

    In missing values part the heat map plot is not visible instead if we resize the scale of the graph we will be able to actuallyplot the point.
    plt.figure(figsize = (15,8))
    sns.heatmap(df.isnull(), yticklabels = False, cbar = True, cmap = 'viridis')
    the figure method will help us to resize the heat map accordingly.

  • @kibetwalter8528
    @kibetwalter8528 2 роки тому

    Hi Krish.
    Please do an example for the difference between using LSTM for classification and LSTM for regression.
    Explain the difference between using LSTM for the two. Especially for multivariate.
    You have always been my teacher. I learned machine learning and deep learning from you. No other bootcamp, I didn't do any computer science course in University. Just your UA-cam videos.
    Thank you so much.

  • @sonaganeshg2301
    @sonaganeshg2301 Рік тому

    Thank you so much sir. You are like a way to build confidence in me to start data science

  • @usamashaikh1046
    @usamashaikh1046 3 місяці тому

    Really appreciate your work

  • @shanmuganathan6230
    @shanmuganathan6230 4 місяці тому

    final_df[['Cuisines']].groupby('Cuisines').size().reset_index().rename(columns={0:'Count'}).head(10).sort_values(by='Count',ascending=False)

  • @kartik_exe_
    @kartik_exe_ Рік тому

    Hello the explanation is great and i have done the assignment it was super easy:
    # Finding top 10 cuisines
    cuisine_counts = df.groupby('Cuisines').size().reset_index()
    top_10_Cuisines = cuisine_counts.head(10)

  • @atharv_preeti
    @atharv_preeti 2 роки тому

    Wonderful Krish. Just love it.

  • @rithikahuja8203
    @rithikahuja8203 Рік тому

    The most helpful video sir thank you so much for your valuable efforts ❤❤

  • @talibdaryabi9434
    @talibdaryabi9434 Рік тому

    assignment: df_combined['Cuisines'].value_counts().reset_index().rename(columns = {'index' : 'Food'}).iloc[:11,:]

  • @catchursam
    @catchursam 2 роки тому

    Great session. Wish I could have joined online

  • @chaiyanutjirayupat4724
    @chaiyanutjirayupat4724 Рік тому

    You are the best!

  • @dikshagupta3276
    @dikshagupta3276 2 роки тому +1

    In cell no 10 you added so many features it is important pls reply nice explanation 👍

  • @thilak8595
    @thilak8595 2 роки тому

    final_df[final_df['Aggregate rating'] == 0][['Aggregate rating','Country']].value_counts()
    this also workes at 1:11:19

  • @sidnoga
    @sidnoga 2 роки тому

    Thank you for the amazing session

  • @SACHINGUPTA-in2gj
    @SACHINGUPTA-in2gj Місяць тому

    great session sir love u sir 🥰😍

  • @firasathali8044
    @firasathali8044 2 роки тому +1

    Hello sir, your contributions are very much helpful to many aspirants. one question why have stopped linear algebra tutorial ?

  • @discoverychannel6799
    @discoverychannel6799 2 роки тому

    great session thank you I learnt so much

  • @syeedafatima8634
    @syeedafatima8634 2 роки тому

    you are just amazing!

  • @shaelanderchauhan1963
    @shaelanderchauhan1963 2 роки тому

    Pronunciation of Cuisines was hilarious 1:26:20 HHAHAHAHAHHHAHAHAHAHAHAHAHAHAHAHAHA. It was an Amazing Video Kudos

  • @MuhammedShaheb
    @MuhammedShaheb 5 місяців тому

    Great session

  • @skuna1217
    @skuna1217 2 роки тому

    Wonderful Content

  • @samuelmorales4871
    @samuelmorales4871 Рік тому

    Thank you amazing video

  • @user-fb9tw6yh1f
    @user-fb9tw6yh1f 10 місяців тому

    Thank you Krish.

  • @HarishKumar-qt3mr
    @HarishKumar-qt3mr 2 роки тому

    Very helpful content brother

  • @palvinderbhatia3941
    @palvinderbhatia3941 2 роки тому +1

    Hi Krish Amazing video. Thanks alot for all the videos, keep it up 👍
    Have a doubt @1.02, why max no of ratings is between 2.5 to 3.4? And not 2.5 to 3.9?

  • @amoldusane9851
    @amoldusane9851 2 роки тому

    nice section very informative.....

  • @onlymusic2005
    @onlymusic2005 2 роки тому +1

    You are a source of motivation... Keep up the good work, Krish! May Allah bless you and all your beloved!

  • @sparshruhela8584
    @sparshruhela8584 7 місяців тому +1

    #find the top 10 cuisines
    final_df['Cuisines'].value_counts().head(10).reset_index()

  • @codecheckAbhi
    @codecheckAbhi 2 місяці тому

    # Find the top 10 cuisines
    final_df['Cuisines'].value_counts().sort_values(ascending = False).head(10)

  • @swapnilloharkar9668
    @swapnilloharkar9668 2 роки тому

    Really Helpful.

  • @exclusiveglobaleducation2658
    @exclusiveglobaleducation2658 2 роки тому

    really a great session .

  • @javeedtech
    @javeedtech Рік тому

    Thanks sir
    One issue is Krish sir moves his screen rapidly, it is difficult to code along with him,
    In UA-cam we can pause video and look for that section, but in live class difficult.

  • @harish00784
    @harish00784 2 роки тому

    💖💖💖AMAZING💖💖💖

  • @riteshmukhopadhyay6922
    @riteshmukhopadhyay6922 2 роки тому

    awesome work,

  • @kareoss
    @kareoss 2 роки тому +1

    Nice sir

  • @sethusaim1250
    @sethusaim1250 2 роки тому +1

    Thank you sir

  • @datasciencegyan5145
    @datasciencegyan5145 2 роки тому

    we can use
    import matplotlib.pyplot as plt
    plt.figure(figsize=(10,4)) for figure size in visualization

    • @Uda_dunga
      @Uda_dunga 2 роки тому

      yaa its just for background of ur chart

  • @amolghongade19_07
    @amolghongade19_07 Рік тому

    Good learning

  • @akashrathore1388
    @akashrathore1388 Рік тому

    Mzza hi a gya

  • @ashulohar8948
    @ashulohar8948 Рік тому

    Please make more vedios on different use cases

  • @indranisen5877
    @indranisen5877 2 роки тому

    Very helpful..

  • @siddnrx3943
    @siddnrx3943 3 місяці тому

    final_df['Country'][final_df['Rating text'] == 'Not rated'].value_counts()

  • @geekyprogrammer4831
    @geekyprogrammer4831 2 роки тому

    Amazing session Krish😁

  • @surajsuryawanshi5182
    @surajsuryawanshi5182 2 роки тому

    Osm session ❤👍

  • @dikshagupta3276
    @dikshagupta3276 2 роки тому

    Nice session thanku sir
    In jupyter how we execute all cell in one command

  • @sakshirathoree2908
    @sakshirathoree2908 2 роки тому +1

    How to change the background color of seaborn plot to black??

    • @Akash_158
      @Akash_158 2 роки тому +1

      sns.set(rc={'axes.facecolor':'dark', 'figure.facecolor':'dark'})

  • @abelsontenny7537
    @abelsontenny7537 2 роки тому

    Using concat instead of merge will result in NaN values perhaps

  • @drprince8766
    @drprince8766 6 місяців тому

    How to create website video. thank you

  • @er_ritesh_meshram
    @er_ritesh_meshram Рік тому

    sir please do EDA and ML project.

  • @deepakrc8956
    @deepakrc8956 Рік тому

    Thankyou sir..

  • @chandrashekhar-ss9hm
    @chandrashekhar-ss9hm 2 роки тому

    pl upload EDA videos in community group

  • @shubhamsingh3122
    @shubhamsingh3122 2 роки тому +1

    sir i am unable to open this dataset on my jupyter notebook

  • @pdivyanshupandey104
    @pdivyanshupandey104 6 місяців тому

    i m getting error in barplot where we plotting aggregate rating vs rating count
    (the error showing value error not able to interpret the rating count) at 55.20 mins

  • @chetak-thegermanshepherdsm141
    @chetak-thegermanshepherdsm141 2 роки тому

    Sir, Django playlist has been left incomplete, I believe. Please upload more videos on django

  • @sagaragalawe1536
    @sagaragalawe1536 7 місяців тому

    About error we can ask chatgpt direct

  • @pythonenthusiast9292
    @pythonenthusiast9292 2 роки тому +2

    what are the pre-requisites for this series?

  • @kartiksopran1359
    @kartiksopran1359 2 роки тому

    Hi sir can you explain how to aggregrate multiple columns in group by

  • @arbaazkhan7136
    @arbaazkhan7136 2 роки тому +1

    1.11.00 sir aapne kya paadha hai us time😆😆😆

  • @tanish7124
    @tanish7124 2 роки тому

    Thank you sir, Where to do we get the notepad which you have worked? is it saved somewhere. please inform

  • @sandipansarkar9211
    @sandipansarkar9211 2 роки тому +1

    finished coding

  • @graphicswithsk518
    @graphicswithsk518 Рік тому

    please help me how to find top 10 cuisines

  • @siddnrx3943
    @siddnrx3943 3 місяці тому

    final_df[final_df['Rating text'] == 'Excellent'][final_df['Aggregate rating'] == final_df['Aggregate rating']]['Cuisines'].value_counts()

  • @venkateshpolisetty5624
    @venkateshpolisetty5624 2 роки тому

    1:10:58 I Think The last record shouldn't come sir. Because, it has 1.8 rating and we require only 0 rating countries.

    • @abhishekrao1097
      @abhishekrao1097 2 роки тому

      Yes we should use
      country_rating= final_df.groupby(['Aggregate rating','Country']).size().reset_index().rename(columns={0:'Rating count'})
      country_rating[country_rating['Aggregate rating']==0]

  • @aashishraj685
    @aashishraj685 2 роки тому

    in case of skewed data, do we need to perform yeo-johnson power transformation and then standard scaling for the SVM model?

  • @user-bt3hy4ny2w
    @user-bt3hy4ny2w Рік тому

    df["Cuisines"][0:10]

  • @arvindersingh2193
    @arvindersingh2193 Рік тому

    ty sir

  • @noadsensehere9195
    @noadsensehere9195 Місяць тому

    put time stamp

  • @riffatabdulrauf2132
    @riffatabdulrauf2132 2 роки тому

    Sir I want to extract features from a text .CSV file, by using TFIDF and Ngram model, and I want the output in sparse matrix, Do you have any tutorial on that plz guide.

  • @user-rl4bu3mc8d
    @user-rl4bu3mc8d 4 місяці тому +1

    Naye bottle chamka rahae ho Kris naik

  • @faheemsification
    @faheemsification 2 роки тому

    Wonderful, I want to interact with you in regarding of Data science?

  • @gopikrishna4552
    @gopikrishna4552 2 роки тому

    Hi sir, while merging two different data sets. either data shape should be same or different?

  • @abhiramadala9864
    @abhiramadala9864 10 місяців тому

    so it is a regression dataset