Lecture 2 | Preprocessing Data for Machine Learning With Datavec & Spark

Поділитися
Вставка
  • Опубліковано 27 гру 2024

КОМЕНТАРІ • 9

  • @crockpotveggies
    @crockpotveggies 7 років тому

    Thanks Tom for making your videos easy to follow!

  • @scottkralph
    @scottkralph 5 років тому

    I want to use Scala, and not Java, but these videos are nice and clear and will help a lot! Thanks!

  • @sardarjaf4007
    @sardarjaf4007 7 років тому +2

    Thanks for this informative tutorial. I see you are not using the string columns. I am wondering if you could share your knowledge on converting one of the string columns to vector, or numbers, so that we can use the data for training.

    • @hunters.dicicco1410
      @hunters.dicicco1410 6 років тому

      There are two basic approaches to this depending on your application. Two very powerful libraries exist for word vectorization: word2vec and its extension doc2vec. Both were written by Mikolov of Google.
      I encourage you to look into tutorials on those and see if it's what you need.
      Otherwise you might consider converting your string data to a numerical code such as ASCII or UTF-8 and regularizing. The regularization step is incredibly important as a string of varying length will vary greatly in magnitude when represented in a numerical code. From there you will need a back end: your learning model will predict in ASCII and your back end must be prepared to translate that back into a human-readable string.

  • @guilhermelaviola8079
    @guilhermelaviola8079 Рік тому

    Could you please tell us what you did at the end of line 91? It's not visible in the video...

  • @Qornv
    @Qornv 5 років тому +1

    last part of the code is cut off.... incompetency is astounding..

  • @TodoProcesos
    @TodoProcesos 7 років тому

    Lot of thank's for share this

  • @giribhushanch1726
    @giribhushanch1726 8 років тому

    Hi Alex, Thanks for the code walk through, it is very helpful for a newbie like me.
    Please share the reports.csv file

    • @fortunelee5666
      @fortunelee5666 7 років тому +1

      github.com/SkymindIO/screencasts/tree/master/datavec_spark_transform
      Maybe you can get it from this website.