Thanks for this informative tutorial. I see you are not using the string columns. I am wondering if you could share your knowledge on converting one of the string columns to vector, or numbers, so that we can use the data for training.
There are two basic approaches to this depending on your application. Two very powerful libraries exist for word vectorization: word2vec and its extension doc2vec. Both were written by Mikolov of Google. I encourage you to look into tutorials on those and see if it's what you need. Otherwise you might consider converting your string data to a numerical code such as ASCII or UTF-8 and regularizing. The regularization step is incredibly important as a string of varying length will vary greatly in magnitude when represented in a numerical code. From there you will need a back end: your learning model will predict in ASCII and your back end must be prepared to translate that back into a human-readable string.
Thanks Tom for making your videos easy to follow!
I want to use Scala, and not Java, but these videos are nice and clear and will help a lot! Thanks!
Thanks for this informative tutorial. I see you are not using the string columns. I am wondering if you could share your knowledge on converting one of the string columns to vector, or numbers, so that we can use the data for training.
There are two basic approaches to this depending on your application. Two very powerful libraries exist for word vectorization: word2vec and its extension doc2vec. Both were written by Mikolov of Google.
I encourage you to look into tutorials on those and see if it's what you need.
Otherwise you might consider converting your string data to a numerical code such as ASCII or UTF-8 and regularizing. The regularization step is incredibly important as a string of varying length will vary greatly in magnitude when represented in a numerical code. From there you will need a back end: your learning model will predict in ASCII and your back end must be prepared to translate that back into a human-readable string.
Could you please tell us what you did at the end of line 91? It's not visible in the video...
last part of the code is cut off.... incompetency is astounding..
Lot of thank's for share this
Hi Alex, Thanks for the code walk through, it is very helpful for a newbie like me.
Please share the reports.csv file
github.com/SkymindIO/screencasts/tree/master/datavec_spark_transform
Maybe you can get it from this website.