- 15
- 92 668
Dan Bochman
Приєднався 22 чер 2018
Hello! Welcome to my channel!
My name is Dan Bochman,
I enjoy making content that can help fellow data science enthusiasts get into the field. I'm trying to put the focus on topics that were difficult for me to understand, and that I didn't find much sources to learn from on them.
My name is Dan Bochman,
I enjoy making content that can help fellow data science enthusiasts get into the field. I'm trying to put the focus on topics that were difficult for me to understand, and that I didn't find much sources to learn from on them.
LinkedIn Algorithm Test
Same content, different exposure? 🤔
Lately, many influential content creators on LinkedIn are wondering why their viewership isn't proportionate to the amount of followers they have.
External links seem to be the most prominent suspect as a severely decreasing factor in exposure.
Being the data geek that I am, I've decided to tackle this hypothesis in the form of A/B testing.
I've released 2 identical posts (one of which you're seeing right now), where in one the video is uploaded natively and in the other from my UA-cam channel.
I will give weekly updates on the view count for both of these posts.
If you think this experiment is useful for the community, please help scale it up, making it more robust, by liking and commenting your thoughts!
My LinkedIn page:
www.linkedin.com/in/danbochman/
Lately, many influential content creators on LinkedIn are wondering why their viewership isn't proportionate to the amount of followers they have.
External links seem to be the most prominent suspect as a severely decreasing factor in exposure.
Being the data geek that I am, I've decided to tackle this hypothesis in the form of A/B testing.
I've released 2 identical posts (one of which you're seeing right now), where in one the video is uploaded natively and in the other from my UA-cam channel.
I will give weekly updates on the view count for both of these posts.
If you think this experiment is useful for the community, please help scale it up, making it more robust, by liking and commenting your thoughts!
My LinkedIn page:
www.linkedin.com/in/danbochman/
Переглядів: 216
Відео
ML Model Deployment with Flask - Part II - Hosting on Heroku | ML & DS Open-source Spotlight #9
Переглядів 4114 роки тому
Host ML apps on the Web with ease using Heroku 📤 See the first part of this video here - ua-cam.com/video/Od0gS3Qeges/v-deo.html Heroku is a platform as a service that enables developers to build, run, and operate applications entirely in the cloud. If you don't have any hardware or driver requirements like GPU or CUDA for your ML app to run, it's so convenient to host your app on Heroku! It bu...
ML Model Deployment with Flask | Machine Learning & Data Science Open-source Spotlight #8
Переглядів 7414 роки тому
Deploy your ML models easily with Flask! ⚗️ Deploying a trained machine learning model successfully for other people to use and enjoy, is an increasingly important skill, but often neglected in the curriculum of a data scientist. In many companies, the task of model deployment is the sole responsibility of the software engineering team, but I believe that as this field advances, this privilege ...
HoloViews | Machine Learning & Data Science Open-source Spotlight #7
Переглядів 1,3 тис.4 роки тому
Question: What's the simplest way to make high quality plots in Python? Answer: 👉 HoloViews! 👈 " HoloViews is an open-source Python library designed to make data analysis and visualization seamless and simple. With HoloViews, you can usually express what you want to do in very few lines of code, letting you focus on what you are trying to explore and convey, not on the process of plotting. " - ...
Datashader in 15 Minutes | Machine Learning & Data Science Open-source Spotlight #6
Переглядів 6 тис.4 роки тому
Real-time interactive(!) big data visualizations with Datashader Bokeh 👁️ The main challenge of visualizing huge datasets is NOT computing power or memory! It's actually making meaningful plots which are able to highlight the dense sections in your data together with the outliers. Datashader pre-renders even the largest datasets into a fixed-size raster image that faithfully represents the data...
Dask in 15 Minutes | Machine Learning & Data Science Open-source Spotlight #5
Переглядів 50 тис.4 роки тому
Should you use Dask or PySpark for Big Data? 🤔 Dask is a flexible library for parallel computing in Python. In this video I give a tutorial on how to use Dask for parallel computing, handling Big Data and integration with Deep Learning frameworks. I compare Dask to PySpark and list the relative advantages I see of choosing Dask as your primary choice for Big Data handling. Link to Notebook: nbv...
Plotly & Cost Function Visualizations | Machine Learning & Data Science Open-source Spotlight #4
Переглядів 10 тис.4 роки тому
Stop doing 3D plots with Matplotlib! ❌ Plotly's Python graphing library makes interactive, publication-quality graphs. It's based on JavaScript and similar to Bokeh which I covered last week. I especially like using Plotly for creating 3D interactive plots. It has an intuitive API for passing the data and creating the grid required for 3D plotting. Much simpler and flexible compared to other li...
bokeh | Machine Learning & Data Science Open-source Spotlight #3
Переглядів 1,1 тис.4 роки тому
Are you using bokeh? 📊 bokeh is an interactive visualization library for modern web browsers. Although it's an already well-established Python package with sponsors, I think not many people are choosing this tool for data visualizations. With this short tutorial I'm aiming to help you get started with this great package, so you can easily start making professional-looking interactive plots. Wit...
Featuretools | Machine Learning & Data Science Open-source Spotlight #2
Переглядів 8 тис.4 роки тому
Are you using Featuretools? 🔎 Featuretools is a Python open-source library which offers automated feature engineering. What I particularly liked in this library is the ability to elegantly extract features from multiple tables and aggregate them to one final dataset. Video notebook: github.com/danbochman/Open-Source-Spotlight/tree/master/Featuretools Featuretools: github.com/FeatureLabs/feature...
Pandas Profiling | Machine Learning & Data Science Open-source Spotlight #1
Переглядів 5634 роки тому
Are you using Pandas Profiling? 🐼 There are amazing open-source Python libraries out there for machine learning and data science, which are well-deserved to be mainstream staple choices in every professional's toolkit. With these new "Machine Learning & Data Science Open Source Spotlight" weekly videos, my objective is to introduce many game-changing libraries, which I believe many people can b...
חמישה צעדים בשביל להפוך לדאטה סיינטיסט
Переглядів 1855 років тому
בסרטון זה אני מתאר מניסיון אישי מהם הצעדים הרלוונטים והאפקטיבים ביותר לדעתי שניתן לקחת בשביל להיכנס לתחום של Data Science ו/או Machine Learning כתלות ברקע וניסיון קודם
How Do Instagram Filters Work? - ?איך פילטרים באינסטגרם עובדים
Переглядів 1 тис.5 років тому
האם תהיתם לעצמכם איך פילטר של אינסטגרם באמת עובד מאחורי הקלעים? סרטון זה מציג עקרונות בסיסים בעיבוד תמונה וכלים פשוטים בשפת התכנות פייתון ליצירת פילטר של אינסטגרם.
Decision Trees - עצי החלטה
Переглядів 1,5 тис.5 років тому
סרטון המציג את הנושא של עצי החלטה. חלק ראשון מתוך הרצאה מקיפה יותר בנושא של Tree Models and Ensembles המועברת תחת תוכנית ההכשרה Future Learning. אתר התוכנית: futurelearning.ai/
Real-time Action Recognition with Non-local Network
Переглядів 1,3 тис.5 років тому
Inference demo of a project done by me and Daniel Shafer (The person in the video) Webcam feed is inferred to ResNet101 w/ Non-local Blocks trained on the UCF-101 dataset. Precision can be improved by utilizing the optical flow; However, creating optical flow data and running it on a parallel network hinders real-time performance substantially. Source code on GitHub: github.com/danbochman/Real-...
One Hot Encoding with Python | Handling Categorical Data
Переглядів 11 тис.6 років тому
In this tutorial you can see how one hot encoding is applied in order to handle categorical data, step-by-step, in a real world data problem environment. You can check out the whole project from A to Z on my GitHub page: github.com/danbochman/FARS_LEARNING If you have any questions, feel free to ask in the comments! Please let me know if there are any specific machine learning tutorials you wis...
Watching this again reveals even more ideas! Many thanks for this Dan. Also do you have similar ones using Datashader and hvplot using pandas / polars /dask for plotting line,bar charts etc?
Thanks for the video! One question, what to do when I have z as pd.Series and not as a matrix? Not sure what would be the right way to convert it to matrix. I can use reshape, but I'm not sure it will shape the matrix as required.
Hi Dan, got the following error in cell [7] when trying to run your script in Colab: --------------------------------------------------------------------------- TypeError Traceback (most recent call last) <ipython-input-7-f76c603a8d2f> in <cell line: 1>() ----> 1 feature_matrix_customers, features_defs = ft.dfs(entities=entities, 2 relationships=relationships, 3 target_entity="customers") 1 frames /usr/local/lib/python3.10/dist-packages/featuretools/utils/entry_point.py in function_wrapper(*args, **kwargs) 30 # call function 31 start = time.time() ---> 32 return_value = func(*args, **kwargs) 33 runtime = time.time() - start 34 except Exception as e: TypeError: dfs() got an unexpected keyword argument 'entities'
awesome intro
Hi Dan, this was a great introductory video. I am learning Dask and this was very helpful.
Thanks for the videos. I have a question about multivariate data. I have three independent variables and would like to see their occurrences by coloring the data based on their probability densities (plot type can be contour, surf etc.) Which function should I use? Could you please help me with this?
Hey Dan, can you explain how to change the color scale in green - yellow - red?
Great video, worth trying the notebook after reading the paper
Great
I've added dask delayed to some functions. When I visualize it, there are several parallel works planned, but my cpu does not seems to be affected (its only using a small percentage of it)
Thanks man. It's really hard to find some information about dask
4:12 ,visualize() , where can in get documetation? I tried to use for sorted() did not work
Hi 😀 Dan. Thank you for this video. Do you an example which uses the apply() function? I want to create a new column based on a data transformation. Thank you!
How to schedule a dask task? for example how to put a script to run every day at 10:00 with dask for example
Saved my ass, man. Thanks!
Many thanks for this excellent video! It is really clear and helpful! I just have one question, I tried to run the notebook, and ran pretty well after some minor updates. Just the last line I was not able to make it run: # never run out-of-memory while training "model.fit_generator(generator=dask_data_generator(df_train), steps_per_epoch=100)" Gives me an error message: InvalidArgumentError: Graph execution error: TypeError: `generator` yielded an element of shape (26119,) where an element of shape (None, None) was expected. Traceback (most recent call last): TypeError: `generator` yielded an element of shape (26119,) where an element of shape (None, None) was expected. [[{{node PyFunc}}]] [[IteratorGetNext]] [Op:__inference_train_function_506] Any recommendation on how I should modify it to make it run? Thanks AG
very good ! thank you ! :)
fantastic video must say,,keep going sir...ur really really great...teaching such a complex thing in such short video ythat too so clear...thanks a lottt again
Great job on this video, Dan! Datashader is pretty sweet!
👏👏
Great video! Thank you man 👍
Thank you. You should come back and bring more content.
Nice demo. Thank you for sharing. Keep up the good work.
Many thanks your very sophisticated steps by steps instructions! I would like to ask you how we can reach these kind of data set like more than a millition being able to use datashader?
Great Video an Explanation! Thank you very much for it! IT is really helpful! I tried to run the notebook, and it ran pretty well after some minor updates. I just had problems to run the latest part. "never run out-of-memory while training", seems the generator or steps per epoch part is giving some prblem I cant fogure hout how to solve. Any possible suggestion on how to fix the code? Thanks! InvalidArgumentError: Graph execution error: TypeError: `generator` yielded an element of shape (26119,) where an element of shape (None, None) was expected. Traceback (most recent call last): TypeError: `generator` yielded an element of shape (26119,) where an element of shape (None, None) was expected.
Interesting, I have the same problem in the last part of the notebook. Seems it is related to IDE, it needs and update.
Man, great video, thank you!
Thanks Man!
Great explanations for beginners!! Thanks for this...
Hi, trying to do your excerise code , but an error appears : TypeError: dfs() got an unexpected keyword argument 'entities'.
Hi Maria, It has been 2 years, so they probably changed their dfs function arguments. Looking in the documentation for dfs: featuretools.alteryx.com/en/stable/generated/featuretools.dfs.html It seems this function now expects an argument called entityset And this entityset is: "entityset (EntitySet) - An already initialized entityset. Required if dataframes and relationships are not defined." EntitySet seems to be a new class : featuretools.alteryx.com/en/stable/generated/featuretools.EntitySet.html Perhaps my guide is outdated in terms of syntax but the concept should stay the same!
Very nice video!!!! Thank you so much!!! Pls make more about this hands-on video! You explain them very clear and helpful!!!!
amazing. Very interesting theme.
wonder what is difference between encoding and mapping. For example if STATE_CD goes from 1 to 50 say, now its numeric - can it be used in AI learning without resorting to one hot encoding?
If you map states to numbers 1 to 50, it can be used in ML, but you inserted an internal relationship that doesn’t exist (state 1 is more similar to state 2, very far from state 50)
@@danbochman thanks for the reply. understood
Chapeau, well explained and healthy usage of memes!
Great intro. Thank you
Great video! Thank you for sharing.But I think your code would have some incorrect codes in machine learning with dask part. There is no X in the code (model.add(..., input_dim=X.shape[1], ... ) and when I training model.fit_generator, the tensor flow saids model.fit_generator is deprecated.. and finally displayed error - AttributeError: 'tuple' object has no attribute 'rank'
Hey! Whoops, I must've changed the variable name X to df_train and wasn't consistent in the process, it probably didn't pop a message to me because X was still in my notebook workspace. You can either change df_train to X or change X to df_train X <==> df_train. Just be consistent and it should work!
Great video with good examples. Loved the MiniNet part. Thank you.
Hey thanks for the awesome video and the explanation. I have a use case. I am trying to build a Deep learning tensorflow model for time series forecasting. For this I need to use multinode cluster for parallelization across all nodes. I have a single function which can take data for any 1 store and predict for that store. Likewise I need to do predictions for 2 lakh outlets. How can I use dask to parallelize this huge task across all nodes of my cluster? Can you please guide me. Thanks in advance.
Hi Madhu, Sorry, wish I could help, but node cluster parallelization tasks are more dependent on the framework iteself (e.g. Kubernetes), than Dask. You have the dask.distributed module (distributed.dask.org/en/stable/), but handling the multi-worker infrastructure is where the real challenge lies...
I really appreciate the batch-on-the-fly example with keras.
Great content! One question, isn't it strange to use function in this form: function(do_something)(on_variable) instead of function(do_something(on_variable)) ?
Hey Mark! Thanks. I understand what you mean, but when a function returns a function this makes sense, as opposed to a function which outputs the input to the next function. delayed(fn) returns a a new function (e.g. "delayed_fn"), and this new function is then called regularly delayed_fn(x). So its delayed(fn)(x). All decorators are functions which return callable functions. In this example they are used quite unnaturally because I wanted to keep both versions of the functions. Hope the explanation helped!
Really great Dask introduction and the explanation is so easy to understand. That was useful. Thank you!
Many thanks. Now I understand why the file was not read
Nice
Good work sir, your video has helped me to get started with Dask. Thank you very much.
This is really greAT
Great. I would also like to see more on DASK and Deep Learning. How exactly would this generator be used in pytorch? Instead of the DataLoader. Thanks for the video(s)
יש משהו דומה על למידת מכונה?
Excellent!
Thank you so much for your explanation, I learned more in this one video than reading multiple articles where my mind felt bogged and bored each time I read a line but this is so digestible and easy to understand
Dan, I just found your videos. They’re great! Will you be making any more?
Hey Benjamin, glad you liked it! Unfortunately, I don't think I'll be making new videos soon... Just out of curiosity, what kind of videos/topics would you be interested in?
Thank you so much man... you saved a project... 🙂🙂❤❤🙏🙏