Great TDA example video. However, returns should be estimated using original adjusted closed prices first and then standardized (if necessary), instead of being estimated from standardized prices.
@@ShawhinTalebi Could've sworn I also wrote an answer to this, about how that's standardization and not normalization, but I might've also forgotten to press 'Send'/'Reply' normalization would be (data - data.min()) / (data.max() - data.min()) (pseudocode)
Glad it was helpful! You can define the cover using the kmapper.cover() class, then pass it into the .fit_transform() and .map() methods. Since I do not specify it explicitly here, it uses the default parameters of n_cubes=10, perc_overlap=0.5, and limits=None. Here are some documentation links. cover - kepler-mapper.scikit-tda.org/en/latest/reference/stubs/kmapper.Cover.html#kmapper.Cover fit_transform - kepler-mapper.scikit-tda.org/en/latest/reference/stubs/kmapper.KeplerMapper.html#kmapper.KeplerMapper.fit_transform map - kepler-mapper.scikit-tda.org/en/latest/reference/stubs/kmapper.KeplerMapper.html#kmapper.KeplerMapper.map
When someone says "high dimensional data" , like you said as an example some "500 dimensional data set", what do they mean exactly? Does it mean that each data point has 500 components or features to it?
Great question! In data science, dimensions are typically synonymous with features i.e. the attributes that define individual data points. This is because given N features, we can view data points as living in an N-dimensional space.
In the Mapper algorithm, I am not sure what the projecting, covering, and clustering the pre-image have to do with each other. Basically, I'm wondering couldn't the very first step be some clustering algorithm and then you can immediately make a graph from that? I'm just not sure what the projecting onto lower-dimensions , covering etc have to do with the eventual clustering step.
Another great question. Here's my understanding. Mapper generates a graph whose nodes correspond to clusters. While we can get these without running through steps 2-4 on slide 3, we still need connect these nodes in some way to generate a graph. This is where the cover comes in. Essentially, Step 3 provides us a 2nd clustering strategy (i.e. green vs red) which we can use to define the links between the clusters found in Step 4. Namely, 2 clusters from Step 4 are connected by an edge if they share members according to the Step 2 subsets.
Per se fascinating, thanks. But is it also useful? It would be helpful with some kind of literature that covers the necessary background knowledge. My topology book was printed in the 1960's.
While there are have been some interesting use cases, Mapper is still in its infancy. I share more resources in the video description and in the article for this video: medium.datadriveninvestor.com/the-mapper-algorithm-d0842f926658?sk=4b78e5f8f2e8f390b919e8285a97871e
More in this series 👇
- Introduction to TDA: ua-cam.com/video/fpL5fMmJHqk/v-deo.html
- Persistent Homology: ua-cam.com/video/5ezFcy9CIWE/v-deo.html
Thanks for following up in the series! Really enjoying it 👌🏾
Thanks for watching!
Thank you for introducing this idea in such a friendly way!
Thanks for producing this - very well done
Thanks! I'm glad you liked it :)
We need persistent homology! Thank you!
Coming very soon!
Much needed, TDA w/o notation nightmare 😂
Glad it helped!
Great TDA example video. However, returns should be estimated using original adjusted closed prices first and then standardized (if necessary), instead of being estimated from standardized prices.
Good call out!
great video
Thanks for watching!
Great video. Do you have any future videos planned illustrating similar pipelines for multimodal or mixed datasets?
That's a good idea. Are there any specific use cases you're interested in?
7:22 data-np.mean... or (data-np.mean...)/np.std(... ?
(data - np.mean(data))/np.std(data)
Here I am "normalizing" the data.
@@ShawhinTalebi Could've sworn I also wrote an answer to this, about how that's standardization and not normalization, but I might've also forgotten to press 'Send'/'Reply'
normalization would be (data - data.min()) / (data.max() - data.min()) (pseudocode)
Hi Thanks for this video, was very helpful. Where do you set the cover in the python code?
Glad it was helpful! You can define the cover using the kmapper.cover() class, then pass it into the .fit_transform() and .map() methods. Since I do not specify it explicitly here, it uses the default parameters of n_cubes=10, perc_overlap=0.5, and limits=None.
Here are some documentation links.
cover - kepler-mapper.scikit-tda.org/en/latest/reference/stubs/kmapper.Cover.html#kmapper.Cover
fit_transform - kepler-mapper.scikit-tda.org/en/latest/reference/stubs/kmapper.KeplerMapper.html#kmapper.KeplerMapper.fit_transform
map - kepler-mapper.scikit-tda.org/en/latest/reference/stubs/kmapper.KeplerMapper.html#kmapper.KeplerMapper.map
When someone says "high dimensional data" , like you said as an example some "500 dimensional data set", what do they mean exactly? Does it mean that each data point has 500 components or features to it?
Great question! In data science, dimensions are typically synonymous with features i.e. the attributes that define individual data points. This is because given N features, we can view data points as living in an N-dimensional space.
I like it
In the Mapper algorithm, I am not sure what the projecting, covering, and clustering the pre-image have to do with each other. Basically, I'm wondering couldn't the very first step be some clustering algorithm and then you can immediately make a graph from that? I'm just not sure what the projecting onto lower-dimensions , covering etc have to do with the eventual clustering step.
Another great question. Here's my understanding.
Mapper generates a graph whose nodes correspond to clusters. While we can get these without running through steps 2-4 on slide 3, we still need connect these nodes in some way to generate a graph.
This is where the cover comes in. Essentially, Step 3 provides us a 2nd clustering strategy (i.e. green vs red) which we can use to define the links between the clusters found in Step 4. Namely, 2 clusters from Step 4 are connected by an edge if they share members according to the Step 2 subsets.
Per se fascinating, thanks. But is it also useful? It would be helpful with some kind of literature that covers the necessary background knowledge. My topology book was printed in the 1960's.
While there are have been some interesting use cases, Mapper is still in its infancy. I share more resources in the video description and in the article for this video: medium.datadriveninvestor.com/the-mapper-algorithm-d0842f926658?sk=4b78e5f8f2e8f390b919e8285a97871e
good video, the volume is way too low!! when the youtube ads pop up is dangerous hehe
Forgive my mediocre editing. Hopefully the quality of my more recent content is better 😅
@@ShawhinTalebi editing is great! just a constructive comment for your next video. Greetings
@@luis2arm Thanks Luis, I appreciate the feedback
It looks like Research Rabbit also use similar way to show its search results.
That’s really interesting, I wonder if they use Mapper in the backend
And could you recommend some books or references?
Yes, there are a couple in the video description under "Resources I found helpful".
اريد مساعدة منك
Happy to help however I can. You can message me here: shawhint.github.io/connect.html
too small volume,, spaek up!
sorry about that! hopefully other videos had better levels 😅