What I infer here is that the output of these modules is simply FeatureId + FeaturePose + SensorPose, and this is considered a pattern. This concept seems to also apply to more abstract modules that take input from other modules, which themselves output "FeatureIds" + "VirtualPose" for those inner features. This might explain the sense of location when dealing with abstract concepts, such as a "higher-pitched song" or words that are "close" in meaning. Just sharing my thoughts here. Great video! 😊
@next_phase yes it's an interesting question to examine how these relate and differ. At a high-level, both of these rely on hierarchy, however there are several key differences. In CNNs, and deep-learning systems more generally, there is often a lack of “object-centric” representations, which is to say that when processing a scene with many objects, the properties of these tend to be mixed up with one another. This is in contrast to humans, where we understand the world as being composed of discrete objects with a degree of permanence, and where these objects have the ability to interact with one another - an understanding that emerges at a very young age. Furthermore, any given object in our brain is represented spatially, where the shape of the object - i.e. the relative arrangement of features - is far more important than low-level details like a texture that might be present. Again, this is different from how CNNs and other deep-learning systems learn to represent objects. So while there is hierarchy in both CNNs and the human visual system, the former can be thought of as more of a bank of filters that detect things like textures and other correlations between input pixels and text. We believe that in the brain however, every level of the hierarchy is representing discrete objects with their own structure and associated motor policies. These can be rapidly composed and recombined, enabling a wide range of representations and behaviors to emerge. Hope that's helpful, let me know if I can clarify anything.
Hi, when you refer to where and what columns, does that mean there are two different type of columns that work differently, or does all columns generally work differently and the brain separates “where” and “what” data to different columns in the brain to be processed?
Currently, our implementation encapsulates both the where and what concepts in one learning module. Neurologically, they are separated in the human cortex but we are not 100% sure if we'll need to implement them this way. Stay tuned!
What I infer here is that the output of these modules is simply FeatureId + FeaturePose + SensorPose, and this is considered a pattern. This concept seems to also apply to more abstract modules that take input from other modules, which themselves output "FeatureIds" + "VirtualPose" for those inner features. This might explain the sense of location when dealing with abstract concepts, such as a "higher-pitched song" or words that are "close" in meaning. Just sharing my thoughts here. Great video! 😊
Good inferences. :) Relatedly, you see lots of memory champions move their body to move through a space of associations.
Greate content
Thank you!
The concept of hierarchical composition reminded me of low level and high level features in CNNs.
@next_phase yes it's an interesting question to examine how these relate and differ.
At a high-level, both of these rely on hierarchy, however there are several key differences.
In CNNs, and deep-learning systems more generally, there is often a lack of “object-centric” representations, which is to say that when processing a scene with many objects, the properties of these tend to be mixed up with one another. This is in contrast to humans, where we understand the world as being composed of discrete objects with a degree of permanence, and where these objects have the ability to interact with one another - an understanding that emerges at a very young age.
Furthermore, any given object in our brain is represented spatially, where the shape of the object - i.e. the relative arrangement of features - is far more important than low-level details like a texture that might be present. Again, this is different from how CNNs and other deep-learning systems learn to represent objects.
So while there is hierarchy in both CNNs and the human visual system, the former can be thought of as more of a bank of filters that detect things like textures and other correlations between input pixels and text. We believe that in the brain however, every level of the hierarchy is representing discrete objects with their own structure and associated motor policies. These can be rapidly composed and recombined, enabling a wide range of representations and behaviors to emerge.
Hope that's helpful, let me know if I can clarify anything.
@ thanks for your comprehensive answer 👍
Hi, when you refer to where and what columns, does that mean there are two different type of columns that work differently, or does all columns generally work differently and the brain separates “where” and “what” data to different columns in the brain to be processed?
Currently, our implementation encapsulates both the where and what concepts in one learning module. Neurologically, they are separated in the human cortex but we are not 100% sure if we'll need to implement them this way. Stay tuned!