MMDS Foundation
MMDS Foundation
  • 107
  • 115 601
Why Deep Learning Works: Perspectives from Theoretical Chemistry, Charles Martin
We present new ideas which attempt to explain why Deep Learning works, taking lessons from Theoretical Chemistry, and integrating ideas from Protein Folding, Renormalization Group, and Quantum Chemistry.
We address the idea that spin glasses make good models for Deep Learning, and discuss both the p-spherical spin glass models used by LeCun, and the spin-glass-of-minimal frustration, proposed by Wolynes for protein folding some 20 years ago.
We argue that Deep Learning energy models resemble the energy models developed for protein folding, and, in contrast to the p-spin spherical models, suggest the energy landscape of a deep learning model should be ruggedly convex. We compare and contrast this to hypothesis to current suggestions as to why Deep Learning works.
We show the relationship between RBMs and Variational Renormalization Group, and explain the importance in modeling neuro-dynamics. We then discuss how the RG transform can be used as a path to construct an Effective Hamiltonian for Deep Learning that would help illuminate why these models work so well.
Переглядів: 9 143

Відео

In-core computation with HyperBall, Sebastiano Vigna
Переглядів 7427 років тому
We approach the problem of computing geometric centralities, such as closeness and harmonic centrality, on very large graphs; traditionally this task requires an all-pairs shortest-path computation in the exact case, or a number of breadth-first traversals for approximated computations, but these techniques yield very weak statistical guarantees on highly disconnected graphs. We rather assume t...
PCA with Model Misspecification, R. Anderson, S Bianchi
Переглядів 2877 років тому
The theoretical justifications for Principal Component Analysis (PCA) typically assume that the data is IID over the estimation window. In practice, this assumption is routinely violated in financial data. We examine the extent to which PCA-like procedures can be justified in the presence of two specific kinds of misspecification present in financial data: time-varying volatility, and the prese...
Fast, flexible, and interpretable regression modeling, Daniela Witten
Переглядів 3,2 тис.7 років тому
In modern applications, we are interested in fitting regression models that are fast (i.e. can be fit to high-dimensional data), flexible (e.g. do not assume a linear conditional mean relationship between the response and the predictors), and interpretable (i.e. a person can make sense of the fitted model). I’ll present two recent developments towards this end - the fused lasso additive model (...
Is manifold learning for toy data only?, Marina Meila
Переглядів 8 тис.7 років тому
Manifold learning algorithms aim to recover the underlying low dimensional parametrization of the data using either local or global features. It is however widely recognized that the low dimensional parametrizations will typically distort the geometric properties of the original data, like distances and angles. These impredictible and algorithm dependent distortions make it unsafe to pipeline t...
Cooperative Computing for Autonomous Data Centers Storing Social Network Data, Jon Berry
Переглядів 1177 років тому
We consider graph datasets that are distributed among several data centers with constrained sharing arrangements. We propose a formal model in which to design and analyze algorithms for this context along with two representative algorithms: s-t connectivity and planted clique discovery. The latter forced us to rethink recent conventional wisdom from social science regarding the clustering coeff...
New Methods for Designing and Analyzing Large Scale Randomized Experiment, Jasjeet Sekhon
Переглядів 4037 років тому
The rise of massive datasets that provide fine-grained information about human beings and their behavior provides unprecedented opportunities for evaluating the effectiveness of social, behavioral, and medical treatments. We have also become more interested in finegrained inferences. Researchers and policy makers are increasingly unsatisfied with estimates of average treatment effects based on ...
Head, Torso and Tail - Performance for modeling real data, Alex Smola
Переглядів 9917 років тому
Real data is high dimensional and multi variate. A naive application of optimization techniques leads to rather poor performance, due to large memory footprint, latency and network cost. In this talk I will address how to overcome these constraints by design that takes advantage of distributional properties of real data for recommendation, classification and modeling.
The Union of Intersections Method, Kristopher Bouchard
Переглядів 2897 років тому
The increasing size and complexity of biomedical data could dramatically enhance basic discovery and prediction for applications. Realizing this potential requires analytics that are simultaneously selective, accurate, predictive, stable, and scalable. However, current methods do not generally achieve this. Here, we introduce the Union of Intersections method, a novel, modular paradigm for reco...
New Results in Non-Convex Optimization for Large Scale Machine Learning, Constantine Caramains
Переглядів 5237 років тому
The last few years has seen a flurry of activity in non-convex approaches to enable solution of large scale optimization problems that come up in machine learning. The common thread in many of these results is that low-rank matrix optimization or recovery can be accomplished while forcing the low-rank factorization and then solving the resulting factored (non-convex) optimization problem. We co...
The Stability Principle for Information Extraction from Data, Bin Yu
Переглядів 2497 років тому
Reproducibility is imperative for any scientific discovery. More often than not, modern scientific findings rely on statistical analysis of high-dimensional data. At a minimum, reproducibility manifests itself in stability of statistical results relative to ”reasonable” perturbations to data and to the model used. Jacknife, bootstrap, and crossvalidation are based on perturbations to data, whil...
Restricted Strong Convexity Implies Weak Submodularity, Alex Dimakis
Переглядів 5867 років тому
We connect high-dimensional subset selection and submodular maximization. Our results extend the work of Das and Kempe (2011) from the setting of linear regression to arbitrary objective functions. This connection allows us to obtain strong multiplicative performance bounds on several greedy feature selection methods without statistical modeling assumptions. This is in contrast to prior work th...
Randomized Low-Rank Approximation and PCA: Beyond Sketching, Cameron Musco
Переглядів 4,4 тис.7 років тому
I will discuss recent work on randomized algorithms for low-rank approximation and principal component analysis (PCA). The talk will focus on efforts that move beyond the extremely fast, but relatively crude approximations offered by random sketching algorithms. In particular, we will see how advances in Johnson-Lindenstrauss projection methods have provided tools for improving the analysis of ...
Minimax optimal subsampling for large sample linear regression, Aarti Singh
Переглядів 7597 років тому
We investigate statistical aspects of subsampling for large-scale linear regression under label budget constraints. In many applications, we have access to large datasets (such as healthcare records, database of building profiles, and visual stimuli), but the corresponding labels (such as customer satisfaction, energy usage, and brain response, respectively) are hard to obtain. We derive comput...
A Framework for Processing Large Graphs in Shared Memory, Julian Shun
Переглядів 1,8 тис.7 років тому
In this talk, I will discuss Ligra, a shared-memory graph processing framework that has two very simple routines, one for mapping over edges and one for mapping over vertices. The routines can be applied to any subset of the vertices and automatically adapt to their density, which makes the framework useful for many graph traversal algorithms that operate on subsets of the vertices. Ligra is ab...
Scalable Collective Inference from Richly Structured Data (Lise Getoor)
Переглядів 2847 років тому
Scalable Collective Inference from Richly Structured Data (Lise Getoor)
Local graph clustering algorithms: an optimization perspective, Kimon Fountoulakis
Переглядів 1,5 тис.7 років тому
Local graph clustering algorithms: an optimization perspective, Kimon Fountoulakis
Principal Component Analysis & High Dimensional Factor Model, Dacheng Xiu
Переглядів 2,3 тис.7 років тому
Principal Component Analysis & High Dimensional Factor Model, Dacheng Xiu
Identifying Broad and Narrow Financial Risk Factors with Convex Optimization: Part 1, Lisa Goldberg
Переглядів 2117 років тому
Identifying Broad and Narrow Financial Risk Factors with Convex Optimization: Part 1, Lisa Goldberg
Top 10 Data Analytics Problems in Science, Prabhat
Переглядів 4577 років тому
Top 10 Data Analytics Problems in Science, Prabhat
Learning about business cycle conditions from four terabytes of data, Serena Ng
Переглядів 5187 років тому
Learning about business cycle conditions from four terabytes of data, Serena Ng
Sub-sampled Newton Methods: Uniform and Non-Uniform Sampling, Fred Roosta
Переглядів 1,1 тис.7 років тому
Sub-sampled Newton Methods: Uniform and Non-Uniform Sampling, Fred Roosta
A theory of multineuronal dimensionality, dynamics and measurement, Surya Ganguli
Переглядів 8677 років тому
A theory of multineuronal dimensionality, dynamics and measurement, Surya Ganguli
Structure & Dynamics from Random Observations, Abbas Ourmazd
Переглядів 2297 років тому
Structure & Dynamics from Random Observations, Abbas Ourmazd
Building Scalable Predictive Modeling Platform for Healthcare Applications, Jimeng Sun
Переглядів 1,1 тис.7 років тому
Building Scalable Predictive Modeling Platform for Healthcare Applications, Jimeng Sun
Mining Tools for Large-Scale Networks, Babis Tsourakakis
Переглядів 6127 років тому
Mining Tools for Large-Scale Networks, Babis Tsourakakis
PCA from noisy linearly reduced measurements, Joakim Anden
Переглядів 4367 років тому
PCA from noisy linearly reduced measurements, Joakim Anden
Fast Graphlet Decomposition, Nessreen Ahmet
Переглядів 6417 років тому
Fast Graphlet Decomposition, Nessreen Ahmet
CUR Factorization via Discrete Empirical Interpolation by Mark Embree
Переглядів 1,8 тис.7 років тому
CUR Factorization via Discrete Empirical Interpolation by Mark Embree
MMDS Announcement
Переглядів 698 років тому
MMDS Announcement

КОМЕНТАРІ

  • @behnamhashemi5019
    @behnamhashemi5019 4 місяці тому

    This is a great talk by Petros Drineas who knows how to extract and clarify the essence of each topic; a skill that sets him apart from many others.

  • @vivekumarajak
    @vivekumarajak 7 місяців тому

    How vertecies and edges are represented in the graph, Can yo explain?

  • @vivaelbetis2086
    @vivaelbetis2086 9 місяців тому

    didnt know j cole was so good at mth

  • @MachinimaCommentary1
    @MachinimaCommentary1 Рік тому

    J Cole teaching at MIT now haha meth

  • @wolfram77
    @wolfram77 Рік тому

    Interesting approach. In which scenarios is a point-to-point PPR useful?

  • @yni3240
    @yni3240 Рік тому

    W jelani Cole

  • @chadx8269
    @chadx8269 Рік тому

    I'm hooked on nonl-inear.

  • @neetrab
    @neetrab Рік тому

    Thank you so much for uploading this video with clear and good volume!!

  • @euclidofalexandria3786
    @euclidofalexandria3786 2 роки тому

    the signatures of the "flavor" of chaos in the vacuum, the flux flav. , the biases expressed as noise, brown, pink blue etc etc...

  • @colleejennifer2609
    @colleejennifer2609 2 роки тому

    Thanks for the video

  • @AndruideAtario
    @AndruideAtario 2 роки тому

    wow

  • @libertycan6959
    @libertycan6959 2 роки тому

    Target supplied applications...forward and reverse

  • @karannchew2534
    @karannchew2534 2 роки тому

    Hyperactive-hands

  • @luckyshadowtux
    @luckyshadowtux 3 роки тому

    Did anything ever come of the SQL labeling?

  • @pincopallo9551
    @pincopallo9551 3 роки тому

    Legit! Thank you Mr. Marley!

  • @dracleirbag5838
    @dracleirbag5838 3 роки тому

    This is great can I use it for a nested model? The flam package does it do clustered data?

  • @azurewang
    @azurewang 3 роки тому

    it's a pity that such good presentation got only few thumbs up

  • @KarlVonBismark
    @KarlVonBismark 3 роки тому

    This man almost makes pee, he can't speak with other people in this world

  • @niloofarjazayeri972
    @niloofarjazayeri972 3 роки тому

    Thank you for this

  • @fexm1
    @fexm1 3 роки тому

    Really nice Video. Good explanation to get a deeper understanding.

  • @rahul5959able
    @rahul5959able 3 роки тому

    not a good explanation !

  • @chiriviscospower
    @chiriviscospower 3 роки тому

    Legalize ranch

  • @NaqiAli
    @NaqiAli 3 роки тому

    J Cole

  • @leif1075
    @leif1075 3 роки тому

    Does anyone actually understand this?

    • @osemudiame123
      @osemudiame123 3 роки тому

      Yes

    • @leif1075
      @leif1075 3 роки тому

      @@osemudiame123 Must have a really heavy stats background because otherwise seems impenetrable..

    • @osemudiame123
      @osemudiame123 3 роки тому

      Leif i have a masters in data science and a bachelors in electrical engineering. My stats background is mediocre at best. And I understand most of what is being said. She's a very clear teacher. This isn't some next level incomprehensible stats. Most stats BSc’s would understand this and a lot more than I would.

    • @leif1075
      @leif1075 3 роки тому

      @@osemudiame123 well maybe you need that bsckground then because this is pretty technical andninthink hard for most people even with general science background..

    • @jerrylin6297
      @jerrylin6297 3 роки тому

      @@leif1075 Wouldn't say BSc's can understand this material fully. But graduates from a MSc Statistics program should be able to grasp the full idea.

  • @allanben1397
    @allanben1397 3 роки тому

    I would recommend skimming his original paper on K-Means||, and then you'll have a easier time understanding the video: theory.stanford.edu/~sergei/papers/vldb12-kmpar.pdf

  • @MSuriyaPrakaashJL
    @MSuriyaPrakaashJL 3 роки тому

    Is this how you speak at Stanford, like a metro train running without a stop??

  • @Genet665
    @Genet665 3 роки тому

    Great lecture! *አመስግነናል! ቲሸርቱ <3

  • @campbellhutcheson5162
    @campbellhutcheson5162 3 роки тому

    It's a joy to listen to this lecturer!

  • @ambarishkapil8004
    @ambarishkapil8004 3 роки тому

    First time in my life, I had to reduce the speed of the video to make out even vaguely what he is saying.

  • @fanalysis6734
    @fanalysis6734 3 роки тому

    This guy seems smart but doesn't seem to put a lot of effort to make people understand

    • @KarlVonBismark
      @KarlVonBismark 3 роки тому

      This man almost makes pee, he can't speak with other people in this world

  • @khamishoufar1019
    @khamishoufar1019 4 роки тому

    Please ; Why they often put : CT *1=0 as a constraint in binary clustering; knowing that C is a centroid matrix and G is indicator matrix and B is our binary data matrix.

  • @anandeswaranful
    @anandeswaranful 4 роки тому

    This guy seems to be in love with himself.

  • @shouryacool
    @shouryacool 4 роки тому

    VIT, university have us your video as an DIGITAL assignment to summarize. You are getting 60 views from India because our faculty didn't take our hand written assignment a as DA. Sleep on that.

  • @evanm31
    @evanm31 4 роки тому

    great lecture, thank you.

  • @hamidfazli6936
    @hamidfazli6936 4 роки тому

    It was a really good talk about communities in networks. thanks a lot!

  • @biswaranjanmishra722
    @biswaranjanmishra722 4 роки тому

    It is really difficult to understand what he is telling. I feel as if he is murmuring and he only understand what he is telling. Worthless video.

    • @HarshSharma-mp7zf
      @HarshSharma-mp7zf 4 роки тому

      bro improve your english, learn to Appreciate

    • @MSuriyaPrakaashJL
      @MSuriyaPrakaashJL 3 роки тому

      @@HarshSharma-mp7zf Well delivery is important dude

    • @HarshSharma-mp7zf
      @HarshSharma-mp7zf 3 роки тому

      @@MSuriyaPrakaashJL yes it is, but there is no point in criticizing someone not related :)

  • @HabibKarbasian
    @HabibKarbasian 4 роки тому

    On 10:20 when he is explaining X set, it is completely unclear that what it is and I have a hard time to understand it even though he says something that I don't understand. He has the same issue with the Theorem slide on 12:30 where "e" has not been explained. The slides for algorithms are extremely insufficient to understand and there are many assumptions that you should know and his explanation doesn't clear things up either.

  • @JeremyHelm
    @JeremyHelm 4 роки тому

    Please enable those playback features you have disabled here, namely playback speed control.

    • @JeremyHelm
      @JeremyHelm 4 роки тому

      For whatever reason additionally, this video always starts from the beginning if I switch away from the app while it's on pause

  • @liuauto
    @liuauto 5 років тому

    funny to see Google auto subtitle fails...

  • @yaomingzhu5698
    @yaomingzhu5698 5 років тому

    I found sildes at web.stanford.edu/group/mmds/slides2012/s-park.pdf , hope it can help

  • @theman71ful
    @theman71ful 5 років тому

    Damn J.Cole!

  • @ericjacobsen6901
    @ericjacobsen6901 5 років тому

    Love the idea of the "Normalized Cut Prime", we will only have more need for this in the future to distribute learning. @22:00, Using a matrix of similarity weights. (O(n^2) space and time!) But then she goes on to solve using LSH (Locality Sensitive Hashing). Another approach is a vantage point tree in metric space, for O(n log n) time and O(n) space. (E.g. Brin "Near Neighbor Search in Large Metric Spaces".)

  • @anishdey1189
    @anishdey1189 5 років тому

    how do your determine the complexities for the dimsum algorithm? Please explain

  • @crzyprplmnky
    @crzyprplmnky 5 років тому

    It's quite hard to tell where Haesun is pointing on the upload, but the lecture is excellent, thank you for putting it up.

  • @webdeveloper42
    @webdeveloper42 6 років тому

    This is some heavy math

  • @channel-ug9gt
    @channel-ug9gt 6 років тому

    Uhh, a bit ... all over the place. First of all, how do spin glasses relate to DNNs ? How does the no free lunch idea come into this picture ? It must be somewhere. Whats the difference between deep and shallow networks here ? What is exactly the question that you try to answer here ? What does it mean that deep learning is good ? Good for what ?

    • @frun
      @frun 2 роки тому

      There is a paper explaining what deep learning is good for versus shallow neural network.

  • @rakka1dude184
    @rakka1dude184 6 років тому

    my work involves just keeping the dimensions, its rather insane running kth near neighbour with every pixel as a separate dimension. BUT IT IS POSSIBLE!! but this definitely looks like a more sane idea.

  • @variousmentalproblems
    @variousmentalproblems 6 років тому

    *absolute legend*

  • @FariborzGhavamian
    @FariborzGhavamian 6 років тому

    Very interesting talk.

  • @randomdude6205
    @randomdude6205 6 років тому

    Link to Local Graph Clustering package on pypi: pypi.python.org/pypi/localgraphclustering, and on GitHub: github.com/kfoynt/LocalGraphClustering