High Quality, High Performance Clustering with HDBSCAN | SciPy 2016 | Leland McInnes

Поділитися
Вставка
  • Опубліковано 5 вер 2024

КОМЕНТАРІ • 20

  • @DouglasDuhaime
    @DouglasDuhaime 4 роки тому +3

    Leland is truly a gentleman and a scholar

    • @kevon217
      @kevon217 2 місяці тому +1

      Been on a Leland yt binge as of late, saw this comment, and truly agree.

  • @lelandmcinnes9501
    @lelandmcinnes9501 8 років тому +12

    Thanks to the great people at conda-forge hdbscan is now available as conda packages (which is by far the easiest way to install it).
    conda install -c conda-forge hdbscan

    • @zwitter689
      @zwitter689 7 років тому

      Thanks, very nicely done. I installed hdbscan and am trying to mimic the examples you give but I can't find the data for the example on "Getting More Information About a Clustering". I like to follow the examples exactly so a copy of the actual data set you used would be great, can you help me with this?

    • @lelandmcinnes9501
      @lelandmcinnes9501 7 років тому

      It's in the github repository with the notebooks:
      github.com/scikit-learn-contrib/hdbscan/blob/master/notebooks/clusterable_data.npy

    • @zwitter689
      @zwitter689 7 років тому

      Thank you and especially for the quick response.

    • @chengchu88
      @chengchu88 6 років тому

      Dr McInnes,
      thanks for the great video.
      I am using the HDBSCAN on a large dataset, and I know how to set 'memory' parameter to cache the hard computation.
      My question is, after I cache the computation during fitting, how do I change the min_cluster_size and min_sample_size and re-label the same data without going through the time-consuming fitting again? Could you provide a few sample python lines?
      thank you,
      Cheng

  • @elivazquez7582
    @elivazquez7582 6 років тому +3

    Great video! Great presentation - thanks for doing this!

  • @enthought
    @enthought  8 років тому

    More info on HDBSCAN here: github.com/lmcinnes/hdbscan.
    See the complete SciPy 2016 Conference talk & tutorial playlist here: ua-cam.com/play/PLYx7XA2nY5Gf37zYZMw6OqGFRPjB1jCy6.html

  • @rajeshbalakrishnan2228
    @rajeshbalakrishnan2228 4 роки тому

    Wowwww!! One of best clustering discussion

  • @shyamsbox
    @shyamsbox 6 років тому

    Very nice! We will try HDBSCAN.

  • @grygoriyzolotarov3228
    @grygoriyzolotarov3228 6 років тому +2

    What is the font you use in your presentations (very appealing)?

  • @wexwexexort
    @wexwexexort 3 роки тому

    great talk!

  • @rednax3788
    @rednax3788 7 років тому +2

    HDBSCAN IS KING

  • @Marin-ct5my
    @Marin-ct5my 3 роки тому

    HDBScan seems to be capable of producing clusters which share overlapping nodes, given that clustering for me is to identify shared points between clusters, what would I have to do to the algorithm to get those? I was surprised when nobody had a question about this and there was nothing said about it despite it being a possible feature of the algorithm.

  • @karthik-ex4dm
    @karthik-ex4dm 5 років тому

    Great video...Since clustering cannot do better in high dimension space, the pair wise distance matrix should be fine if we are working in high dim spaces..right? but even computation of pairwise distance will also be computational expensive for very high dimension space right?. So the best choice must be finding best features using something like forward feature selection and then perform hdbscan. right?

  • @andrewdennis6976
    @andrewdennis6976 6 років тому +1

    I am running your example code to just play around and keep getting an error.
    TypeError: descriptor 'get_metric' requires a 'hdbscan.dist_metrics.DistanceMetric' object but received a 'str'
    unfortunately there is not much documentation on this so its hard to find fixes. Any help?

  • @jennifermew8386
    @jennifermew8386 7 років тому +1

    how do you identify noise in HDBSCAN ? how do the algorithm tell the difference between outliers and noise?

    • @ashishkannad3021
      @ashishkannad3021 6 років тому +2

      the ones which are not clustered in any cluster are our noises!

  • @KeshavDial
    @KeshavDial 4 роки тому +3

    For anyone who was looking for Christian Hennig's PyData talk ua-cam.com/video/Mf6MqIS2ql4/v-deo.html