#71: Scikit-learn 68:Supervised Learning 46: Intuition- brute force, KD & Ball tree

Поділитися
Вставка
  • Опубліковано 4 січ 2025

КОМЕНТАРІ •

  • @TommygunG
    @TommygunG Рік тому +1

    For Ball tree, you also would also need to scan through every point in region 7 as well as region 7 is a leaf node. For example, this would matter if a point (7,6) existed.

    • @learndataa
      @learndataa  Рік тому

      Thank you for your comment. I appreciate your engagement and your point about the Ball tree algorithm. Yes, if additional points existed they would have to be considered during the search process.

  • @BalaMurugan-cm6ev
    @BalaMurugan-cm6ev 3 роки тому +3

    Clearly Explained !!!

  • @Astro_Rohan
    @Astro_Rohan 2 роки тому +1

    Thank you for the awesome video. I was able to use this to actually speed up distance between elements from each other in a very large dataset (6418). Would you know why computing all distances between all the data points is faster than brute force since it's still finding distances between each pair just like brute force?
    Your channel is very underrated. Should have so many more subscribers for such quality content!

  • @BalaMurugan-cm6ev
    @BalaMurugan-cm6ev 3 роки тому +1

    Sir How KD or Ball tree using in DBscan clustering

  • @sand9282
    @sand9282 Рік тому +1

    Please write the conditions while you split the trees. Additionally, I would suggest using letters like A, B, C, or C1, C2, C3 to designate circle names. Writing numbers while designating circles can confuse first-time learners about their actual meaning. One more thing, I would like to suggest elaborating a bit more on finding the centroid. While we know that the median is the middle value in a set of data points, it would be helpful to explain how to find the centroid since we will be applying all these concepts to a dataset. Overall, the video is short and nice.

    • @learndataa
      @learndataa  Рік тому +1

      Thank you for your feedback. Hoping below helps.
      Ball tree circle condition:
      - Draw a circle with centroid as its centre with a radius that is the farthest point from that centroid in that half.
      # How to calculate centroid of points: (1, 2), (3, 4), (5, 6), (7, 8)?
      x_cord = (1+3+5+7)/4 = 4.0
      y_cord = (2+4+6+8)/4 = 5.0
      Thus, centroid would be: (4.0, 5.0)
      Try in Python:
      ###############
      # Calculate centroid
      ###############
      def calculate_centroid(points):
      x_sum = 0
      y_sum = 0
      num_points = len(points)
      for point in points:
      x_sum += point[0]
      y_sum += point[1]
      centroid_x = x_sum / num_points
      centroid_y = y_sum / num_points
      return centroid_x, centroid_y
      # Example
      points = [(1, 2), (3, 4), (5, 6), (7, 8)]
      centroid = calculate_centroid(points)
      print("Centroid:", centroid)
      -- -- -- Output of the code would be: -- -- --
      Centroid: (4.0, 5.0)

  • @RanjitSingh-rq1qx
    @RanjitSingh-rq1qx 2 роки тому +1

    Sir which one is best. And you should explain this thing. How we can choose one of them based on the scenario?

    • @learndataa
      @learndataa  Рік тому +1

      Thank you. According to the scikit-learn docs:
      Note:
      N = number of samples;
      D = number of features (i.e. dimensionality)
      > Brute Force:
      - fast computation
      - scales as O[DN2]
      - for small datasets (N < 30 or so)
      > K-D Tree
      - address computational inefficiencies of brute-force
      - these structures attempt to reduce the required number of distance calculations by efficiently encoding aggregate distance information for the sample
      - scales as O[DN log(N)]
      - very fast for low-dimensional (D < 20) neighbors searches
      - it becomes inefficient as grows very large
      > Ball Tree
      - To address the inefficiencies of KD Trees in higher dimensions
      - scales as O[D log(N)]
      Link: scikit-learn.org/stable/modules/neighbors.html#nearest-neighbor-algorithms

  • @RanjitSingh-rq1qx
    @RanjitSingh-rq1qx 2 роки тому +1

    Very well explained. ❤️

    • @learndataa
      @learndataa  Рік тому

      Thank you. Glad it was helpful.

  • @EvgenyMozolev
    @EvgenyMozolev 2 роки тому +1

    Great explanation!

  • @BalaMurugan-cm6ev
    @BalaMurugan-cm6ev 3 роки тому +1

    Sir how to find the centroid after we split the data using median

    • @BalaMurugan-cm6ev
      @BalaMurugan-cm6ev 3 роки тому

      @@learndataa ((x1+x2+x3+...xn)/n,(y1+y2+y3+...+yn)/n)

  • @cristhiancastro7984
    @cristhiancastro7984 2 роки тому

    Hi, very good video! I have a doubt! In Ball Tree, when calculating the centroid and then the farthest 2 points? How we came up with this farthest point? Is it sort of bruce force for this?

    • @lukalicina8230
      @lukalicina8230 2 роки тому

      ​@@learndataa Yeah, I mean you'd have to compute the distances at every iteration for that set and then the set becomes divided? But what after the iterations after that, i think for the first run through while you assemble the tree it will take the same time as a for-loop? But after you have the tree inserting and comparing every new input will only be as simple as finding the nearest set point and computing the few distances inside the set/node/leaf. You're technicly on the test-data not on the 'training data' better called the knowledge base or corpuse. I may be wrong tho, let me know if this makes any sense or if someone really knows the correct answer? Seems really interesting!

  • @hadyshaaban1221
    @hadyshaaban1221 Рік тому +1

    Great , thanks