Kafka Tutorial - Exactly once processing

Поділитися
Вставка
  • Опубліковано 18 гру 2024

КОМЕНТАРІ • 56

  • @ScholarNest
    @ScholarNest  3 роки тому

    Want to learn more Big Data Technology courses. You can get lifetime access to our courses on the Udemy platform. Visit the below link for Discounts and Coupon Code.
    www.learningjournal.guru/courses/

  • @MaheshSingh-ev8yh
    @MaheshSingh-ev8yh 4 роки тому

    Hi Sir,
    really become a big fan of you. The way u r explaining each concept, r up to the marks 5/5. Short videos and u r categorizing them. Are r excellent.I was not expecting this when i got u r link. I was looking kafka with c# for micro-services but ur videos have given me a lot clear idea about it.

  • @max9260712
    @max9260712 4 роки тому

    Thank you for your detailed videos. I am new to the channel and hope to come here more often. I have a bit of difficulty understanding the problem statement here , If you could please help
    at 5:11 where you are explaining how storing into the DB and adding offSet to rebalanceListener are not atomic and this is problem. If the consumer crashes , lets say just after storing into the database, then even if the RebalanceListner is triggered it is unable to commit this particular offset ( The record just stored in DB) to Kafka. Reason being our method call .addOffset did not occur. Is my understanding correct?

  • @praveenkumar-oy5zt
    @praveenkumar-oy5zt 6 років тому

    your way of teaching is awesome..

  • @gopinathGopiRebel
    @gopinathGopiRebel 7 років тому +1

    how do we know how many partitions to assign to a particular topic ?
    what is the default length of partitioner in kafka ?

  • @lytung1532
    @lytung1532 2 роки тому

    Thanks for this tutorial. I am a fan of yours in Udemy.

  • @neil3507
    @neil3507 6 років тому +2

    Is this a way to achieve exactly once semantics in kafka?

  • @SauravOjha94
    @SauravOjha94 4 роки тому

    Hi Sir. Excellent explanation. Just one doubt, since this is not a case of auto commit, don't you think we have forgotten to commit the offset to kafka?

  • @VaibhavPatil-rx7pc
    @VaibhavPatil-rx7pc 4 роки тому

    Excellent explained !! thanks you !!

  • @yog2915
    @yog2915 4 роки тому

    very nice cleared alot of things

  • @DineshKumar-by4sk
    @DineshKumar-by4sk 7 років тому

    Excellent and crisp explaination.

  • @vitinho0610
    @vitinho0610 4 роки тому

    Hey sir,
    Thank you once again for your excellent tutorials!
    I may have have one doubt:
    1 - If this consumer dies, will Kafka redistribute the TSS partitions for other consumers? if so, how will the other consumers know where the commited offset stands?

  • @theashwin007
    @theashwin007 8 років тому +1

    Hi I have one doubt. Consider there are two different groups of consumers. And say both the groups subscribed to same topic. Now, how does the Kafka stores these offsets (commit offset & read offset)? I mean whether it stores it per consumer group?

    • @ScholarNest
      @ScholarNest  8 років тому +1

      Kafka maintains current offset & Committed offset per consumer. However, rebalace happens at consumer group level.

  • @akhilanandbenkalvenkanna5057
    @akhilanandbenkalvenkanna5057 7 років тому

    Do we use MYSQL db in real time project as well?? Are there any performance issues with using relational DB??

  • @reachmurugeshanm7750
    @reachmurugeshanm7750 3 роки тому

    Hi Sir.. I have one doubt,.. You have explained in this video one consumer with multiple custom partitions but if my requirement is multiple consumer with multiple custom partitions, in this case wht would be the code snippet.. And if one consumer crashes when process the message, how partitions takes away from consumer1 and assign to consumer 2.
    Do we need to handle any exception when consumer crashes?

  • @gauravluthra7959
    @gauravluthra7959 6 років тому

    Great Explanation. One doubt, Suppose I want exactly once processing and the consumer is of same type as you wrote in this example, where I write data and offset in database with single commit. But I want to use a group of consumer instead of only one consumer. Then how will it do exactly once processing? (My doubt, if we have three consumers and C0 is reading from P0 and C1 is from P1 and C2 from P2. Then if C0 gets down/killed, and never run again. Then data from P0 will never get read. Can we solve this problem with exactly once?)

  • @robind999
    @robind999 7 років тому

    HI LJ,
    I struggled with kafka-mongodb-sink connector setups,
    github.com/startappdev/kafka-connect-mongodb
    Seemed it needs curl to convert mongodb configuration file(json file) to xml(need add header too). ... need modify httpd.config file to open port and still could not upload file through curl on the localhost etc.
    By watching your demo, the process is fully monitored, if I use this kafka connector, I just dont know how to monitor my process, especially the partition part.
    so question to you, instead of using kafka-mongodb-sink connector, can I use your similar code to sink kafka-mongodb?
    please advise, yours is the most advanced detailed kafka demo so far.
    Thanks,
    Robin

    • @ScholarNest
      @ScholarNest  7 років тому

      You can always write your own code to sink. However, it may be convenient to use a connector. Unfortunately, there is no certified connector for MongoDB yet. Check this link www.confluent.io/product/connectors/
      There are 4 Mongo DB Sinks listed. I never tried any of them, but you can give it a try. One of them should be mature enough.

    • @robind999
      @robind999 7 років тому

      Thank you so much for your quick feedback,
      I just found a spark code to sink data to mongodb, since you told me there is no certified connector for mongodb yet, so I will give this a try as following:
      rklicksolutions.wordpress.com/2017/04/04/read-data-from-kafka-stream-and-store-it-in-to-mongodb/
      how you think about this link?
      Confluent involved another tool installation, and I still dont find use case of this. only find one to pull out data from mongodb to kafka.
      thank you so much my mentor.
      Robin

  • @rbsood
    @rbsood 4 роки тому

    hi Learning Journal - Have a questoin ? I have a kafka log retention policy based on size. So if the size is 1 gig kafka will delete the log. How can i make sure that kafka does not delete the log if Consumer has not finished reading all messages ? In other words kafka should delete the log only when consumer's current offset is same as the latest offset in the log. Does kafka do this automatically or is there some manipulation thats needed ?

  • @cellisisimo
    @cellisisimo 8 років тому

    Excellent video!! What if, after updating the first table with data, the consumer fails before updating tables with offsets. In this case, the same data will be processed twice, won´t it?

    • @ScholarNest
      @ScholarNest  8 років тому +2

      No, The data in the table is not permanent until we execute commit. The commit is the last statement after insert and update both.

  • @JoaoGomes-ff2pz
    @JoaoGomes-ff2pz 7 років тому

    There is no Rebalance Listener in this example.
    What happens if you more than one costumer like via subscribe, one of them received 100 records and after processing and saving 50 records a rebalacing is initialized? The offsets in kafka in will be stored as the actual commited offsets and the next consumer assigned to that partition will receive the data from the beginning?

    • @ScholarNest
      @ScholarNest  7 років тому +1

      Good question. When we are not using Automatic group management (Like in this example), There is no rebalance activity. Kafka can't rebalance because there is no group in this case.

    • @JoaoGomes-ff2pz
      @JoaoGomes-ff2pz 7 років тому

      Oh cool ! didn't notice that you aren't using any group. Thank u !

  • @nawaz4321
    @nawaz4321 6 років тому

    very nicely explained, big thank you

  • @Prabhatkumardiwaker
    @Prabhatkumardiwaker 6 років тому

    Hi,I have one question. Why did consumer application consumed 10 records in 2 diff polls. i.e. 6 in 1st poll and 4 records in 2nd poll. It could have got all 10 records in 1 poll as message were already available in topic.
    Thanks in Advance

  • @lonelybard19
    @lonelybard19 7 років тому

    Hi. In this example you didn't have parallel processing because one single consumer assigned the 3 partitions to itself. How would I achieve "exactly once" processing in a scenario with multiple consumers? I could give each consumer an ID and have a table in the external database to store which partitions should be assigned to each consumer, but then I would have to perform the rebalance myself, which could be some hard work :(

    • @AmitITpartner
      @AmitITpartner 6 років тому

      Answer to your question "How would I achieve "exactly once" processing in a scenario with multiple consumers? "is by implementing multiple consumer within a consumer group. Advantage of this is unique data fetch by each consumer. Hope this helps.

  • @glt123
    @glt123 7 років тому

    Can producer send messages during the rebalance is happening ? Or the Kafka Producer will get exception during the rebalancing process...

    • @ScholarNest
      @ScholarNest  7 років тому

      Rebalance is an activity for the consumer group. It has nothing to do with a producer.

    • @glt123
      @glt123 7 років тому

      Okay... When a new partition is added to a topic then how does Producer starts sending the message to new partition?

    •  7 років тому

      I don't think you can add a partition in "real-time". You have to specify them when you create the topic.

  • @singhsankar
    @singhsankar 6 років тому

    where do we commit kafka processed message? , we do commit only mysql(db) connection.

    • @ScholarNest
      @ScholarNest  6 років тому

      The idea is to make a single transaction to commit after processing the message and the offset number.

  • @kumarvairakkannu360
    @kumarvairakkannu360 8 років тому

    on poll() first time 6 records, second time 5 records, etc..- Curious how Kafka decides how many records to pull? default max.poll.records=2147483647, is it random below the max poll limit?

    • @ScholarNest
      @ScholarNest  8 років тому +2

      The poll method will try to give you as many as it can within the various limits specified by you. The max.poll.records is one of them (default 500). The timeout parameter passed to poll method is another such limit.

  • @HollyJollyTolly
    @HollyJollyTolly 7 років тому

    Hi sir,
    What is the difference between high level consumer and low level cnsumer

    • @ScholarNest
      @ScholarNest  7 років тому

      That's an outdated concept. Old Kafka API used to have high-level consumer, but new Kafka API doesn't have such concept. I cover new API since the old one is not supported now.

  • @4ukcs2004
    @4ukcs2004 6 років тому

    Great video.Sir I need a reply. I have a kafka topic which contain jobname filed.using consumer when I read the topic with jobname those jobnames should get triggered and start running.it looks to be event triggering or event driven.any link or snippet would help.How do I take care this part.Pls help

  • @hugodeiro
    @hugodeiro 6 років тому

    Very good. But it would be nice if you provide the code in somewhere like Github...

    • @ScholarNest
      @ScholarNest  6 років тому +1

      It is already there in github
      github.com/LearningJournal/ApacheKafkaTutorials

  • @madhuthakur2523
    @madhuthakur2523 5 років тому +2

    This will make consumption super slow

    • @ScholarNest
      @ScholarNest  5 років тому +2

      This method is obsolete. Kafka streams has got better options.

    • @humanGenAI
      @humanGenAI 2 роки тому

      @@ScholarNest any link?

  • @KajalSingh-og7fk
    @KajalSingh-og7fk 3 роки тому

    why is setAutommmit to false.. it should be true right? Am I missing something?

  • @somethingbig8072
    @somethingbig8072 7 років тому

    how to send different data to different consumer from single topic

    • @ScholarNest
      @ScholarNest  7 років тому

      The answer to your question is in the videos. Watch the full playlist.

  • @sujeeshsvalath
    @sujeeshsvalath 6 років тому

    "exactly once" processing have been incorporated now built in starting from Kafka 0.11 version. The concept is the same explained in this video. Please refer www.confluent.io/blog/exactly-once-semantics-are-possible-heres-how-apache-kafka-does-it/ to enable "exactly once" processing in Kafka

  • @reachmurugeshanm7750
    @reachmurugeshanm7750 3 роки тому

    I am a big fan of you Sir, the way of your explanation is awesome.
    Could you pls share with me your mail id for cpmmunication and clarify my doubts.

  • @learn9475
    @learn9475 2 роки тому

    please check if kafka-streams, kafka-transactions solve your issues
    since they have been release in 2017 NOV