Link to Kafka series: ua-cam.com/play/PLaz3Ms051BAkwR7d9voHsflTRmumfkGVW.html Link to Data Lake video: ua-cam.com/video/DLRiUs1EvhM/v-deo.html Link to data lake GitHub repo: github.com/hnawaz007/pythondataanalysis/tree/main/data-lake Link to Kafka GitHub repo: github.com/hnawaz007/pythondataanalysis/tree/main/kafka
Love how Kafka is turning into a datalake now that you can have unlimited retention at very cheap cost (KIP-405). This means that you can reduce data movement by bringing analysts directly where the data was ingested. This opens a plethora of new data sources and much greater volume of data available for ad hoc analysis. We can finally say bye to the cost, complexity and consistency issues associated with heavy ELT/ETL processes!
Hi! Thank you for the video - it's great explanation as always on your channel! I have a questions. I have similar task starting from kafka and now I'm using iceberg/dremio/nessie stack for storing the data from your previous video. Here you have added hive - could you explain what's benefits of using hive with or instead of stack from your previous data lakehouse guide. Thanks!
Thanks. Nessie catalog is feature rich, Git integration, and I personally like it over Hive. However, dremio is a cloud native and offers an open source option. So I'd read the fine print and what's allowed commercially. That's the only catch. Otherwise, your setup optimal for streaming and storage. This implementation is fully open source and you can deploy it commercially. Both options offer similar capabilities and I will cover more as the current connector limits the iceberg's ACID capabilities. More to come on that.
Link to Kafka series: ua-cam.com/play/PLaz3Ms051BAkwR7d9voHsflTRmumfkGVW.html
Link to Data Lake video: ua-cam.com/video/DLRiUs1EvhM/v-deo.html
Link to data lake GitHub repo: github.com/hnawaz007/pythondataanalysis/tree/main/data-lake
Link to Kafka GitHub repo: github.com/hnawaz007/pythondataanalysis/tree/main/kafka
Love how Kafka is turning into a datalake now that you can have unlimited retention at very cheap cost (KIP-405). This means that you can reduce data movement by bringing analysts directly where the data was ingested. This opens a plethora of new data sources and much greater volume of data available for ad hoc analysis. We can finally say bye to the cost, complexity and consistency issues associated with heavy ELT/ETL processes!
Thank you for the vídeo, excelent explanation.
great project, congrats. keep going on
Thanks. More to come on this.
Hi! Thank you for the video - it's great explanation as always on your channel!
I have a questions. I have similar task starting from kafka and now I'm using iceberg/dremio/nessie stack for storing the data from your previous video. Here you have added hive - could you explain what's benefits of using hive with or instead of stack from your previous data lakehouse guide. Thanks!
Thanks. Nessie catalog is feature rich, Git integration, and I personally like it over Hive. However, dremio is a cloud native and offers an open source option. So I'd read the fine print and what's allowed commercially. That's the only catch. Otherwise, your setup optimal for streaming and storage. This implementation is fully open source and you can deploy it commercially. Both options offer similar capabilities and I will cover more as the current connector limits the iceberg's ACID capabilities. More to come on that.
🔥🎆😍
i subscribe , send my phone to Bangladesh, Uttara, Sector#13, Road#18. I prefer macbook