Checking UA-cam and found this video. By far one of the best videos I found around the topic multi-stage docker builds. @DevOpsToolkit thank you for keeping it simple and straight.
Thank you Viktor, always great to refresh on the fundamentals. I've had to explain multistage builds to developers recently, your vid would have been of great help, bookmarked for later :)
We use multi-stage builds on Golang projects and for compiling JavaScript static content. But sometimes come node js developers with dependencies up to 1Gb that needed in runtime.
I use multi-stage builds in a similar way for my Python apps. I like to use python:3-buster (~850MB) as the build image and then python:3-slim-buster (~115MB) as the final image. This allows things (e.g. cffi) to compile during the build w/o the bloat needed in the run.
Thanks for the wonderful explanation. I have one question; there are multiple docker image optimization techniques and one of them is using multi-stage build. Can you suggest why should anyone use any other technique (choosing minimal base image, minimize layer size and count, cache image layers, etc) when they are using multistage which eventually reduce image size and layers ?
It does not matter what you do in all the stages but the last. Build binaries, run tests, etc. If possible, the last stage should be based on scretch image. If that's not an option, use chainguard, alpine, or busybox as the base image of the last stage. Inside that stage, add only things that are needed (e.g. the binary) abd nothing else. As for your question... Multi stage builds are not really about optimization. The final stage is the image and those before it are just a convenient way to do something inside docker as opposed doing it outside. If, for example, you compile the binary outside docker and just add it to the image the end result will be the same.
@@DevOpsToolkit Thanks for taking time to answer. You are a gem. Just one thought again on same point - would you prefer to run the optimization (as mentioned above) in previous layers or developers should choose bulky image to finish the work and then always use lighter image for the final stage.
@RVRead I think that the two options you described are the same (or I misunderstood). You can use any image of any size in previous layers, assuming that by previous layers you mean previous stages. As long as the last stage is based on no image or a very small image you should be fine. What happens before the last stage is not that important.
Great video as always. Is there a way you can catch the Unit Test error and feedback to your CI system, where you can clearly tell the user the Unit Test failed? As not all developers are willing to dive into the Jenkins (or other CI tools) and look into the logs, would be nice if there's a way to clearly identify it as a "unit test" error.
Thank you, very useful video! For languages that require runtime like Java would you have any suggestions on how to best split the the image build? Is there something better we can do other than building on an image with jdk and running on image with jre?
I haven't been working with java for a while now so I cannot have a concrete answer. If you asked 5 years ago, I would say "use jre in the final image or create executable (fat jar)". However, I'm sure a lot has changed in the java world since then. As a general rule, the final image should have only what you need, and nothing else. It should be minimalistic. Whether that is jre or something else I cannot say.
Thank you for the video. When using multi-stage build we don't rely on application artifacts and artifacts registry (jar and nexus). Is it a good solution to have only docker images as "artifacts"?
That depends. If you are building a CLI or a desktop app, or a library artifacts/releases need to be binaries. For everything else (apps running in clusters), I do not see a good reason for anything but images. That's all you really need. Those images contain the artifacts inside them.
I've been testing for the past couple hours and kept running into the issue where alpine would not execute the go binary that was created. The solution was to use **FROM debian** instead of *FROM alpine*. Just a heads up for anyone else that runs into this problem. "This is because golang is not alpine base but debian base."
Spending time with multi-stage dockerfiles sure can be fun, but I think it can be really tedious and really error prone work, especially if you have to deal with project where multiple programming languages are involved. Instead of guessing which base image to use, and what steps to choose, you already are spending precious hours just to get stuff nearly up and running. This is why I like the idea behind Cloud Native Buildpacks project. What do you think about it Viktor? BR, -Admir
Ah, yes, this is another request coming up! If you are not too averse to Ruby (why, btw?) - could you please review CNBs in comparison with the multi stage docker builds? And use some example in Python with lots of dependencies! :-) (perhaps requiring C/C++ compilation - for example using cppyy framework at cppyy.readthedocs.io, if its not too much to ask)
Hi Viktor, multi-stage builds are an awesome feature! However, I struggle when it comes to import unit tests / code coverage results, into the CI/CD tool. Generally, an XML JUnit file is generated when tests are executed, and this file can be uploaded using a built-in task from the CI/CD tool. However, this task cannot be used from the Dockerfile, so currently, the only workaround I found is to run the tests independently in the pipeline and import results outside the Dockerfile. Another way could be to copy the report file from the container to the agent running the pipeline, but as we use multi-stage builds, the stage where tests are executed is deleted once the final image is built. Do you have any suggestion to solve this kind of problem?
If you need artifacts store in pipeline tools, multi stage is not a good option. That being said, I do not think that any type of artifacts should be stored in pipeline tools. They should be in registries or repositories. For convenience, pipelines might need to show those artifacts, but not to store them. Even in the case of visibility-only type of usage of those artifacts/results, pull request tends to be a much better place.
too bad you didn't mention that the --target flag, you can define dozens of images in the dockerfile with arbitrary requirement chains yet build a single image and only its requisite stages. lazy evaluation for the win
SIR PLEASE MAKE VEDIOS THESE ================K8 realtime Q========================interview questions varous source---mny source friend company helpall----------------------------------------- Most important Kubernetes 7-8 topics as per recent competition and live interview , these questions asking regular/product service based concept istio , what is service mesh very imp , ingress controller nginx why v imp? fluend for logs , prometheus grafan how intall monitor k8 , helm why? how?, gitops flow explanation using argo operator vimp .( helm and prome.. in all projects 100 prcnt questions ), opemshift adavmatage? concept of role based access security , service accounts very imp , meaning of manifest. meaning of policy in K8 , pod lifecycle what stages like pending how check stages? how createing aws cluster? tell AKS azure Eks aws infra setup what advantage tomorow docker engine support gone? explain in detail db flow --how setup master how many master how many slaves tell process of setup master salves for db . how u taking incremental backup database say perfona tool and how normal db backup example data at transit or rest S3? backuprestore frequency? startegy deploy canary bluegreen warmstandy?, secuirty service /frameork project asked atleast 30 min asked secuirty any public cloud ? what canary ,blue green, stdby , rollback restor? explain in detail helm , (full dns process load balmcer controller what happen i enter cnn.com all steps )? dev says api not work what u do? tell full process how integrate prometheus grafana if prometehus fail u use ELK ,log infra atlest? tell top 15 metrics in detail observe prmotheus as action will be based on that . how you decide autosaclig let me know common metrics tehse serevrs u need in big infra? hw many volumes and how mny port mapped ?explain in general in detail port forwarding ? what is sql proxy explain schema why we use? create bastion host reltime db secure create showme? rverse proxy implementation details? when proxy when reverse proxy difference? how you implement bastion host and port forward it? ===================================================================================================================================================== Terraform realtime concept asking- State file , global S3 state local state , taint untaint , import ,(how sync manual machine by mistake creation and the current state), using cache terraform fast process. updt state terraform manual cretedresource, how we read other developer state file with your statefile using what terraform command, Json basics, ARM ofazure wahts difference ? why provisoner in terraform ? whats refresh and state pull whats difference? whats import? what target reffred terrafrm? whats self parameter ?can u get automatically all ips proviioned private public , then whats parameter terraform used? state pull ,refresh? syantax env variable Terraform? why workspace if module? ================================================================================================================================================= HELPS ALL linux screenshare question interview zoom>product based-- whats dev/console , dev/null , logger command , logrotate.d , &1 ? !! &&? yum options , sometime rpm we need only rpm how then? suoders file? for many path permission singleuser how ALIAS command? setfacl why? stickybits why when u have chmod? what selinux concept? awk sed find 3 tell 5-10 wys in ur devops ? curl command how u use differnt way docker alo aws also? tee forlogs? exec >> ? netstat tulp? teacroute telnet?nmcli--ip assign?suppose vm with you now home you dont have company access so no net but u need wifi?> /dev/null 2>&1? exec > >(tee /var/log/user-data.log|logger -t user-data -s 2>/dev/console) 2>&1? curl what options u know?what meaning of epel? whats port why ip combined with port why? port is there why service? what ephemral port ? NACL why u need entry 32000 62500 even u have 443 entry at target? whats application port and wahts server ports ranges whats differece? why you need router and route table ? whats destination and target explain route table? asked sharing screen , sar ,log see using tail , boot problem grub , lvm asked details hw u use nfs server ? find tell 10 differnt way ? awk tell 8-10 ways in devops whats cidr whats blocksize basic network , not host but subnet range decide what factor? port? ipv4? forward port forwardip commands ,proxy,why proxy 2types proxy? ================================================================================================================================================================ cloud=========== Any publiccloud common q told .any cloud he told comfortable ---- flow 1 --only dedicated db flow explian --how youcreate bastion--master -slave count -- whats sql proxy for read write , tell basic schema manage devops side---backup sure asking--2 backups incremental [entire backuprestre DR db side] depth, normal asking script an d frequency how take ? security structure of your project in details --30 min mimimum asking service / existing project secuirty architecture. , how u do port forwrd i[ forward bastion? how ip /port/proxy all these asked, sql proxy--atleast basics schema maanged devops side? stateless lab , stateful db lab? entire process Site 2 site told spend some 200-300 rs practise it , aorganization yes like 20 account managing , Active directory- backend--SSO frontend flow lab tell , when NAT when IGW in route? if not NAT whats other flow same requirement? detailed q endppoints ? private endpoint , serverless as he told saves cost so becomes important. on premise jenkins ---aws side flow knowledge .DR , security ,project archtecture asking mainly, boto os atlest 2 module u know? python exprtise atlest basics? why you come here just doing institute labs? how many types VPC connection are there? you never did onpremise --aws coonection?whats VPN you nver configured for ur peers?please configure bastion host here. write servrelss function which undo a secuirty group change by a invader/hacker , in the time i send mail my security gone. whats ingress controler explain properly , DNS flow all steps properly ? whats use case of internal lB and App Lb , internal LB u used sql proxy or db flow? whats messaging queue itso crtical workflow , atleast learn basics API message queue , what happens dev comes API not works what steps? wht is L in doc of S3 why u need versioning? whats cross functional replication with version say you are hacked 100 %now? how u take backup master(database) in 2 availabilty zones right DR tell implemnation? API Flowlogs , lambda , Message queu , SNS pls learn told. complete workflow target maillist group on monitor error 100 % CPU consumed, ELK prome EKS explore workspaces its ok u not do.
I don't usually subscribe after one video, but this was straight to the point, with real life example which is very useful.
I m watching from Turkey. Thank you boss
Thanks!
Thanks a ton.
Checking UA-cam and found this video.
By far one of the best videos I found around the topic multi-stage docker builds.
@DevOpsToolkit thank you for keeping it simple and straight.
What a brilliant tutorial! You are a great man sir. Keep up the good work.
Thank you Viktor, always great to refresh on the fundamentals. I've had to explain multistage builds to developers recently, your vid would have been of great help, bookmarked for later :)
Sweet ,simple and to the point video. Subscribed!
Great video as usual. Thanks Viktor
My pleasure!
Thanks Viktor, for a good reminder on Docker. Always great videos.
amazing minimal illustration
Thank Viktor. I use multi-stage builds for long time.
A video in the future about kubernetes operators (maybe vs helm) will be great.
Adding operators to my TODO list... :)
Excellent video. Thanks for sharing.
Great and simple explanation, what else could you ask for.
Excellent explanation, thank you
This was great thank you for the thorough explanation!
Such a great explanation and so to the point. Thanks a lot.
Excellent explanation thank you a lot Sir!
Amazing video. Thank you for sharing
Absolute gold this video is.
Great video! Thanks!
Really good video, and channel in general. I wish I have found it earlier.
We use multi-stage builds on Golang projects and for compiling JavaScript static content. But sometimes come node js developers with dependencies up to 1Gb that needed in runtime.
Спасибо что делишься своими знаниями!
Amazing content! I have learned a lot with your videos
Great example.
I use multi-stage builds in a similar way for my Python apps. I like to use python:3-buster (~850MB) as the build image and then python:3-slim-buster (~115MB) as the final image. This allows things (e.g. cffi) to compile during the build w/o the bloat needed in the run.
Thanks for sharing
Thanks for the wonderful explanation. I have one question; there are multiple docker image optimization techniques and one of them is using multi-stage build. Can you suggest why should anyone use any other technique (choosing minimal base image, minimize layer size and count, cache image layers, etc) when they are using multistage which eventually reduce image size and layers ?
It does not matter what you do in all the stages but the last. Build binaries, run tests, etc. If possible, the last stage should be based on scretch image. If that's not an option, use chainguard, alpine, or busybox as the base image of the last stage. Inside that stage, add only things that are needed (e.g. the binary) abd nothing else.
As for your question... Multi stage builds are not really about optimization. The final stage is the image and those before it are just a convenient way to do something inside docker as opposed doing it outside. If, for example, you compile the binary outside docker and just add it to the image the end result will be the same.
@@DevOpsToolkit Thanks for taking time to answer. You are a gem.
Just one thought again on same point - would you prefer to run the optimization (as mentioned above) in previous layers or developers should choose bulky image to finish the work and then always use lighter image for the final stage.
@RVRead I think that the two options you described are the same (or I misunderstood). You can use any image of any size in previous layers, assuming that by previous layers you mean previous stages. As long as the last stage is based on no image or a very small image you should be fine. What happens before the last stage is not that important.
@@DevOpsToolkit Thanks again for your expert opinion and kind response. Yes I was pointing to previous layers and now its clear to me :)
Great video Viktor 👍 could you please make video on kubernete operators.
Are you using multi-stage builds or you prefer building, testing, etc. outside Dockerfile?
Great video as always. Is there a way you can catch the Unit Test error and feedback to your CI system, where you can clearly tell the user the Unit Test failed? As not all developers are willing to dive into the Jenkins (or other CI tools) and look into the logs, would be nice if there's a way to clearly identify it as a "unit test" error.
Thanks
Thank you, very useful video! For languages that require runtime like Java would you have any suggestions on how to best split the the image build? Is there something better we can do other than building on an image with jdk and running on image with jre?
I haven't been working with java for a while now so I cannot have a concrete answer. If you asked 5 years ago, I would say "use jre in the final image or create executable (fat jar)". However, I'm sure a lot has changed in the java world since then.
As a general rule, the final image should have only what you need, and nothing else. It should be minimalistic. Whether that is jre or something else I cannot say.
Thank you for the video.
When using multi-stage build we don't rely on application artifacts and artifacts registry (jar and nexus). Is it a good solution to have only docker images as "artifacts"?
That depends. If you are building a CLI or a desktop app, or a library artifacts/releases need to be binaries. For everything else (apps running in clusters), I do not see a good reason for anything but images. That's all you really need. Those images contain the artifacts inside them.
I've been testing for the past couple hours and kept running into the issue where alpine would not execute the go binary that was created. The solution was to use **FROM debian** instead of *FROM alpine*. Just a heads up for anyone else that runs into this problem. "This is because golang is not alpine base but debian base."
Can we use multiple base images in one image?
We can not. An image can have only one base image.
Spending time with multi-stage dockerfiles sure can be fun, but I think it can be really tedious and really error prone work, especially if you have to deal with project where multiple programming languages are involved. Instead of guessing which base image to use, and what steps to choose, you already are spending precious hours just to get stuff nearly up and running. This is why I like the idea behind Cloud Native Buildpacks project. What do you think about it Viktor? BR, -Admir
CNB is great. My only complaint is that it is in ruby.
Ah, yes, this is another request coming up! If you are not too averse to Ruby (why, btw?) - could you please review CNBs in comparison with the multi stage docker builds? And use some example in Python with lots of dependencies! :-)
(perhaps requiring C/C++ compilation - for example using cppyy framework at cppyy.readthedocs.io, if its not too much to ask)
I haven't worked with python for a while. Can you create a sample app I could use?
@@DevOpsToolkit depends what app you are after.? Hint you can also find on github topic python or flask, fastapi, for web 😃
@@TankaNafaka I'll give it a try :)
Hi Viktor, multi-stage builds are an awesome feature! However, I struggle when it comes to import unit tests / code coverage results, into the CI/CD tool. Generally, an XML JUnit file is generated when tests are executed, and this file can be uploaded using a built-in task from the CI/CD tool. However, this task cannot be used from the Dockerfile, so currently, the only workaround I found is to run the tests independently in the pipeline and import results outside the Dockerfile. Another way could be to copy the report file from the container to the agent running the pipeline, but as we use multi-stage builds, the stage where tests are executed is deleted once the final image is built. Do you have any suggestion to solve this kind of problem?
If you need artifacts store in pipeline tools, multi stage is not a good option. That being said, I do not think that any type of artifacts should be stored in pipeline tools. They should be in registries or repositories. For convenience, pipelines might need to show those artifacts, but not to store them. Even in the case of visibility-only type of usage of those artifacts/results, pull request tends to be a much better place.
Ah so the Dockerfile is almost like a Makefile in a sense
It indeed is.
too bad you didn't mention that the --target flag,
you can define dozens of images in the dockerfile with arbitrary requirement chains yet build a single image and only its requisite stages. lazy evaluation for the win
SIR PLEASE MAKE VEDIOS THESE
================K8 realtime Q========================interview questions varous source---mny source friend company helpall-----------------------------------------
Most important Kubernetes 7-8 topics as per recent competition and live interview , these questions asking regular/product service based
concept istio , what is service mesh very imp , ingress controller nginx why v imp? fluend for logs , prometheus grafan how intall monitor k8 , helm why? how?, gitops flow explanation using argo operator vimp .( helm and prome.. in all projects 100 prcnt questions ), opemshift adavmatage?
concept of role based access security , service accounts very imp , meaning of manifest. meaning of policy in K8 , pod lifecycle what stages like pending how check stages? how createing aws cluster? tell AKS azure Eks aws infra setup what advantage tomorow docker engine support gone? explain in detail db flow --how setup master how many master how many slaves tell process of setup master salves for db . how u taking incremental backup database say perfona tool and how normal db backup example data at transit or rest S3? backuprestore frequency? startegy deploy canary bluegreen warmstandy?, secuirty service /frameork project asked atleast 30 min asked secuirty any public cloud ? what canary ,blue green, stdby , rollback restor? explain in detail helm , (full dns process load balmcer controller what happen i enter cnn.com all steps )? dev says api not work what u do? tell full process how integrate prometheus grafana if prometehus fail u use ELK ,log infra atlest? tell top 15 metrics in detail observe prmotheus as action will be based on that . how you decide autosaclig let me know common metrics tehse serevrs u need in big infra? hw many volumes and how mny port mapped ?explain in general in detail port forwarding ? what is sql proxy explain schema why we use? create bastion host reltime db secure create showme? rverse proxy implementation details? when proxy when reverse proxy difference? how you implement bastion host and port forward it?
=====================================================================================================================================================
Terraform realtime concept asking-
State file , global S3 state local state , taint untaint , import ,(how sync manual machine by mistake creation and the current state), using cache terraform fast process. updt state terraform manual cretedresource, how we read other developer state file with your statefile using what terraform command, Json basics, ARM ofazure wahts difference ? why provisoner in terraform ? whats refresh and state pull whats difference? whats import? what target reffred terrafrm? whats self parameter ?can u get automatically all ips proviioned private public , then whats parameter terraform used? state pull ,refresh? syantax env variable Terraform? why workspace if module?
=================================================================================================================================================
HELPS ALL
linux screenshare question interview zoom>product based-- whats dev/console , dev/null , logger command , logrotate.d , &1 ? !! &&? yum options , sometime rpm we need only rpm how then? suoders file? for many path permission singleuser how ALIAS command? setfacl why? stickybits why when u have chmod? what selinux concept? awk sed find 3 tell 5-10 wys in ur devops ? curl command how u use differnt way docker alo aws also? tee forlogs? exec >> ? netstat tulp? teacroute telnet?nmcli--ip assign?suppose vm with you now home you dont have company access so no net but u need wifi?> /dev/null 2>&1? exec > >(tee /var/log/user-data.log|logger -t user-data -s 2>/dev/console) 2>&1? curl what options u know?what meaning of epel? whats port why ip combined with port why? port is there why service? what ephemral port ? NACL why u need entry 32000 62500 even u have 443 entry at target? whats application port and wahts server ports ranges whats differece? why you need router and route table ? whats destination and target explain route table?
asked sharing screen , sar ,log see using tail , boot problem grub , lvm asked details hw u use nfs server ? find tell 10 differnt way ? awk tell 8-10 ways in devops
whats cidr whats blocksize basic network , not host but subnet range decide what factor? port? ipv4? forward port forwardip commands ,proxy,why proxy 2types proxy?
================================================================================================================================================================
cloud===========
Any publiccloud common q told .any cloud he told comfortable ---- flow 1 --only dedicated db flow explian --how youcreate bastion--master -slave count -- whats sql proxy for read write , tell basic schema manage devops side---backup sure asking--2 backups incremental [entire backuprestre DR db side] depth, normal asking script an d frequency how take ? security structure of your project in details --30 min mimimum asking service / existing project secuirty architecture. , how u do port forwrd i[ forward bastion? how ip /port/proxy all these asked, sql proxy--atleast basics schema maanged devops side? stateless lab , stateful db lab? entire process Site 2 site told spend some 200-300 rs practise it , aorganization yes like 20 account managing , Active directory- backend--SSO frontend flow lab tell , when NAT when IGW in route? if not NAT whats other flow same requirement? detailed q endppoints ? private endpoint , serverless as he told saves cost so becomes important. on premise jenkins ---aws side flow knowledge .DR , security ,project archtecture asking mainly, boto os atlest 2 module u know? python exprtise atlest basics? why you come here just doing institute labs? how many types VPC connection are there? you never did onpremise --aws coonection?whats VPN you nver configured for ur peers?please configure bastion host here. write servrelss function which undo a secuirty group change by a invader/hacker , in the time i send mail my security gone. whats ingress controler explain properly , DNS flow all steps properly ? whats use case of internal lB and App Lb , internal LB u used sql proxy or db flow? whats messaging queue itso crtical workflow , atleast learn basics API message queue , what happens dev comes API not works what steps? wht is L in doc of S3 why u need versioning? whats cross functional replication with version say you are hacked 100 %now? how u take backup master(database) in 2 availabilty zones right DR tell implemnation? API Flowlogs , lambda , Message queu , SNS pls learn told. complete workflow target maillist group on monitor error 100 % CPU consumed, ELK prome EKS explore workspaces its ok u not do.
Why on an earth you'll need to do this with docker?
Are you referring to multi-stage builds?