At 9:46 I'm pretty sure that the duration was actually 75.4 seconds as can be seen at the top, or 76.18 seconds (aka 1 minute 16.18 seconds) as measured by time command. The 0.47s user means that only half a second was spent in userland or something like that. And we can see that it takes 266 MB in size. And at 11:20 we can see that the one-layer build took 71.8 seconds as measured by docker or 72.38 seconds as measured by time. So 3-4 seconds faster, which is not much, but it's still a gain. The multi-layer docker build also didn't had THAT many instructions to begin with. If you have a 30 instruction builds (I've seen them) I think it should help more. The size of the single-layer build can also be seen that is at 262 MB, which is only 4 MB lower than the multi-layer one. To be frank, I kind of don't understand why multiple layers would need more space (except some minimal space needed for the layers and its metadata, which should be negligible). However, if I'm not mistaken, having less / fewer layers has other benefits, like faster i/o in general for the docker build, since something like a find command run inside the container should not check multiple layers, aka multiple file systems. So while the build time and space might not be much better from the single-layer build, it should run faster/better, and should be enough of a reason to do it. The scratch idea is pretty neat, totally didn't thought about that one. Good mention that you don't have bash and probably not even things like cd, ls and so on. For something like php which is the main thing I use, I'm not sure how easy to do would be and especially how useful. I also I'm not so sure what is the difference between simply running that executable directly on the host. Also, it might've been nice to show the contents of the images in the different scenarios. But it's a very nice video, decently detailed and very on point. Keep it up!
Thanks for such a detailed feedback! - My bad, I read the time information incorrectly. Should've paid more attention to that, thanks for pointing it out! - Agree, layers don't make that big of a difference. I only recommend them because they are a low hanging fruit. If they were even slightly hard to implement, they would not be worth the effort. - Scratch & Multilayer gets more challenging with interpreted languages like PhP and then we need to explore converting our scripts into executables (eg- Py2exe for python) - It is still beneficial to run your app inside a scratch container rather than on host. Because the container still provides all the benefits of process isolation (no other processes on system can interfere with your app, user doesn't have to bother about downloading the right executable like intel or amd, win/mac, etc)
Nice video, i have followed the same steps a year ago while building a container image for deploying it in AWS lambda. But scratch is a new thing i learned today. And storing an already built tools in an external storage like s3 is a great way to improve the build time of the docker image, i have uploaded static binary of ffmpeg and dynamically linked binaries of poppler-utils package for improving build times.
apt-get is the legacy way. Just using apt update and upgrade will fetch latest repo metadata and upgrades. You can use docker init to create multistage and multilayer builds with caching. Another alternative is to use distroless images from Google container or passenger if you want very minimal Ubuntu base image of 8MB.
It's work well for compile languages like go lang, java, but what about nodejs ? Alpine also having its own limitations for http protocol which is dependent upon c++ package less performance
Exactly my thoughts - I'm working with python, but it's the same (interpreted language - you need all that code and the complete python environment including dependencies)
You're right, size becomes a much bigger challenge in case of interpreted languages. Then we need to explore other techniques such as compression or, if possible, compile the application into an executable (eg- py2exe does this for python programs). I'm actually trying to research more ways to reduce the size for interpeted languages.
most likely you're missing some OS-level dependencies which are not present in alpine. check the dependencies of pytest itself, and then you can install it yourself on top of alpine
Hope you guys learnt something interesting from this video 🙂 Creating good quality content while also having a Day Job is hard - I mostly spend nights and weekends doing this stuff. So I'm also looking for another person who could take out some time and help me with the content (obviously, this comes with perks 😉) If you'd be interested, do reach out to me - sre101collabs@gmail.com
I think we can call all the stages (except for the last one) as pre-processed, yes. The last stage, however, is the final image so I wouldn't call it pre-processed.
Thank you Alex! If you're looking to deploy docker containers on servers using Kubernetes, I have a video for that -> ua-cam.com/video/euqlrMXDKDY/v-deo.html But if you're looking for something else, feel free to share more details with me and I'll look into it
If your runtime dependency is a binary file (eg- DLLs in windows), you can simply include that in the image as well. If you're talking about interpreted languages, then sadly we do need to include their dependencies in the image, which bloats the size.
I don't expect anyone to include these things intentionally. But I've seen plenty of production images out there which use, for example, Golang image as their base image. This image includes things like Go compiler, standard library, etc which is just adding bloat and shouldn't be part of production images IMO.
@@putnam120 Correct. In case of interpreted languages, there's no choice but to include the dependencies (unless you can compile your application into an executable. eg- Py2exe)
I would highly recommend alpine for production. Not only does it reduce the image size, it also decreases chances of security issues by reducing the attack surface (the lesser data and apps you have on your system, the lesser things can be exploited)
@@igorcastilhos I'm aware of them, but never used them in production. I would almost always prefer to use an official image rather though they're a very trusted brand
how about node js? what we all know, node js is interpreter language, than means we still need all node modules, different from golang, we can remove all pre build file, cause we just need executable file
You're right, size becomes a much bigger challenge in case of interpreted languages. Then we need to explore other techniques such as compression or, if possible, compile the application into an executable (eg- py2exe does this for python programs). I'm actually trying to research more ways to reduce the size for interpeted languages.
The problem with interpreted languages is that you need to include all their dependencies. Multistage & scratch method are more effective with compiled languages. So probably compression strategies work better when working with Python
to be honest this is a subject I never thought I should have done, thank you, this was great and very helpful
Glad you found it insightful!
Liked the clarity of explanation.
Thanks raghv dua for your informative tutorial on reducing docker image size
At 9:46 I'm pretty sure that the duration was actually 75.4 seconds as can be seen at the top, or 76.18 seconds (aka 1 minute 16.18 seconds) as measured by time command. The 0.47s user means that only half a second was spent in userland or something like that. And we can see that it takes 266 MB in size.
And at 11:20 we can see that the one-layer build took 71.8 seconds as measured by docker or 72.38 seconds as measured by time. So 3-4 seconds faster, which is not much, but it's still a gain. The multi-layer docker build also didn't had THAT many instructions to begin with. If you have a 30 instruction builds (I've seen them) I think it should help more.
The size of the single-layer build can also be seen that is at 262 MB, which is only 4 MB lower than the multi-layer one.
To be frank, I kind of don't understand why multiple layers would need more space (except some minimal space needed for the layers and its metadata, which should be negligible).
However, if I'm not mistaken, having less / fewer layers has other benefits, like faster i/o in general for the docker build, since something like a find command run inside the container should not check multiple layers, aka multiple file systems. So while the build time and space might not be much better from the single-layer build, it should run faster/better, and should be enough of a reason to do it.
The scratch idea is pretty neat, totally didn't thought about that one. Good mention that you don't have bash and probably not even things like cd, ls and so on. For something like php which is the main thing I use, I'm not sure how easy to do would be and especially how useful. I also I'm not so sure what is the difference between simply running that executable directly on the host.
Also, it might've been nice to show the contents of the images in the different scenarios. But it's a very nice video, decently detailed and very on point. Keep it up!
Thanks for such a detailed feedback!
- My bad, I read the time information incorrectly. Should've paid more attention to that, thanks for pointing it out!
- Agree, layers don't make that big of a difference. I only recommend them because they are a low hanging fruit. If they were even slightly hard to implement, they would not be worth the effort.
- Scratch & Multilayer gets more challenging with interpreted languages like PhP and then we need to explore converting our scripts into executables (eg- Py2exe for python)
- It is still beneficial to run your app inside a scratch container rather than on host. Because the container still provides all the benefits of process isolation (no other processes on system can interfere with your app, user doesn't have to bother about downloading the right executable like intel or amd, win/mac, etc)
Nice video, i have followed the same steps a year ago while building a container image for deploying it in AWS lambda. But scratch is a new thing i learned today.
And storing an already built tools in an external storage like s3 is a great way to improve the build time of the docker image, i have uploaded static binary of ffmpeg and dynamically linked binaries of poppler-utils package for improving build times.
Awesome video!
apt-get is the legacy way. Just using apt update and upgrade will fetch latest repo metadata and upgrades. You can use docker init to create multistage and multilayer builds with caching. Another alternative is to use distroless images from Google container or passenger if you want very minimal Ubuntu base image of 8MB.
Agreed. I almost always prefer using Distroless over scratch - its very little overhead
It's work well for compile languages like go lang, java, but what about nodejs ? Alpine also having its own limitations for http protocol which is dependent upon c++ package less performance
Exactly my thoughts - I'm working with python, but it's the same (interpreted language - you need all that code and the complete python environment including dependencies)
You're right, size becomes a much bigger challenge in case of interpreted languages.
Then we need to explore other techniques such as compression or, if possible, compile the application into an executable (eg- py2exe does this for python programs).
I'm actually trying to research more ways to reduce the size for interpeted languages.
Hi
Iam unable to use a python(pytest) based script using python-alpine image , any idea
most likely you're missing some OS-level dependencies which are not present in alpine. check the dependencies of pytest itself, and then you can install it yourself on top of alpine
Hope you guys learnt something interesting from this video 🙂
Creating good quality content while also having a Day Job is hard - I mostly spend nights and weekends doing this stuff.
So I'm also looking for another person who could take out some time and help me with the content (obviously, this comes with perks 😉)
If you'd be interested, do reach out to me - sre101collabs@gmail.com
this was amazing!
please do another video for reducing build time
Thanks! Yes, I've been thinking about this too. Hopefully I'll publish one very soon :)
This is interestinig approach. Can we call multi-stage builds as pre-processed builds?
I think we can call all the stages (except for the last one) as pre-processed, yes.
The last stage, however, is the final image so I wouldn't call it pre-processed.
@@sre101 got it! Absolutely, makes sense, thank you!
Which tool do you use for creating these flow diagrams ??
this one is with excalidraw.
I also use draw.io a lot and sometimes Canva
Amazing content, please can you make videos on how we can deploy docker on server?
Thank you Alex!
If you're looking to deploy docker containers on servers using Kubernetes, I have a video for that -> ua-cam.com/video/euqlrMXDKDY/v-deo.html
But if you're looking for something else, feel free to share more details with me and I'll look into it
Any Solution how to reduce compile time in development in nextjs?
I've never worked with nextjs so I can;t recommend something specific to it. But in general, dependency caching is super helpful in such cases.
Thank you 🙂
How it should work with runtime dependency
If your runtime dependency is a binary file (eg- DLLs in windows), you can simply include that in the image as well.
If you're talking about interpreted languages, then sadly we do need to include their dependencies in the image, which bloats the size.
Do people include source code and build tools in their final images??? I don't think it's common.
I don't expect anyone to include these things intentionally.
But I've seen plenty of production images out there which use, for example, Golang image as their base image.
This image includes things like Go compiler, standard library, etc which is just adding bloat and shouldn't be part of production images IMO.
If it's an interpreted language.
@@putnam120 Correct. In case of interpreted languages, there's no choice but to include the dependencies (unless you can compile your application into an executable. eg- Py2exe)
Is it good practice to use alpine in production?
I would highly recommend alpine for production. Not only does it reduce the image size, it also decreases chances of security issues by reducing the attack surface (the lesser data and apps you have on your system, the lesser things can be exploited)
@@sre101have you heard about Bitnami images? Have you ever used it?
@@igorcastilhos I'm aware of them, but never used them in production. I would almost always prefer to use an official image rather though they're a very trusted brand
thanx
Great video. Wasn't aware about Scratch. TIL
Glad you found it useful brother!
how about node js? what we all know, node js is interpreter language, than means we still need all node modules, different from golang, we can remove all pre build file, cause we just need executable file
You're right, size becomes a much bigger challenge in case of interpreted languages.
Then we need to explore other techniques such as compression or, if possible, compile the application into an executable (eg- py2exe does this for python programs).
I'm actually trying to research more ways to reduce the size for interpeted languages.
Great
thanks
Video on eks
I'm thinking of a project on EKS, which is beginner-friendly.
If you would like me to cover any specific issues, let me know
How it would be for a Django project or Fastapi project?
The problem with interpreted languages is that you need to include all their dependencies. Multistage & scratch method are more effective with compiled languages.
So probably compression strategies work better when working with Python
Noice!
Toight!
Mirror your face cam so you are looking inside the video, not out of the video.
That's a very good point, noted!