By - Dry_Back_1116
`COPY`, as name suggests, copies stuff to your image. So, if you remove it `pip install` will fail (won't be able to find `requirements.txt` file) and if you remove it as well `CMD` will fail because `main.py` file will be missing.
It's not mandatory to use `COPY` in your `Dockerfile`. You need to use it if you want to copy files to your image.
ahh I see, so yeah I am building an image and I planned deploying that image,
so copy is a must ah!
Thank you so much
You could put all `pip install` commands as RUN-commands instead. I don't recommend it, but it would remove the copy command if you want a standalone dockerfile. It would make caching more granular, which could be beneficial.
Huh? Removing a copy of requirements and it's install would create less efficient image I believe; assuming that requirements are stable.
One more thing: I'd pin pip version; and question - to pythonistas - is upgrading pip in docker image really necessary?
Many packages won't install using the pip versions that many distros ship. It's pretty common to just upgrade pip after creating a virtualenv.
I'm not saying it's a good idea. Just saying it's completely possible.
Mate, just remove it and see what happens, 90% of work in it is messing around with things.
Are you confused about what COPY does or why theres two of them?
If its the first then i suggest you read the top of the COPY command docs although its pretty self explanatory.
If its the second then its because you want to fail early and reduce the number of steps you have to rebuild each time you are running your image. Basically it will only copy and rerun step 4+5 if it detects that your requirements file have changed. This means that after the first time, your build time will decrease significantly
There are 2 `COPY` commands to make use of Docker's caching for layers.
When `requirements.txt` doesn't change, and the available versions don't change, it should be faster. Because programmers will change the code much more often than the dependencies in `requirements.txt`.
You can remove the first COPY line and move the second before the `RUN pip install` line and it should still work, but be slower.
Side note: consider using entry point instead of cmd
Why? I hate those images!
Why would you want to allow users by 'default' to change which script is running? Goal is to reduce complexity; by allowing users to act on arguments only; users are abstracted away from knowing which executable/script they have to run.
Goal of the container is to do one thing, this in particular has a goal of running main.py. why would you wish to change its purpose in runtime? Why do you want to bother to remember which one is it, especially when the next version could switch to main2.py; thus failing your deployment?
If absolutely necessary, override entry point. Otherwise, act on arguments only.
E: of course tooling images are a different beast, in python base image you'd wish to use command only; because you want to use either pip or python. Though I'd argue, that the final image shouldn't even have pip; but i doubt there is such image available:)
I consider the entry point images inferior because the audience is technical and entry points make it so much harder to interact with the image.
I’ve seen too many cases where the “single purpose” that the creator of the image thought of just was not the most important one.
Being able to easily interrogate the current runtime is often more important that to run an arguments only image.
I’d argue that less people know how to get into an image with an entry point than there are people who how to run the command for an image without entry point.
On this account any abstraction is unnecessary, because 'audience is technical'. :)
Do as you wish, but I consider end images forcing you to know the executable as a problem.
A CMD does not force me to know it but allows me to use a less surprising method to run something else.
At any point when you wish to use additional arguments you have to specify executable; and as such the abstraction is lost. I do understand your point, but let's keep to the facts. :)
Where did I deviate from facts?
This is, after all, a matter of opinion what’s easier to use.
I’ve almost never had an image where the entry point was enough and overriding to get into the image requires more knowledge that not having an entry point.
A CMD provides a good default to run the directly intended thing while keeping it easy to get I to a shell or run a modified command.
An entry point only makes it easier to add new arguments but nothing else and `docker run … /bin/sh` almost always fails or doesn’t do what’s expected.
An entry point is also harder to reset than a command.
That’s why an entry point is, in my opinion easier to just use a command and type out what should be run if you modify the thing that should be run.
Which of these _behaviors_ are factually false?
Whether it is easier is up for debate, but the behavior describes (getting into the image or resetting the entry point in a subsequent layer is more complex with entry points than with CMDs.
> A CMD does not force me to know
It does, as soon as you try to pass any argument. This (single) thing is factually false. The rest is a matter of opinion; especially considering that spec does not recommend one over the other
> I’ve almost never had an image where the entry point was enough and overriding to get into the image requires more knowledge that not having an entry point.
By far most of the deployments in k8s/helm workloads - and any containerized application (Not tool!) require no arguments. One that I actually remember was MSSSQL2012 to enable error logs. Not one requred change to the entrypoint.
> An entry point is also harder to reset than a command.
I don't get this point. varargs vs parameter, if one is 'harder than the other'...
What's the point of
`RUN pip install --no-cache-dir -U pip`