The problem with Docker and node modules for Node.js development
August 12, 2022
TL;DR - update August 2022
I wanted a more convenient way to work with Node.js and Docker - specifically to use Docker without having to use the
node_modules volume mount workaround in order to keep host and container modules usable (so host could use @types and container could run the app). I found a way which worked in principle, but after a short time of using in the wild, I realised it was too slow and ran in to issues with permissions on some machines. The rest of this post explains that different approach, and why it
should work well.
The issue with cross platform Docker for Node.js development
I now have 3 machines I use to do dev work - my main MacBook, a Windows PC, and a Raspberry Pi. I want a dev system that can run truly cross platform between these three platforms, so decided to stick with Docker as Docker is meant to be built for this reason.
When I was developing solely on my MacBook, speed was not too much of a problem. I would set up my Docker process so that I’d be required to re-build the whole container each time I install a new dependency or my Prisma schema changed. This only took a minute or so each time, but after moving to my less powerful Windows machine and then eventually my Raspberry Pi, the build time was 4 or 5 minutes. I decided to look for a new way to use Docker which was more efficient, and this is what I decided!
First, I’ll cover exactly what my previous setup was. It’s a massively widely used setup, documented on most of the “Hello world” tutorials, and for good reason - it works for the majority of use cases for a dev with a semi-decent machine:
FROM node:12.12.0-alpine WORKDIR /home/api/app COPY package.json ./ COPY package-lock.json ./ RUN npm install COPY . ./
First the Dockerfile pulls down Node Alpine. This is Node installed on a very lightweight version of Linux - Linux Alpine, which is used as the base image. This flavour of Linux is apparently very secure, and runs off of musl, which is different to many other Linux distributions that run glibc. I mention this because it was an important part of the eventual Docker setup I ended up with, so more on that later.
Then I set the
WORKDIR of the container we’re building to be
/home/app/api. This is the place where my API code will sit, and what the rest of the Dockerfile will use as the current directory.
package.json file are then copied over to the container,
npm install is ran (installing the node modules on the container), and finally the rest of the project source code is copied over. This is an important note - node modules are installed on the container as a best practice for compatibility reasons. As they are being installed on our Linux Apline distro, moving the container from Mac to Windows to Raspberry Pi should be seamless.
That’s it for creating our image, but in order to use the image running in a container in a usable state, it needs to be ran with Docker. You can run using Docker commands using Docker CLI, but I prefer to use Docker Compose for most projects. Here’s my Docker Compose file:
version: "3" services: api: build: ./api command: sh -c "npm run dev" ports: - 3001:3001 volumes: - ./api:/home/app/api - /home/app/api/node_modules depends_on: - db db: image: postgres ...
The compose file sets up the API, client, and database, but I’ve taken most of the other bits out to keep the file simple. The focus is on the
command part of the file.
command section is pretty simple, it says that when the container is mounted then
sh in to the container (instead of the usual
bash which is not installed on Linux Alpine) and run our start script inside the container. This is the whole point of using Docker - so we can install packages on a standardised platform.
volumes section does two things - firstly it creates a volume from our host machine at
./api (which is where our current code is), and mounts the data inside the container at
/home/app/api (which is where the equivalent code was copied to during the build of our Dockerfile earlier). This synchronises the code between the host and container, and gives us the ability to have Node.js restart on save inside the container, amongst other things.
The problem with having just this line
./api:/home/api/app is that the host machine’s node modules were also just mounted to the container along with the rest of the
./api directory. This hides the node modules in the container, and overwrites them with the
node_modules directory on the host machine. There are two issues with this:
- It also makes the project less compatible, as there may be a mismatch in underlying libraries used deep within a certain package that was installed by my Macbook, but not usable on the Linux OS in the container. One such package that comes to mind is
node-sass- many hours have been spent on compatibility issues with that one.
- It means in order to get a working app, we must install the dependencies in the container when building (in the Dockerfile), and again on the host machine in order to get IDE benefits like typings and autocompletion.
The node modules workaround for preserving container dependencies
This is where the node modules workaround on the second
volumes line in the
docker-compose file. The second part of the volumes section (
/home/app/api/node_modules) addresses problem 1 outlined above by creating a volume for the node modules folder in the container. With this anonymous volume in place (which is not used at all in the rest of the project or workflow), the container’s node modules are preserved, and not “overridden” by the host machines node modules. This means the container uses it’s own node modules, helping towards any cross platform problems.
That’s the basic setup outlined, and works mostly well - the Docker image is created by running
docker compose build, which installs dependencies inside a Node Alpine environment, whether it’s ran on a MacBook for a Raspberry Pi. Then later once modules are installed and image is created, we use Docker Compose with
docker compose up to build that image in to a container, running the code inside the container (for compatibility), and ensuring that the packages in the container are not hidden by the hosts node modules.
So what’s the problem with the above setup
Even with the node modules workaround helping with cross compatibility, in order to get a working app, we must now install the dependencies in the container when building (in the Dockerfile), and again on the host machine in order to get IDE benefits like typings and autocompletion. Alternatively, the dependencies can be installed on the host machine first, but then the while image would need re-building to update the container dependencies, so it’s still less than ideal.
This problem may not even be a problem for most people, but as the build takes so long on a less powerful device such as my Raspberry Pi, I wanted to find a more efficient way of working. Especially since I started using Prisma. Installing new dependencies and faffing around with the occasional
npm install wasn’t so bad, requiring rebuilding Docker images now and then, but it became cumbersome when working with Prisma.
Prisma is different to other ORM’s such as [Sequelize] or [Objection] as it installs it’s generated files (types for TypeScript, and logs) inside the node_modules directory. This means that when anything to do with the database schema is changed, such as a new field or table added, the command
prisma generate needs to be ran. This command can be ran either on the host machine, OR we can
sh in to the container and run it there (similar to installing new node_modules), but there are downsides:
- You can
shin to the container with
docker exec -it <container_name> shand run
prisma generatethere, but it means the node_modules is only updated in the container and the host machine doesn’t have the updated Prisma types and other files. You could also run it here as well, but running the same command in two locations each time is a pain.
prisma generateon the host machine is possible too, but it means having to re-build the Docker image as well - re-installing all the dependencies from the Dockerfile again.
I’d originally posted this solution below as my chosen solution, but I’ve since abandoned it because it was causing too many issues with compatibility on my Raspberry Pi and Windows machines.
I had a good Google session to find out how others have approached this problem, and came across some great resources that are worth listing.
I came across this very helpful Stack Overflow post which explained the issue I was trying to overcome perfectly. This persons question outlines 3 common approaches to running Node apps with Docker, and their Approach 2 is the same one I was using, and that’s outlined in this post.
The top answer (at time of writing) links off to this GitHub project which sort of worked, but requires that the node modules are installed locally before or after building the image. Without doing this, the host modules directory would be empty and loose the IDE auto completion and type support. My ideal approach would mean I could run the project without even having
npm installed on the host machine.
I would likely have npm installed on the host machine in all cases, but still I wanted the project to be as cross platform as possible.
I then found this equally helpful Stack Overflow post which has a question specifically about keeping node_modules in sync between the host and container environments. The person asking the question answered the question later, with the approach I ultimately started using.
Here’s my new setup, with a few changes from the one in the SO post mentioned above:
FROM node:16 RUN apt-get update -y RUN apt-get install -y rsync WORKDIR /home/api/cache COPY package.json ./ COPY package-lock.json ./ COPY prisma ./prisma/ RUN npm install RUN npx prisma generate WORKDIR /home/api/app
First I pull down the full Node image to use as a base. Instead of pulling down just the Alpine based version, the standard Node image comes with a load of common Debian Linux packages that help with compatibility acorss multiple machines.
rsync is installed. This is used later when syncing up node modules as outlined in the SO post earlier. The SO post answer uses
cp, but that command takes a long time and runs the whole process each time the container is executed.
rsync is great because is caches the syncing, so as long as no files change in the node_modules directory, the process takes seconds instead of minutes.
Then a temporary,
cache directory is created in the container, which the
package-lock.json files are moved over too and used to
npm install dependencies on the container. Installing the modules here means they dont get hidden when the host code mounts on to the container later as defined in the Docker Compose file. The dependencies are kept safe in the container, and copied over to the proper directory later with
rsync, where they can be used by the running application in the container, and the host machine due to the volume mounting all of the code to the container.
prisma files are also copied over to the temporary/cache location on the container when building the image.
prisma generate is ran in this temp location too, generating the types etc which are placed in the node modules directory before being transferred over to the “proper” location using the
rsync command mentioned above.
Heres the Docker Compose file:
version: "3.8" services: api: user: node command: sh -c "rsync -arv /home/api/cache/node_modules/. /home/api/app/node_modules && npm run dev" ports: - 3001:3001 volumes: - ./api:/home/api/app depends_on: - db db: image: postgres ...
Permissions issues with rsync
If running on a Mac, permissions won’t pose a problem, but if running on a Windows or Linux host machine, permissions can cause a headache due to Prisma types etc. being inside the node_modules directory.
Files inside a container have different permissions depending on if the host machine is running Windows, Mac or Linux. On a Windows host machine for example, all files inside the container are owned by
root. And on a Linux machine files have the same permissions as outside the container - for example my Raspberry Pi has all files owned by
node which is the default user of the Alpine Linux. This can be a problem because when running
prisma migrate dev (which is ran separately from the Dockerfile - manually when the database schema changes), those files need to be read and updated.
Ideally we want the node modules directory in the container to be owned by
node user (or any user that’s not root, for security reasons), and set that user to be the default user within the container. This is done by adding the
user: node option in the Docker Compose file. By setting this as the default user, the files in the
/home/api/cache directory are updated to the
node user permission level automatically, as the
rsync file transfer moves them over as that user. The
rsync runs in the container on
docker compose up, so it moves them over according to it’s own privileges.
Note: when messing with permissions and node_modules, remember to delete the node_modules directory when observing changes made to Docker, as I found the permissions don’t change otherwise. Also, removing
user: node from Docker Compose will cause the node_modules to be transferred over to the end location as
root user, and can’t be changed back by updating the compose file - a re-build will be needed.
Thanks for reading
As mentioned earlier, I originally finished writing this with an end solution in mind, but after a time of working with the
rsync solution, there were just too many issues to keep it a viable option.
Senior Engineer at Haven