I Thought I Understood Containers. Then I Tried Building One.

I Thought I Understood Containers. Then I Tried Building One.

How building from scratch reveals what the textbooks skip

The Gap Between Knowing and Building

You can ace a Docker exam, rattle off the right words—namespaces, cgroups, images, layers, PID 1, Kubernetes Pods—and still have no idea what you're doing when you actually try to build a container. That's the lesson from a developer who recently shared their story on the DEV Community, and it's a reminder that theory and practice live in different worlds. This matters right now because containerization is everywhere—Docker, Kubernetes, and a thousand deployment systems rest on concepts that feel abstract until you bang into them in real time.

The First Command: A Humbling Start

It started simple: try to run a process in a new namespace using the unshare command. The first attempt was:

bash sudo unshare -p 1 test

Error: unshare: failed to execute 1: No such file or directory

The flags were wrong. The command asked the system to execute a program called "1," which doesn't exist. Before building anything, before even reaching the real problems, the keyboard and the documentation had different ideas about what was supposed to happen.

Part 1: Namespaces and PID 1

The first real attempt was to run a process in a new PID (process ID) namespace and prove it saw itself as PID 1—the root of the process tree. So the command ran:

bash sudo unshare --pid bash

Then inside that shell:

bash echo $$

Expected output: 1. Actual output: 25184.

It didn't work. That wasn't PID 1; it was just the parent process ID from the host. The rule was simple but counterintuitive: PID namespaces apply to child processes, not the process calling unshare. You need to fork. The first child born into the new namespace becomes PID 1.

So the working version was:

bash sudo unshare --pid --fork bash echo $$

Now the output was 1. The shell thought it was the root of the process tree. Everything felt different from the inside.

Then came the next surprise. Running ps from inside showed:

PID PPID COMMAND 25310 25304 bash 25344 25310 ps

But the shell said it was PID 1. That didn't make sense. The revelation: ps doesn't ask the kernel a pure "what processes exist?" question. It reads files. If /proc still points to the host's process filesystem, your tools will lie to you. They'll show the host's numbering scheme.

The fix was to remount /proc from inside the namespace:

bash mount -t proc proc /proc ps -o pid,ppid,comm

Now it showed:

PID PPID COMMAND 1 0 bash 7 1 ps

That was the moment it clicked. The namespace provided real isolation, but the tools couldn't see it until the filesystem view changed. Isolation and visibility are different things.

The UTS namespace—which controls hostname—was cleaner to understand. Running hostname on the host showed one name. Inside a new UTS namespace (created with sudo unshare --uts bash), changing it showed something different. Back on the host, it went back to the original. One machine, one kernel, three different views.

Part 2: The Filesystem Handoff

After namespaces, the next version was supposed to give the process its own filesystem: a rootfs (root filesystem) with BusyBox and a shell. Very container-ish.

The first error was straightforward:

exec /bin/sh: no such file or directory

The shell wasn't where the script said it would be. That got fixed. But then:

./rootfs/bin/busybox: cannot execute: required file not found

This error is cruel because the file is right there. You can list it. You can see it. The kernel still refuses to run it. Using the file command revealed the truth:

ELF 64-bit LSB pie executable, ARM aarch64, dynamically linked, interpreter /lib/ld-musl-aarch64.so.1, stripped

The binary was there. The interpreter it needed—the dynamic linker—wasn't available from the old world. Linux wasn't saying the file doesn't exist. It was saying "from here, I cannot load the interpreter this ELF needs."

The fix wasn't to make BusyBox static. It was to make Alpine the new root filesystem so the interpreter would be at the right path. But first, the transition needed a bridge—a tool that could run before and during the filesystem switch. BusyBox came in two forms: a dynamic one for inside Alpine, and a static one for the handoff:

bash /bin/busybox pivot_root . put_old

The static BusyBox was the key that could execute pivot_root before the system fully switched worlds.

After Alpine became the new root, more surprises appeared. Bash still remembered command paths from the old filesystem. When it tried to run mount, it looked in /usr/bin/mount in a world that had just been evicted:

bash: /usr/bin/mount: No such file or directory

The fix was hash -r, which clears the command cache. Bash had made a decision in the old world and couldn't let it go.

Part 3: The Mac Complication

The setup wasn't a normal Linux laptop. It was Apple Silicon Mac → privileged Ubuntu container → repo mounted from macOS. That meant virtiofs (a filesystem passthrough for virtualization) was involved, whether anyone wanted it or not.

Inside Alpine, which was now the root filesystem, executing through symlinks could fail with "Permission denied" on the Mac-shared mount, while calling BusyBox directly worked:

bash ls

sh: ls: Permission denied

/bin/busybox ls

(works)

The files were there. Executing through those symlinks was the weird part. The fix was boring and correct: move the rootfs to a container-native path and try from there. Don't force a Mac-shared mount to behave like normal Linux.

Part 4: pivot_root Has Opinions

Even after all that, pivot_root itself wasn't done teaching. The error:

pivot_root: invalid argument

The new root had to be a mount point. The old root needed somewhere to go. So the ritual was:

  1. Bind-mount the new root onto itself
  2. Create an oldroot directory
  3. Call pivot_root(newroot, oldroot)
  4. Change directory to the new root
  5. Unmount the old root

When it finally worked, the reward was tiny and perfect:

bash cat /etc/os-release NAME="Alpine Linux" ID=alpine VERSION_ID=3.24.1

Just a text file. But now it was proof that the container worked.

Conclusion

Building a container from scratch teaches something a course never can: the distance between knowing the theory and watching it fail in real time. Namespaces, cgroups, and the rest are real. They work exactly as documented. But the path from "I understand this" to "I can make it work" runs through error messages, Bash caches, filesystem permissions, and the realization that your tools will lie to you if the filesystem isn't right.

Merits

  • Hands-on learning embeds understanding in a way studying alone cannot
  • Real errors teach constraints that documentation often skips
  • Building from scratch reveals how modern container tools hide complexity
  • Understanding namespaces, filesystems, and interpreters makes Docker and Kubernetes less mysterious

Demerits

  • The learning curve is steep and involves many small, confusing errors
  • Environmental factors (like virtiofs on macOS) add unpredictable complications
  • The process is slow compared to using Docker directly
  • Many edge cases (Bash hash caching, symlink permissions) are not obvious from the documentation

Caution

This article is educational and describes real learning from building containers from first principles. The commands and concepts shown are accurate to the source material. This is not a production-grade container runtime; it's a teaching tool to understand how containers work. Before relying on any container system in production, consult official documentation and best practices. Verify all claims against the original source before implementing anything in your environment.

Frequently asked questions

  • What is a PID namespace and why does it matter in containers?
  • Why does a process calling unshare not automatically become PID 1?
  • What does the /proc filesystem have to do with containers?
  • How does pivot_root differ from chroot?
  • Why did remounting /proc change what ps showed?
  • What is the role of BusyBox in container building?
  • Why do dynamic interpreters matter when switching to a new root filesystem?
  • How does virtiofs complicate container setup on macOS?

Tags

#containers #linux #docker #devops #namespaces #cgroups #filesystem #learning

Responses

Sign in to leave a response.

Loading…