This may be me shouting into the void, but I wish there were an article directly comparing jails with namespaces, which is the Linux functionality that Docker uses. I can totally believe that FreeBSD jails provide a better / more unified / more secure experience than Docker, but to extend that into saying "FreeBSD jails are better than Linux namespaces" feels like a category error.
Questions I would like to see answered in that article:
* Can jails be used to run subprocesses in the normal filesystem, but with a different network environment (for example making a given command run its net traffic through TAP)?
* Can jails be used to limit memory/cpu/IO/network for subprocesses? For threads within a process?
* Can live processes be moved into or out of a jail?
* Can jails be used to make a process think it's running as a different user?
I feel like the answer to these questions is generally "no, that's not what jails are for", which is (1) a fine answer given the apparent goal of being a better chroot(), and (2) reinforces that jails and namespaces are addressing different problem domains.
Jails are basically like a VM-like/light: it appears like jail is its own system. Not quite as heavy as an actual VM under (e.g.) Qemu with virtualized hardware. A jail can have its own network stack:
* https://klarasystems.com/articles/virtualize-your-network-on...
You can then create a 'virtual patch cable' between the host and the jail and send one side of the 'cable' to the jail and do routing and stuff on the host:
* https://man.freebsd.org/cgi/man.cgi?epair
But a (sub)process cannot be 'sent' to a jail: jails 'boot up' like a normal system does, and so you'd have your PID 1 run your regular daemon startup.
> Can jails be used to limit memory/cpu/IO/network for subprocesses? For threads within a process?
Yes:
* https://wiki.freebsd.org/JailResourceLimits
> Can live processes be moved into or out of a jail?
No: jails are VM-like in functionality.
> Can jails be used to make a process think it's running as a different user?
The jail, being VM-like, would have its own passwd.
It's an option to run jails with a whole init and everything; then it's like a separate host. But you can also just run stuff in the jail, without a whole everything. I currently run two daemons in a vnet jail to get a separate network for those daemons, but chrooted to /, because they don't need a separate filesystem. At my last job, we ran a TLS termination proxy chrooted to a very limited directory, because we didn't trust OpenSSL after Heartbleed; the chroot had just the executable, ld-elf and the libraries it loaded, unix sockets to communicate with the origin server, and logfiles, most of which was chflags schg.
If you don't trust a component running inside a Jail or a Linux container, you shouldn't be running it there; the kernel attack surface is big. At that point you've adopted the same security model as a phone jailbreaker has (except you don't get to dedicate hardware to the anti-jailbreaking problem the way the phone vendors do).
Can't run TLS termination on an isolated host, because then the traffic to the origin goes over the network, which you also don't trust.
Don't want to run OpenSSL integrated into the daemon, because OpenSSL is garbage.
Couldn't run anything else in the immediate aftermath of Heartbleed, because GnuTLS was worse, and LibreSSL and BoringSSL hadn't been released yet.
The truth is, you really do trust OpenSSL. You just trust it less than other things. That's fine! Layer controls on top of it. In Linux-land, this is the point where you'd start thinking about things like seccomp-bpf.
A truly untrusted workload is, like, a compute job you've accepted from a SAAS customer; it's arms-length multitenancy. You can't share kernels in that situation.
I do think VMs are likely more secure than containers in a cloud environment, because of course at that level you have both problems, but I don't believe that.the number of vulnerabilities found at the VM layer is at all reflective of their actual vulnerability.
If you don't want to derive this axiomatically, fair enough: count vulnerabilities. The tally you're looking for is every Linux LPE versus every Linux KVM escape.
When you do this, do the jails/chroots act like separate overlays on top of / to the daemons?
For example, if you use systemd-nspawn or systemd-run, you can run daemons in containers that get their own view of /, but writes take place in separate overlay file systems.
I'm curious how that compares with what you're doing with jails.
The low-level details and fiddly knobs matter, not how it looks like from userspace when all is done.
Based on the jail(8) manpage I get the impression that jails are more like all the namespace and cgroup things rolled into a single entity. Which I guess makes it more difficult to use them incorrectly. But it also prevents other uses such as the one that jmillikin has in mind.
- Often boots user land like a VM (i.e. PID 1 is not the process you want to run, but whatever metal or VM freebsd runs as PID 1)
- Normally has its own network stack (VIMAGE)
- Normally runs a bunch of background services like a regular VM or on-metal would.
- Normally has the entire copy of user-land
- I've never seen short-lived jails either - you make a base dataset, clone it, but after that, you just run `freebsd-update` like you would in VM.
The majority of jail users treat them like lightweight VMs, only difference between VM and Jail from consumer standpoint: shared kernel, access to a subset of host's FS (sharing host FS to VMs in freebsd is not as easy as it is on linux).
To be clear, it's possible to use Jails like Docker, there is just no good tooling to do it. People would yell that whatever we have for jails is all we need with foam from their mouths every time you mention it.
The question that was asked what the technology can do, not how it's being used. What are the primitives, the atoms its made of.
People tend to use words that describe how they use things to describe what they are.
The question is not what you use a car for, but rather how the car is built.
The question was "Calling "VM-like" is not helpful because containers have also been called that, it also doesn't explain anything."
I've provided examples why "VM-like" is used to describe jails.
of a VM.
The difference between "VM" and "VM-like" is the trailing modifier "-like". This means something is similar, but not the same thing. If it looks like a duck, but is made of rubber, we call it a "rubber ducky." If it looks like a duck, is wearing a blue shirt, has a speech impediment, and is not wearing any pants, we call it Donald Duck.
Thus the fact that jails, and docker look like VMs because they have their own PID 1, their own file system, their own slice of memory/cpu/IO/network; that is why people are, accurately, using the descriptor "VM-like" to describe them. People understand that docker is not a true virtual machine because it's not running its own kernel. In cases where it matters, pedantry between true VM vs fake VM, or VM-like (for eg security isolation between VM and docker container) is crucial, but most discussions where "VM-like" is used to help people understand OpenVZ, LXC, jails, docker, cgroups, etc aren't focused on the possibility of an RCE in the container escaping the container, but helping people understand what a container even is in the first place.
Or to put it another way, strcmp("VM", "VM-like") != 0.
It feels like jails are still stuck with being described using older terms even after Linux containers and the distinction between them and VMs were mainstreamed.
Technically, you don't have to. It's just because jails don't have convenient tooling around them (like docker or podman) it's easier to just boot it up like a normal system.
With all due respect, they are not.
The definition of a «virtual machine» is a settled matter, and the «M» in «VM» is important and is the differentiator: it allows one to run a different operating system kernel on the same host under the auspices of a hardware or a software supervisor (somewhat less of a defining feature).
Neither jails nor cgroups possess such a property, and both restrict users to the same host operating system kernel and its version, so none of them are «VM-like/light» irregardless of the semantic interpretation of the «-like» suffix.
VM-like functionality is provided by a different OS subsystem in both, Linux and FreeBSD, kernels.
>No: jails are VM-like in functionality.
Could this be implemented though if moving between sufficiently similar operating systems?
Seems like might be useful to imprison a suspicious acting process or to release once seen to be safe.
Yes, you can run a vnet jail chrooted to /; same filesystem as the host, but a separate network system (you have to setup the network for the jail at this point; but I imagine the jail tools help with that). This is a fine use for jails. I currently run a vnet jail chrooted to / to do some crazy network stuff, but I just need a separate network, no other separation.
> * Can jails be used to limit memory/cpu/IO/network for subprocesses? For threads within a process?
I haven't used it, but it looks like yes/maybe with the rctl subsystem. This allows limits on lots of things, include memory of several types, cpu of a few types, filesystem io in bytes per second and operations per second, number of threads. Valid subjects for resource limits are process, user, loginclass and jail. Based on the manual, this doesn't let you limit network as you wanted, or place limits on threads within a process like you wanted. If you run a vnet jail, you can potentially set networking limits in other ways, but only if you pass virtual interfaces (such as epair, or taptun) to the jail rather than physical interfaces. This is a reasonable use for jails, but you might not need a jail for this?
> * Can live processes be moved into or out of a jail?
A process can move itself into a jail, but only if it's superuser. This is generally used administratively. Jailed processes can not be moved out of the jail, afaik; when the jail is destroyed, all processes within are killed. This is not within the design scope of jails.
> * Can jails be used to make a process think it's running as a different user?
I don't think so, but I'd use something with LD_PRELOAD to override getuid/geteuid for this; but I think I must not understand the question. You can certainly have entirely different users inside jail vs on the host?
FreeBSD Jails provide both secure isolation from the host and also provide the separated network namespace and resources with rctl(8) when needed.
To have Linux containers separated and secured you need additional layer for that - like SELinux or AppArmor. Only then Docker/Podman/other/... Linux containers are isolated and secure.
Regards.
To provide a security boundary between Linux processes, it's currently considered best practice to use something like Firecracker or gVisor.
For several of those questions about jails, the answer would be the same if you were just evaluating the questions against namespaces by itself.
That might be possible on DragonflyBSD. They have support for freezing a running process to disk and restoring the process again (potentially on another machine) [1], and they have support for jails, so I don't see why you wouldn't be able to freeze a running process on the host and restoring it in the jail or vice versa.
I use only the tools included in base system for setting up my jails. No “ezjail” or anything.
If you read his whole book you will see how it might be the correct choice to just do it yourself. Depending on what you want to do etc.
For me I am definitely much better off having set it up myself with the help of mwl’s book.
https://www.freebsdmall.com/cgi-bin/fm/bsdmjails
Buy the physical copy of the book.
PS: Use vnet interfaces for most of your jails.
However, I really wish the FreeBSD folks would educate themselves a bit more about what is actually available on Linux and how those options compare to FreeBSD Jails.
But mainly they might make make more educated comments. Sometimes, and maybe this is more a historic thing, the Linux advocates can get a bit carried away.
I'm speaking against my own interests here; I don't like K8s.
If they're invested in an existing linux container ecosystem and don't intend to change then yeah probably not much gained.
For example years ago I started runit in a container to run multiple processes (multiple containers was a bit tricky for operational reasons) and some people were surprised: "oh wait, you can do that?" Who knows if my experience with jails contributed to this "idea" (if you can even call it that), but it probably didn't hurt.
I’m planning on starting a FreeBSD based hosting service, but I am not sure if anyone actually wants what I have in mind.
The thing is that I have some couple of very specific ideas in mind.
I will offer a small amount of storage, and not general internet connectivity.
Users will have 25GB space and the idea is that they will be able to connect over Wireguard, but they cannot make outbound connections from the host.
It’ll be like a /home/user in the cloud.
So you can keep some files there and connect from wherever in the world and use the cli tools we all know and love. But idk if there is an actual market for that or not.
The selling point is that I will be focusing on the storage of that data. ZFS with redundancy and offsite backups – the works. That’s why the amount of data should be low. Only keep the most important files there.
Also I’m gonna accept payments in Bitcoin only, and people will have to sign up for many years upfront.
I think maybe the number of people that want this can be counted on one hand heh
Just trying to help with question to validate the idea. Best of luck.
My impression of rsync.net is that it is for backup. Whereas my service would be a live service you ssh into. Basically, connect to Wireguard VPN and then ssh into your cloud home.
The idea being that this is where you keep, and interact with, your files that are important.
> DigitalOcean
My service would have less system administration for the user, and high level of storage redundancy and offsite backups
> Just trying to help with question to validate the idea. Best of luck.
Thank you, I appreciate it :)
Bastille also has a sister project 'rocinante' which allows you to use Bastille templates on the host. I converted my ansible scripts to bastille templates and it works a lot better for *ME*. I found I spend more time updating ansible scripts whenever I needed to use them, it costed more time then just using a setup.sh script, which rocianate basically is. https://github.com/BastilleBSD/rocinante
Another new kid on the block for jails is AppJail, it has some interesting features. I have not played with it enough to say how stable it is. https://github.com/DtxdF/AppJail
Started with ezjail, switched to iocage, now thinking about bastille or roll-my-own.
(I'm also curious if BSD Jails are the same thing as Solaris Zones but with a different name or if there is significant nuance making them different).
Docker and runc are very similar.
Docker run Docker containers. runc runs OCI containers.
And jails operate at a similar level, though without an image format.
There are systems which build upon jails such as iocage and ezjail which are more similar to docker.
I used jails on FreeBSD and nothing in Linux comes close. Yes, it is not a pointy-clicky setup like Linux likes to do. But IMHO Jails are far more secure, in a way: you get what you 'pay' for.
That is strongly dependent on your threat model. The default docker configuration completely bypasses the firewall, making it trivial for containers to be exposed to the open internet with no way for admins to prevent it[0]. Likewise, I hesitate to call docker's default of running as root safe since it means anyone with access to the docker socket immediately has root on the host.
[0] It is quite easy for someone even slightly inexperienced to accidentally write, say `-p 1234:1234` instead of `-p 127.0.0.1:1234:1234` and thereby cause a security incident or near-miss; ask me how I know.
I never got why this is commonly used as an argument against Docker, TBH. You just don't give out access to the Docker socket to anything untrusted. Doesn't pretty much everyone know that by now?
I feel like people always say Docker is awfully insecure, but then the proofs-of-concept include flags like `--privileged`, or the socket is mounted, or / is mounted, or --net=host is set... etc. Docker by default always seemed pretty good to me, but I'm not very experienced in that realm, so I'm just wondering what I'm missing.
I agree with ports, working[0][1][2] on it.
[0] https://github.com/moby/moby/discussions/45524
* https://blog.aquasec.com/cve-2022-0185-linux-kernel-containe...
* https://securitylabs.datadoghq.com/articles/dirty-pipe-conta...
* https://snyk.io/learn/docker-security/top-5-vulnerabilities/
* https://www.container-security.site/attackers/container_brea...
The only escaping of jails that I've heard of in the last ~20 years is one not in the jails code, but tunnelling out through devfs:
* https://www.freebsd.org/security/advisories/FreeBSD-SA-14:07...
(to be more accurate, I knew that it was not "on top" in the same way as Linux containers are built on top of chroot. )
You can setup Linux in a jail.
You can nest bhyve virtual machines in a jail.
You can assign individual NICs to a jail.
You can encrypt with ZFS a jail.
You can run browsers in a jail with full set of features.
> You can setup Linux in a jail.
I don't see how this can be possible. Could you explain more how to boot a Linux kernel within a FreeBSD jail?edit: I'm not talking about running binaries compiled for Linux under FreeBSD. The parent said it's possible to set up Linux within a jail, so I want to see instructions to boot an actual Linux kernel as a FreeBSD process.
With FreeBSD you can set up a standalone virtualized network stack:
* https://klarasystems.com/articles/virtualize-your-network-on...
But to be real: that's obviously not what was meant. If someone wants to install and run Ubuntu inside a FreeBSD jail, it is well possible to do that. Nobody really cares if it's running kernel.org code, just that the binaries are running as expected.
And what about if my Linux Distro doesn't use system-md, like Slackware.
All other things you have mentioned are possible with pretty much any of the Linux container runtimes. systemd-nspawn is just one of them, and I don't think it is actually used very much compared to the alternatives.
It's a choice of Slackware not use systemd infra. However on Linuxes you have more that one choice to run containers, your next option would be LXC/LDX https://docs.slackware.com/howtos:misc:lxc
I'm not claiming Linux can't. I was stating why. I know Linux can do what FreeBSD can do, but BSD does it better.
However both are pretty much mimicked concept from Solaris Zones
Your timeline is off.
FreeBSD and Linux were within a year of each other (not in-tree on linux, but Debian packaged VServer kernels and VServer + GRSec kernels; we used vserver+grsec debian packaged kernels at work in the mid through late '00s).
Solaris containers came years later.
2000 - FreeBSD jails
2001 - Linux VServer
2004 - Solaris Containers
https://blog.aquasec.com/a-brief-history-of-containers-from-...
Specifically the suggestion to use lo# instead of vnet.
I ran FreeBSD servers with jails around ‘00. They worked fine.
Not for me, but in hindsight, maybe the FreeBSD daemon mascotte had something to do with it.
But around that time bea weblogic and ibm websphere also became popular, and I don’t think Java was officially supported. It was also a period where people tended to run oracle, but I’m not sure if that tab on Linux or something else. I think it was some custom Linux “unbreakable oracle”
Ironically, just a few years later, SCO thought "hey that was a good idea" and did it to Linux too.
High info density, consistent look, “responsive” without being responsive.