Pyspawner¶
Subprocess that spawns children quickly, using clone().
How to use¶
Create a pyspawner.Client
that imports the “common” Python imports
your sandboxed code will run. (These import
statements aren’t sandboxed,
so be sure you trust the Python modules.)
Then call pyspawner.Client.spawn_child()
each time you want to create
a new child. It will invoke the pyspawner’s child_main
function with the
given arguments.
Here’s pseudo-code for invoking the pyspawner part:
import pyspawner
# pyspawner.Client() is slow; ideally, you'll just call it during startup.
with pyspawner.Client(
child_main="mymodule.main",
environment={"LC_ALL": "C.UTF-8"},
preload_imports=["pandas"], # put all your slow imports here
) as cloner:
# cloner.spawn_child() is fast; call it as many times as you like.
child_process: pyspawner.ChildProcess = cloner.spawn_child(
args=["arg1", "arg2"], # List of picklable Python objects
process_name="child-1",
sandbox_config=pyspawner.SandboxConfig(
chroot_dir=Path("/path/to/chroot/dir"),
network=pyspawner.NetworkConfig()
)
)
# child_process has .pid, .stdin, .stdout, .stderr.
# Read from its stdout and stderr, and then wait for it.
For each child, read from stdout and stderr until end-of-file; then wait() for the process to exit. Reading from two pipes at once is a standard exercise in UNIX, so the minutae are left as an exercise. A safe approach:
Register both stdout and stderr in a
selectors.DefaultSelector
loop, calling
selectors.BaseSelector.select()
and reading from whichever file descriptors have data. Unregister whichever file descriptors reach EOF; and read but _ignore_ data past a predetermined buffer size. Kill the child process if this is taking too long. (Keep reading after killing the child to avoid deadlock.)Wait for the child process (using
os.waitpid()
) to clean up its system resources.
Setting up your environment¶
Your system must have libcap.so.2
installed. In Debian, the libcap2
package provides it.
Pyspawner relies on Linux’s clone()
system call to create child-process
containers. If you’re using pyspawner from a Docker container, subcontainer
are disabled by default. Run Docker with
--seccomp-opt=/path/to/pyspawner/docker/pyspawner-seccomp-profile.json
to
allow creating subcontainers.
By default, sandboxed children cannot access the Internet. If you want to
enable networking for child processes, ensure your process has the
CAP_NET_ADMIN
capability. (docker run --cap-add NET_ADMIN ...
).
Also, you’ll need to configure NAT in the parent-process environment …
which is beyond the scope of this README. Finally, you may want to supply a
chroot_dir
to give child processes a custom /etc/resolv.conf
.
Ideally, sandboxed children would not be able to write anywhere on the main
filesystem. Unfortunately, the umount()
and pivot_root()
system calls
are restricted in many environments. As a placeholder, you’re encouraged to
supply a chroot_dir
to provide an environment for your sandboxed child
code. chroot_dir
must be in a separate filesystem from the root filesystem.
(In the future, when the Linux container ecosystem evolves enough,
chroot_dir
will make children unmount the root filesystem.) Again, chroot
is beyond the scope of this README.
-
class
pyspawner.
ChildProcess
(pid: int, stdin: BinaryIO, stdout: BinaryIO, stderr: BinaryIO)¶ A handle for the parent to interact with a spawned child process.
This is akin to a subprocess.Popen object … but with fewer features. (Rationale: subprocess.Popen has too many features.)
-
kill
()¶ Terminate the child process with
SIGKILL
.- Return type
None
-
pid
: int¶ Child process ID as seen from the parent.
(The child process will see its own ID as
1
.)
-
stderr
: BinaryIO¶ Readable pipe, written in the child as
sys.stderr
.
-
stdin
: BinaryIO¶ Writable pipe, readable in the child as
sys.stdin
.
-
stdout
: BinaryIO¶ Readable pipe, written in the child as
sys.stdout
.
-
wait
(options)¶ Wait for the child process to complete.
You must call this for every child process. Otherwise, children will become zombie processes when they terminate, consuming system resources.
- Return type
Tuple
[int
,int
]
-
-
class
pyspawner.
Client
(*, child_main, environment={}, preload_imports=[], executable='/home/docs/checkouts/readthedocs.org/user_builds/pyspawner/envs/latest/bin/python')¶ Launch Python quickly, sharing most memory pages.
The problem this solves: we want to spin up many children quickly; but as soon as a child starts running we can’t trust it. Starting Python with lots of imports like Pyarrow+Pandas can take ~2s and cost ~100MB RAM.
The solution: a mini-server process, the “pyspawner”, preloads Python modules. Then we clone() each time we need a subprocess. (clone() is near-instantaneous.) Beware: since clone() copies all memory, the “pyspawner” shouldn’t load anything sensitive before clone(). (No Django: it reads secrets!)
This is similar to Python’s multiprocessing.forkserver, except…:
Children are not managed. It’s up to the caller to kill and wait for the process. Children are direct children of the _caller_, not of the pyspawner. (We use CLONE_PARENT.)
asyncio-safe: we don’t listen for SIGCHLD, because asyncio’s subprocess-management routines override the signal handler.
Thread-safe: multiple threads may spawn multiple children, and they may all run concurrently (unless child code writes files or uses networking).
No multiprocessing.context. This Client is the context.
No Connection (or other high-level constructs).
The caller interacts with the pyspawner process via _unnamed_ AF_UNIX socket, rather than a named socket. (multiprocessing writes a pipe to /tmp.) No messing with hmac. Instead, we mess with locks. (“Aren’t locks worse?” – [2019-09-30, adamhooper] probably not, because clone() is fast; and multiprocessing and asyncio have a race in Python 3.7.4 that causes forkserver children to exit with status code 255, so their named-pipe+hmac approach does not inspire confidence.)
- Parameters
child_main (
str
) – The full name (including module name) of the function each child should run. (Must be importable.)environment (
Dict
[str
,str
]) – Environment variables for child processes. (Must all be str.)preload_imports (
List
[str
]) – List of module names pyspawner should import at startup. These modules (plus pyspawner’s internal imports) will be preloaded in all child processes.executable (
str
) – Python executable to invoke. (Default: current-process executable).
-
close
()¶ Kill the pyspawner.
Spawned child processes continue to run: they are entirely disconnected from their pyspawner.
- Return type
None
-
spawn_child
(args=[], *, process_name=None, sandbox_config)¶ Make our server spawn a process, and return it.
- Parameters
args (
List
[Any
]) – List of arguments to pass to the child-process function. (Must be picklable.)process_name (
Optional
[str
]) – Process name to display for the child process inps
and other sysadmin tools. (Useful for debugging.)sandbox_config (pyspawner.SandboxConfig) – Sandbox settings.
- Raises
OSError – if the clone() system call fails.
pyroute2.NetlinkError – if network configuration fails.
- Return type
-
class
pyspawner.
NetworkConfig
(kernel_veth_name: str = 'veth-pyspawn', child_veth_name: str = 'veth-pyspawn-c', kernel_ipv4_address: str = '192.168.123.1', child_ipv4_address: str = '192.168.123.2')¶ Network configuration that lets children access the Internet.
Pyspawner will create a veth interface that may be used to route traffic from the child to the Internet via network address translation (NAT). You must write the iptables rules yourself! pyspawner does not invoke iptables! The intent is for you to set up iptables rules once, and then reuse the same rules for every clone.
One iptables rule to route network traffic from a child process to the Internet:
iptables -t nat -a POSTROUTING -s [child_ipv4_address] -j SNAT --to-source=[our IP address]
You should also firewall the traffic to secure the rest of your network from sandboxed processes. See
tests/setup-sandbox.sh
for a minimal set of iptables rules.We do not yet support IPv6, because Kubernetes support is shaky. Follow https://github.com/kubernetes/kubernetes/issues/62822.
Here’s how networking works. When cloning, the child process gets a new, anonymous network namespace. pyspawner creates a veth pair, and it passes the “child” veth interface to the child process. The child process brings up its network interface and can only see the public Internet.
After the child dies, the Linux kernel will delete the network interface. (There’s a bit of a race here: the interface may exist a few milliseconds after the child dies. Pyspawner will explicitly ensure the interface is deleted before creating it.)
Beware if running multiple children at once that all access the Internet. Each must have a unique interface name and IP addresses.
The default values match those in tests/setup-sandbox.sh. Don’t edit one without editing the other.
-
child_ipv4_address
: str = '192.168.123.2'¶ IPv4 address of the child.
The kernel will maintain iptables rules to route from this IP address to the public Internet.
This must be in the same /24 network block as kernel_ipv4_address.
-
child_veth_name
: str = 'veth-pyspawn-c'¶ Name of veth interface run by the child.
Maximum length is 15 characters. Any longer gives NetlinkError 34.
This name must not conflict with any other network device in the kernel’s container. (The kernel creates this device before sending it into the child’s network namespace.)
-
kernel_ipv4_address
: str = '192.168.123.1'¶ IPv4 address of the kernel.
This must not conflict with any other IP address in the kernel’s container.
This should be a private address. Be sure it doesn’t conflict with your network’s addresses. Kubernetes uses 10.0.0.0/8; Docker uses 172.16.0.0/12. The hard-coded “192.168.123/24” should be safe for Docker and Kubernetes.
The child will use this address as its default gateway.
-
kernel_veth_name
: str = 'veth-pyspawn'¶ Name of veth interface run by the kernel.
Maximum length is 15 characters. Any longer gives NetlinkError 34.
This name must not conflict with any other network device in the kernel’s container.
-
-
class
pyspawner.
SandboxConfig
(chroot_dir: Union[pathlib.Path, NoneType] = None, network: Union[pyspawner.sandbox.NetworkConfig, NoneType] = None, skip_sandbox_except: FrozenSet[str] = <factory>)¶ -
chroot_dir
: Optional[pathlib.Path] = None¶ Setting for “chroot” security layer.
If chroot_dir is set, it must point to a directory on the filesystem. Remember that we call setuid() to an extreme UID (>65535) by default: that means the child will only be able to read files that are world-readable (i.e., “chmod o+r”).
(TODO chroot_dir should use pivot_root, for security. When Kubernetes lets us modify our mount namespace in an unprivileged container, switch to pivot_root.)
-
network
: Optional[pyspawner.sandbox.NetworkConfig] = None¶ If set, network configuration so child processes can access the Internet.
If None, child processes have no network interfaces.
-
skip_sandbox_except
: FrozenSet[str]¶ Security layers to enable in child processes. (DO NOT USE IN PRODUCTION.)
MUST BE EXACTLY frozenset(). Other values are only for unit tests. See protocol.SpawnChild for details.
By default, child processes are sandboxed: user code should not be able to access the rest of the system. (In particular, it should not be able to access parent-process state; influence parent-process behavior in any way but its stdout, stderr and exit code; or communicate with any internal services.)
Our layers of sandbox security overlap: for instance: we (a) restrict the user code to run as non-root _and_ (b) disallow root from escaping its chroot. We can’t test layer (b) unless we disable layer (a); and that’s what this feature is for.
By default, all sandbox features are enabled. To enable only a subset, set skip_sandbox_except to a frozenset() with one or more of the following strings:
“drop_capabilities”: limit root’s capabilities
“setuid”: become an anonymous, non-root user
“no_new_privs”: prevent setuid-root programs from gaining capabilities
“seccomp”: filter system calls
-