PHP's Resources and garbage collection

Warning: This blogpost has been posted over two years ago. That is a long time in development-world! The story here may not be relevant, complete or secure. Code might not be complete or obsoleted, and even my current vision might have (completely) changed on the subject. So please do read further, but use it with caution.
Posted on 10 Jul 2013
Tagged with: [ deamon ]  [ garbage collection ]  [ PHP ]  [ strace

Today, I’ve found a nice bug/feature/whatsmathing in PHP. I was playing around with writing a daemon and if you have any experience writing daemons (in any language), there are a few rules you have to live by. For instance, setting your effective uid and gid to a non-privileged user (in case you needed to do some privileged initialization, like opening a socket on a tcp port < 1024), setting the process as a group leader with posix_setsid(), and redirecting stdio file descriptions. And here something went wrong which took a while to find and fix..

So, what’s the case? I’ve written a simple proof of concept for a bit of code that I wanted to use as a daemon. There are multiple ways to do this, ranging from bad to very bad: like placing it into crontab, just adding a & when starting the app, and many other strange and not-very secure/effective ways. Because it’s a PoC and not very OO’ish, I’ve decided just to create a "daemonize()" function which gets called before the “main loop” so that it will be running in a nicely daemon stowed away in the background. If i wanted to do some debugging, I only have to remove the daemonize() function, and the system will run in the foreground. Easy-peasy.

The gist of the daemonize() function looked something like this:

function daemonize() {
    $pid = pcntl_fork();
    if ($pid == -1) die("error while fork()");

    if ($pid > 0) {
        exit();  // Parent just dies
    }

    setproctitle("Daemon - running");

    posix_setsid();
    posix_setegid(-1);
    posix_seteuid(-1);

    fclose(STDIN);
    fclose(STDOUT);
    fclose(STDERR);

    $STDIN = fopen('/dev/null', 'r');
    $STDOUT = fopen('/dev/null', 'wb');
    $STDERR = fopen('/dev/null', 'wb');
}

But whatever I tried, when I run the application it just exits without leaving a daemon running. After some xdebug/commentingout debugging, i’ve found the issue had to do with the fclose() and fopen() lines.

Why open and close in the first place?

Suppose you have a deamon that reads something from stdin? Either it can wait for a keypress, or something else. By redirecting the stdin to /dev/null, your application automatically will receive a EOF upon read so it will not wait indefinitely on something to read. The same thing with stdout: you can still write something to stdout without any errors but it doesn’t end up anywhere.

Normally, this is done with redirecting the stdio handles (file descriptor 0, 1 and 2) to /dev/null, but such a system is not present in PHP. The next best thing we can do, is actually closing the stdio descriptors and opening new ones directly. Whenever PHP (or actually, the OS) detects that either one of the first 3 file descriptors is closed, it will automatically use that file descriptor during the next fopen() call you make. This means that if you close stdio, you MUST open it again straight away, otherwise bad things will happen.

Debugging our script, chuck norris style

After debugging for a while, I tried to see what would happen internally by using strace. This tool allows you to see what happens under the hood by seeing what kind of call are getting made to the operating system. If you know how to interpret its output, you can save literally hours of debugging:

$ strace -ff php daemon.php
   ....SNIP....
[pid 20334] munmap(0x7fe51b865000, 2201520) = 0
[pid 20334] munmap(0x7fe51d7a7000, 2293680) = 0
[pid 20336] set_robust_list(0x7fe527c8eac0, 0x18) = 0
[pid 20336] setsid()                    = 20336
[pid 20336] close(0)                    = 0
[pid 20336] munmap(0x7fe527aeb000, 4096) = 0
[pid 20336] close(1)                    = 0
[pid 20336] munmap(0x7fe527aea000, 4096) = 0
[pid 20336] close(2)                    = 0
[pid 20336] lstat("/dev/null", {st_mode=S_IFCHR|0666, st_rdev=makedev(1, 3), ...}) = 0
[pid 20336] lstat("/dev", {st_mode=S_IFDIR|0755, st_size=3700, ...}) = 0
[pid 20336] open("/dev/null", O_RDONLY) = 0
[pid 20336] fstat(0, {st_mode=S_IFCHR|0666, st_rdev=makedev(1, 3), ...}) = 0
[pid 20336] lseek(0, 0, SEEK_CUR)       = 0
[pid 20336] open("/dev/null", O_WRONLY|O_CREAT|O_TRUNC, 0666) = 1
[pid 20336] fstat(1, {st_mode=S_IFCHR|0666, st_rdev=makedev(1, 3), ...}) = 0
[pid 20336] lseek(1, 0, SEEK_CUR)       = 0
[pid 20336] open("/dev/null", O_WRONLY|O_CREAT|O_TRUNC, 0666) = 2
[pid 20336] fstat(2, {st_mode=S_IFCHR|0666, st_rdev=makedev(1, 3), ...}) = 0
[pid 20336] lseek(2, 0, SEEK_CUR)       = 0
[pid 20336] close(0)                    = 0
[pid 20336] close(1)                    = 0
[pid 20336] close(2)                    = 0
[pid 20336] write(1, ".", 1)            = -1 EBADF (Bad file descriptor)
[pid 20336] write(3, "173\0<?xml version=\"1.0\" encoding"..., 178) = -1 EPIPE (Broken pipe)
[pid 20336] --- SIGPIPE (Broken pipe) @ 0 (0) ---
[pid 20336] recvfrom(3, "", 128, 0, NULL, NULL) = 0
[pid 20336] close(3)                    = 0
[pid 20336] munmap(0x7fe527aa9000, 266240) = 0
[pid 20336] munmap(0x7fe517821000, 2190464) = 0
[pid 20336] munmap(0x7fe518302000, 2129368) = 0

This is strange. After the setsid() call, we see that the next lines are actually doing a close(0), close(1) and close(2). those are respectively closing stdin, stdout and stderr. So this part is what actually gets called when you issue a fclose() in PHP. The next lines look pretty familiar as well. It does some stats on /dev/null, and opens that file on the "open("/dev/null", ..." line. The number at the end of the line, is the actual file descriptor for that file, so you see that it allocates respectively file descriptor 0, 1 and 2. Everything seems to be working!

However, a little bit we see AGAIN a close(0), close(1) and close(2). The next line after that is just some debugging (printing a single dot), but you see that results in -1 because it tries to write to STDOUT, which was just closed.

Our issue is thus with the extra close() calls that gets made. Where did they come from? Is it something to do with fork()? Some PHP magic? Something else?

PHP garbage collection

When you allocate a variable in PHP, internally it will hold a special counter for that variable called a reference count. It’s a simple counter that keeps track on how many times that variable is used. For instance if you instantiate a class and assign in to a variable $foo, the reference count for that class will become 1. If you do: "$bar = $foo", it means that both the variables $bar and $foo will reference your class and its reference-count becomes 2. When we do $foo = 1; afterwards, PHP sees that $foo doesn’t reference to your class anymore, and decreases the reference count again.

This is extremely handy to figure out quickly if PHPs data like variables, or big classes etc, is being referenced or not. If it doesn’t have any references, PHP can actually free the memory from that variable. This way it can keep your memory usage as low as possible and you this way you are able to use lots of variables throughout your application without using tons of memory. The process of freeing up memory when no references are found is called garbage collection (GC).

But what has got GC to do with our bug?

The reason is due to the fact that we are using a function call daemonize(), in which we do our fclose() and fopen(). We actually assign the file descriptors from fopen() to our $stdin, $stdout and $stderr variables. But these variables are local to the function. As soon as the function ends, PHP will detect that these variables aren’t used anymore and cleans them up, because there are no reference anymore. This means that for resources, it automatically closes these resources. This is why we get the extra close() calls: this is PHP just cleaning up.

So, now we know the issue, and we can actually fix it. There is only one way to fix this and that is to make sure that we always keep a reference to these resources so they don’t get garbage collected. Because we are inside a function, we should use globals variables so the variable will still exist after we exit the function. In our case, we can  fix it like this:

fclose(STDIN);
fclose(STDOUT);
fclose(STDERR);

global $STDIN, $STDOUT, $STDERR;
$STDIN = fopen('/dev/null', 'r');
$STDOUT = fopen('/dev/null', 'wb');
$STDERR = fopen('/dev/null', 'wb');

How about globals begin evil huh? Because we now use global variables, PHP will leave the resources alone and doesn’t close them. But make sure we don’t do something like $STDIN = "foo";, because in will decrease the reference count of the $STDIN variable back to 0, and the resource will be cleanup and closed again.

Obviously, you don’t have this issue when you fclose() and fopen() outside any function. This is because you are already in the global space, and there is nothing to exit from (apart from exiting the application).