Tagged with: [ Bytecode ] [ Internals ] [ PHP ]
Take a variable, increment it with 1. That sounds like a simple enough job right? Well.. from a PHP developer point of view that might seem the case, but is it really? There are bound to be some catches to it (otherwise we wouldn’t write a blogpost about it). So, there are a few different ways to increment a value, and they MIGHT seem similar, they work and behave differently under the hood of PHP, which can lead to - let’s say - interesting results.
Let’s take a look:
There seems to be many different ways of adding
1 to a variable. Take a look at these three examples:
Different code, but all three blocks will increment the number. But will they all result in the same output?
Seems intuitive enough and they look all equal enough. So it seems that using
$a++ is just as valid as using
$a += 1
for incrementing. But let’s take a look at another example:
I reckon most aren’t expecting this outcome! Maybe some of you probably knew that adding something to a string will
result in different characters, and guessed the
fop string right, but the two
int(1)’s? Where do they come
from? From a PHP developer’s point of view, it seems very inconsistent and now it seems that these three statements
actually aren’t equal. But let’s take a look what is actually happening under the hood of PHP when executing the code.
When a PHP script runs, the first thing it does is actually compile your PHP code into an intermediate format called byte code (this also debunks the fact that PHP is a truly interpreted language, it’s the byte code that gets interpreted, but not the actual PHP source code). Our example code will output the following byte code:
compiled vars: !0 = $a, !1 = $b, !2 = $c line #* E I O op fetch ext return operands --------------------------------------------------------------------------- 3 0 E > ASSIGN !0, 1 4 1 POST_INC ~1 !0 2 FREE ~1 5 3 SEND_VAR !0 4 DO_FCALL 1 'var_dump' 7 5 ASSIGN !1, 1 8 6 ASSIGN_ADD 0 !1, 1 9 7 SEND_VAR !1 8 DO_FCALL 1 'var_dump' 11 9 ASSIGN !2, 1 12 10 ADD ~7 !2, 1 11 ASSIGN !2, ~7 13 12 SEND_VAR !2 13 DO_FCALL 1 'var_dump' 14 > RETURN 1
You can create this kind of opcodes easily yourself with the help of Derick Rethans VLD Debugger or online through 3v4l.org. Don’t worry about what it all means. If we get rid of all uninteresting things, we only keep these lines:
compiled vars: !0 = $a, !1 = $b, !2 = $c line #* E I O op fetch ext return operands --------------------------------------------------------------------------- 4 1 POST_INC ~1 !0 2 FREE ~1 8 6 ASSIGN_ADD 0 !1, 1 12 10 ADD ~7 !2, 1 11 ASSIGN !2, ~7
$a++ results into 2 opcodes (
$a += 1 into one (
$a = $a + 1 into
two again. Notice that all three of them result into different codes, already implying that the actual code that will be
executed by PHP will also differ.
Unary increment operator
Let’s talk about the first way of incrementing, the unary incremental operator (
$a++). This PHP code will result into
POST_INC opcode. (its partner
PRE_INC would be the result of
++$a and you should know the difference between
the two). The second opcode
FREE actually frees up the result from
POST_INC, as we don’t use its return value (since
POST_INC changes the actual operand in-place). We can ignore this opcode for our case.
The magic that defines what will happen when these opcodes are executed is located in the file called
which can be found in the actual C source code of PHP. It’s a large C-language header file full of macro’s so it might
be a bit hard to read, even if you know C. Let’s take a look what happens during a
POST_INC opcode call, defined at line 971 of that file (don’t worry, you don’t need to
know about C):
In a nutshell it does the following:
- Check if the variable (
$ain PHP code, which in our bytecode is referenced as
!0) is of the type
long. Basically it means to check if the variable contains a number (even though PHP is loose-typed, every variable still has a “type”, and it can switch these types, which we see later). If it’s a
long, it will call the C function
fast_increment_function()and returns to the next opcode.
- If the variable is not a number, it will do some basic checks to see if incrementing is possible (you can’t do this
for instance on string offsets:
$a = "foobar"; $a++which will result in an error).
- Next, check if the variable is a non-existing property of an object and that the object has a
__setmagic PHP methods. If so, use the
__getto fetch the correct value, call
fast_increment_function()and store the value by calling the
__setmethod (it actually calls these methods from C, not within PHP).
- Finally, if the variable is not a property, just call the
As you can see, incrementing a number, behaves differently based on the type of the variable. Pretty much it boils down
fast_increment_function when it’s a number or when it’s a magic property, and calling
increment_function() otherwise. We’ll discuss these functions below, as the real work will be done there.
fast_increment_function() is a function located in zend_operators and its job is to increment a certain
variable as fast as possible.
If the given variable is a long, it will actually use some very fast assembly code to increment the value. If a value
reached the maximum int value (
LONG_MAX), the variable gets automatically converted to a double. Since this piece of
code is written assembly, this is the fastest way to actually increase a number (provided that the compiler cannot
optimize its C code better than this assembly code), but it can only work when the variable is a long. If the given
variable is not a long, it will simply redirect to the
increment_function(). Since incrementing (and decrementing)
will happen mostly in very tight inner loops (like in
for-statements for instance), doing this as fast as possible is
mandatory in keeping PHP quick.
So if the
fast_increment_function() is the fast way of incrementing a number, the
increment_function is the
way of doing this. How something is incremented from this point, is again based on the type of the variable.
- If the variable is a long, it will simply increase the number (and convert it to a double, if we reached the maximum
value that can be stored inside a long). Most of the time, this would already be taken care of by the
fast_increment_function, but it might happen that we enter this function with a long anyway, so we must check it here as well.
- If the variable is a double, we simply increase the double.
- If the variable is a NULL, we return a long 1 (always!).
- If the variable is a string, we do some
magicwe discuss later.
- If the variable is an object, and has
internaloperator functionality, call the
addoperator to add the long
1to it. Note that this only works for
internalclasses that manually have defined these operator functions, as you cannot define operators on objects in userland PHP code. The only class I found in a quick scan through the PHP source code, that actually implements this, is the
GMPclass so you can do
$a = new gmp(1) + new gmp(3); // gmp(4). This is actually a new feature of GMP since PHP 5.6, but operator overloading is something that is not directly possible in PHP.
- If the variable of some other type than the ones above, we can’t increment it and return a failure code.
So it takes care of objects, doubles, nulls etc. It does not handle for instance booleans, indicating that you cannot
increment a boolean. So
$a = false; $a++ won’t work, but also won’t return an error. It just won’t change the variable
Now for the fun part. Incrementing strings. Dealing with strings is always tricky, but here is what happens:
First, a check is done to see if a given string actually contains a
number. For instance, the string
123. This string-number will be converted into an actual long number (thus
int(123)). There are few
catches though when trying to convert:
- White spaces are stripped.
- Hex numbers are supported (
- Octal and binary (
b11) are not supported.
- Scientific notation is supported (
- Doubles are supported.
- Pre or postfixed number string (like:
135abc or ab123) are not supported and are not considered a number.
If the output of this check is a long or double, it will simply increase the number. This means that when using a
123 and increment it, the output will be
int(124) (note that it changes the variable type from a string to
If the string could not be converted into a long or double, it will call the function
PHP uses a
perl-like string incrementing system. If a string is empty, it will return simply
it will use a carry-system to increment the string:
Start from the back of the string. If the character is between ‘a’ or ‘z’, increment this character
b, etc). If the character is
z, wrap around to
a, and carry one over to the string position
ac (no carry needed),
we carry one character).
Same goes with uppercase
Z and with digits
9. When incrementing a
9 it wraps to
0 and carries one.
When we reach the beginning of the string, and we need to carry, we just add another character IN FRONT
of the string, of the same
type that we carried:
So when incrementing a string, we can never
change the type of each character: if it’s a lowercase letter, it will
always stay a lower case letter.
But be careful: when incrementing a “string-number” multiple times:
string("2D9") will result in
string("2D9") is not a number, thus the regular
string increment will happen). But, when incrementing
string("2E0"), it will result in
2E0 is the
scientific notation for
2, thus it will convert it to a double, and then increment that double into 3. So be careful
with loops and increments!
This string-increment system also might explain why we can increment the string “Z” to “AA”, but why we cannot decrement
“AA” back to “Z”. We could decrement the last “A” back to a “Z”, but what would we do with the first “A”? Should it
decrement also to a “Z” because of a (negative) carry? What what about “0A”? Would that become
Z? But if so,
incrementing that again, will result into
AA. In other words: we cannot simply remove characters during decrementing
like we can add characters when incrementing.
Add assignment expression
So let’s take a look at the second PHP code, which is the
add assignment expression (basically
$a += 1). This seems
similar to the unary increment operator, but behaves differently, in both generated opcodes and in actual execution. It
is ultimately processed by the zend_binary_assign_op_helper, which after some checks,
<a href="http://lxr.php.net/xref/phpng/Zend/zend_operators.c#921">add_function</a>, with 2 operands:
add_function behaves differently based on the types of the variables. It mainly consists of doing a type-check on
the operand pair, to see what the variable types are of both operands:
- If the two operands are both
long, their values are simply added (and the result is converted to a double if overflowed).
- If the two operands are a
double, both will be converted to a double, and added.
- If the two operands are doubles, they are simply added together.
- If both operands are arrays, they will be merged based on keys:
$a = [ 'a', 'b' ] + [ 'c', 'd' ];, will result in
[ 'a', 'b'], as it will merge the second array, BUT they happen to have the same keys. Note that it does < strong>not</strong> merge on values, only on keys.
- Next, it will try and see if the operands are objects, and checks if the the first operand has internal operator
functionality (just like in the
increment_function()method). Again, this is not something that you can create yourself in php, but is only supported for internal classes like the
If all fails, because the operands are of different types (like being a string and a long), it will convert both
operands into scalars through the
zendi_convert_scalar_to_number method. When converted, it will basically retry the
add_function again, but this time, it will probably match one of the pairs above.
Converting a scalar to number depends on the scalar type. It basically boils down to this:
- If the scalar is a string, check to see if it contains a number through
is_numeric_string. If it does not contain a numerical value, return
- If the scalar is
null, or boolean
- If the scalar is a boolean
- If the scalar is a resource, return the numerical value of the resource number.
- If the scalar is an object, try and cast the object to a long (just like the internal operators, there could also be internal cast functionality, again, not always implemented, and only available for core classes, not php userland classes).
The add operator is the simplest one of the three. It boils down to calling the function
fast_increment_function(), is uses some direct assembly code to add the numbers if both operands are a long
or double. If not possible, it will redirect to the
add_function(), which is the same one that is used by the
Since both the add-operator and the add-assignment expression both use the same underlying functionality, doing a
$a + 1 and
$a += 1 are equal in working. The only exception is that the add operator CAN result in a fast adding, if
both operands are long or double, so IF you want to do some micro-optimization, an
$a = $a + 1 will be faster than
$a += 1. Not only because the
fast_add_function(), but also we don’t need to process the additional bytecode to
store the result back into
Incrementing values behave differently from adding values: the
converts types into compatible
pairs, while the
increment_function does not. We can explain the following results now:
increment_function does not convert the boolean value (it’s not a number, or a string that can be converted into
a number), it fails (silently) and does not increment the value. Thus leaving it to
tries to match a
long, which doesn’t exist. Thus is converts both values to long: the
int(1) just stays
int(1). Now we have a
long pair, so the
can simply add them, resulting in
int(1). (Question: what would a boolean
Another weird thing we can explain now:
The increment does a normal string increment, as it cannot convert the string into a number. The add
expressions convert the strings into longs, by checking if a number is present. Since there isn’t,
it will convert the string to
int(0)and simply add
int(1) to it.