Tagged with: [ regexiterator ] [ spl ]
If everything goes according to plan (which never is the case), I’ll try and highlight some of the fascinating stuff that can be found inside the SPL. I do a lot of presentations about the SPL, and one of the things I like to tell people is that even though the SPL, - iterators particularly - is a magnificent piece of code that is often underused and misunderstood, it does come with some quirks and glitches that aren’t documented properly.
Today, i’ll explain a bit in-depth the
RegexIterator. This iterator extends the
FilterIterator, meaning it can be
used to filter out unwanted entries from parent iterators.
A simple use-case would be to filter on certain names that are taken from a
directoryIterator. This iterator is very
simple in usage and pretty obvious for most people:
This iterator will now filter out all file names that do NOT start with “foo”.
How does it work:
First of all, the
DirectoryIterator returns by default
SplFileInfo objects, not file names. The
method, the method that does the filtering, will cast anything that is not a string into a string, since that’s
something we can apply our regular expression on. From there, it will do call the
pcre_exec() function, and either
return a boolean
false depending on whether or not there are matches found. When a false is returned, the
regexIterator will not pass this element to the foreach, but continue with the next value.
IF you look at the php.net documentation for the regexIterator constructor, you’ll find that the iterator has 3
additional arguments that can be passed during initalization:
The mode can be one of the following modes that are defined as constants inside the
You can change this mode after you have constructed the iterator. You can use the
setMode($new_mode) on the iterator to
change this mode on the fly. It’s even possible to change this mode inside a
foreach() iteration if you like (even
though i can’t find any reason why you would like to do this).
regexIterator mode is
MATCH. Meaning it will just do a check to see if there was something that actually
matched the regex. It doesn’t do anything with any results, it will just return true when it did, and false otherwise.
GET_MATCH mode behaves a bit differently. It will not only check to see if the regex matches on the current element,
but it will return also information about what capture groups.
Take for instance the following code:
It does not return directly the filtered elements, but an array with the first element the complete element that was matched, and optionally one or more capture groups (sometimes called sub patterns), which can be added inside your regular expression through ().
GET_MATCH will only match once inside each element. If there are multiple matches available, you won’t find
them. For this, you can use
There is a catch though: as you can see, empty elements or elements that do not match, will not get filtered by the
regexIterator but they will show up as empty arrays. This is most likely a bug (as filed as bug #66703).
SPLIT mode, will split your elements through the given regular expression, just like
SPLIT mode does filter correctly. It will return an array with the split values, but there is no way of getting
the original value (like you have with
The last mode is
REPLACE, which allows you to replace values through regular expressions.
Ok, so there isn’t much replacement going on here. It seems that it just checks if there are matches, and if so, remove
those matches and return the result. This is because the default replacement string that is used for
REPLACE is actually
empty. You can change it manually, but this is implemented in a bit of hack’ish way with a public property on the
RegexIterator that you can set:
I wasn’t kidding about the hack’ish way. The documentation suggests that the
REPLACE mode is actually still under
construction and not implemented fully. It does however, support capture groups and placeholders so something like this
is perfectly valid (and seems to work without problems):
Key or value?
Great! So I can filter out values through the
RegexIterator, as it will check the current values taken from the parent
iterator. But what if I want to filter out through the parent iterator keys instead? This is possible too: just use the
RegexIterator::USE_KEY as $flags in the
Besides modes and flags, there is a
$preg_flags argument inside the constructor (also available through
The value of these flags depend on the actual mode that you are using. For instance, the
PREG_PATTERN_ORDER makes sense
ALL_MATCHES, but not really when using the default
PREG_SPLIT_* flags only make sense when
SPLIT mode. See the documentation of pcre to find the flags and what they are actually doing.
|If you are looking for some more information about the SPL, or any of the iterators, why not try my book? It’s available through amazon or through php|architect:http://www.phparch.com/books/mastering-the-spl-library/ and contains a full overview on the SPL and the iterators.|