Sometimes PREG_SPLIT_DELIM_CAPTURE does strange results.
<?php
$content = '<strong>Lorem ipsum dolor</strong> sit <img src="test.png" />amet <span class="test" style="color:red">consec<i>tet</i>uer</span>.';
$chars = preg_split('/<[^>]*[^\/]>/i', $content, -1, PREG_SPLIT_NO_EMPTY | PREG_SPLIT_DELIM_CAPTURE);
print_r($chars);
?>
Produces:
Array
(
[0] => Lorem ipsum dolor
[1] => sit <img src="test.png" />amet
[2] => consec
[3] => tet
[4] => uer
)
So that the delimiter patterns are missing. If you wanna get these patters remember to use parentheses.
<?php
$chars = preg_split('/(<[^>]*[^\/]>)/i', $content, -1, PREG_SPLIT_NO_EMPTY | PREG_SPLIT_DELIM_CAPTURE);
print_r($chars); //parentheses added
?>
Produces:
Array
(
[0] => <strong>
[1] => Lorem ipsum dolor
[2] => </strong>
[3] => sit <img src="test.png" />amet
[4] => <span class="test" style="color:red">
[5] => consec
[6] => <i>
[7] => tet
[8] => </i>
[9] => uer
[10] => </span>
[11] => .
)
preg_split
(PHP 4, PHP 5)
preg_split — Divide a string por uma expressão regular
Descrição
Divide uma dada string por uma expressão regular.
Parâmetros
- pattern
-
O padrão a ser usado.
- subject
-
A string de entrada.
- limit
-
Se especificado, então somente limit pedaços da string serão retornados, e se limit for -1, significa "sem limite", que é útil quando especificando flags .
- flags
-
flags pode ser uma combinação das seguintes flags (combinada com o operador | bit-a-bit):
- PREG_SPLIT_NO_EMPTY
- Se esta flag é usada, somente pedaços não vazios serão retornados pela preg_split().
- PREG_SPLIT_DELIM_CAPTURE
- Se esta flag é usada, expressão entre parênteses no padrão serão capturados e retornados também.
- PREG_SPLIT_OFFSET_CAPTURE
-
Se esta flag é usada, para cada combinação o offset da string será também retornado. Note que isto modifica o valor de retorno em um array onde cada elemento é um array contendo a string combinada no índice 0 e o offset da mesma em subject no índice 1.
Valor Retornado
Retorna um array contendo pedaços de strings de subject divididos pelo que for combinado pelo pattern .
Histórico
| Versão | Descrição |
|---|---|
| 4.3.0 | A PREG_SPLIT_OFFSET_CAPTURE foi adicionada |
| 4.0.5 | A PREG_SPLIT_DELIM_CAPTURE foi adicionada |
| 4.0.0 | O parâmetro flags foi adicionado |
Exemplos
Exemplo #1 Exemplo da preg_split(): Obtendo partes de uma string
<?php
// reparte a frase por algum número caracteres de vírgula ou espaço,
// incluindo " ", \r, \t, \n e \f
$keywords = preg_split("/[\s,]+/", "hypertext language, programming");
?>
Exemplo #2 Dividindo a string em cada caractere que a compõe
<?php
$str = 'string';
$chars = preg_split('//', $str, -1, PREG_SPLIT_NO_EMPTY);
print_r($chars);
?>
Exemplo #3 Dividindo a string pela combinação e seus offsets
<?php
$str = 'hypertext language programming';
$chars = preg_split('/ /', $str, -1, PREG_SPLIT_OFFSET_CAPTURE);
print_r($chars);
?>
O exemplo acima irá imprimir:
Array
(
[0] => Array
(
[0] => hypertext
[1] => 0
)
[1] => Array
(
[0] => language
[1] => 10
)
[2] => Array
(
[0] => programming
[1] => 19
)
)
Notas
Se você não precisa do poder das expressões regulares, pode optar por alternativa mais rápidas como explode() ou str_split().
Veja Também
- spliti() - Separa strings em array utilizando expressões regulares insensíveis a maiúsculas e minúsculas
- split() - Separa strings em array utilizando expressões regulares
- implode() - Junta elementos de uma matriz em uma string
- preg_match() - Perform a regular expression match
- preg_match_all() - Perform a global regular expression match
- preg_replace() - Perform a regular expression search and replace
preg_split
24-Oct-2009 08:26
06-Oct-2009 06:23
To split a camel-cased string using preg_split() with lookaheads and lookbehinds:
<?php
function splitCamelCase($str) {
return preg_split('/(?<=\\w)(?=[A-Z])/', $str);
}
?>
24-Sep-2009 07:34
If you want to use something like explode(PHP_EOL, $string) but for all combinations of \r and \n, try this one:
<?php
$text = "A\nB\rC\r\nD\r\rE\n\nF";
$texts = preg_split("/((\r(?!\n))|((?<!\r)\n)|(\r\n))/", $text);
?>
result:
array("A", "B", "C", "D", "", "E", "", "F");
01-Aug-2009 05:57
Extending m.timmermans's solution, you can use the following code as a search expression parser:
<?php
$search_expression = "apple bear \"Tom Cruise\" or 'Mickey Mouse' another word";
$words = preg_split("/[\s,]*\\\"([^\\\"]+)\\\"[\s,]*|" . "[\s,]*'([^']+)'[\s,]*|" . "[\s,]+/", $search_expression, 0, PREG_SPLIT_NO_EMPTY | PREG_SPLIT_DELIM_CAPTURE);
print_r($words);
?>
The result will be:
Array
(
[0] => apple
[1] => bear
[2] => Tom Cruise
[3] => or
[4] => Mickey Mouse
[5] => another
[6] => word
)
1. Accepted delimiters: white spaces (space, tab, new line etc.) and commas.
2. You can use either simple (') or double (") quotes for expressions which contains more than one word.
28-May-2009 02:36
Spacing out your CamelCase using preg_replace:
<?php
function spacify($camel, $glue = ' ') {
return preg_replace( '/([a-z0-9])([A-Z])/', "$1$glue$2", $camel );
}
echo spacify('CamelCaseWords'), "\n"; // 'Camel Case Words'
echo spacify('camelCaseWords'), "\n"; // 'camel Case Words'
?>
27-May-2009 08:11
Here's a helpful function to space out your CamelCase using preg_split:
<?php
function spacify($camel, $glue = ' ') {
return $camel[0] . substr(implode($glue, array_map('implode', array_chunk(preg_split('/([A-Z])/',
ucfirst($camel), -1, PREG_SPLIT_NO_EMPTY | PREG_SPLIT_DELIM_CAPTURE), 2))), 1);
}
echo spacify('CamelCaseWords'); // 'Camel Case Words'
echo spacify('camelCaseWords'); // 'camel Case Words'
?>
23-May-2009 12:56
If you need convert function arguments without default default values and references, you can try this code:
<?php
$func_args = '$node, $op, $a3 = NULL, $form = array(), $a4 = NULL'
$call_arg = preg_match_all('@(?<func_arg>\$[^,= ]+)@i', $func_args, $matches);
$call_arg = implode(',', $matches['func_arg']);
?>
Result: string = "$node,$op,$a3,$form,$a4"
27-Mar-2009 05:02
how to display a shortened text string with an elipsis, but on word boundaries only.
<?php
function truncate($string, $max = 70, $rep = '...') {
$words = preg_split("/[\s]+/", $string);
$newstring = '';
$numwords = 0;
foreach ($words as $word) {
if ((strlen($newstring) + 1 + strlen($word)) < $max) {
$newstring .= ' '.$word;
++$numwords;
} else {
break;
}
}
if ($numwords < count($words)) {
$newstring .= $rep;
}
return $newstring;
}
?>
hope this helps someone! thanks for all the help from everyone else!!
17-Mar-2009 07:06
If the task is too complicated for preg_split, preg_match_all might come in handy, since preg_split is essentially a special case.
I wanted to split a string on a certain character (asterisk), but only if it wasn't escaped (by a preceding backslash). Thus, I should ensure an even number of backslashes before any asterisk meant as a splitter. Look-behind in a regular expression wouldn't work since the length of the preceding backslash sequence can't be fixed. So I turned to preg_match_all:
<?php
// split a string at unescaped asterisks
// where backslash is the escape character
$splitter = "/\\*((?:[^\\\\*]|\\\\.)*)/";
preg_match_all($splitter, "*$string", $aPieces, PREG_PATTERN_ORDER);
$aPieces = $aPieces[1];
// $aPieces now contains the exploded string
// and unescaping can be safely done on each piece
foreach ($aPieces as $idx=>$piece)
$aPieces[$idx] = preg_replace("/\\\\(.)/s", "$1", $piece);
?>
17-Jul-2008 06:17
<?php
$s = '<p>bleh blah</p><p style="one">one two three</p>';
$htmlbits = preg_split('/(<p( style="[-:a-z0-9 ]+")?>|<\/p>)/i', $s, -1, PREG_SPLIT_DELIM_CAPTURE);
print_r($htmlbits);
?>
Array
(
[0] =>
[1] => <p>
[2] => bleh blah
[3] => </p>
[4] =>
[5] => <p style="one">
[6] => style="one"
[7] => one two three
[8] => </p>
[9] =>
)
two interesting bits:
1. When using PREG_SPLIT_DELIM_CAPTURE, if you use more than one pair of parentheses, the result array can have members representing all pairs. See array indexes 5 and 6 to see two adjacent delimiter results in which the second is a subset match of the first.
2. If a parenthesised sub-expression is made optional by a following question mark (ex: '/abc (optional subregex)?/') some split delimiters may be captured in the result while others are not. See array indexes 1 and 2 to see an instance where the overall match succeeded and returned a delimiter while the optional sub-expression '( style="[-:a-z0-9 ]+")?' did not match, and did not return a delimiter. This means it's possible to have a result with an unpredictable number of delimiters in the result array.
This second aspect is true irrespective of the number of pairs of parentheses in the regex. This means: in a regular expression with a single optional parenthesised sub-expression, the overall expression can match without generating a corresponding delimiter in the result.
29-May-2008 08:56
For people who want to use the double quote to group words/fields, kind of like CSV does, you can use the following expression:
<?php
$keywords = preg_split( "/[\s,]*\\\"([^\\\"]+)\\\"[\s,]*|[\s,]+/", "textline with, commas and \"quoted text\" inserted", 0, PREG_SPLIT_DELIM_CAPTURE );
?>
Which will result in:
Array
(
[0] => textline
[1] => with
[2] => commas
[3] => and
[4] => quoted text
[5] => inserted
)
04-Sep-2007 06:29
I was having trouble getting the PREG_SPLIT_DELIM_CAPTURE flag to work because I missed reading the "parenthesized expression" in the documentation :-(
So the pattern should look like:
/(A)/
not just
/A/
and it works as described/expected.
23-Mar-2005 02:41
preg_split() behaves differently from perl's split() if the string ends with a delimiter. This perl snippet will print 5:
my @a = split(/ /, "a b c d e ");
print scalar @a;
The corresponding php code prints 6:
<?php print count(preg_split("/ /", "a b c d e ")); ?>
This is not necessarily a bug (nowhere does the documentation say that preg_split() behaves the same as perl's split()) but it might surprise perl programmers.
25-Sep-2004 01:01
To clarify the "limit" parameter and the PREG_SPLIT_DELIM_CAPTURE option,
<?php
$preg_split('(/ /)', '1 2 3 4 5 6 7 8', 4 ,PREG_SPLIT_DELIM_CAPTURE );
?>
returns:
('1', ' ', '2', ' ' , '3', ' ', '4 5 6 7 8')
So you actually get 7 array items not 4
29-May-2002 05:01
The above description for PREG_SPLIT_OFFSET_CAPTURE may be a bit confusing.
When the flag is or'd into the 'flags' parameter of preg_split, each match is returned in the form of a two-element array. For each of the two-element arrays, the first element is the matched string, while the second is the match's zero-based offset in the input string.
For example, if you called preg_split like this:
preg_split('/foo/', 'matchfoomatch', -1, PREG_SPLIT_OFFSET_CAPTURE);
it would return an array of the form:
Array(
[0] => Array([0] => "match", [1] => 0),
[1] => Array([1] => "match", [1] => 8)
)
Note that or'ing in PREG_DELIM_CAPTURE along with PREG_SPLIT_OFFSET_CAPTURE works as well.
