Examples of how regular expressions work in php. PHP regexp: examples of regular expressions. An example of obtaining images from HTML using the regexp method

Regular expressions allow you to find sequences in a string that match a pattern. For example, the template “Vasya(.*)Pupkin” will allow you to find a sequence when there is any number of any characters between the words Vasya and Pupkin. If we need to find six digits, then we write “(6)” (if, for example, from six to eight digits, then “(6,8)”). This separates things like the character set indicator and the required number indicator:

Instead of a set of characters, the designation of any character can be used - a dot; a specific set of characters can be specified (sequences are supported - the mentioned "0-9"). "Except for this character set" may be specified.

The number of characters is called a "quantifier" in the official PHP documentation. The term is convenient and does not carry misunderstandings. So, a quantifier can have either a specific value - either one fixed value (“(6)”), or a numerical interval (“(6,8)”), or an abstract “any number, including 0” (“ *"), "any natural number" - from 1 to infinity ("+": "document+.txt"), "either 0 or 1" ("?"). The default quantifier for this character set is one ("document.txt").

For a more flexible search for combinations, these “character set - quantifier” links can be combined into metastructures.

Like any flexible tool, regular expressions are flexible, but not absolutely: their area of ​​application is limited. For example, if you need to replace one fixed string in the text with another, fixed one again, use str_replace. PHP developers tearfully beg you not to use complex functions ereg_replace or preg_replace for this purpose, because when they are called, the process of string interpretation occurs, and this seriously consumes system resources. Unfortunately, this is a favorite rake of beginning PHP programmers.

Use regular expression functions only if you don't know exactly what string is there. Examples: search code in which service characters and short words are cut out from the search string, as well as extra spaces are cut out (or rather, all spaces are compressed: “+” is replaced by one space). Using these functions, I check the email of the user leaving his review. There's a lot that can be done, but it's important to keep in mind that regular expressions aren't all-powerful. For example, it is better not to make complex replacements in large text with them. After all, for example, the combination “(.*)” in program terms means searching through all the characters of the text. And if the template is not tied to the beginning or end of the line, then the template itself is “moved” by the program through the entire text, and the result is double search, or rather search squared. It is not difficult to guess that another combination “(.*)” means iterating over a cube, and so on. Raise, say, 5 kilobytes of text to the third power. It turns out 125,000,000,000 (in words: one hundred and twenty-five billion operations). Of course, if you take a strict approach, there will not be so many operations, but four to eight times less, but the order of the numbers itself is important.

Character set
. dotany character
[] square bracketscharacter class ("any of"). For example
[^] negative character class ("any except")
- dashsequence designation in a character class ("" digits)
\d Only numbers
\D[^0-9] Besides numbers
\w Letters and numbers
\W[^a-z0-9]Besides letters and numbers
\s Whitespace characters: space, tab, newline
\S[^ ] Except for whitespace characters
| (one|other)At this place there may be one of the listed options, for example: (Vasya|Petya|Masha). If you don't want it to be included in the selection, use (?: ...)

Don't use a character class to denote just one (instead of "+" "+" will do just fine). Do not write a dot in a character class; this is any character, then other characters in the class will simply be superfluous (and in a negative class you will get the negation of all characters).

Quantifier

A quantifier can indicate both a specific value and limits. If the number specified falls within the limits of the quantifier, the expression fragment is considered to match the string being parsed. Syntax:

{ }

{ , }

If you need to indicate only the required minimum, but there is no maximum, simply put a comma and do not write the second number: “(5,)” (“minimum 5”). There are special notations for the most frequently used quantifiers:

In practice, such symbols are used more often than curly braces.

Anchors

These characters must appear at the very beginning and at the very end of the line, respectively.

Greed The question mark also acts as a quantifier minimizer:
.*?

Result of the example: Greedy version: bold text [b]and here - even bolder returned Lazy version: bold text [b]and here - even bolder returned

The template line, as you have already noticed, begins and ends with slashes. After the second there are parameters:

icase-insensitive search
m

multiline mode. By default, PCRE only looks for pattern matches within a single line, and the "^" and "$" characters match only the beginning and end of the entire text. When this option is set, "^" and "$" match the beginning and end of individual lines.

ssymbol "." (dot) also matches line break (default no)
Aanchor to the beginning of the text
Eforces the "$" character to match only the end of text. Ignored if parameter m is set.
UInverts "greedy" for each quantifier (if a quantifier is followed by "?", that quantifier is no longer "greedy").
eThe replacement string is interpreted as PHP code.
Functions for working with regular expressions
  • preg_grep
  • preg_match - Performs a match against a regular expression. This function only looks for the first match!
  • preg_match_all
  • preg_quote - Escapes characters in regular expressions. Those. inserts slashes before all service characters (for example, brackets, square brackets, etc.) so that they are taken literally. If you have any user input and you are checking it using regular expressions, then it is better to escape the service characters in the incoming variable before doing so
  • preg_replace
  • preg_replace_callback - Performs regular expression search and replacement
  • preg_split
preg_grep

preg_grep function - Returns an array of occurrences that match a pattern

Syntax

array preg_grep (string pattern, array input [, int flags])

preg_grep() returns an array consisting of elements of the input array that match the given pattern.

The flags parameter can take the following values:

PREG_GREP_INVERT
If this flag is set, the preg_grep() function returns those array elements that do not match the specified pattern.
The result returned by preg_grep() uses the same indexes as the original data array. If this behavior doesn't suit you, use array_values() on the array returned by preg_grep() to reindex.
Sample code:

// Returns all array elements // containing floating point numbers $fl_array = preg_grep("/^(\d+)?\.\d+$/", $array);

preg_match

preg_match function - Performs a match against a regular expression

Syntax

int preg_match (string pattern, string subject [, array matches [, int flags [, int offset]]]) Searches the given text subject for matches with the pattern pattern

If the additional matches parameter is specified, it will be filled with search results. The $matches element will contain the portion of the string that matches the entire pattern, $matches will contain the portion of the string that matches the first subpattern, and so on.

flags can take the following values:

PREG_OFFSET_CAPTURE

The search is carried out from left to right, from the beginning of the line. The optional offset parameter can be used to specify an alternative starting position for the search. A similar result can be achieved by replacing subject with substr())($subject, $offset).

The preg_match() function returns the number of matches found. This can be 0 (no matches found) or 1, since preg_match() stops working after the first match found. If you need to find or count all matches, you should use the preg_match_all() function. The preg_match() function returns FALSE if any errors occur during execution.

Recommendation: Do not use the preg_match() function if you need to check for the presence of a substring in a given string. Use strpos() or strstr() for this as they will do the job much faster.

Example code Example code Example code Result of the example:

domain name is: site

preg_match_all

preg_match_all function - Performs a global pattern search in a string

Syntax

int preg_match_all (string pattern, string subject, array matches [, int flags [, int offset]])

Searches the string subject for all matches of the pattern pattern and places the result in the matches array in the order determined by the combination of flags.

After finding the first match, subsequent searches will be carried out not from the beginning of the string, but from the end of the last found occurrence.

The optional flags parameter can combine the following values ​​(be aware that using PREG_PATTERN_ORDER at the same time as PREG_SET_ORDER is meaningless):

PREG_PATTERN_ORDER
If this flag is set, the result will be ordered as follows: the $matches element contains an array of complete occurrences of the pattern, the $matches element contains an array of occurrences of the first subpattern, and so on.

Example code Result of the example: example: , this is a test example: , this is a test

As we can see, $out contains an array of complete occurrences of the pattern, and the $out element contains an array of substrings contained in the tags.

PREG_SET_ORDER
If this flag is set, the result will be ordered as follows: the $matches element contains the first set of occurrences, the $matches element contains the second set of occurrences, and so on.

Example code Result of the example: example: , example: this is a test, this is a test

In this case, the $matches array contains the first set of matches, namely: the $matches element contains the first occurrence of the entire pattern, the $matches element contains the first occurrence of the first subpattern, and so on. Similarly, the $matches array contains the second set of matches, and so on for each match found.

PREG_OFFSET_CAPTURE
If this flag is specified, for each found substring its position in the source string will be indicated. It is important to remember that this flag changes the format of the returned data: each occurrence is returned as an array, the zeroth element of which contains the found substring, and the first element contains the offset.

In case no flag is used, the default is PREG_PATTERN_ORDER.

The search is carried out from left to right, from the beginning of the line. The optional offset parameter can be used to specify an alternative starting position for the search. A similar result can be achieved by replacing subject with substr())($subject, $offset).

Returns the number of occurrences of the pattern found (can be zero) or FALSE if any errors occurred during execution.

Example code Example code Result of the example: matched: bold text part 1: part 2: bold text part 3: matched: click me part 1: part 2: click me part 3: preg_quote

preg_quote function - Escapes characters in regular expressions

Syntax

string preg_quote (string str [, string delimiter])

The preg_quote() function takes the string str and adds a backslash before each special character. This can be useful if the template is composed of string variables, the value of which may change during the script's operation.

If the additional delimiter parameter is specified, it will also be escaped. This is useful for escaping the limiter that is used in PCRE functions. The most common delimiter is the "/" character.

In regular expressions, the following characters are considered service characters: . \\ + * ? [ ^ ] $ () ( ) = !< > | :

Example code Example code Result of the example: This book is *very* difficult to find. preg_replace

preg_replace function - Performs search and replace using a regular expression

Syntax

mixed preg_replace (mixed pattern, mixed replacement, mixed subject [, int limit])

Searches the string subject for matches of pattern and replaces them with replacement. If the limit parameter is specified, the limit occurrences of the template will be replaced; if limit is omitted or equal to -1, all occurrences of the pattern will be replaced.

Replacement can contain references of the form \\n or (since PHP 4.0.4) $n, with the latter being preferable. Each such reference will be replaced with a substring corresponding to the nth subpattern enclosed in parentheses. n can take values ​​from 0 to 99, with the reference \\0 (or $0) corresponding to an occurrence of the entire pattern. Subpatterns are numbered from left to right, starting with one .

When using wildcard replacement using subpattern references, a situation may arise where the mask is immediately followed by a number. In this case, notation like \\n results in an error: a reference to the first subpattern followed by the number 1 will be written as \\11, which will be interpreted as a reference to the eleventh subpattern. This misunderstanding can be eliminated by using the construction \$(1)1, which indicates an isolated reference to the first subpattern followed by the digit 1.

Example code

The output of this example will be:

If a pattern match is found during function execution, the modified subject value will be returned, otherwise the original subject will be returned.

The first three parameters of preg_replace() can be one-dimensional arrays. In case the array uses keys, when processing the array they will be taken in the order in which they are located in the array. Specifying the keys in the array for pattern and replacement is optional. If you do decide to use indexes to match the patterns and strings involved in the replacement, use the ksort() function on each of the arrays.

The bear black slow jumped over the lazy dog.

Using ksort() we get the desired result:

The slow black bear jumped over the lazy dog.

If the subject parameter is an array, pattern search and replacement are performed for each of its elements. The returned result will also be an array.

If the pattern and replacement parameters are arrays, preg_replace() alternately retrieves a pair of elements from both arrays and uses them for the search and replace operation. If the replacement array contains more elements than pattern, empty strings will be taken to replace the missing elements. If pattern is an array and replacement is a string, each element of the pattern array will be searched and replaced with pattern (all elements of the array will be the pattern in turn, while the replacement string remains fixed). The option when pattern is a string and replacement is an array does not make sense.

The /e modifier changes the behavior of the preg_replace() function in such a way that the replacement parameter, after performing the necessary substitutions, is interpreted as PHP code and only then is used for replacement. When using this modifier, be careful: the replacement parameter must contain valid PHP code, otherwise a syntax error will occur in the line containing the preg_replace() function call.

Sample code: Replace by multiple patterns

This example will output:

$startDate = "5/27/1999";

Example code: Using the /e modifier Example code: Converts all HTML tags to uppercase preg_replace_callback

preg_replace_callback function - Performs regular expression search and replacement using a callback function

Syntax

mixed preg_replace_callback (mixed pattern, callback callback, mixed subject [, int limit])

The behavior of this function is in many ways similar to preg_replace(), except that instead of the replacement parameter, you must specify a callback function, which is passed an array of found occurrences as an input parameter. The expected result is the string that will be replaced.

Example code