Regular Expression
Regular Expression, often shortened as RegEx, is a sequence of characters that forms a search pattern. RegEx can be used to check if a string contains the specified search pattern.
Syntax
A RegEx pattern is written between two forward slashes (/
) as shown below:
Regular Expressions Module
In Python, regular expressions are supported by the re
module. You must import re
when you want to use RegEx in your Python code.
How does RegEx work?
Raw String: We usually write regex patterns as raw strings (r"text"). Raw strings treat backslashes (
\
) as literal characters.Match Method: Tries to match the regex pattern at the start of the string. If match is found, it returns a match object, else None.
Search Method: Searches the entire string for the RegEx pattern.
Findall Method:Returns all non-overlapping matches of the regex pattern as a list of strings.
Here is an example using the match and search methods:
Metacharacters
Metacharacters are characters with a special meaning. Some of them include [] . ^ $ * + ? {} () \ |.
Special Sequences
Special sequences make commonly used sequences easier to write and comprehend. For example, \d matches any decimal digit, \D matches any non-decimal digit, \s matches any whitespace character, and \S matches any non-whitespace character.
Sets
A set is a group of characters inside a pair of square brackets [] with a special meaning.
Flags
A RegEx can include flags such as ignore case (re.I), multiline (re.M) or search (re.S) to include new line characters.
RegEx Functions
The re
module offers functions including match(), findall(), search(), split(), sub(), finditer().
Regular Expressions are powerful tools for text processing. They can be intricate and complicated, but understanding and using them effectively can save a lot of time and lines of code.