Regular Expression

Regular Expression, often shortened as RegEx, is a sequence of characters that forms a search pattern. RegEx can be used to check if a string contains the specified search pattern.

Syntax

A RegEx pattern is written between two forward slashes (/) as shown below:

1import re
2#Using regex
3x = re.search("/ai", text)

Regular Expressions Module

In Python, regular expressions are supported by the re module. You must import re when you want to use RegEx in your Python code.

How does RegEx work?

  1. Raw String: We usually write regex patterns as raw strings (r"text"). Raw strings treat backslashes (\) as literal characters.

  2. Match Method: Tries to match the regex pattern at the start of the string. If match is found, it returns a match object, else None.

  3. Search Method: Searches the entire string for the RegEx pattern.

  4. Findall Method:Returns all non-overlapping matches of the regex pattern as a list of strings.

Here is an example using the match and search methods:

1import re
2
3text = "Hello, welcome to Codebay."
4
5# Check if the string starts with "Hello":
6x = re.match(r"Hello", text)
7
8# Check if "Codebay" is present anywhere in the text:
9y = re.search(r"Codebay", text)

Metacharacters

Metacharacters are characters with a special meaning. Some of them include [] . ^ $ * + ? {} () \ |.

Special Sequences

Special sequences make commonly used sequences easier to write and comprehend. For example, \d matches any decimal digit, \D matches any non-decimal digit, \s matches any whitespace character, and \S matches any non-whitespace character.

Sets

A set is a group of characters inside a pair of square brackets [] with a special meaning.

Flags

A RegEx can include flags such as ignore case (re.I), multiline (re.M) or search (re.S) to include new line characters.

RegEx Functions

The re module offers functions including match(), findall(), search(), split(), sub(), finditer().

Regular Expressions are powerful tools for text processing. They can be intricate and complicated, but understanding and using them effectively can save a lot of time and lines of code.