Python Regex: Regular Expression in Python

use import re to import regex module in your python code for patten matching.

Here’s Cheat sheet for regular expressions (regex) in Python:

Basic Syntax:

  • . matches any single character
  • [] matches any character inside the brackets
  • [^] matches any character NOT inside the brackets
  • * matches 0 or more occurrences of the previous character/group
  • + matches 1 or more occurrences of the previous character/group
  • ? matches 0 or 1 occurrence of the previous character/group
  • | matches either the expression before or after the symbol
  • () groups characters together

Anchors:

  • ^ matches the beginning of a string
  • $ matches the end of a string

Character Classes:

  • \d matches any digit (0-9)
  • \D matches any non-digit
  • \w matches any word character (a-z, A-Z, 0-9, and _)
  • \W matches any non-word character
  • \s matches any whitespace character (space, tab, newline, etc.)
  • \S matches any non-whitespace character

Quantifiers:

  • {n} matches exactly n occurrences of the previous character/group
  • {n,} matches at least n occurrences of the previous character/group
  • {n,m} matches between n and m occurrences of the previous character/group

Special Characters:

  • \ escapes a special character
  • . matches any character except newline
  • \b matches a word boundary
  • \B matches a non-word boundary
  • \n matches a newline character
  • \t matches a tab character

Flags:

  • re.I or re.IGNORECASE makes the regex case-insensitive
  • re.M or re.MULTILINE allows ^ and $ to match the beginning and end of each line (rather than just the whole string)
import re

# search for a pattern in a string
match = re.search(r'pattern', 'this is the string to search for the pattern')
if match:
    print('Found the pattern')

# find all occurrences of a pattern in a string
matches = re.findall(r'pattern', 'this is the string to search for the pattern')
print('Found', len(matches), 'occurrences of the pattern')

# replace all occurrences of a pattern with a new string
new_string = re.sub(r'pattern', 'replacement', 'this is the string to search for the pattern')
print('New string:', new_string)

# use flags
match = re.search(r'pattern', 'this is the string to search for the PATTERN', re.IGNORECASE)
if match:
    print('Found the pattern (case-insensitive)')

# use quantifiers
match = re.search(r'a{3,5}', 'aaabbbaaa')
if match:
    print('Found', match.group())

Here are a few websites similar to regexr.com for testing and learning regular expressions:

Regex Cheat Sheet with Examples

cheat sheet for regular expressions (regex) in Python with examples for each case:

Basic Syntax:

  • . matches any single character

Example: a.c matches “abc”, “adc”, etc.

  • [] matches any character inside the brackets

Example: [aeiou] matches any vowel

  • [^] matches any character NOT inside the brackets

Example: [^aeiou] matches any consonant

  • * matches 0 or more occurrences of the previous character/group

Example: ab*c matches “ac”, “abc”, “abbc”, etc.

  • + matches 1 or more occurrences of the previous character/group

Example: ab+c matches “abc”, “abbc”, etc.

  • ? matches 0 or 1 occurrence of the previous character/group

Example: ab?c matches “ac” or “abc”

  • | matches either the expression before or after the symbol

Example: a(b|c)d matches “abd” or “acd”

  • () groups characters together

Example: (ab)+ matches “ab”, “abab”, “ababab”, etc.

Anchors:

  • ^ matches the beginning of a string

Example: ^abc matches “abc” at the beginning of the string

  • $ matches the end of a string

Example: abc$ matches “abc” at the end of the string

Character Classes:

  • \d matches any digit (0-9)

Example: \d{3} matches any three-digit number

  • \D matches any non-digit

Example: \D{3} matches any three-character string that is not a digit

  • \w matches any word character (a-z, A-Z, 0-9, and _)

Example: \w+ matches any word

  • \W matches any non-word character

Example: \W+ matches any non-word character(s)

re.findall() vs re.match() vs re.search() vs re.finditer()

re.findall(pattern, string) is a function that returns all non-overlapping matches of the regular expression pattern in the string. It returns a list of all matches found. For example, suppose we have a string “Hello World” and we want to find all occurrences of the word “l” in it. We can use the following code:

import re

string = "Hello World"
matches = re.findall("l", string)
print(matches)

#Output of above code:
['l', 'l', 'l']


#Because there are three "l" characters in the string.

re.match(pattern, string) is a function that checks if the regular expression pattern matches at the beginning of the string. It returns a match object if the pattern matches, or None if it does not. For example, suppose we have a string “Hello World” and we want to check if it starts with the word “Hello”. We can use the following code:

import re

string = "Hello World"
match = re.match("Hello", string)
print(match)


#Output of above code:
<re.Match object; span=(0, 5), match='Hello'>

#Because the pattern "Hello" matches at the beginning of the string. import re

string = "Hello World"
match = re.match("Hello", string)
print(match)

#Output of above code:
<re.Match object; span=(0, 5), match='Hello'>

#Because the pattern "Hello" matches at the beginning of the string.

If we want to check if the string contains the word “Hello” anywhere, not just at the beginning, we can use re.search instead of re.match. re.search searches for the pattern anywhere in the string.

import re

string = "Hello World"
match = re.search("Hello", string)
print(match)


#Output of above code:

<re.Match object; span=(0, 5), match='Hello'>

#Because the pattern "Hello" appears in the string.

re.finditer() is a function provided by the Python re module that allows you to search for multiple occurrences of a regular expression pattern in a string.

When you call re.finditer(pattern, string), it returns an iterator that produces match objects, each representing a single match of the pattern in the string. You can then loop through the iterator to access each match and extract information from it.

For example, if you wanted to find all instances of the word “cat” in a string called my_string, you could use re.finditer() like this:

import re

my_string = "I have a cat named Mittens and another cat named Whiskers."

pattern = r"cat"

for match in re.finditer(pattern, my_string):
    print(f"Found '{match.group()}' at index {match.start()}.")

#Output of above code:

Found 'cat' at index 9.
Found 'cat' at index 39.

Note that the match.group() method returns the matched string, and match.start() returns the index where the match begins in the string.

Using Python Regex to extract Domain name from a URL

import re

text = 'Extract the domain from the URL www.google.com'

pattern = r'(www\.([\w-]+)\.(\w+))'

# Use findall to get a list of all matches
matches = re.findall(pattern, text)

# Print the full matches
for match in matches:
    print(match[0])  # Output 1: full matches

# Print just the domain names
for match in matches:
    print(match[1])  # Output 2: domain names

The code starts by defining a string variable called “text” which contains some text that we want to search through.

Next, it defines a pattern using a combination of regular expression symbols and specific characters. This pattern is used to search for URLs in the text that start with “www.” and have a domain name following it.

(.\w+) : This part matches any character (represented by the period) followed by one or more word characters (represented by the \w+). Word characters include any uppercase or lowercase letter, any digit, and the underscore character.

Password Validation Program

password validation program:

Write a Python function that takes a string as input and checks whether it is a valid password or not. A password is considered valid if it meets the following criteria:

  1. The password should be at least eight characters long.
  2. The password should contain at least one uppercase letter.
  3. The password should contain at least one lowercase letter.
  4. The password should contain at least one digit.
  5. The password should contain at least one special character, which can be either $, #, or @.
  6. The password should not contain any whitespace characters.

If the input string satisfies all of the above criteria, the function should return 'Valid Password'. Otherwise, it should return 'Invalid Password'.

Python function that checks whether a password is valid or not based on the conditions you have specified:

def is_valid_password(password):
    # Check the length of the password
    if len(password) < 8:
        return False

    # Check if password contains at least one uppercase letter
    if not any(char.isupper() for char in password):
        return False

    # Check if password contains at least one lowercase letter
    if not any(char.islower() for char in password):
        return False

    # Check if password contains at least one digit
    if not any(char.isdigit() for char in password):
        return False

    # Check if password contains at least one special character
    if not any(char in ['$','#','@'] for char in password):
        return False

    # Check if password contains any whitespace
    if ' ' in password:
        return False

    # If all conditions are met, the password is valid
    return True

You can call this function with a password string as an argument to check whether it is valid or not. For example:

password = "MyPassword@123"
if is_valid_password(password):
    print("Password is valid.")
else:
    print("Password is not valid.")

Here are a few websites similar for testing and learning regular expressions in python:

  1. Regexr – https://regexr.com/
  2. Regex101 – https://regex101.com/
  3. RegExr by gskinner – https://regexr.com/
  4. Regexpal – https://www.regexpal.com/
  5. RegexPlanet – https://www.regexplanet.com/
  6. Pythex – https://pythex.org/
  7. RegexOne – https://regexone.com/
  8. Debuggex – https://www.debuggex.com/
  9. Regex Tester – https://regex101.com/regex-tester/
  10. Regex Coach – http://www.weitz.de/regex-coach/

By Pankaj

Leave a Reply

Your email address will not be published. Required fields are marked *