Skip to content

Effortlessly Replace Text Using Python Regex

[

How to Replace a String in Python

If you’re working with Python and need to remove or replace parts of a string, this tutorial will be helpful to you. We will guide you through the process of removing or replacing strings using both the .replace() method and the re.sub() function.

Python offers two main ways to clean up text: the .replace() method and the re.sub() function. In this tutorial, we will use a fictional chat room transcript to demonstrate how these methods can be used. Our goal is to sanitize the transcript by removing personal data and replacing any swear words with emojis.

Let’s start by looking at the chat transcript we will be working with:

[support_tom] 2022-08-24T10:02:23+00:00 : What can I help you with?
[johndoe] 2022-08-24T10:03:15+00:00 : I CAN'T CONNECT TO MY BLASTED ACCOUNT
[support_tom] 2022-08-24T10:03:30+00:00 : Are you sure it's not your caps lock?
[johndoe] 2022-08-24T10:04:03+00:00 : Blast! You're right!

In this transcript, there are user identifiers, ISO time stamps, and messages. Our task is to sanitize the transcript by removing any personal data and replacing swear words with emojis.

How to Remove or Replace a Python String or Substring

The simplest way to replace a string in Python is by using the .replace() method. This method allows you to replace a specific string with another string. Here’s an example:

"Fake Python".replace("Fake", "Real")

Let’s apply this knowledge to the chat transcript:

transcript = """\
[support_tom] 2022-08-24T10:02:23+00:00 : What can I help you with?
[johndoe] 2022-08-24T10:03:15+00:00 : I CAN'T CONNECT TO MY BLASTED ACCOUNT
[support_tom] 2022-08-24T10:03:30+00:00 : Are you sure it's not your caps lock?
[johndoe] 2022-08-24T10:04:03+00:00 : Blast! You're right!"""
transcript.replace("BLASTED", "😤")

In this code snippet, we replace the word “BLASTED” with the emoji ”😤“. The result is a new string with the replacement applied.

Set Up Multiple Replacement Rules

In some cases, you may need to replace multiple strings in a single text. To handle multiple replacements, you can chain the .replace() method calls or use a loop. Here’s an example of chaining .replace() method calls:

text = "I love Python. Python is awesome!"
text = text.replace("Python", "JavaScript").replace("awesome", "amazing")

In this example, we replace “Python” with “JavaScript” and “awesome” with “amazing” in the text.

Leverage re.sub() to Make Complex Rules

The re.sub() function is a powerful tool for making more complex replacements. It allows you to use regular expressions to match patterns and perform replacements based on those patterns. Here’s an example:

import re
text = "I love apples, I love oranges, but I don't love pears."
# Replace all occurrences of "love" with "prefer"
text = re.sub(r"love", "prefer", text)

In this example, we replace all occurrences of the word “love” with “prefer” using the re.sub() function.

Use a Callback With re.sub() for Even More Control

The re.sub() function also allows you to use a callback function for even more control over the replacement process. This callback function takes a match object as input and returns the desired replacement string. Here’s an example:

import re
def replace_with_emoji(match):
word = match.group(0)
if word == "love":
return "❤️"
elif word == "apples":
return "🍎"
elif word == "oranges":
return "🍊"
elif word == "pears":
return "🍐"
text = "I love apples, I love oranges, but I don't love pears."
# Replace specific words with emojis using a callback function
text = re.sub(r"\b\w+\b", replace_with_emoji, text)

In this example, we replace specific words with corresponding emojis using the callback function replace_with_emoji().

Apply the Callback to the Script

Now that we have the necessary tools and techniques, let’s apply them to the chat transcript:

import re
def sanitize_chat(chat):
# Replace personal data with placeholders
chat = re.sub(r"\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b", "<EMAIL>", chat)
chat = re.sub(r"\b\d{3}-\d{3}-\d{4}\b", "<PHONE>", chat)
chat = re.sub(r"\b([A-Z][a-z]+)[A-Z]+\b", r"\1 <NAME>", chat)
# Replace swear words with emojis
chat = chat.replace("BLASTED", "😤")
chat = chat.replace("Blast!", "😡")
return chat
transcript = """\
[support_tom] 2022-08-24T10:02:23+00:00 : What can I help you with?
[johndoe] 2022-08-24T10:03:15+00:00 : I CAN'T CONNECT TO MY BLASTED ACCOUNT
[support_tom] 2022-08-24T10:03:30+00:00 : Are you sure it's not your caps lock?
[johndoe] 2022-08-24T10:04:03+00:00 : Blast! You're right!"""
sanitized_transcript = sanitize_chat(transcript)

In this code snippet, the sanitize_chat() function takes the chat transcript as input and applies various replacement rules to sanitize the text. The function replaces personal data such as email addresses and phone numbers with placeholders and replaces swear words with emojis.

Finally, the sanitized chat transcript is stored in the sanitized_transcript variable.

Conclusion

In this tutorial, we have explored different ways to remove or replace strings in Python. We have learned how to use the .replace() method for simple replacements and the re.sub() function for more complex replacements using regular expressions. Additionally, we have seen how to use a callback function with re.sub() to have even more control over the replacement process. By applying these techniques, you can easily sanitize text and replace specific strings or substrings with desired replacements.

Remember to experiment with different examples and explore more advanced usage of the .replace() method and re.sub() function to become proficient in replacing strings in Python.