How to remove repeated characters in pandas?

作者

2023年08月22日

更新时间

10.64 分钟

阅读时间

阅读量

To remove repeated characters in a Pandas Series of strings, you can use the Series.str.replace() method with a regular expression that uses backreferences. For example:

import pandas as pd

# Create a sample series
s = pd.Series(['aabbcc', 'ddddd', 'effffee', 'gggghhhh'])

# Remove repeated characters
s = s.str.replace(r'(\w)\1+', r'\1')

print(s)

The regular expression pattern (\w)\1+ matches any character that is immediately followed by one or more copies of itself. The parentheses create a capturing group, which can be referred to later using a backreference. The backreference \1 matches the same content as the first capturing group, effectively removing the repeated characters.

After running this code, the Series s will contain the following values:

0    abc
1      d
2    efe
3    ghh
dtype: object

Notice that the repeated characters have been removed from each string in the Series.

Note: This method only removes consecutive duplicates, so it will not remove duplicates that are separated by other characters.

How to remove repeated characters in pandas?

相关标签

How to remove words from a list in pandas?

How to merge/join data frames in pandas?

博客作者

GLM 是真敢删啊？！说好的 P0 安全规范呢？

如果要投票一个最弱智的ai模型一定是千问

告别手动拼接：PromptForge 如何重新定义你的 AI 工作流

Privacy Policy for TerryVoiceRead Chrome Extension

告别龟速！NAS迅雷内测体验，速度起飞，附邀请码！