深度阅读

How to remove duplicate lines from a text file using Linux command line?

作者
作者
2023年08月22日
更新时间
10.75 分钟
阅读时间
0
阅读量

To remove duplicate lines from a text file using Linux command line, you can use the sort and uniq commands together. Here’s the command:

sort file.txt | uniq > output_file.txt

This command sorts the lines in file.txt and then passes them to the uniq command, which removes any duplicate lines. The result is then written to output_file.txt.

Alternatively, you can use the awk command to achieve the same result:

awk '!seen[$0]++' file.txt > output_file.txt

This command uses an Awk script which builds an associative array called “seen” to keep track of the lines that have already been seen. If the line has not been seen before, it is printed to the output file.

Note that the sort command is not necessary if the file is already sorted or you don’t otherwise care about the order of the lines.

Also, be careful when using these commands on large files, as they may consume a large amount of memory or take a long time to run. In such cases, you may need to use more specialized tools or scripts to remove duplicates efficiently.

博客作者

热爱技术,乐于分享,持续学习。专注于Web开发、系统架构设计和人工智能领域。