Find and replace text in a very large file fast

I needed to replace a specific text string in a 6.5Gb file for one of my projects. This is a pretty easy task if you are on Linux (using tool like sed) but it is that easy if you are on Windows.

First, I tried my favorite PowerShell. I stumbled upon a comment made by Rob Campbell here and quickly created this script in PowerShell ISE (love it!):

$filepath = "input.csv"
$newfilepath = "input_fixed.csv"

filter num2x { $_ -replace "aaa","bbb" }  
measure-command {  
    Get-Content -ReadCount 1000 $filepath | num2x | add-content $newfilepath
}  


It took 19 minutes on my laptop to run which was not too bad. Rob mentioned that using a filter and reading a file in batches (using ReadCount) would provide very good performance. I could use .NET Streamreader library to read lines one by one but I think a lot of people agreed that ReadCount works better since it is doing reads/writes in batches.

Then I stumbled upon a free little command tool called FART - Find And Replace Text. Now leaving name of the tool aside, I was a bit skeptical about it but gave it a try. Well, it did the same thing in just 3 minutes! Quite a difference! All I had to do is to download it and use this command which would replace all the occurrences of aaa with bbb:

fart.exe -c input.csv "aaa" "bbb"  

Then I processed another big file - 21Gb this time. It took FART 21 minutes - still not bad!

Certainly a great little tool to add to your toolbox!

P.S. I have also tried a trial version of EmEditor, authors of which claim that it can work with very large 200Gb+ files. EmEditor opened my file just fine but when I tried to do Find and Replace, it froze my laptop for 40 minutes and I had to kill the process.

Boris Tyukin

BI, Data Warehousing and ETL

Orlando, Florida