Processing large files in PHP

28 Feb

I’ve been using my own PHP web statistics script for over a year now. I realized that some dates were missing in reports. It turns out PHP has a limit of 2GB or so when fopen-ing files, regardless of the fact that the script is reading it line by line and not storing any lines in memory.

The solution is to use Linux split command to break the file in manageable pieces and process them one by one. Don’t go crazy and try to split it in 2GB pieces, unless you have abundant RAM. If you’re splitting it in 2GB files, the process will use 2GB of RAM while doing it. Ouch!!!

Since, I’m working with 1GB RAM total, I decided to go with 100MB files, hence using 100MB of RAM in doing so. Also, I wanted my files to have a prefix zzz_split_ (instead of a default x). “zzz” just lists nice at the end of all files in a directory.

split -C 100m access_log.old zzz_split_

This command split my apache access_log file into 30 pieces, 100 MB each, making sure that lines are not broken.

I fixed my PHP to glob the files in a directory.

$logfiles = '/home/admin/webstats/zzz_split_*';
foreach(glob($logfiles) as $logfile) {
$logfile = $logfile[0];
$handle = fopen($logfile,'r') or die("Can't open the log file");

Here’s a (wo)man page for split

split – split a file into pieces


Output fixed-size pieces of INPUT to PREFIXaa, PREFIXab, …; default
PREFIX is ‘x’. With no INPUT, or when INPUT is -, read standard input.

Mandatory arguments to long options are mandatory for short options

-a, –suffix-length=N
use suffixes of length N (default 2)

-b, –bytes=SIZE
put SIZE bytes per output file

-C, –line-bytes=SIZE
put at most SIZE bytes of lines per output file

-l, –lines=NUMBER
put NUMBER lines per output file

print a diagnostic to standard error just before each output
file is opened

–help display this help and exit

output version information and exit

SIZE may have a multiplier suffix: b for 512, k for 1K, m for 1 Meg.

1 Comment

Posted by on February 28, 2007 in Linux


One response to “Processing large files in PHP

  1. Karl Saynor

    November 8, 2007 at 10:44 am


    thanks this is a useful post – just wondered is there a funky version of the split command that I can use on an XML file; so it splits at the end of record, rather than just an arbitary line.



Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: