RSS

Processing large files in PHP

28 Feb

I’ve been using my own PHP web statistics script for over a year now. I realized that some dates were missing in reports. It turns out PHP has a limit of 2GB or so when fopen-ing files, regardless of the fact that the script is reading it line by line and not storing any lines in memory.

The solution is to use Linux split command to break the file in manageable pieces and process them one by one. Don’t go crazy and try to split it in 2GB pieces, unless you have abundant RAM. If you’re splitting it in 2GB files, the process will use 2GB of RAM while doing it. Ouch!!!

Since, I’m working with 1GB RAM total, I decided to go with 100MB files, hence using 100MB of RAM in doing so. Also, I wanted my files to have a prefix zzz_split_ (instead of a default x). “zzz” just lists nice at the end of all files in a directory.

split -C 100m access_log.old zzz_split_

This command split my apache access_log file into 30 pieces, 100 MB each, making sure that lines are not broken.

I fixed my PHP to glob the files in a directory.

$logfiles = '/home/admin/webstats/zzz_split_*';
foreach(glob($logfiles) as $logfile) {
$logfile = $logfile[0];
$handle = fopen($logfile,'r') or die("Can't open the log file");
...
}

Here’s a (wo)man page for split

NAME
split – split a file into pieces

SYNOPSIS
split [OPTION] [INPUT [PREFIX]]

DESCRIPTION
Output fixed-size pieces of INPUT to PREFIXaa, PREFIXab, …; default
PREFIX is ‘x’. With no INPUT, or when INPUT is -, read standard input.

Mandatory arguments to long options are mandatory for short options
too.

-a, –suffix-length=N
use suffixes of length N (default 2)

-b, –bytes=SIZE
put SIZE bytes per output file

-C, –line-bytes=SIZE
put at most SIZE bytes of lines per output file

-l, –lines=NUMBER
put NUMBER lines per output file

–verbose
print a diagnostic to standard error just before each output
file is opened

–help display this help and exit

–version
output version information and exit

SIZE may have a multiplier suffix: b for 512, k for 1K, m for 1 Meg.

Advertisements
 
1 Comment

Posted by on February 28, 2007 in Linux

 

One response to “Processing large files in PHP

  1. Karl Saynor

    November 8, 2007 at 10:44 am

    Hi,

    thanks this is a useful post – just wondered is there a funky version of the split command that I can use on an XML file; so it splits at the end of record, rather than just an arbitary line.

    Cheers,
    Karl

     

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

 
%d bloggers like this: