Mar 7, 2009

Tip: processing large data sets

When deal with large data sets, ReadList is more efficient than Import.

From documentation:

If file is not already open for reading, ReadList opens it, then closes it when it is finished. If the file is already open, ReadList does not close it at the end.

ReadList[stream] reads from an open input stream, as returned by OpenRead.

So we can use OpenRead to open the file, then use ReadList to load large data sets piece by piece.

str = OpenRead[datafile]

(* the first number in our data file is the total number of points *)

ReadList[str, Number, 1]

{17581099}

Then we can read the real data in much smaller pieces every time, otherwise it may not have enough memory to run the processing algorithm.

Here is the example

(* read first 5 records *)

ReadList[str, Number, 7*5, RecordLists -> True] // MatrixForm

data1

(* read the next 5 records *)

ReadList[str, Number, 7*5, RecordLists -> True] // MatrixForm

data2

At the end, close the file by

Close[str]

No comments: