The csv data file is 3.2 GB in total, with god knows how many rows and columns (assume very large). The file is a genomics data with SNP data for a population of individuals. Thus the csv file contains IDs such as TD102230
and genetic data such as A/A
and A/T
.
Now that I used Text::CSV
and Array::Transpose
modules but couldn't seem to get it right (as in the computing cluster froze). Is there specific module that would do this? I am new to Perl (not much experience in low level programming, mostly used R and MATLAB before) so detailed explanations especially welcome!
Break down the task into several steps to save memory.
As direct answer, you should read file line by line, process them with
Text::CSV
, push new values to arrays with each array corresponds to original column and then just output them withjoin
or like to get transposed representation of original. Disposing of each array right afterjoin
will help with memory problem too.Writing values to external files instead of array and joining them with OS facilities is another way around memory requirements.
You also should think about why you need this. Is there really no better way to solve real task at hand, since transposing just by itself serves no real purpose?