Need to transpose a LARGE csv file in perl [closed

2019-10-19 02:40发布

The csv data file is 3.2 GB in total, with god knows how many rows and columns (assume very large). The file is a genomics data with SNP data for a population of individuals. Thus the csv file contains IDs such as TD102230 and genetic data such as A/A and A/T.

Now that I used Text::CSV and Array::Transpose modules but couldn't seem to get it right (as in the computing cluster froze). Is there specific module that would do this? I am new to Perl (not much experience in low level programming, mostly used R and MATLAB before) so detailed explanations especially welcome!

2条回答
来,给爷笑一个
2楼-- · 2019-10-19 02:44

Break down the task into several steps to save memory.

  1. Read a line and write the fields into a file named after the line number. Output one line per field.
  2. Repeat step 1 until the input CSV file is exhausted.
  3. Use paste to merge all output files into a big one.
查看更多
小情绪 Triste *
3楼-- · 2019-10-19 02:51

As direct answer, you should read file line by line, process them with Text::CSV, push new values to arrays with each array corresponds to original column and then just output them with join or like to get transposed representation of original. Disposing of each array right after join will help with memory problem too.

Writing values to external files instead of array and joining them with OS facilities is another way around memory requirements.

You also should think about why you need this. Is there really no better way to solve real task at hand, since transposing just by itself serves no real purpose?

查看更多
登录 后发表回答