Wednesday 10 May 2017

Processing Data

I have a couple of CSVs that are very large (several gigs each) and would like to compare the data between them to consolidate any matches.What is the least expensive way to iterate and compare the values between these large files? My first thought was to use redis and load in the smallest csv, then compare each chunk of data to the redis database but is there a better way?I would like to fork this process into clusters to process them all in parallel if possible.

Submitted May 11, 2017 at 02:03AM by Midicide

No comments:

Post a Comment