Working with data conversion and data import projects means I am often faced with unmanageably large chunks of data. CSV files with 10,000+ entries are not uncommon, and many applications don't handle import files of that size very well. I got tired of breaking these files apart by hand over and over again, and surprisingly I only found one decent application in existence that reliably split csv files. It is a little out of date, and didn't meet all of my requirements. Since the requirements were simple and the time savings outweighed the time invested, I decided to create my own.
- Take an existing CSV file and break it into chunks of a defined size
- Provide an option to include the header line from the input file in each chunk
- Provide meaningful feedback on status using a progress bar
- Communicate any problems to the user quickly
- Allow the process to be cancelled
- Manage files with 500K - 1M lines
- Published as Open Source
Since my primary desktop is Windows, I decided to use Visual Studio 2008 and C# for the development. This gave me a rapid development cycle, easy to use multi-threading and it is an environment I am familiar with from a previous job.
The application has a simple Windows Forms interface with a progress bar and a small text box that reports any problems during the process. I used the basic BackgroundWorker class to kick the splitting process off to another thread while keeping the user interface responsive during updates.
I created test files of various sized using the data generation application at generatedata.com. I downloaded and installed a local copy so I could generate sample data sets larger than the default of 200 records. This PHP based app lets you define columns and data types and then create sample data in HTML, XML, CSV, SQL and other formats that you can use for development and testing.
I generated a number of test files with up to 500K lines. The 500K file processed in just a couple of seconds, so I am confident that I will be able to handle fairly large files in future projects.
I am making this software available to the public to download and modify as needed. Please feel free to grab it, test it, and let me know how it works for you. I would be happy to add features if they will be beneficial to the community at large. I am providing both source code as well as an executable in the attached zip file. No installation is required, just double click on the exe file and go.