Split large File in Chunks using File Enumerator- Approach 3

Read a Large File in Chunks in C 1

Today in this article we shall see an approach to Split large File in Chunks using File Enumerator.

We will read a large-size file by breaking a file into small chunks of files using a connected approach i.e file enumeration.

This approach can be used in the below scenarios,

  • Dealing with big-size files with more than 1 GB.

  • The file is readily accessible to Enumerate line by line.

  • You know the number of lines, you want to process in each chunk.

In our previous article, we came up with unique logic based on the size of the data*bytes we want to deal with.

Today in this article, we will see a simple approach using a number of Lines.

Let’s use the same sample discussed in the previous article.

We have a Text file with a total of ~150 k records.
split large file in smaller file chunk by chunk using c dot 1

I would like to create a file with chunks of 15 K records possibly in each file.

Expected result

Below each file approximately contain 15K records.

We can achieve the above result by using the below set of codes.




static void Main(string[] args)
        {

            int fileSplitSize = 15000;

            using (var fileLineReader = File.ReadLines("file-input-thecodebuzz.txt").GetEnumerator())
            {
                bool IsFileReadOn = true;
                for (int chunk = 0; IsFileReadOn; chunk++)
                {
                    IsFileReadOn = WriteChunkToFile(fileLineReader, fileSplitSize, chunk);
                }
            }
        }



In the above logic File.ReadLines(“file-input-thecodebuzz.txt”).GetEnumerator()) read the lines of the file line by line using an enumerator until the end of the file.

It also lets you control the flow and you can perform a write-to file once the chunk size matches with the current line while file processing

Below is the chunking logic,

Split large File in Chunks using File Enumerator

So overall when we break the file into multiple files based on a number of records, we want to make sure data is not truncated.

Once after successful reading, You shall see a total of 10 files of 15k records generated in the selected output folder, and more importantly, you don’t lose any line of records or data from the file.

If you are interested to learn another approach using Chunk sizing, please see the below article,

References:

Do you have any comments or ideas or any better suggestions to share?

Please sound off your comments below.

Happy Coding !!



Please bookmark this page and share it with your friends. Please Subscribe to the blog to receive notifications on freshly published(2024) best practices and guidelines for software design and development.



Leave a Reply

Your email address will not be published. Required fields are marked *