Read a large file into a byte array with chunks in C#

Read a Large File in Chunks in C

Today in this article we shall see one more approach of reading Read a Large File in Chunks in C#.

While breaking a file into chunks is always a challenge specifically if your logic relies on the size of bytes.

Let’s use the same sample discussed in the previous article. I have a Text file size of 10 MB and I would like to read the file in a chunk of size 1 MB.

Today in this article, we will cover below aspects,

So overall when we break the file into chunks, we want to make sure data is not truncated.

best way to read a large file into a byte array csharp

The discussed approach is very useful while dealing with large-size of files (.TXT or .CSV or .XLSX) like in GB or TB. While breaking the file so we would like to make sure of data integrity.

Create a file using WriteAllBytes

Please note that the WriteAllBytes method creates a new file, writes the specified byte array to the file, and then closes the file also.

The use was simple but don’t solve the truncated record/data issue.

If the target file already exists, it is overwritten. This approach is already discussed in the article,

Resolving – Issue with file data Truncated between chunks

While breaking a file into chunks if your logic relies on the size of bytes then file size logic may break or truncate the data between two consecutive files.

  private static bool ReadFileInChunks(string filePath)
        {
            int lineCountActual = File.ReadLines(filePath).Count();
            int lineCountCurrent = 0;
            int noOfFiles = 0;
            using (FileStream fs = File.Open(filePath, FileMode.Open, FileAccess.Read))

            using (BufferedStream bs = new BufferedStream(fs))
            {
                string currentRecord;
                var memoryStream = new MemoryStream(buffer);
                var stream = new StreamReader(memoryStream);

                var noOfTotalFiles = Math.Ceiling((double)bs.Length / MAX_BUFFER);
                Console.WriteLine($"Total no. approximate files will be generated = {noOfTotalFiles}");
                StringBuilder currentLine = new StringBuilder();
    
                while ((bs.Read(buffer, 0, MAX_BUFFER)) != 0)
                {
                    noOfFiles++;
                    memoryStream.Seek(0, SeekOrigin.Begin);
                    while (!stream.EndOfStream)
                    {
                        currentRecord = LineFormatter(stream, currentLine);
                        if (currentRecord != null)
                        {
                            lineCountCurrent++;
                            //process line
                            if (lineCountCurrent <= lineCountActual)
                            {
                                using (StreamWriter w = File.AppendText($"Chunks{noOfFiles}.txt"))
                                {
                                    w.WriteLine(currentRecord);
                                }
                            }
                        }
                    }

                }
            }
            return true;
        }

Here below method ensures that we read the content line by line ensuring no loss or truncation of data.

Read line without truncating

Above charBuffer is defined as below,

static char[] charBuffer = new char[1];

Once after successful reading, You shall see a total of 10 files of 1MB size go generated in the selected output folder, and more importantly, you don’t lose any line of records or data from the file.

read large file into a byte array with chunks csharp

If you are dealing with a very huge file then another approach is to use Enumerator using File class which allows reading line by line until EOD.

Please see the below article for more details.

That’s all! Happy coding!

Does this help you fix your issue?

Do you have any better solutions or suggestions? Please sound off your comments below.



Please bookmark this page and share it with your friends. Please Subscribe to the blog to receive notifications on freshly published(2024) best practices and guidelines for software design and development.