Reducing memory allocations from 7.5GB to 32KB
Codeweavers is a financial services software company, part of what we do is to enable our customers to bulk import their data into our platform. For our services we require up-to-date information from all our clients, which includes lenders and manufacturers across the UK. Each of those imports can contain several hundred megabytes uncompressed data, which will often be imported on a daily basis.
This data is then used to power our real-time calculations. Currently this import process has to take place outside of business hours because of the impact it has on memory usage.
In this article we will explore potential optimisations to the import process specifically within the context of reducing memory during the import process. If you want to have a go yourself, you can use this code to generate a sample input file and you can find all of the code talked about here.
The current implementation uses StreamReader
and passes
each line to the lineParser
.
using (StreamReader reader = File.OpenText(@"..\..\example-input.csv"))
{
try
{
while (reader.EndOfStream == false)
{
.ParseLine(reader.ReadLine());
lineParser}
}
catch (Exception exception)
{
throw new Exception("File could not be parsed", exception);
}
}
The most naïve implementation of a line parser that we originally had looked something like this:-
public sealed class LineParserV01 : ILineParser
{
public void ParseLine(string line)
{
var parts = line.Split(',');
if (parts[0] == "MNO")
{
var valueHolder = new ValueHolder(line);
}
}
}
The ValueHolder
class is used later on in the import
process to insert information into the database:-
public class ValueHolder
{
public int ElementId { get; }
public int VehicleId { get; }
public int Term { get; }
public int Mileage { get; }
public decimal Value { get; }
public ValueHolder(string line)
{
var parts = line.Split(',');
= int.Parse(parts[1]);
ElementId = int.Parse(parts[2]);
VehicleId = int.Parse(parts[3]);
Term = int.Parse(parts[4]);
Mileage = decimal.Parse(parts[5]);
Value }
}
Running this example as a command line application and enabling monitoring:-
public static void Main(string[] args)
{
.MonitoringIsEnabled = true;
AppDomain
// do the parsing
.WriteLine($"Took: {AppDomain.CurrentDomain.MonitoringTotalProcessorTime.TotalMilliseconds:#,###} ms");
Console.WriteLine($"Allocated: {AppDomain.CurrentDomain.MonitoringTotalAllocatedMemorySize / 1024:#,#} kb");
Console.WriteLine($"Peak Working Set: {Process.GetCurrentProcess().PeakWorkingSet64 / 1024:#,#} kb");
Console
for (int index = 0; index <= GC.MaxGeneration; index++)
{
.WriteLine($"Gen {index} collections: {GC.CollectionCount(index)}");
Console}
}
Our main goal today is to reduce allocated memory. In short, the less memory we allocate, the less work the garbage collector has to do. There are three generations that garbage collector operates against, we will also be monitoring those. Garbage collection is a complex topic and outside of the scope of this article; but a good rule of thumb is that short-lived objects should never be promoted past generation 0.
We can see V01
has the following statistics:-
Took: 8,750 ms
Allocated: 7,412,303 kb
Peak Working Set: 16,720 kb
Gen 0 collections: 1809
Gen 1 collections: 0
Gen 2 collections: 0
Almost 7.5 GB of memory allocations to parse a three hundred megabyte file is less than ideal. Now that we have established the baseline, let us find some easy wins…
Eagle-eyed readers will have spotted that we
string.Split(',')
twice; once in the line parser and again
in the constructor of ValueHolder
. This is wasteful, we can
overload the constructor of ValueHolder
to accept a
string[]
array and split the line once in the parser. After
that simple change the statistics for V02
are now:-
Took: 6,922 ms
Allocated: 4,288,289 kb
Peak Working Set: 16,716 kb
Gen 0 collections: 1046
Gen 1 collections: 0
Gen 2 collections: 0
Great! We are down from 7.5GB to 4.2GB. But that is still a lot of memory allocations for processing a three hundred megabyte file.
Quick analysis of the input file reveals that there are
10,047,435
lines, we are only interested in lines that are
prefixed with MNO
of which there are
10,036,466
lines. That means we are unnecessarily
processing an additional 10,969
lines. A quick change to
V03
to only parse lines prefixed with
MNO
:-
public sealed class LineParserV03 : ILineParser
{
public void ParseLine(string line)
{
if (line.StartsWith("MNO"))
{
var valueHolder = new ValueHolder(line);
}
}
}
This means we defer splitting the entire line until we know it is a
line we are interested in. Unfortunately this did not save us much
memory. Mainly because we are interested in 99.89%
of the
lines in the file. The statistics for V03
:-
Took: 8,375 ms
Allocated: 4,284,873 kb
Peak Working Set: 16,744 kb
Gen 0 collections: 1046
Gen 1 collections: 0
Gen 2 collections: 0
It is time to break out the trusty profiler, in this case dotTrace:-
Strings in the .NET ecosystem are immutable. Meaning that anything we
do to a string
always returns a brand new copy.
Therefore calling string.Split(',')
on every line (remember
there are 10,036,466
lines we are interested in) returns
that line split into several smaller strings. Each line at minimum has
five sections we want to process. That means in the lifetime of the
import process we create at least 50,182,330 strings
..!
Next we will explore what we can do to eliminate the use of
string.Split(',')
.
A typical line we are interested in looks something like this:-
MNO,3,813496,36,30000,78.19,,
Calling string.Split(',')
on the above line will return
a string[]
containing:-
'MNO'
'3'
'813496'
'36'
'30000'
'78.19'
''
''
Now at this point we can make some guarantees about the file we are importing:-
MNO
is always the
first section)Guarantees established, we can now build a short lived index of the positions of all the commas for a given line:-
private List<int> FindCommasInLine(string line)
{
var list = new List<int>();
for (var index = 0; index < line.Length; index++)
{
if (line[index] == ',')
{
.Add(index);
list}
}
return list;
}
Once we know the position of each comma, we can directly access the section we care about and manually parse that section.
private decimal ParseSectionAsDecimal(int start, int end, string line)
{
var sb = new StringBuilder();
for (var index = start; index < end; index++)
{
.Append(line[index]);
sb}
return decimal.Parse(sb.ToString());
}
private int ParseSectionAsInt(int start, int end, string line)
{
var sb = new StringBuilder();
for (var index = start; index < end; index++)
{
.Append(line[index]);
sb}
return int.Parse(sb.ToString());
}
Putting it all together:-
public void ParseLine(string line)
{
if (line.StartsWith("MNO"))
{
var findCommasInLine = FindCommasInLine(line);
var elementId = ParseSectionAsInt(findCommasInLine[0] + 1, findCommasInLine[1], line); // equal to parts[1] - element id
var vehicleId = ParseSectionAsInt(findCommasInLine[1] + 1, findCommasInLine[2], line); // equal to parts[2] - vehicle id
var term = ParseSectionAsInt(findCommasInLine[2] + 1, findCommasInLine[3], line); // equal to parts[3] - term
var mileage = ParseSectionAsInt(findCommasInLine[3] + 1, findCommasInLine[4], line); // equal to parts[4] - mileage
var value = ParseSectionAsDecimal(findCommasInLine[4] + 1, findCommasInLine[5], line); // equal to parts[5] - value
var valueHolder = new ValueHolder(elementId, vehicleId, term, mileage, value);
}
}
Running V04
reveals this statistics:-
Took: 9,813 ms
Allocated: 6,727,664 kb
Peak Working Set: 16,872 kb
Gen 0 collections: 1642
Gen 1 collections: 0
Gen 2 collections: 0
Whoops, that is worse than expected. It is an easy mistake to make but dotTrace can help us here…
Constructing a StringBuilder
for every section in every
line is incredibly expensive. Luckily it is a quick fix, we constructor
a single StringBuilder
on the construction of
V05
and clear it before each usage. V05
now
has the following statistics:-
Took: 9,125 ms
Allocated: 3,199,195 kb
Peak Working Set: 16,636 kb
Gen 0 collections: 781
Gen 1 collections: 0
Gen 2 collections: 0
Phew we are back on the downwards trends. We started at 7.5GB and now we are down to 3.2GB.
At this point dotTrace becomes an essential part of the optimisation
process. Looking at V05
dotTrace output:-
Building the short lived index of commas positions is expensive. As
underneath any List<T>
is just a standard
T[]
array. The framework takes care of re-sizing the
underlying array when elements are added. This is useful and very handy
in typical scenarios. However, we know that there are six sections we
need to process (but we are only interested in five of those sections),
ergo there are at least seven commas we want indexes for. We can
optimise for that:-
private int[] FindCommasInLine(string line)
{
var nums = new int[7];
var counter = 0;
for (var index = 0; index < line.Length; index++)
{
if (line[index] == ',')
{
[counter++] = index;
nums}
}
return nums;
}
V06
statistics:-
Took: 8,047 ms
Allocated: 2,650,318 kb
Peak Working Set: 16,560 kb
Gen 0 collections: 647
Gen 1 collections: 0
Gen 2 collections: 0
2.6GB is pretty good, but what happens if we force the compiler to
use byte
for this method instead of the compiler defaulting
to use int
:-
private byte[] FindCommasInLine(string line)
{
byte[] nums = new byte[7];
byte counter = 0;
for (byte index = 0; index < line.Length; index++)
{
if (line[index] == ',')
{
[counter++] = index;
nums}
}
return nums;
}
Re-running V06
:-
Took: 8,078 ms
Allocated: 2,454,297 kb
Peak Working Set: 16,548 kb
Gen 0 collections: 599
Gen 1 collections: 0
Gen 2 collections: 0
2.6GB was pretty good, 2.4GB is even better. This is because an
int
has a much larger
range than a byte
.
V06
now has a byte[]
array that holds the
index of each comma for each line. It is a short lived array, but it is
created many times. We can eliminate the cost of creating a new
byte[]
for each line by using a recent addition to the .NET
ecosystem; Systems.Buffers
. Adam Sitnik has a great breakdown on using it
and why you should. The important thing to remember when using
ArrayPool<T>.Shared
is you must always return the
rented buffer after you are done using it otherwise you will introduce a
memory leak into your application.
This is what V07
looks like:-
public void ParseLine(string line)
{
if (line.StartsWith("MNO"))
{
var tempBuffer = _arrayPool.Rent(7);
try
{
var findCommasInLine = FindCommasInLine(line, tempBuffer);
// truncated for brevity
}
finally
{
.Return(tempBuffer, true);
_arrayPool}
}
}
private byte[] FindCommasInLine(string line, byte[] nums)
{
byte counter = 0;
for (byte index = 0; index < line.Length; index++)
{
if (line[index] == ',')
{
[counter++] = index;
nums}
}
return nums;
}
And V07
has the following statistics:-
Took: 8,891 ms
Allocated: 2,258,272 kb
Peak Working Set: 16,752 kb
Gen 0 collections: 551
Gen 1 collections: 0
Gen 2 collections: 0
Down to 2.2GB, having started at 7.5GB. It is pretty good, but we are not done yet.
Profiling V07
reveals the next problem:-
Calling StringBuilder.ToString()
inside of the
decimal
and int
parsers is incredibly
expensive. It is time to deprecate StringBuilder
and write
our own1 int
and decimal
parsers
without relying on strings and calling int.parse()
/
decimal.parse()
. According to the profiler this should
shave off around 1GB. After writing our own int
and
decimal
parsers V08
now clocks in at:-
Took: 6,047 ms
Allocated: 1,160,856 kb
Peak Working Set: 16,816 kb
Gen 0 collections: 283
Gen 1 collections: 0
Gen 2 collections: 0
1.1GB is a huge improvement from where we were last (2.2GB) and even better than the baseline (7.5GB).
1Code can be found here
Until V08
our strategy has been to find the index of
every comma on each line and then use that information to create a
sub-string which is then parsed by calling int.parse()
/
decimal.parse()
. V08
deprecates the use of
sub-strings but still uses the short lived index of comma positions.
An alternative strategy would be to skip to the section we are interested in by counting the number of preceding commas then parse anything after the required number of commas and return when we hit the next comma.
We have previously guaranteed that:-
This would also means we can deprecate the rented byte[]
array because we are no longer building a short lived index:-
public sealed class LineParserV09 : ILineParser
{
public void ParseLine(string line)
{
if (line.StartsWith("MNO"))
{
int elementId = ParseSectionAsInt(line, 1); // equal to parts[1] - element id
int vehicleId = ParseSectionAsInt(line, 2); // equal to parts[2] - vehicle id
int term = ParseSectionAsInt(line, 3); // equal to parts[3] - term
int mileage = ParseSectionAsInt(line, 4); // equal to parts[4] - mileage
decimal value = ParseSectionAsDecimal(line, 5); // equal to parts[5] - value
var valueHolder = new ValueHolder(elementId, vehicleId, term, mileage, value);
}
}
}
Unfortunately V09
does not save us any memory, it does
however reduce the time taken:-
Took: 5,703 ms
Allocated: 1,160,856 kb
Peak Working Set: 16,572 kb
Gen 0 collections: 283
Gen 1 collections: 0
Gen 2 collections: 0
Another benefit of V09
is that it reads much more closer
to the original implementation.
This blog post is not going to cover the difference or the pros/cons
of classes vs structs. That topic has been covered
many
times.
In this particular context, it is beneficial to use a
struct
. Changing ValueHolder
to a
struct
in V10
has the following
statistics:-
Took: 5,594 ms
Allocated: 768,803 kb
Peak Working Set: 16,512 kb
Gen 0 collections: 187
Gen 1 collections: 0
Gen 2 collections: 0
Finally, we are below the 1GB barrier. Also, word of warning please
do not use a struct
blindly, always test your code and make
sure the use case is correct.
As of V10
the line parser itself is virtually allocation
free. dotTrace reveals where the remaining allocations occur:-
Well this is awkward, the framework is costing us memory allocations.
We can interact with the file at a lower-level than a
StreamReader
:-
private static void ViaRawStream(ILineParser lineParser)
{
var sb = new StringBuilder();
using (var reader = File.OpenRead(@"..\..\example-input.csv"))
{
try
{
bool endOfFile = false;
while (reader.CanRead)
{
.Clear();
sb
while (endOfFile == false)
{
var readByte = reader.ReadByte();
// -1 means end of file
if (readByte == -1)
{
= true;
endOfFile break;
}
var character = (char)readByte;
// this means the line is about to end so we skip
if (character == '\r')
{
continue;
}
// this line has ended
if (character == '\n')
{
break;
}
.Append(character);
sb}
if (endOfFile)
{
break;
}
var buffer = new char[sb.Length];
for (int index = 0; index < sb.Length; index++)
{
[index] = sb[index];
buffer}
.ParseLine(buffer);
lineParser}
}
catch (Exception exception)
{
throw new Exception("File could not be parsed", exception);
}
}
}
V11
statistics:-
Took: 5,594 ms
Allocated: 695,545 kb
Peak Working Set: 16,452 kb
Gen 0 collections: 169
Gen 1 collections: 0
Gen 2 collections: 0
Well, 695MB is still better than 768MB. Okay, that was not the
improvement I was expecting (and rather anti-climatic). Until, we
remember we have previously seen and solved this problem before. In
V07
we used ArrayPool<T>.Shared
to
prevent lots of small byte[]
. We can do the same here:-
private static void ViaRawStream(ILineParser lineParser)
{
var sb = new StringBuilder();
var charPool = ArrayPool<char>.Shared;
using (var reader = File.OpenRead(@"..\..\example-input.csv"))
{
try
{
bool endOfFile = false;
while (reader.CanRead)
{
// truncated for brevity
char[] rentedCharBuffer = charPool.Rent(sb.Length);
try
{
for (int index = 0; index < sb.Length; index++)
{
[index] = sb[index];
rentedCharBuffer}
.ParseLine(rentedCharBuffer);
lineParser}
finally
{
.Return(rentedCharBuffer, true);
charPool}
}
}
catch (Exception exception)
{
throw new Exception("File could not be parsed", exception);
}
}
}
The final version of V11
has the following
statistics:-
Took: 6,781 ms
Allocated: 32 kb
Peak Working Set: 12,620 kb
Gen 0 collections: 0
Gen 1 collections: 0
Gen 2 collections: 0
Yes, only 32kb of memory allocations. That is the climax I was looking for.
Version | Took (ms) | Allocated (kb) | Peak Working Set (kb) | Gen 0 Collections |
---|---|---|---|---|
01 | 8,750 | 7,412,303 | 16,720 | 1,809 |
02 | 6,922 | 4,288,289 | 16,716 | 1,046 |
03 | 8,375 | 4,284,873 | 16,744 | 1,046 |
04 | 9,813 | 6,727,664 | 16,872 | 1,642 |
05 | 8,125 | 3,199,195 | 16,636 | 781 |
06 | 8,078 | 2,454,297 | 16,548 | 599 |
07 | 8,891 | 2,258,272 | 16,752 | 551 |
08 | 6,047 | 1,160,856 | 16,816 | 283 |
09 | 5,703 | 1,160,856 | 16,572 | 283 |
10 | 5,594 | 768,803 | 16,512 | 187 |
11 | 6,781 | 32 | 12,620 | 0 |