In the previous post we took an in-depth look at using list in C#. In addition to that, let’s take a look at real examples of how to use lists from the performance point of view. How does using a class or structure in lists affect the speed? Which loop is faster for
or foreach
? How many times faster is it to use the AddRange
method instead of the Add
method in a loop? Answers to these and other performance questions will be discussed in this post (in examples and charts). To compare performance in our examples we will use the BenchmarkDotNet nuget package. Well, let’s go further into the subtleties of working with lists in C#.
Initialize C# List with Capacity
If you know the number of elements that will be added to the list in advance, it’s more efficient to initialize the list with that capacity. The Capacity
property of a List<T>
is quite interesting. It represents the number of elements that the List<T>
can hold before it needs to resize its internal array.
When you first initialize a List<T>
without specifying any capacity, the default capacity is set, which is typically small. As you add elements to the list, and once the number of elements equals the current capacity, the list automatically resizes its internal array to accommodate more elements. This resizing operation involves allocating a new array (usually double the current size) and copying all elements from the old array to the new one. This process is computationally expensive, especially for large lists.
The Capacity
property is crucial for optimizing performance when dealing with large collections. If you know the number of elements you plan to store in a list in advance, you can set the capacity accordingly using the List<T>(int capacity)
constructor. This pre-allocation helps avoid frequent resizing, leading to better performance.
Now let’s measure with BenchmarkDotNet the speed of working with a list with and without pre-initialization of the list capacity.
public class BenchmarkListCapacity
{
[Params(100, 5000, 20000)]
public int count;
[Benchmark]
public void AddToList()
{
List<int> list = new List<int>();
for (int i = 0; i < count; i++)
{
list.Add(i);
}
}
[Benchmark]
public void AddToListWithCapacity()
{
List<int> list = new List<int>(count);
for (int i = 0; i < count; i++)
{
list.Add(i);
}
}
}
Let’s see the result of our measurements.
As you can see from the diagram, setting the capacity on average decrease the time of adding items to the list by from 20 to 55 percent, regard of the list size.
Remember, the Capacity
of a List<T>
is different from its Count
property. Count
indicates the actual number of elements in the list, whereas Capacity
is about the potential number of elements it can hold before resizing.
Use AddRange() Instead of Multiple Add() Calls
In C#, using the AddRange()
method when adding to a List<T>
instead of multiple Add()
calls can significantly improve performance. Especially when you are working with large datasets. The AddRange()
method is designed to add an entire collection of items to the end of a List, which is more efficient compared to adding items one at a time using Add()
.
Here’s why AddRange()
is a better choice:
- Reduced Overhead: Each call to
Add()
might trigger a check for the capacity of the List and potentially resize the underlying array if the capacity is exceeded.AddRange()
, however, calculates the required capacity once and resizes the array only if necessary, reducing the overhead of multiple capacity checks and resizes. - Better Memory Management:
AddRange()
can allocate the exact memory needed for the entire collection in one operation, which is more efficient than potentially multiple memory allocations that might occur with successiveAdd()
calls. - Improved Performance: Because
AddRange()
minimizes the number of resize operations and memory allocations, it generally offers better performance, especially when adding a large number of elements. - Cleaner Code: Using
AddRange()
leads to more concise and readable code. Instead of multiple lines ofAdd()
calls, you have a single line that clearly expresses the intent of adding a collection of items to the list.
Let’s now measure the speed of working with a list using the Add()
and AddRange()
methods.
public class BenchmarkListAddRange
{
private List<int> _items;
[Params(5000, 20000)]
public int count;
[GlobalSetup]
public void Setup()
{
_items = Enumerable.Range(1, count).ToList();
}
[Benchmark]
public void AddToList()
{
List<int> list = new List<int>();
for (int i = 0; i < _items.Count; i++)
{
list.Add(i);
}
}
[Benchmark]
public void AddRangeToList()
{
List<int> list = new List<int>();
list.AddRange(_items);
}
}
Let’s see the result of working with a list using the Add()
and AddRange()
methods.
As you can see from the diagram, using AddRange()
instead of multiple Add()
calls can on average decrease time of execution by 90 percent.
Consider Custom Data Structures for Specialized Needs
Using structures instead of classes in C# lists can have a positive impact on performance, especially when working with large collections of small data of value type. In C#, structures are value types and classes are reference types. This distinction has several implications:
- Memory Allocation: Structs are stored in the stack, which usually results in faster allocation and deallocation. Meanwhile, classes are stored in the heap, which may take longer due to the added task of garbage collection.
- Value Semantics: Structures are passed by value, which creates a copy when they are assigned to other variables or passed to methods. This approach can be more efficient for small data structures, as it avoids the overhead of creating and managing references.
- Performance: For small, immutable types, structs can improve performance by reducing heap allocation and garbage collection. This is particularly important for large collections.
However, there may be situations where a struct is not the optimal option:
- If the struct becomes large (the general recommendation is that the struct should be less than 16 byte), the overhead of the copy process can exceed the benefits of the value-type semantics.
- Structs do not have support for inheritance, which can be a limitation in certain design scenarios.
- Manipulating structs within a collection, such as a list, can lead to unexpected errors and glitches because the struct copy is likely to be used instead of the actual one.
Let’s test the efficiency of utilizing a list with struct and class:
public class BenchmarkListStructAndClass
{
[Params(5000, 20000)]
public int count;
[Benchmark]
public void AddStructToList()
{
List<ItemStruct> list = new List<ItemStruct>();
for (int i = 0; i < count; i++)
{
list.Add(new ItemStruct());
}
}
[Benchmark]
public void AddStructToListWithCapacity()
{
List<ItemStruct> list = new List<ItemStruct>(count);
for (int i = 0; i < count; i++)
{
list.Add(new ItemStruct());
}
}
[Benchmark]
public void AddClassToList()
{
List<ItemClass> list = new List<ItemClass>();
for (int i = 0; i < count; i++)
{
list.Add(new ItemClass());
}
}
[Benchmark]
public void AddClassToListWithCapacity()
{
List<ItemClass> list = new List<ItemClass>(count);
for (int i = 0; i < count; i++)
{
list.Add(new ItemClass());
}
}
}
Let’s see what we’ve got.
And diagram of result:
Prefer Count Property Over Any() for Checking Emptiness
In C# development, especially when dealing with collections like list, the choice between using the Count
property and the Any()
method for checking if a list is empty can significantly impact performance and readability. This choice is particularly relevant in performance-critical applications.
The Count
property, available on List<T>
and other collection types, directly returns the number of elements in the collection. When you need to check if a list is empty, using Count
is straightforward and efficient. For instance, if (myList.Count == 0)
is a clear and direct way to check for an empty list. This approach is extremely fast because Count
is a property that holds the current count of the list, and accessing it is a matter of reading this value.
On the other hand, the Any()
method, provided by LINQ (Language Integrated Query), is a more general-purpose tool. It’s designed to work with any IEnumerable<T>
, not just lists, and it determines whether any elements of a sequence satisfy a condition (or just any elements exist in the case of an empty predicate). When you call myList.Any()
, it starts iterating over the list to check if at least one element exists. While this is not a significant performance hit for small or moderate-sized lists, it can become more noticeable with larger collections or in performance-critical code paths, as it involves more overhead than simply reading a property.
Prefer myList.Count == 0
over myList.Any()
for checking the emptiness of a list in C#. This choice leads to more performant code, particularly in scenarios where this check is performed frequently or on large collections. It also enhances code readability by explicitly stating the intention to check for an empty collection.
Let’s see how fast Count
and Any()
work:
public class BenchmarkListAnyAndCount
{
private List<int> _items;
[Params(20000)]
public int count;
[GlobalSetup]
public void Setup()
{
_items = Enumerable.Range(1, 10000).ToList();
}
[Benchmark]
public void CheckEmptyByCount()
{
for (int i = 0; i < count; i++)
{
if (_items.Count > 0)
{
}
}
}
[Benchmark]
public void CheckEmptyByAny()
{
for (int i = 0; i < count; i++)
{
if (_items.Any())
{
}
}
}
}
Result of our test:
As you can see Count
works many times faster compared to Any()
.
Use foreach Loop for Iteration
Using a foreach
loop in C# is often a preferred choice over a for
loop when iterating over collections like list, especially when you don’t require the index of the current element. The foreach
loop stands out for its readability and simplicity. It automatically iterates through each element in the collection, eliminating the need for manual index tracking and bounds checking.
Here’s why choosing foreach
can be beneficial:
- Readability: The syntax of
foreach
is straightforward and self-explanatory. It clearly indicates that you’re iterating over each element in a collection, making the code easier to understand. - Safety: Since
foreach
handles the iteration process internally, it reduces the risk of errors like off-by-one mistakes or index out of bounds exceptions. - Efficiency: In some cases,
foreach
can be more efficient than afor
loop because it directly accesses the elements in the collection without needing to use an index to retrieve them. - Compatibility with Various Collections:
foreach
can iterate over any collection that implements theIEnumerable
orIEnumerable<T>
interface, making it versatile for use with arrays, lists, dictionaries, and other collection types.
Remember, foreach
is best used when you don’t need to modify the collection or require the index of each element during iteration. In scenarios where you need to modify the collection or the indexes are essential, a for
loop or other iteration methods might be more appropriate.
Let’s take a look at how work loop foreach
and for
:
public class BenchmarkListForeachAndFor
{
private List<ItemClass> _items;
[Params(2000000)]
public int count;
[GlobalSetup]
public void Setup()
{
_items = Enumerable.Range(1, count)
.Select(x => new ItemClass { ItemId = x })
.ToList();
}
[Benchmark]
public void CheckForeach()
{
int sum = 0;
foreach (var item in _items)
{
sum += item.ItemId;
}
}
[Benchmark]
public void CheckFor()
{
int sum = 0;
for (int i = 0; i < _items.Count; i++)
{
sum += _items[i].ItemId;
}
}
}
Binary Search for Sorted Lists
Using the BinarySearch
method with a C# List
is a powerful approach for optimized searching, especially when dealing with large, sorted collections. This method, inherent to the List<T>
class in C#, operates on the principle of the binary search algorithm, offering a significant performance advantage over linear search methods.
Here’s a breakdown of how to use BinarySearch
effectively:
- Precondition – Sorted List: Ensure your
List<T>
is sorted.BinarySearch
works only on sorted collections. If the list isn’t sorted, the results are undefined. UseList<T>.Sort()
method to sort the list if necessary. - Implementing BinarySearch: Call
BinarySearch(T item)
on yourList<T>
instance, whereT
is the type of elements in the list. This method returns the zero-based index of the item in the sorted list, if the item is found; otherwise, a negative number, which is the bitwise complement of the index of the next element that is larger or, if there is no larger element, the bitwise complement ofCount
. - Handling the Return Value: If the return value is non-negative, it denotes the index of the item in the list. If it’s negative, the item isn’t in the list. You can compute the insertion point (if needed) using
~returnValue
. - Performance Aspect:
BinarySearch
has a time complexity of O(log n), making it much faster for large datasets compared to O(n) for a linear search.
Let’s see the difference in performance between a normal search and using the BinarySearch
method:
public class BenchmarkListBinarySearch
{
private List<int> _items;
private int _findValue;
[Params(50000)]
public int count;
[GlobalSetup]
public void Setup()
{
_items = Enumerable.Range(1, count).ToList();
var random = new Random();
_findValue = random.Next(count);
}
[Benchmark]
public void CheckBinarySearch()
{
int index = _items.BinarySearch(_findValue);
}
[Benchmark]
public void CheckFirst()
{
int index = _items.First(x => x == _findValue);
}
}
As you can see from the result, searching a sorted list with BinarySearch
is many times more efficient.
The diagram shows that search with BinarySearch
is more than 100 times more efficient than with LINQ method First()
.
Conclusion
The right optimization strategy depends on the specific use case. Be sure to profile and test your application to ensure that these optimizations are beneficial in your particular scenario.
I’ve been working tirelessly on a comprehensive GitHub project that’s packed with an array of BenchmarkDotNet examples specifically focusing on List operations. These examples are designed to provide you with real insights and practical knowledge that can elevate your coding skills to the next level.
I’ve been working tirelessly on a comprehensive GitHub project that’s packed with an array of BenchmarkDotNet examples specifically focusing on List operations. These examples are designed to provide you with real insights and practical knowledge that can elevate your coding skills to the next level.
I’ve just released this GitHub project on my Telegram Channel! Don’t miss out on this opportunity to boost your .NET prowess.
π Head over to my Telegram channel, download the project, and start exploring today.
π Go to Telegram Channel
Let’s dive into the world of high-performance .NET together! π
Great job explaining the differences between Add() and AddRange(). The examples made it super clear.