BenchmarkDotNet Best Practices: Performance Testing in .NET

BenchmarkDotNet Best Practices: Elevate Your .NET Performance Today

For beginner .NET and C# developers, one of the key tasks is learning how to accurately measure the performance of your code. Without proper measurements, it’s hard to identify bottlenecks or know how effective optimizations really are. One of the most popular tools for performance testing in .NET is BenchmarkDotNet. In this post, we’ll dive into key aspects of working with BenchmarkDotNet, from setting up your benchmarks to comparing benchmarks.

Setting Up Warmup and Iterations

The Importance of Warmup

Warmup refers to executing your code several times before starting to measure its performance. In the context of .NET, this is especially important for several reasons:

  • JIT Compilation: .NET uses a Just-In-Time (JIT) compiler that compiles Intermediate Language (IL) code into machine code upon the first method execution. This process takes time and can significantly skew initial measurements.
  • Initialization: The first execution of code may involve initializing static fields, loading assemblies, or other resources.
  • Caching: Data and instructions may be loaded into the CPU cache, speeding up subsequent executions.

Example of JIT Compilation Impact:

Let’s consider a simple method that calculates the sum of numbers from 1 to 1000.

public class SumBenchmark
{
    [Benchmark]
    public int CalculateSum()
    {
        int sum = 0;
        for (int i = 1; i <= 1000; i++)
        {
            sum += i;
        }
        return sum;
    }
}

If we run this benchmark without warmup, the initial measurements will include the time taken for JIT compilation, which can significantly increase the method’s overall execution time.

Configuring Warmup in BenchmarkDotNet:

BenchmarkDotNet automatically performs warmup iterations by default, but you can customize the number of warmup iterations using attributes.

Example of Setting Up Warmup:

[SimpleJob(warmupCount: 5, targetCount: 10)]
public class SumBenchmark
{
    [Benchmark]
    public int CalculateSum()
    {
        int sum = 0;
        for (int i = 1; i <= 1000; i++)
        {
            sum += i;
        }
        return sum;
    }
}

After running the benchmark with warmup, you’ll notice that the execution time stabilizes after a few iterations, providing more accurate and reproducible results.

Choosing the Right Number of Iterations

The number of iterations (both warmup and target) affects the accuracy and reliability of your results.

Factors Influencing the Number of Iterations:

  • Method Execution Time: Fast methods require more iterations to yield meaningful results.
  • Result Variability: If measurements fluctuate significantly, increasing iterations can help.
  • Available Time: If benchmarks take too long, you might need to reduce iterations, potentially impacting accuracy.

Example of Configuring Iterations for a Fast Method:

[SimpleJob(warmupCount: 10, targetCount: 20, invocationCount: 1000)]
public class FastMethodBenchmark
{
    [Benchmark]
    public void FastMethod()
    {
        // Extremely fast code
        int x = 1 + 1;
    }
}

Explanation:

  • invocationCount: 1000 specifies that the method will be invoked 1000 times in each iteration, increasing the total execution time and improving measurement accuracy for very fast methods.

Analyzing the Results:

  • Increasing invocationCount helps achieve more stable results by measuring the cumulative execution time of many calls.

Accurate Measurements

Disabling Side Effects

Side effects can skew your benchmark results. These include:

  • Input/Output (I/O): Accessing the file system, network, or database.
  • Global State: Modifying static or global variables.
  • External Dependencies: Using external services or systems.

Why This Matters:

  • Unpredictability: I/O operation times can vary greatly.
  • System Load: Side effects may consume resources, affecting other measurements.

Example of a Benchmark with Side Effects:

[Benchmark]
public void WriteToDisk()
{
    File.WriteAllText("temp.txt", "Benchmarking is fun!");
}

Issues:

  • Disk access times are unpredictable and can vary based on system state.
  • There may be conflicts when multiple benchmarks write to the same file.

How to Avoid Side Effects:

  • Emulate Operations: Instead of performing actual I/O, simulate the work.

Corrected Example:

[Benchmark]
public void SimulateWrite()
{
    var data = "Benchmarking is fun!";
    // Simulate data processing
    var processedData = data.GetHashCode();
}

Analyzing the Results:

  • Eliminating I/O leads to more stable and reproducible measurements.

Avoiding Dead Code Elimination

Compilers and the JIT may remove or optimize code that doesn’t affect the program’s observable behavior. This can result in your benchmark not measuring what you intended.

Example of Dead Code Elimination Issue:

[Benchmark]
public void DoWork()
{
    int result = 0;
    for (int i = 0; i < 1000; i++)
    {
        result += i;
    }
    // 'result' is not used anywhere
}

Problem:

  • The compiler may notice that ‘result’ is unused and remove the entire loop.

Solution:

  • Return the Result or use GC.KeepAlive.

Corrected Example (Returning the Result):

[Benchmark]
public int DoWork()
{
    int result = 0;
    for (int i = 0; i < 1000; i++)
    {
        result += i;
    }
    return result;
}

Or Using GC.KeepAlive:

[Benchmark]
public void DoWork()
{
    int result = 0;
    for (int i = 0; i < 1000; i++)
    {
        result += i;
    }
    GC.KeepAlive(result);
}

Analyzing the Results:

  • By returning the result or preserving it with GC.KeepAlive, we ensure the compiler doesn’t remove our code.

Using the Consumer Class from BenchmarkDotNet:

[Benchmark]
public void DoWork(Consumer consumer)
{
    int result = 0;
    for (int i = 0; i < 1000; i++)
    {
        result += i;
    }
    consumer.Consume(result);
}

Explanation:

  • Consumer is a special class that signals to BenchmarkDotNet that the result should be used, preventing optimization.

Comparing Benchmarks

The Baseline Attribute

When comparing multiple methods, it’s important to have a reference point. The [Benchmark(Baseline = true)] attribute sets a method as the baseline for comparison.

Example of Comparing Sorting Methods:

public class SortingBenchmark
{
    private int[] data;

    [GlobalSetup]
    public void Setup()
    {
        var random = new Random();
        data = Enumerable.Range(1, 1000).OrderBy(_ => random.Next()).ToArray();
    }

    [Benchmark(Baseline = true)]
    public int[] ArraySort()
    {
        int[] copy = (int[])data.Clone();
        Array.Sort(copy);
        return copy;
    }

    [Benchmark]
    public int[] LinqOrderBy()
    {
        return data.OrderBy(x => x).ToArray();
    }

    [Benchmark]
    public int[] CustomSort()
    {
        int[] copy = (int[])data.Clone();
        // Implement your custom sorting algorithm here
        QuickSort(copy, 0, copy.Length - 1);
        return copy;
    }

    private void QuickSort(int[] array, int left, int right)
    {
        // QuickSort algorithm implementation
        // ...
    }
}

Analyzing the Results:

  • In the BenchmarkDotNet report, methods will be compared against the baseline method ArraySort.
  • You’ll see the relative performance ratio (Ratio) compared to the baseline.

Sample Report:

|     Method    |     Mean |    Error |   StdDev | Ratio |
|---------------|---------:|---------:|---------:|------:|
|    ArraySort  | 1.234 ms | 0.012 ms | 0.011 ms |  1.00 |
|  LinqOrderBy  | 1.567 ms | 0.015 ms | 0.014 ms |  1.27 |
|   CustomSort  | 1.890 ms | 0.018 ms | 0.017 ms |  1.53 |

Summary:

  • ArraySort is the fastest and serves as the baseline.
  • LinqOrderBy is 27% slower.
  • CustomSort is 53% slower.

Noise and Statistical Significance

Noise refers to random variations in measurements caused by external factors:

  • Background processes
  • CPU frequency changes
  • Operating system activities

How to Reduce Noise:

  1. Use Configuration Attributes:
[SimpleJob(RunStrategy.Monitoring, launchCount: 3, warmupCount: 5, targetCount: 15)]
  • RunStrategy.Monitoring reduces the impact of garbage collection.
  1. Pin the Processor Frequency:
  • Disable CPU frequency scaling features like Turbo Boost and power-saving modes in your BIOS or OS settings.
  1. Use Diagnostic Tools:
  • Add the [ThreadingDiagnoser] attribute to detect threading issues.
  • Use hardware counters and disassembly diagnosers for deeper analysis.
  1. Minimize System Load:
  • Close unnecessary applications and services during benchmarking.

Example of Results with High Noise:

  • Standard Deviation (StdDev) is high relative to the mean.
|     Method   |     Mean |    Error |   StdDev |
|--------------|---------:|---------:|---------:|
|  TestMethod  | 1.000 ms | 0.500 ms | 0.400 ms |
  • Here, StdDev is 40% of Mean, indicating a high data variance.

Summary:

  • Measures must be taken to reduce noise; otherwise, the results are unreliable.
Please enable JavaScript in your browser to complete this form.
Did you find this post useful?

Key Takeaways:

  • Warmup and Iterations: Configure them appropriately for stable results.
  • Accurate Measurements: Avoid side effects and dead code elimination.
  • Comparing Benchmarks: Use the Baseline attribute and reduce noise for reliable comparisons.
  • Handling Errors: Prevent exceptions and failures to ensure valid benchmarking.

Additional Recommendations:

  • Analyze BenchmarkDotNet Reports: They contain valuable information such as execution time distribution and statistical metrics.
  • Use Charts and Visualizations: They help you better understand the results.
  • Experiment with Different Settings: This will help you find the optimal parameters for your specific case.

Example of a Comprehensive BenchmarkDotNet Report:

BenchmarkDotNet=v0.13.1, OS=Windows 10.0.19043.1237 (21H1)
Intel Core i7-9700K CPU 3.60GHz, 1 CPU, 8 logical and 8 physical cores
.NET SDK=5.0.400
  [Host]     : .NET 5.0.9 (5.0.921.35908), X64 RyuJIT
  DefaultJob : .NET 5.0.9 (5.0.921.35908), X64 RyuJIT

|     Method   |     Mean |    Error |   StdDev |   Median |
|--------------|---------:|---------:|---------:|---------:|
|   ArraySort  | 1.234 ms | 0.024 ms | 0.023 ms | 1.230 ms |
| LinqOrderBy  | 1.567 ms | 0.030 ms | 0.029 ms | 1.565 ms |
|  CustomSort  | 1.890 ms | 0.037 ms | 0.035 ms | 1.885 ms |

How to Read the Report:

  • Mean: The average execution time.
  • Error: The standard error of the mean.
  • StdDev: The standard deviation.
  • Median: The median execution time.

Use This Information To:

  • Assess Measurement Stability: A low standard deviation indicates stable results.
  • Compare Methods: Easily see which method is faster or slower.

Performance optimization is an iterative process. By using BenchmarkDotNet and following best practices, you can enhance your application’s performance, making it more responsive and efficient. Remember to continually learn, experiment, and share your findings with the community.

Useful Resources:

Happy benchmarking and high-performance coding!

Leave a Reply

Your email address will not be published. Required fields are marked *