Miłosz Orzeł

.net, js, html, arduino, java... no rants or clickbaits.

Vanilla div performance

INTRO

I've recently checked whether using a Box in Material UI 5, instead of plain div, has significant performance impact. I don't think it would matter much in practice, but read the post if you are interested in details... Doing this check got me thinking: how much faster making the little square divs would be if we were to drop React and do the DOM manipulation with plain JS? Let's find out!

 

DISCLAIMER

I'm not suggesting that you should ditch the comforts of React/MUI and write your programs with vanilla JS just because it's faster to create tens of thousands of DOM elements this way. It's best to avoid making too many elements (think of paging and virtualization), and if you must, try to memoize to avoid repeating expensive operations...

 

STRESS TEST APP

Just like the program from previous post, the test app for this post is also about making lots of small squares with varying colors (through randomized opacity). It was made with Vite Vanilla + TypeScript template (vite: 5.4.1, typescript: 5.5.3).

Live demo is here: https://morzel85.github.io/blog-post-vanilla-div-performance

The code is here: https://github.com/morzel85/blog-post-vanilla-div-performance

Div stress test app... Click to enlarge...

 

The app has an input for choosing amount of squares to create inside a container div (flexbox with wrapping), a few buttons to choose different ways of making the divs and a button for clearing all squares. Clearing is done by setting container.innerHTML to empty string and is included before making the squares so the items don't add up.

The Make: createElement (inline) + appendChild calls this function (here inline styles are used, just like in the previous app):

const makeCreateElementInline = () => {
  clear();

  for (let i = 0; i < amount; i++) {
    const div = document.createElement('div');

    div.style.backgroundColor = 'darkorange';
    div.style.opacity = Math.random().toString();
    div.style.margin = '1px';
    div.style.width = '15px';
    div.style.height = '15px';

    containerDiv.appendChild(div);
  }

  clearButton.disabled = false;
};

The Make: createElement + appendChild calls this function:

const makeCreateElement = () => {
  clear();

  for (let i = 0; i < amount; i++) {
    const div = document.createElement('div');
    div.className = 'item';
    div.style.opacity = Math.random().toString();
    containerDiv.appendChild(div);
  }

  clearButton.disabled = false;
};

This time a CSS class is used to apply repetitive styles (all except opacity):

.item {
  background-color: darkorange;
  margin: 1px;
  width: 15px;
  height: 15px;
}

The functions above use old-school for loop to add items, while in previous post I've used Array.from({ length: amount }, (_, i) => ... ) to generate items. The Array.from feels more natural in functional component but is slower. It is still fast enough that it just doesn't matter here: around 200 ops/s for Array.from vs 1300 ops/s for classic for loop on a test that fills array of 100K items).

Both functions use createElement and appendChild from DOM API.

The Make: string + innerHTML buttons calls this function:

const makeInnerHTML = () => {
  clear();

  let s = '';

  for (let i = 0; i < amount; i++) {
    const d = `<div class="item" style="opacity: ${Math.random()}"></div>`;
    s += d;
  }

  containerDiv.innerHTML = s;

  clearButton.disabled = false;
};

If you have C#/Java background like me you may think that code above contains a classic blunder: not using string builder. Well, it turns out that JS engines are pretty good at string concatenation. Check out this great post.

The final Make: createDocumentFragment + appendChild calls this function:

const makeCreateDocumentFragment = () => {
  clear();

  const fragment = document.createDocumentFragment();

  for (let i = 0; i < amount; i++) {
    const div = document.createElement('div');
    div.className = 'item';
    div.style.opacity = Math.random().toString();
    fragment.appendChild(div);
  }

  containerDiv.appendChild(fragment);

  clearButton.disabled = false;
};

The idea here is to first append all items to a fragment (obtained by createDocumentFragment) that is not a part of DOM tree and only when it's done, append the fragment to DOM. It used help in older browsers, just like the innerHTML solution shown before...

I've included couple of different methods of adding elements but the results below will show only the behavior of Make: createElement (inline) + appendChild to stay close to React version and to prevent this post from getting too log. 

I've clicked a bit and to be honest I haven't noticed a significant difference that would justify switching from simplest createElement + appendChild to using innerHTML or createDocumentFragment... But, the buttons are there for you to try it yourself.

 

DESKTOP RESULTS

Just like before, I'm running the test on 7 years old PC with Intel Core i7-7700K, 16 GB RAM and MSI GTX 1080 Ti with Chrome 128 on Ubuntu 20.04.

That's what my eyes could see (time between button press and items appearing/page becoming responsive):

  • 500, 1K, and 2K: items appear instantly.
  • Barely perceptible lag starts to appear around 4K items but event at 8K it could pass as instant.
  • For 15K: about 0.3s.
  • For 50K: around a second.

Here's the performance trace for 10K items (just mind DevTools instrumentation is not free):

Performance report for 10K items. Click to enlarge...

 

Not bad, right? 

OK, let's check 25K:

Performance report for 25K items. Click to enlarge...

 

Half a second, sweet!

How about a 100K?

Performance report for 100K items. Click to enlarge...

Ok, finally we got Chrome to sweat a bit, but still... 2.3s for one hundred thousand divs! Surely the browser is smart enough not to paint the squares which are far off-screen but the divs are there (check the Nodes count on the screenshot, and document.querySelectorAll('*').length gives more than 100K too).

One thing worth mentioning here is the scrolling performance: scrolling through the absurd block of 100K squares could get really slow, it could freeze the browser for a few seconds depending on the the distance. For the more "reasonable" 20K it was pretty smooth... 

Quick note about Firefox: I've run the abusive 100K check in Firefox 130 and div making was even a bit faster than in Chrome plus there was no scrolling issue! The flip side: I noticed that getting back to a tab with this many elements was bit laggy.

 

MOBILE RESULTS

Like before, tests were done with my 3+ years old, non-flagship, Samsung Galaxy A52 on Android 14 (this time in Chrome 128 instead of 127).

This is the perceived performance I got (time between clicking a button and page with generated items being responsive):

  • 500, 1K, 2K items: instant.
  • 3K: barely perceptible delay.
  • 5K: about 0.2s.
  • 10K: about 0.4s
  • 50K: about 1.2s
  • 100K: about 3s.

Honestly I'm shocked how fast it is on a phone worth about 250 USD (1000 PLN) - I mean a new one, mine fell a bit to many times without a case to be worth that much ;)

There's not much difference between the phone and my (old) PC in the div-making speed. My greatest surprise is the scrolling: it didn't freeze the browser even with 100K elements! The worst I've seen was empty space shown for a while but scrollbar stayed responsive all the time (it actually performed way better than on desktop Chrome)! 

 

Material UI 5 Box performance

INTRO

I've been relying on Material UI Box components quite a lot, because doing so allows use of theme-aware sx property and common attributes such as display or gap. It makes code more consistent with other uses of MUI components.

The Box output is lightweight (it's just a div) but I was wondering how making plenty of these can impact performance, so I've build a test app to (ab)use the Box...

TL;DR: It's quite unlikely that Box vs div performance might become an issue in real application.

 

MUI v6

Just when I was writing this text, MUI team has released the v6.0.0 of Material UI. The release announcement blog post mentions runtime performance improvements and experimental availability of Pigment CSS (zero-runtime CSS-in-JS solution that will eventually replace use of Emotion and allow sx property on plain divs). 

I'm keeping this post focused on v5 (to be precise: v5.16.7 which was released less than 3 weeks ago). There are many projects that use v5 and will stay with it for a while. Plus it might be useful to compare the two major versions in the future...

 

STRESS TEST APP

I've made a small application (vite: 5.4.1, react: 18.3.1, @mui/material: 5.16.7, typescript: 5.5.3) to generate lots of squares with random colors (either by rendering plain divs elements or by using the MUI Box).

Live demo is here: https://morzel85.github.io/blog-post-mui-5-box-performance

The code is here: https://github.com/morzel85/blog-post-mui-5-box-performance

Box stress test app... Click to enlarge...

 

When the MAKE button is pressed, the app generates chosen amount of items. Use the CLEAR button to remove all items. Toggling between Plain div and MUI box options rerenders all the items. Divs are green, Boxes are purple. Simple.

Here's how the divs are created: 

import { memo } from 'react';

export const PlainDivs = memo(({ amount }: { amount: number }) =>
  Array.from({ length: amount }, (_, i) => (
    <div
      key={i}
      style={{
        background: 'darkgreen',
        opacity: Math.random(),
        margin: '1px',
        width: '15px',
        height: '15px'
      }}
    />
  ))
);

Yeah, inline styles are used even for static properties. These could be extracted out to single class but this is to make it closer to the Box/sx version.

This is how Box items are done:

import { memo } from 'react';
import { Box } from '@mui/material';

export const MuiBoxes = memo(({ amount }: { amount: number }) =>
  Array.from({ length: amount }, (_, i) => (
    <Box
      key={i}
      sx={{
        background: 'purple',
        opacity: Math.random(),
        margin: '1px',
        width: '15px',
        height: '15px'
      }}
    />
  ))
);

Notice that the random opacity rule is quite unfavorable for MUI/Emotion as it will generate a lot of different CSS classes that must be injected to the page at runtime! The generated CSS rule might look like this:

.css-s5s1br {
    background: purple;
    opacity: 0.846957;
    margin: 1px;
    width: 15px;
    height: 15px;
}

 

DESKTOP RESULTS

Here's a couple of results from my 7 years old PC with Intel Core i7-7700K and 16 GB RAM (MSI GTX 1080 Ti still going strong!) with Chrome 128 on Ubuntu 20.04.

  • For the default 500 items generating items feels practically instant for both Plain div and MUI Box. Same for 1K.
  • Let's go for 5K: divs are about 0.3s, Boxes about 0.4s.
  • How about 15K? Ok, now there's about a 1.2s of lag for div version and maybe 1.4s  for Box.
  • Well... 50K? 5,5s for divs and about 6,5s for Boxes.
    Quick sanity check: document.querySelectorAll('*').length -> 50056. It really created all those elements. Nice job React, nice job MUI! Aren't modern browsers a marvel of engineering? I remember the times when we had to worry about not putting too many JS validations on a form...

The above highly scientific results we collected by my eyeballs an stopwatch (time between pressing the MAKE button and page becoming responsive).

If you want something more precise here's performance trace for a 10K items: 

Plain div vs Box performance report... Click to enlarge...

 

Speed-wise there's not that much difference between Plain divs and Boxes version, although you can see that Box versions uses about 3.5x more memory. Sounds like a lot but the 10K of Boxes (with unnaturally large amount of unique classes) took about 30 MB.

Watch out for tests with too many items (especially if you open Elements tab), DevTools might choke a bit...

 

MOBILE RESULTS

Ok, how about a phone? This is how my 3+ years old, non-flagship, Samsung Galaxy A52 performs (Chrome 127 on Android 14):

  • 500 and 1K items: instant for both divs and Boxes. 
  • 5K: about 0.5s for divs and 0.9s for Boxes.
  • 15K: about 1.7s for divs and 3.3 for Boxes.
  • and finally the absurd 50K: about 15s for divs and 22s for Boxes (hope you never have to render this many elements on a desktop, let alone mobile)...

Speaking of DOM size, Lighthouse provides warnings and errors for excessive DOM size (as of this writing the thresholds are about 800 and 1400 elements respectively). It also reports DOM depth, it's a performance factor too (which my little app doesn't check, but the Box doesn't increase it). The largest sizes I've seen in practice was about 25K elements. When stuff like this happens is usually caused by a data grid with complex cell renderers and lack of virtualization (columns virtualization is important too).

 

BONUS INFO

When application is running in release mode (result of: npm run build), MUI/Emotion doesn't create individual style elements for each class.

When you click on <style> to see where the CSS rule is defined:

Finding style source... Click to enlarge...

 you will land on style element that appears empty:

Emotion style element... Click to enlarge...

 

This is a bit confusing, where are the classes? Emotion uses insertRule API which is very fast but the disadvantage is lack of DevTools support (check this GitHub issue and this answer in particular).

 

Update 2024-09-08:
I have a follow-up post that checks div performance without React: https://en.morzel.net/post/vanilla-div-performance

ag-Grid (React) Cell Renderers Performance

TL;DR

Vanilla JS cell renderers are faster than framework renderers (just like the docs suggest) but renderers created with React class components are quite decent. Unfortunately, cell renderers build with function components are noticeably worse. The grid team is aware of the issue and might fix it soon (I've done my test on ag-Grid 23.1.0 released just 2 days ago)... See live demo that compares different ways to render cell content and check the source code on GitHub.

 

THE COMPARISON APP

I needed to create some advanced cell renderers to work around limitations of cell editors (to get full control of when row edition stops, to allow multi-row edit and to make it play nicely with Redux)... But before that, I wanted to get and idea of how much slower React-based cell renderers are compared to a plain JS cell rendering function. I wanted to test rudimentary cell renderers to get and idea about the overhead added by using frameworkComponents for cell rendering... 

Click here to see an app that lets you compare cell renderers performance. The app builds a grid with 100 rows and 400 columns but only 100 of these columns are visible at any given moment based on the renderer type choice. Switching renderer type (that is switching columns visibility) or scrolling through the grid gives you a chance to see how renderers perform. There's also a checkbox that lets you assess the impact of disableStaticMarkup option.

The app was done with ag-Grid 23.1.0 and React 16.3.1 (the most recent versions at the time of this writing) and tested in Chrome 81, Firefox 75, Edge 44.

Here's how the test application looks (showing cells rendered with React class component):

Cell renderers test app... Click to enlarge...

 

If you want to run the app on your machine: clone the repo, do npm install and npm start (just like with any other project initiated with create-react-app)...

How does it work?
"Not set" option means that column definition lacks cellRenderer option completely. "Vanilla JS function" will cause the cell to be rendered with such renderer:

export default function vanillaFunctionRenderer(params) {
    return `<span>VF: ${params.value}</span>`;
}

"React class component" means such renderer:

import React, { Component } from 'react';

export default class ReactClassRenderer extends Component {
    render() {        
        return (
            <span>RC: {this.props.value}</span>
        );
    }
}

and this is the renderer for "React function component" option:

import React from 'react';

export default function ReactFunctionRenderer(props) {    
    return (
        <span>RF: {props.value}</span>
    );
}

Please notice that all 3 renderers do the same simple thing: render a span with cell value prefixed with "VF: ", "RC: " or "RF: " to make it easier to see which render is actually used by the columns visible on the gird. Notice also that React state is not used (neither class state nor state hook).

Below you can see how ag-Grid is configured (mind components, frameworkComponents and disableStaticMarkup options) and how column visibility is controlled with calls to column API setColumnsVisible (using bulk visibility change is a lot faster than using individual calls to setColumnVisible).

// Imports skipped

class Grid extends Component {
    constructor(props) {
        super(props);

        const [columnDefs, rowData] = generateColumnsAndRows(100, 100);

        this.state = {
            columnDefs,
            rowData,
            disableStaticMarkup: false
        }
    }

    setColumnsVisiblity = rendererType => {
        const allColumns = this.gridColumnApi.getAllColumns();

        const columnsToHide = allColumns.filter(c => c.colDef.cellRenderer !== rendererType);
        const columnsToShow = allColumns.filter(c => c.colDef.cellRenderer === rendererType);

        this.gridColumnApi.setColumnsVisible(columnsToHide, false);
        this.gridColumnApi.setColumnsVisible(columnsToShow, true);
    }

    handleGridReady = params => {
        this.gridColumnApi = params.columnApi;
    }

    handleRendererTypeChange = event => {
        this.setColumnsVisiblity(event.target.value || undefined);
    }

    handleDisableStaticMarkupOptionChange = event => {
        this.setState({
            disableStaticMarkup: event.target.checked
        });
    }

    render() {
        return (
            <>
                {/* Option controls skipped */}
                <div className="ag-theme-balham">
                    <AgGridReact
                        defaultColDef={{
                            width: 90
                        }}
                        components={{
                            vanillaFunction: vanillaFunctionRenderer
                        }}
                        frameworkComponents={{
                            reactClass: ReactClassRenderer,
                            reactFunction: ReactFunctionRenderer
                        }}
                        columnDefs={this.state.columnDefs}
                        rowData={this.state.rowData}
                        disableStaticMarkup={this.state.disableStaticMarkup}
                        onGridReady={this.handleGridReady}
                    />
                </div>
            </>
        );
    }
}

export default Grid;

The last relevant bit of code is the generateColumnsAndRows function used to populate columnDefs and rowData:

const generateColumnsAndRows = (columnsPerTypeCount, rowsCount) => {
    const columnDefs = [];
    const rowData = [];

    for (let i = 0; i < columnsPerTypeCount; i++) {
        columnDefs.push({
            field: 'field_' + i,
            headerName: 'Col ' + i
        }, {
            field: 'field_vf_' + i,
            headerName: 'Col VF ' + i,
            cellRenderer: 'vanillaFunction',
            hide: true
        }, {
            field: 'field_rc_' + i,
            headerName: 'Col RC ' + i,
            cellRenderer: 'reactClass',
            hide: true
        }, {
            field: 'field_rf_' + i,
            headerName: 'Col RF ' + i,
            cellRenderer: 'reactFunction',
            hide: true
        });
    }

    // Row generation skipped

    return [columnDefs, rowData];
}

export default generateColumnsAndRows;

Notice the cellRenderer renderer property and that initially only the columns that don't have any cell renderer assigned are visible.

 

THE RESULTS

Unsurprisingly the fastest way to have a cell value shown is not to have any cell renderer defined. Assuming you need some, the fastest would be a plain JS renderer assigned to components property. If you need to use React component (assigned to frameworkComponents property) then you should stick to a class-based component until ag-Grid team improves function components handling. And BTW, there's nothing wrong with a class, few other things are also easier with class components when it comes to ag-Grid...

Cell rendering performance is one of those things that are best "measured" by eye. Just play with the demo app, switch the columns and scroll the gird and you will notice that in the case of React function component a duplicated value might be briefly visible, spoiling the UX:

Duplicated value... Click to enlarge..


Enabling disableStaticMarkup option helps with duplicated value flicker for function component, but causes an empty value flicker for class based renderer...

If you need something more than your own impression while using the test app, here's a performance measurement done in Chrome 81 DevTools:

Performance measurement in Chrome DevTools... Click to enlarge..

The measurement was done while grid was showing 560 cells on screen with disableStaticMarkup set to false. The timelines show how long it took to fully rerender the grid after renderer type choice was done. Choosing vanilla renderer took about 1300ms, React class used 1900ms and React function took 2600ms. Of course DevTools instrumentation is not free and without it grid would rerender faster - anyways relative speed seems to match the impression my eyes get.

How fast is .NET Garbage Collector? Part 2.

Read the first part of the article if you haven’t done so already. Part 1 has a brief overview of what GC is and how it performs its magic. It contains a test of GC performance with regards to large array of bytes. You can also find there a detailed information about my test environment…

This part will focus on scenarios which put a lot more pressure on GC and appear more commonly in real world applications. You will see that even a tree of more than 100 million objects can be handled quickly… But first let’s see how GC responds to big array of type object:

static void TestSpeedForLargeObjectArray()
{
    Stopwatch sw = new Stopwatch();

    Console.Write(string.Format("GC.GetTotalMemory before creating array: {0:N0}. Press key any to create array...", GC.GetTotalMemory(true)));
    Console.Read();
    object[] array = new object[100 * 1000 * 1000]; // About 800 MB (3.2 GB when filled with object instances)
             
    sw.Start();
    for (int i = 0; i < array.Length; i++)
    {
        array[i] = new object();
    }
    Console.WriteLine("Setup time: " + sw.Elapsed);
    Console.WriteLine(string.Format("GC.GetTotalMemory after creating array: {0:N0}. Press Enter to set array to null...", GC.GetTotalMemory(true)));

    if (Console.ReadKey().Key == ConsoleKey.Enter)
    {
        Console.WriteLine("Setting array to null");
        array = null;
    }
    
    sw.Restart();
    GC.Collect();
    Console.WriteLine("Collection time: " + sw.Elapsed);
    Console.WriteLine(string.Format("GC.GetTotalMemory after GC.Collect: {0:N0}. Press any key to finish...", GC.GetTotalMemory(true)));

    Console.WriteLine(array); // To avoid compiler optimization...
    Console.ReadKey();
}

Above test creates and array of 100 million items. Initially such array takes about 800 megabytes of memory (on x64 platform). This part is allocated on LOH. When object instances are created total heap allocation jumps to 3.2 GB. Array items are tiny so they are part of Small Object Heap and initially belong to Gen 0.

Here are the test results for situation when array is set to null:

GC.GetTotalMemory before creating array: 41,736. Press key any to create array...
Press any key to fill array...
Setup time: 00:00:07.7910574
GC.GetTotalMemory after creating array: 3,200,057,616. Press Enter to set array to null...
Setting array to null
Collection time: 00:00:00.7481998
GC.GetTotalMemory after GC.Collect: 57,624. Press any key to finish...

It took only about 700 ms to reclaim over 3 GB of memory!

Take a look at this graph from Performance Monitor:

Managed memory counters for large object[] array; setting array to null... Click to enlarge...

You can see that while program was filling the array, Gen 0 and Gen 1 changed size (notice though that the scale for these is 100x bigger than scale for other counters). This means that GC cycles were triggered while items were created - this is expected behavior. Notice how Gen 2 and LOH size adds up to total bytes on managed heap.

What if instead of setting array reference to null we set array items to null?

Let’s see. Here’s the graph:

Managed memory counters for large object[] array; setting array items to null... Click to enlarge...

Notice that after GC.Collect is done 800 MB are still allocated - this is LOH memory held by array itself…

Here are the results:

GC.GetTotalMemory before creating array: 41,752. Press key any to create array...
Press any key to fill array...
Setup time: 00:00:07.7707024
GC.GetTotalMemory after creating array: 3,200,057,632. Press Enter to set array elements to null...
Setting array elements to null
Collection time: 00:00:01.0926220
GC.GetTotalMemory after GC.Collect: 800,057,672. Press any key to finish...

Ok, enough with arrays. One can argue that as continues blocks of memory they are easier to handle then complex objects structures that are abundant in real word programs.

Let’s create a very big tree of small reference types:

static int _itemsCount = 0;

class Item
{
    public Item ChildA { get; set; }
    public Item ChildB { get; set; }
    
    public Item()
    {
        _itemsCount++;
    }           
}

static void AddChildren(Item parent, int depth) 
{
    if (depth == 0)
    {
        return;
    }
    else
    {
        parent.ChildA = new Item();
        parent.ChildB = new Item();

        AddChildren(parent.ChildA, depth - 1);
        AddChildren(parent.ChildB, depth - 1);                
    }
}

static void TestSpeedForLargeTreeOfSmallObjects()
{
    Stopwatch sw = new Stopwatch();

    Console.Write(string.Format("GC.GetTotalMemory before building object tree: {0:N0}. Press any key to build tree...", GC.GetTotalMemory(true)));
    Console.ReadKey();

    sw.Start();
    _itemsCount = 0;       
    Item root = new Item();            
    AddChildren(root, 26);
    Console.WriteLine("Setup time: " + sw.Elapsed);
    Console.WriteLine("Number of items: " + _itemsCount.ToString("N0"));

    Console.WriteLine(string.Format("GC.GetTotalMemory after building object tree: {0:N0}. Press Enter to set root to null...", GC.GetTotalMemory(true)));

    if (Console.ReadKey().Key == ConsoleKey.Enter)
    {
        Console.WriteLine("Setting tree root to null");
        root = null;                
    }
    
    sw.Restart();
    GC.Collect();
    Console.WriteLine("Collection time: " + sw.Elapsed);
    Console.WriteLine(string.Format("GC.GetTotalMemory after GC.Collect: {0:N0}. Press any key to finish...", GC.GetTotalMemory(true)));
                
    Console.WriteLine(root); // To avoid compiler optimization...            
    Console.ReadKey();
}

The test presented above creates a tree with over 130 million nodes which take almost 4.3 GB of memory.

Here’s what happens when tree root is set to null:

GC.GetTotalMemory before building object tree: 41,616. Press any key to build tree...
Setup time: 00:00:14.3355583
Number of items: 134,217,727
GC.GetTotalMemory after building object tree: 4,295,021,160. Press Enter to set root to null...
Setting tree root to null
Collection time: 00:00:01.1069927
GC.GetTotalMemory after GC.Collect: 53,856. Press any key to finish...

Managed memory counters for large tree of small objects; setting tree root to null... Click to enlarge...

It took only 1.1 second to clear all the garbage! When root reference was set to null all nodes below it became useless as defined by mark and sweep algorithm… Notice that this time LOH is not utilized as no single object instance is over 85 KB threshold.

Now let’s see what happens when the root is not set to null and all the objects survive GC cycle:

GC.GetTotalMemory before building object tree: 41,680. Press any key to build tree...
Setup time: 00:00:14.3915412
Number of items: 134,217,727
GC.GetTotalMemory after building object tree: 4,295,021,224. Press Enter to set root to null...
Collection time: 00:00:03.7172580
GC.GetTotalMemory after GC.Collect: 4,295,021,184. Press any key to finish...

This time it took 3.7 sec (less than 28 nanoseconds per reference) for GC.Collect to run – remember that reachable references put more work on GC then dead one!

There is one more scenario we should test. Instead of setting root = null let's set root.ChildA = null. This way half of the tree would became unreachable. GC will have a chance to reclaim memory and compact it to avoid fragmentation. Check the results:

GC.GetTotalMemory before building object tree: 41,696. Press any key to build tree...
Setup time: 00:00:15.1326459
Number of items: 134,217,727
GC.GetTotalMemory after creating array: 4,295,021,240. Press Enter to set root.ChildA to null...
Setting tree root.ChildA to null
Collection time: 00:00:02.5062596
GC.GetTotalMemory after GC.Collect: 2,147,537,584. Press any key to finish...

Time for final test. Let’s create a tree of over 2 million complex nodes that contain some object references, small array and unique string. Additionally lets fill some of the MixedItem instances with byte array big enough to be put on Large Object Heap.

static int _itemsCount = 0;

class MixedItem
{
    byte[] _smallArray;
    byte[] _bigArray;
    string _uniqueString;

    public MixedItem ChildA { get; set; }
    public MixedItem ChildB { get; set; }

    public MixedItem()
    {
        _itemsCount++;

        _smallArray = new byte[1000];
        if (_itemsCount % 1000 == 0)
        {
            _bigArray = new byte[1000 * 1000];
        }

        _uniqueString = DateTime.Now.Ticks.ToString();
    }
}

static void AddChildren(MixedItem parent, int depth)
{
    if (depth == 0)
    {
        return;
    }
    else
    {
        parent.ChildA = new MixedItem();
        parent.ChildB = new MixedItem();

        AddChildren(parent.ChildA, depth - 1);
        AddChildren(parent.ChildB, depth - 1);
    }
}

static void TestSpeedForLargeTreeOfMixedObjects()
{
    Stopwatch sw = new Stopwatch();

    Console.Write(string.Format("GC.GetTotalMemory before building object tree: {0:N0}. Press any key to build tree...", GC.GetTotalMemory(true)));
    Console.ReadKey();

    sw.Start();
    _itemsCount = 0;
    MixedItem root = new MixedItem();
    AddChildren(root, 20);
    Console.WriteLine("Setup time: " + sw.Elapsed);
    Console.WriteLine("Number of items: " + _itemsCount.ToString("N0"));

    Console.WriteLine(string.Format("GC.GetTotalMemory after building object tree: {0:N0}. Press Enter to set root to null...", GC.GetTotalMemory(true)));

    if (Console.ReadKey().Key == ConsoleKey.Enter)
    {
        Console.WriteLine("Setting tree root to null");
        root = null;
    }

    sw.Restart();
    GC.Collect();
    Console.WriteLine("Collection time: " + sw.Elapsed);
    Console.WriteLine(string.Format("GC.GetTotalMemory after GC.Collect: {0:N0}. Press any key to finish...", GC.GetTotalMemory(true)));

    Console.WriteLine(root); // To avoid compiler optimization...
    Console.ReadKey();
}

How will GC perform when subjected to almost 4.5 GB of managed heap memory with such complex structure? Test results for setting root to null

GC.GetTotalMemory before building object tree: 41,680. Press any key to build tree...
Setup time: 00:00:11.5479202
Number of items: 2,097,151
GC.GetTotalMemory after building object tree: 4,496,245,632. Press Enter to set root to null...
Setting tree root to null
Collection time: 00:00:00.5055634
GC.GetTotalMemory after GC.Collect: 54,520. Press any key to finish...

Managed memory counters for large tree of mixed objects; setting tree root to null... Click to enlarge...

And in case you wonder, here's what happens when root is not set to null:

GC.GetTotalMemory before building object tree: 41,680. Press any key to build tree...
Setup time: 00:00:11.6676969
Number of items: 2,097,151
GC.GetTotalMemory after building object tree: 4,496,245,632. Press Enter to set root to null...
Collection time: 00:00:00.5617486
GC.GetTotalMemory after GC.Collect: 4,496,245,592. Press any key to finish...

So what it all means? The conclusion is that unless you are writing applications which require extreme efficiency or total guarantee of uninterrupted execution, you should be really glad that .NET uses automatic memory management. GC is a great piece of software that frees you from mundane and error prone memory handling. It lets you focus on what really matters: providing features for application users. I’ve been professionally writing .NET applications for past 8 years (enterprise stuff, mainly web apps and Windows services) and I’m yet to witness1 a situation when GC cost would be a major factor. Usually performance bottleneck lays in things like: bad DB configuration, inefficient SQL/ORM queries, slow remote services, bad network utilization, lack of parallelism, poor caching, sluggish client side rendering etc. If you avoid basic mistakes like creating to many strings you probably won’t even notice that there is a Garbage Collector :)

Update 31.08.2014: I've just run the most demanding test (big tree of small reference types with all objects surviving GC cycle) on my new laptop. The result is 3.3s compared to 3.7s result presented in the post. Test program: .NET 4.5 console app in Release mode run without debugger attached. Hardware: i7-4700HQ 2.4-3.4GHz 4 Core CPU, 8GB DDR3/1600MHz RAM. System: Windows 7 Home Premium x64.

1. I’ve met some out of memory exceptions related to LOH fragmentation. The good thing is that LOH algorithms are improving and x86 platform, which is especially susceptible to such errors, is becoming a thing of the past…

How fast is .NET Garbage Collector? Part 1.

.NET GC is very fast! Well... I hope you need more than this reassuring statement, if so, read on :) I will show you some test results to prove that I’m not lying but first I will give you a quick reminder about what GC is:

Garbage Collector is fundamental component of .NET CLR. It takes care of freeing managed heap memory so a programmer doesn’t have to think about deallocation. Contrary to popular belief, GC handles memory occupied for both reference and value types. Why? Because quite often space for things like structures or integers is allocated on the heap. It happens for example when value type is an item of array or is a field in a class instance.

GC in .NET uses mark and sweep algorithm: it walks through objects graph starting from root elements (such us statics, references on stack or registers) and marks every object it can reach. When the walk is done, GC knows it can safely free the memory of objects that have not been marked – because as unreachable they are useless for the application.

For efficiency purposes GC supports notion of generations. When the object is created it belongs to Gen 0 (except for so called large objects). If Gen 0 object survives GC cycle it’s moved to Gen 1. If it survives one more cycle it goes to Gen 2 and stays there (there’s no Gen 31). Most objects are short lived – they don’t get pass Gen 0 or Gen 1 so .NET GC tries to free memory from lower generations first. It does Gen 2 (aka full collection) far less often then Gen 0 collection. If the object is big – above 850002 bytes it is allocated in Large Object Heap (LOH) and goes straight to Gen 2. Treating big objects the same way as small objects would have negative impact on heap defragmentation3 algorithms because moving such objects is time-consuming.

GC supports different modes for workstation and server configurations, is able to do some work in background threads, has to consider special cases like finalizers or pined memory buffers, its settings vary between platforms (e. g. CLR on PC is not the same as the one on Xbox)… well let’s stop here – I’ve promised a “quick reminder”! 

Ok, time for the test! 

In this first post I will show you how fast .NET GC can handle large arrays of value (non-reference) types. The test will involve an array of bytes that occupies about two gigabytes of memory. Despite its large size, such object does not put much pressure on the Garbage Collector. This is because the only reference GC has to check is the one for the array itself. If array becomes unreachable all memory occupied by its elements can be safely reused. Additionally, such array, being part of LOH, is not copied (to avoid heap fragmentation) when it survives collection cycle… In the second installment of this article I will show you how GC can handle more complex scenarios. We will examine performance for object array, and more importantly, a tree of objects with thousands of references… 

Note about test environment: I’ve run the test code on desktop PC with Intel i-5 2400 3.1 GHz 4 Core CPU, 12 GB of DDR 3/1333 RAM, running Windows 7 Ultimate x64. The program was a .NET 4.0 console application compiled in Release mode and run without debugger attached.

Here’s the test code:

static void TestSpeedForLargeByteArray()
{
    Stopwatch sw = new Stopwatch();

    Console.Write(string.Format("GC.GetTotalMemory before creating array: {0:N0}. Press any key to create array...", GC.GetTotalMemory(true)));
    Console.ReadKey();
    byte[] array = new byte[2000 * 1000 * 1000]; // About 2 GB
   
    sw.Start();
    for (int i = 0; i < array.Length; i++)
    {
        array[i] = 1; // Touch array elements to fill working set              
    }          
    Console.WriteLine("Setup time: " + sw.Elapsed);

    Console.WriteLine(string.Format("GC.GetTotalMemory after creating array: {0:N0}. Press Enter to set array to null...", GC.GetTotalMemory(true)));
    if (Console.ReadKey().Key == ConsoleKey.Enter)
    {
        Console.WriteLine("Setting array to null");
        array = null;
    }
               
    sw.Restart();
    GC.Collect();                 
    Console.WriteLine("Collection time: " + sw.Elapsed);
    Console.WriteLine(string.Format("GC.GetTotalMemory after GC.Collect: {0:N0}. Press any key to finish...", GC.GetTotalMemory(true)));

    Console.WriteLine(array); // To avoid compiler optimization...
    Console.ReadKey();
}

As you can see test is very simple. The code uses two methods of static GC class: GC.GetTotalMemory and GC.Collect. The former returns the amount of allocated managed memory and the latter forces Garbage Collector to do its job and free unused memory. The only thing that might be surprising is the loop that touches array items. Without it you will witness “strange” phenomenon: after array is defined GC.GetTotalMemory would report about 2 GB but you will not see memory usage increase in Task Manager! This is because taskmgr.exe shows Working Set data. You can run more advanced tool: Resource Monitor (resmon.exe) to see what’s happening: 

This is the screenshot before the “touch”:

Memory usage before access to array items. Click to enlarge...

And this is the one after the loop is executed:

Memory usage after access to array items. Click to enlarge...

Here are the test results (yeah, finally):

GC.GetTotalMemory before creating array: 41,568. Press any key to create array...
Setup time: 00:00:01.0633903
GC.GetTotalMemory after creating array: 2,000,053,864. Press Enter to set array to null...
Setting array to null
Collection time: 00:00:00.1443391
GC.GetTotalMemory after GC.Collect: 53,800. Press any key to finish...

What you can see here is that it took GC just around 150 milliseconds to free about 2 GB of memory. Nice, huh? You can appreciate the speed especially if you compare it with 1 second it took to just iterate over the array!

Below is a screenshot from Performance Monitor (perfmon.exe) running couple of .NET memory counters:

Performance Monitor with managed memory counters. Click to enlarge...

Our array is a big object (well over LOH threshold) – hence after it is defined memory of LOH increases yet Gen 0 heap size remains flat. 

Below are the results for GC.Collect run when reference to the array was not set to null:

GC.GetTotalMemory before creating array: 41,568. Press any key to create array..
Setup time: 00:00:01.0385779
GC.GetTotalMemory after creating array: 2,000,053,864. Press Enter to set array to null...
Collection time: 00:00:00.0001287
GC.GetTotalMemory after GC.Collect: 2,000,053,824. Press any key to finish...

Fraction of a millisecond. Totally negligible! Why I’m even mentioning a test for situation when memory is not collected? Well, in part two you will see that GC usually has more work to do when heap items survive collection cycle…

I hope to post the second part of the article in about a week or two. No promises though – you know, life… ;)

Update 26.07.2014: I've changed the screenshot from perfmon (Gen 0 and Gen 1 scale is bigger now).

Update 24.06.2014: I wrote the second part of this article few days ago. Click here to read it.

1. You can use GC.GC.MaxGeneration method to check it.

2. LOH threshold is an implementation detail. Most sources mention 85KB as the limit but it's not always the case - array of doubles with as little as 1000 items goes on LOH (on x86)...

3. .NET 4.5 introduces LOH improvements for better fragmentation prevention through balancing and enhanced free list usage. Future releases will probably have compaction option too.