Cutting down on garbage collection

Another big update for the Meteor Engine – now it does almost zero garbage creation at runtime. The only exceptions are getting the current mouse and keyboard input- these always allocate memory- but since these are PC-only inputs and the PC hardly hiccups with a single 1MB collection, these exceptions are a non-issue. I figured that my engine needed a good tuning up in order to avoid any unexpected stalls, especially if I plan to port it to Xbox 360. And better to do it now before the engine becomes any more complex.

Figuring out how to optimize for garbage collection has really helped me in writing more efficient C# code. Now, I’ll tell you that prior to using XNA, I have never ever coded in C# before. I’ve mainly been a C++ guy. And I only really got serious with trying out XNA this July, which means I have just about 5 months in seeing how the C# language works, with all its peculiarities of value types, reference types, and memory allocation. So with that said, I am far from the best guy to talk about how everything about C#, its CLR and .NET components work. Still, I have learned a lot so far, thanks to the XNA veterans at the App Hub Forums and also some of their respective blogs, and I will be learning a lot more in the time to come.

As far as reducing the creation of garbage goes, it wasn’t actually too difficult. There were a few cases where I had to write my own functions to circumvent others that really had no way of avoiding memory allocation, but it served me to understand how to do these things on my own. I have tracked down several major causes of garbage collection in my engine:

  • String creation (usually for debug output)
  • Updating mesh transformation matrices for rendering
  • Creating arrays and lists immediately as needed instead of storing them for later
  • Calculating BoundingBoxes with CreateFromPoints() on each frame for culling and rendering
  • Using BoundingBox.GetCorners()  to update the view frustum for directional lights

So not much to do, though of course some involved more work than others to fix. I had to discover these issues one by one, and I started with the one that’s the most obvious for causing this sort of problem.

Creating strings

This one was pretty straightforward, as string manipulation is one of the most reported causes of memory allocation in XNA’s realtime applications. Best to use the SpriteBatch.DrawString more carefully, but luckily with all the stern warnings on using Strings and StringBuilder objects, there are a few existing code bases that you can use to help you out. I eventually took to using Gavin Pugh’s garabage-free StringBuilder extension for formatting numerical values. To make sure I’d get rid of all the string problems, I stopped rendering and updating all other areas of the program. Then I simply put the class into my engine code, re-wrote a few lines in the debug display function, and it was ready to go.

Mesh bone transformations

Now came the first challenge, making the rendering code garbage-free. Here’s where GC.GetTotalMemory was going nuts for, as 1 MB of trash was being scooped up almost every second. As I said before, this didn’t create any noticeable stalls on the PC, but I’m not gonna take any chances with the memory-limited Xbox. So with paring down and commenting out code here and there, I found out that copying the bone transforms to a new Matrix array was not the best way to go. Instead of creating new matrices, I pre-allocated a Matrix array for all the bones in the mesh, and updated them there. Here’s the before code:

		/// <summary>
		/// Draw all visible meshes for this model.
		/// </summary>

		private void DrawModel(InstancedModel instancedModel, Camera camera, string tech)
		{
			// Draw the model.
			Matrix[] transforms = new Matrix[instancedModel.model.Bones.Count];
			instancedModel.model.CopyAbsoluteBoneTransformsTo(transforms);

			foreach (ModelMesh mesh in instancedModel.VisibleMeshes)
			{
				foreach (ModelMeshPart meshPart in mesh.MeshParts)
				{
					Matrix world = transforms[mesh.ParentBone.Index] * instancedModel.Transform;

					/* .... */
				}
			}
			// End model rendering
		}

Here’s the improved version:


		private void DrawModel(InstancedModel instancedModel, Camera camera, string tech)
		{
			// Draw the model.
			instancedModel.model.CopyAbsoluteBoneTransformsTo(instancedModel.boneMatrices);

			foreach (ModelMesh mesh in instancedModel.VisibleMeshes)
			{
				foreach (ModelMeshPart meshPart in mesh.MeshParts)
				{
					Matrix world =
						instancedModel.boneMatrices[mesh.ParentBone.Index] * instancedModel.Transform;

					/* .... */
				}
			}
			// End model rendering
		}

The array of boneMatrices is easily allocated after the model has been loaded successfully.

boneMatrices = new Matrix[model.Bones.Count];

This allows for better separation of the data and the functions that process it. By the way, foreach loops shouldn’t be causing a problem with the iteration in this case, as the newer version of the CLR runs through foreach loops much better, as explained in this article about memory profiling. Nothing really should have to move to the heap here.

List and array creation

This one was just plain dumb on my part. Most of the garbage-creating arrays had to do with the fact that my modular rendering system depended on arrays to pass around render targets as inputs and outputs. As one shader component passes the completely drawn render targets to the next (usually one but the GBuffer needs to pass several), I was initializing a brand new array for the render targets to be returned by the OutputTargets property, on every frame. To my surprise, this wasn’t making the GC memory output tick as fast as others, but it still was a very obvious fix.

All shader components are derived from the BaseRendere class, where OutputTargets comes from, but I kept overriding that property. Then I realized, well I just have base class to work with right there, why didn’t I just use that? So now I pre-assigned all the outputs to keep always them ready.

		// GBuffer example

		public override RenderTarget2D[] OutputTargets
		{
			RenderTarget2D[] rtArray =
			{
				normalRT, depthRT, diffuseRT
			}
			get
			{
				return rtArray;
			}
		}

Now with no garbage:

		// In the BaseRenderer class

		public virtual RenderTarget2D[] OutputTargets
		{
			get
			{
				return outputTargets;
			}
		}

		// In constructor for GBuffer shading

		outputTargets = new RenderTarget2D[]
		{
			normalRT, depthRT, diffuseRT
		};

Also, passing multiple render targets as a series of parameters was also not playing well with memory. When setting them, just stick them all into a RenderTargetBinding structure instead.

Bounding boxes and mesh culling

Here were more unnecessary creations of new objects and referencing other ones for calculations. In creating temporary BoundingBoxes to make new transformed ones to go along with the mesh transformations, we are able to cull meshes easily. But those “temporary” boxes can be made less temporary if we just pre-allocated them into the custom model objects. Here is how my code looked like before:

/// <summary>
/// Cull meshes from a specified list.
/// </summary>

private void CullFromModelList(Scene scene, Camera camera, Dictionary<String, InstancedModel> modelList)
{
	// Pre-cull mesh parts

	foreach (InstancedModel instancedModel in modelList.Values)
	{
		int meshIndex = 0;
		instancedModel.VisibleMeshes.Clear();

		foreach (BoundingBox box in instancedModel.BoundingBoxes)
		{
			BoundingBox tempBox = box;
			tempBox.Min = Vector3.Transform(box.Min, instancedModel.Transform);
			tempBox.Max = Vector3.Transform(box.Max, instancedModel.Transform);

			// Add to mesh to visible list if it's contained in the frustum
			tempBox = BoundingBox.CreateFromPoints(tempBox.GetCorners());

			if (camera.Frustum.Contains(tempBox) != ContainmentType.Disjoint)
			{
						instancedModel.VisibleMeshes.Add(instancedModel.model.Meshes[meshIndex]);
			}

			meshIndex++;
		}
		// Finished culling this model
	}
}

Now, the InstancedModel class will just keep a second array of BoundingBoxes to complement the first array of pre-transformed boxes, leaving me to just reference the model for culling instead:

		private void CullFromModelList(Scene scene, Camera camera, Dictionary<String, InstancedModel> modelList)
		{
			// Pre-cull mesh parts

			foreach (InstancedModel instancedModel in modelList.Values)
			{
				int meshIndex = 0;
				instancedModel.VisibleMeshes.Clear();
				
				foreach (BoundingBox box in instancedModel.BoundingBoxes)
				{			
					instancedModel.tempBoxes[meshIndex] = box;
					instancedModel.tempBoxes[meshIndex].Min = 
						Vector3.Transform(box.Min, instancedModel.Transform);
					instancedModel.tempBoxes[meshIndex].Max = 
						Vector3.Transform(box.Max, instancedModel.Transform);

					// Add to mesh to visible list if it's contained in the frustum

					if (camera.Frustum.Contains(instancedModel.tempBoxes[meshIndex]) != 
						ContainmentType.Disjoint)
					{
						instancedModel.VisibleMeshes.Add(instancedModel.model.Meshes[meshIndex]);
					}

					meshIndex++;
				}
				// Finished culling this model
			}
		}

XNA 4.0 dual paraboloid reflection mapping

Graphics Runner did a tutorial three years ago on dual paraboloid reflection mapping, and it used an older version of XNA for the sample code. I’ve ported it completely to XNA 4.0, using the current conventions for graphics rendering and pixel shaders.

Dual paraboloid maps are simpler to implement and more efficient than traditional cube maps for reflections. Like cube maps, they are a view-independent method of rendering reflections. The tradeoff is that you get lower quality reflections for an increase in speed, but the results are still pretty good. The original blog article does a good job explaining the math behind applying the mapping effect.

Here are some of the notable differences with the updated version:

  • Simple quad mesh used in place of quad rendering class
  • The ColorClamp sampler state is deprecated, this was removed

You can view the source at GitHub, or download the sample here. It is ready to work with your XNA 4.0 projects.

XNA 4.0 parallax occlusion mapping

The XNACommunity Codeplex site, run by a group of hobbyist Spanish game developers, has a huge collection of programs and code examples that you can use freely for your own projects. However, most of them have not been updated for XNA 4.0. I was mostly interested in the Parallax Occulsion Mapping sample, and decided to see if I could update the code.

Parallax Mapping is a more complex way to make textures pop out, and it differs from normal and bump mapping that it actually projects the texture’s details in three dimensions, as opposed to just changing the lighting from the normals.

The original sample included a particle generator, which added a bit of flair to the scene. I could not get this working because the PointSprite object has been deprecated in XNA 4.0 and there’s no easy equivalent for it. The best I could to is make the generator produce black lines :-/ Its particle system was a bit complex to just simply change from point sprites to billboarded quads, so I just decided to omit this altogether since it’s not relevant to the real purpose of the sample.

Aside from trying to get the particles working, porting the program was a breeze. Just a few tweaks were needed in the effect files, and removing unneeded rendering functions and replacing others. Pressing the space bar lets you switch between no mapping, normal mapping, and parallax mapping. Use WASD/arrow keys to move around.

You can download the updated sample here, or check out the source at GitHub. It is ready to work with your XNA 4.0 projects. Please drop a line if you found it helpful!