2018-06-19

Optimizing Metal graphics and compute code on an iPad Pro

TL;DR: I got Sponza scene with 2048 point lights running on an iPad Pro at 60 FPS.

My engine has supported Metal for a long time, but I haven't really optimised the Metal renderer before. In this post I go through the process of optimizing Sponza scene with 2048 dynamic point lights from non-interactive frame rates to 60 FPS on an iPad Pro 10.5". I learned a lot and wanted to share my learnings.

Apple's profiling tools

I started my optimisation journey by taking a GPU frame capture in Xcode. Right away I saw that Apple has added a lot of useful features that were missing or less informative in the past. I was delighted to see that it now has GPU counters, timings on source lines and optimisation tips ("Remarks"). Setting a conditional breakpoint to capture a GPU frame at a fixed frame number got me slightly more consistent results than randomly pressing the capture button. A very good feature is the ability to edit a shader during the capture and see the results without restarting. Xcode and its associated tools are still buggy and crash often. I also often got an annoying "No capture boundary detected" error.

Initial capture

 The starting point doesn't look good, 16 FPS. My Forward+ light culling takes a whopping 55.95 ms and uses 3 600 628 000 ALU ops. Let's see what we can do. From the source line timing view I saw that calculating minimum and maximum z-value for a tile is very slow (59.9 %):



Optimization

What if I don't use depth bounds? Light culling now takes 13.53 ms and 2 234 827 000 ALUs, and I already got 60 FPS. But let's not stop here! I watched the excellent WWDC 2016 talk Advanced Metal Shader Optimization . Xcode's Remarks section warned about buffer preloading. I tried to fix them but couldn't find a way.  My first optimisation was to provide the horizontal and vertical tile counts as uniforms instead of calculating them in the shader. Percentage for those code lines decreased from 1.8 % to 1.4 % and ALU from  2 234 819 000 to 2 208 126 000. The WWDC video suggested to use shorts instead of ints: 2 205 648 000 ALUs.

Future

There's a lot of room for improving the results further, and it's needed. I didn't focus on my material shader in this post because my current engine's shading model is not yet physically based unike my old engine's and I'm not using post processing effects. I use floats in many places where a half would do. I also don't use texture compression, but my engine already supports ASTC so it's just a matter of encoding Sponza textures. My light culler could use parallel reduction or clustered culling.
Metal 2 has some features that could help, like tile shading, function specialization, resource heaps and argument buffers. My plan is to study them next, the concepts are already familiar to me from other graphics APIs. The engine is open source and can be downloaded from GitHub.

2018-04-10

My computing habits

I thought my personal computing habits differ from the norm, so I decided to write about how I do stuff, focusing on hardware in this post.

Choosing which device to use

When deciding to do some computing task, eg. reading a website, this is my preferred order of devices to use: Phone or iPad -> old laptop (2010) -> laptop (2013) -> desktop
I use my iPad Pro 10.5" more than any other device, including PCs. I like its responsiveness, security, small form factor and UX in most of my computing uses. Sadly, I can't really code on it, but I'll keep searching for solutions to that. Maybe WebGL development is possible. The desktop is my least used device, I practically only use it for VR, Vulkan and D3D12 development and some games I don't have on consoles. One of the reasons for this preferred order is power consumption  iPad uses less power than a laptop. My PC OS preferred order is Ubuntu -> macOS -> Windows.

Slow hardware upgrade cycle

I rarely buy new hardware. For example some of my HDDs are spinning disks because they still work and I don't want to discard working stuff. Modern hardware is so fast that upgrade cycles can be long. My current laptop is from 2013 and I see no reason to replace it in the foreseeable
 future. I don't judge people who buy new stuff often, but I personally just don't have a reason to buy a new phone every 2 years or so. My last phone cycle was 4 years and the old is still in good condition. I only buy new stuff when I have a compelling reason. It's also more environmentally friendly to use devices for a longer time before getting a new one. People often criticise Apple for planned obsolescence, but I beg to differ. Their devices get security updates for longer time than competitors and I still actively use a MacBook Pro from 2010, even without an SSD.

If people would replace their hardware less often, software developers would have more
incentive to optimise their code and use resources more efficiently. Viznut has written more on this topic and I recommend every eco-conscious or performance caring reader to read his writings.