GUIMark 3 released.

GUIMark 3 has just been released and it’s aimed squarely at benchmarking mobile devices. The debate this time around concerns the relative speed differences between Flash and HTML5 on your phone or tablet. With a month and a half in the making, 4 unique tests, and over 200 results this was a big one and I hope you guys enjoy reading and dissecting the results. You can read the article here

GUIMark 3 – Mobile Showdown

Introduction

It’s been exactly one year since GUIMark 2 was created and it seems the natives are growing restless. Over the past few years I have spoken with and worked with a few vendors about performance issues in web technologies. Most of this stuff usually stays pretty internal, but this time I’ve gotten a new request straight from Adobe’s QA team. Build a new version of GUIMark that’s more comprehensive, focused on mobile, and remains open to the community. With 200 test results this is definitely the biggest GUIMark yet.

The philosophy for this benchmark is the same as before. Each test has been designed to find the breaking point of your phone or tablet. By forcing the devices to render at less then 60 frames per second we can ensure that stated performance matches actual throughput on the device. In fact most of these tests were designed to average around 30fps so there would be plenty of room for future growth. Similar to previous tests, I’ve only had the time to build these in Flash and HTML5, however it may be interesting to test native apps at a later date.

For the sake of disclosure, I should state that Adobe funded time through my employer EffectiveUI to enable me to write these tests. While the ideas, code, results, and analysis were conducted entirely by me, I understand that people may read in some bias as a result. Please keep in mind that this was designed to be a 3rd party analysis, not a pat on the back, and I think the results reflect this. Also, if you look at the source code for the two platforms, you’ll see that in most cases the code is line for line identical, only diverging when it comes to platform specific APIs.

Setup

Nine devices across a range of hardware and firmware were used to run the tests. While the devices provided a good sampling of whats available on the market, it is by no means a definitive list. Each device was running the latest software available to it, and in the case of Android represents a moderate amount of fragmentation. While it would be ideal to only test devices with the latest version of Android, the reality of mobile right now demands that we all work against a wide variety of firmwares. Tests on the HoneyComb platform were originally done with the 3.0 version, but with the release of 3.1 at Google I/O I wanted to see how this affected performance. I’ve only listed 3.1 results in the main article below, but the 3.0 results are also preserved in the Google spreadsheet linked at the end.

Lastly, the tests have been designed to force the device into a 480 pixel wide viewport which I feel is a good median resolution for interactive content like a game or chart. They have also been designed to run in portrait mode. The source code for all the tests is contained here, and the directory on my webserver containing all the runnable tests is also available to browse through. Keep in mind that these tests were designed to run on a mobile device, and if you view them in your desktop browser you will likely see them all running at the maximum framerate of 60 fps.

Bitmap Test

First up is a bitmap drawing test that was designed to simulate a scrolling shooter similar to the old Raiden arcade game. The game logic is minimal so this is all about pushing pixels around. Unlike the GUIMark 2 bitmap test, this new version doesn’t include scaling, anti-aliasing, rotation or half pixel compositing. This is just a straight up blitting test, which is more in line with how game developers would optimize their drawing code for handheld games. Like previous tests I’ve done, this one runs on absolute timing for the position of elements in the game. This means slower devices don’t run the test slower, instead the rendering just looks more choppy.
Also, one of the comments on GUIMark 2 was that HTML5 should draw faster when the source image was cached to a separate canvas first, so I’ve included 2 versions of the HTML5 test to investigate this theory.

HTML5 HTML5 Cache Flash
Phones
Droid X 14.98 15.93 31.7
Nexus One 16.57 17.86 36.89
Desire HD 19.6 23.08 43.66
Atrix 26.47 30.19 32.34
iPhone 4 14.57 17.24
Tablets
PlayBook 10.7 10.68 25.78
Galaxy Tab *19.17 *19.98 35.99
Xoom *24.07 *19.66 26.62
iPad 2 *19.88 *28.01
Numbers In Frames Per Second

I actually expected HTML5 to do much better on this test. This is blitting 101 stuff here, no fancy transforms or anti-aliasing, just straight up compositing. Flash on the other hand chews through it without a problem. For the most part HTML does seem to benefit from caching image data to a canvas first and copying pixel data from there to output to the final canvas, although the benefits weren’t universal. The asterisks in the results for the 3 tablets is explained further below.

Vector Test

I think that in terms of ‘real world’ tests the original GUIMark 2 vector test better represents the type of things people will use the vector APIs for, so this time I wanted to do something more fun. This new test is more akin to a Processing demo, something I imagine accompanied by a cool audio tone generator and posted on a site like chromeexperiments.com. It also gives us the chance to compare complex vector fills and gradients that were left off the GUIMark 2 vector test. This test runs off absolute timing just like the bitmap test.

HTML5 Flash
Phones
Droid X 8.47 26.59
Nexus One 7.59 26.07
Desire HD 8.96 28.12
Atrix 11.86 57.85
iPhone 4 9.7
Tablets
PlayBook 10.4 26.36
Galaxy Tab 4.53 20.27
Xoom 11.47 19.81
iPad 2 14.65
Numbers In Frames Per Second

Even with only a handful of shapes on screen at a time, this test is pretty devastating to the drawing APIs on both platforms. You can barely even detect the complexity of the gradient in the HTML5 version on mobile. Without having a desktop browser to validate it with, I would have thought the gradients were completely missing when testing on the phones. Flash manages to keep a pretty sizable lead over HTML on this test. Honestly I’m not surprised by this fact since vector drawing has been the keystone of the Flash runtime since its inception.

Compute Test

Since I’m a graphics geek I’ve resisted doing straight up compute tests like the popular SunSpider before. I tend to be more interested in visual complexity then algorithmic complexity in my day to day work. Despite this I wanted to find a test that would really stress number crunching while still providing a good visual metaphor. This left me with two obvious choices, physics and AI simulations, and ultimately I went with a flocking simulation that proved to be easy to strip down and port to javascript. This test is very heavy on Euclidean Vector math and array iterations, and is an absolute killer on the processor with relatively few boids being simulated. Plus, it allows for a lightweight visual component to validate whether the output is behaving correctly. This is the only test that provides an option to disable rendering to see the performance difference between code execution and rendering.

HTML5 HTML5 Disabled Flash Flash Disabled
Phones
Droid X 3.36 4.77 13.36 16.87
Nexus One 4.93 6.66 20.12 26.77
Desire HD 5.61 9.54 21.75 27.99
Atrix 7.77 11.09 29.29 46.24
iPhone 4 7.51 10.46
Tablets
PlayBook 10.03 21.45 24.26 27.85
Galaxy Tab 1.71 4.66 17.38 23.06
Xoom 11.24 16.47 26.28 46.29
iPad 2 13.91 21.38
Numbers In Frames Per Second

This test was especially punishing on the Galaxy Tab for some reason, and the general deltas between Flash and HTML5 are larger here then any other test. While AS3 and Javascript are nearly identical in language, I’m guessing what we see here is the real difference between static and dynamic languages. Browser vendors have been putting a lot of effort in to making Javascript as fast as possible, but at the end of the day the biggest limitation for speed gains may be the lack of explicit typing. I was also surprised to see the performance gain just from disabling rendering for this test. Those 100 or so small lines nearly halved the performance in some cases. It really illustrates just how much visuals affect general software performance.

Video Test

Last time I gave up on my attempt to compare video performance because there was no way to retrieve frame rate information from the system. This time around I decided the only way to make this work was with a high speed camera. By encoding the frame data directly into the video and putting it under a high speed camera, we can objectively record how often frame data is being dropped from the render queue. The better the decoding engine, the less we should see frames being dropped.
This test is a bit different from the others for a couple reasons. Video tends to follow standards and decoding chips are designed around those standards. Performance doesn’t scale linearly like standard CPU bound benchmarks, and you’ll reach a point where the decoder hits a brick wall. It’s more important to test those standards than to compare everything against a single heavy stream (which would be more akin to the tests above). With that in mind I’ve created four tests that stick close to YouTube encoding standards, using the following video profiles.

H.264 Base Profile 360p video at 768 kbps Level 1.3 video, good for ‘lowest common denominator’ testing
H.264 Base Profile 480p video at 1250 kbps Maximum video size you’d expect to see delivered to a phone, between 1.3 and 2.0 Level
H.264 Base Profile 720p video at 2000 kbps Large size Base Profile encoding, appropriate for tablet devices but good for stressing phones.
H.264 High Profile 720p video at 2000 kbps High Profile video at 2 Mbps, maximum detail that you can expect to see on a mobile device for years to come

The chart below lists the percentage of video frames that are displayed by each platform. If you want to look at the high speed results for each device, you can view them all in the results directory.

HTML5 Flash
360p 480p 720p 720p High 360p 480p 720p 720p High
Phones
Droid X 100% 100% 99% 69% 100% 100% 6% 9%
Nexus One 100% 100% 99% wont play 100% 100% 7% 11%
Desire HD 100% 100% 100% 99% 99% 72% 7% 10%
Atrix 100% 100% 100% wont play 100% 100% 100% 34%
iPhone 4 100% 100% 100% 99%
Tablets
PlayBook 100% 100% 100% 100% 100% 100% 100% 98%
Galaxy Tab 100% 100% 100% 100% 99% 64% 8% 9%
Xoom 100% 100% 100% 100% 100% 100% 100% 97%
iPad 2 100% 100% 100% 100%
Numbers In Percentage of Frames Played

Please Note. The Gingerbread release for Galaxy Tab enabled hardware acceleration for Flash video, while numbers are now near 100% since the update on 5/16, I didn’t have time to rerecord the test and parse the results

Before you ask, yes I actually sat through all of these high speed videos and counted individual frame skips, and it was thoroughly painful. Maybe next time I’ll wise up and write an image analysis program to do it for me. Subjectively, I would argue that video that stayed above 70% looked good during playback. Anything below that mark will have too much stutter and really starts looking like crap.

Flash really takes a beating in this category as many of these devices only allow software decoding for Flash video. You can clearly see which devices are enabled for hardware decoding like the Playbook, Xoom, and Atrix. Adobe has informed me that exposing hardware requires Google and the manufacturer to deliver the appropriate drivers, which becomes evident when viewing the performance differences between Xoom 3.0 and 3.1. HTML5 video on the other hand seems to be fully hardware accelerated on all of the phones, although interestingly HTML5 won’t fall back to a software renderer for certain files, and simply refuses to play the video.

What’s wrong with the tablet results?

Every time I build these tests there’s always some hidden problem that I stumble across that I didn’t expect, and this time is no different. You will notice in the HTML bitmap tests I had to place an asterisk next to the frame rate numbers for three of the tablets, the reason why is because the frame rate reported by the device is extremely inaccurate. With the high speed camera we can see just how far off the numbers really are.

*Note that the Xoom in the video is running 3.0, and while this affects 3.1 as well, I didn’t have time to recapture it on video.

While the tablets showed the most dramatic problems here, I’m pretty sure I saw it manifest on the Atrix as well, just to a lesser degree. The behavior doesn’t seem to exhibit itself on either the vector or compute tests, and none of the Flash tests show this problem either. The Playbook also doesn’t seem to have this problem. My best guess is we’re seeing a problem with WebKit image rendering, with the browser run loop falling out of sync with the GPU somehow. Hopefully someone out there can shed some light on this problem.

Stats Roundup

This test was much bigger then anything I’ve done before, and we’re not done yet. I went back to my old GUIMark 2 tests and ran them as well to provide even more numbers to slice and dice. I think those old tests are still perfectly valid and even show how a couple of the devices have improved since they were first tested.

The results of all the tests are broken down on this Google spreadsheet. GM3 refers to the current tests and GM2 refers to the original GUIMark 2 mobile tests.

The Motorola Atrix clearly stands out for overall performance among the phones. On the tablet side the PlayBook took the lead for Flash performance, and while the Xoom posted the highest numbers for HTML5, the truth is that a few of those tests should have their numbers halved since the device isn’t rendering to the screen at the same rate as the listed fps. In terms of interactive content overall, it’s safe to say that Flash maintains a 2x performance lead over HTML5 on average.

The video side tells a different story. All of the devices are able to chew through the full suite of HTML5 videos with only a few exceptions. Flash however is riding out a transition period in which some devices offer hardware acceleration while others fall back to software decoding.

Final Thoughts

There’s a lot of information to absorb here, and hopefully some of the finer points will be fleshed out in the comments, but here’s a quick summary of my thoughts after working on this test.

1. The Flash VM performs really well on mobile chipsets and I don’t see any evidence here to support the idea that Flash is slow on smartphones and tablets. High end videos are below par at the moment, but the 3.1 release of Honeycomb illustrates that firmware updates are the key to solving this issue.

2. I have a sinking feeling that browser vendors are happy enough with current Canvas 2D performance. The performance deltas between Flash and Canvas are nearly the same as they were a year ago when I released GUIMark 2. Maybe I’m wrong but all I hear about in tech circles is improvements in CSS and SunSpider performance.

3. If you’re going to make a Javascript game, create a Canvas-based sprite sheet of all your assets, the performance boost may only be marginal, but it seems to be worth it. Also be aware of this issue that is causing Webkit to get out of sync with the rendering engine.

4. I wanted to include a Windows 7 phone into this review but the browser couldn’t handle any of these tests. If anyone has access to Blackberry or Palm phones I’d be happy to include them in the spreadsheet as well, just add them to the comments below.

Hacking away at UI development

-->