Download

Good afternoon everyone.

My name is John Hess. Today I'm going to be joined by Matthew Lucas, and we are going to be talking to all of you about practical approaches to great app performance.

Now, I'm an engineer on the Xcode team, and I've had the luxury of spending the last several years focused on performance work.

First, with Project Find, and Open Quickly, two areas of Xcode that treat performance as the primary feature.

Most recently, I've had the opportunity to do a survey of Xcode GY responsiveness, and I want to share with you the approaches that I take to performance work, both in code that I'm intimately familiar with, and in code that I'm just experiencing for the first time.

Now, if I could get everyone in today's presentation to just take one lesson away, it is that all of your performance work should be based on measurement.

Before you start solving a performance problem, you should measure, to establish a baseline so you know where you stand.

As you iterate on solving a performance problem, you should measure it each step of the way to ensure that your performance changes are having the impact that you expect.

When you're done solving a performance problem, you should measure again, so that you can compare to your original baseline, and make a quantified claim about just how much you've improved the performance of your application.

You want to share this with your boss, your colleagues, and your users.

Now, when you think about improving performance for your users, you need to think about what I like to call the total performance impact.

If you improve the functionality and performance of one area of your application, by 50%, but it's something that just 1% of your users encounter, that does not have nearly the breadth of impact as improving some other feature by just 10% that all of your users use all the time.

So make sure you're not optimizing edge cases, and make sure that your changes are impacting all of your users.

Now how do we fix performance bugs? Well, how do we fix regular bugs? Normally it starts with some sort of defect report from users, and we take this report of the application not behaving the way that people expect, and we find some way to synthesize steps to reproduce so that we can cause the failure at will.

Once we've done this, we attach a debugger to our program, so that we can see just what our program is doing while it is misbehaving.

We combine that with our knowledge of how the code is supposed to work, to modify it as necessary and eliminate the undesired behavior.

We verify that we haven't introduced any unwanted side effects, and we repeat as necessary until we've completely solved the bug.

I've fixed performance bugs in just the same way.

Except instead of using a debugger, I use a profiler, and a profiler is just a fancy tool for measuring.

I find some set of steps to reproduce the program being slow.

And I run those steps with a profiler attached, so that I can get an insight into what my code is doing while it's running slowly.

I combine that knowledge with what my program has to do to accomplish the task at hand, and I find steps that are happening and remove them, because the primary way you make your code faster is you remove redundant steps from whatever is that is calculating.

Now, I make the modifications to the source code, and I repeat and measure as necessary until I'm happy with the total result.

When I'm doing this type of performance work, I often find myself in one of a handful of scenarios.

And these different scenarios change the way that I go about testing the code in question to reproduce the bugs.

Sometimes I'm up against a big performance regression, right? Everything was moving along smoothly, then someone checked something in on our team, maybe it was me, and performance has fallen through the floor, and now we have to go back and find out what caused this regression.

If this regression is very pronounced, or it's in an area that I don't think it's likely to regress again in the immediate future, I may just test it with my hands, manually, with the profiler attached.

However, your performance victories are going to be hard-won battles, and they can easily be lost through a slow stream of regressions.

I would encourage all of you to write automated performance tests to capture your app's performance, so that you can ensure that it's not regressing over time.

Another scenario I often find myself in, is, are applications performing the same as it has been for a long time? Maybe it is running at 45 frames a second in some drawing test, but we expect it to run at 60. It needs to be improved marginally, and we have reason to believe through our previous performance work that we can get there through spot fixes and incremental changes.

Now, in this type of scenario, I probably also have automated tests already in play, because I understand my performance over time. And a third scenario, our application is just suffering from a poor design and performance is orders of magnitude worse than it should be.

We know that we can't improve it with simple spot fixes, because we've tried them in the past, and we are still stuck here with a very sub-par performance.

In a situation like this, you'd want to do a total performance overhaul, where you are redesigning some core part of the feature, or the algorithms in question, so that performance is a primary constraint.

And definitely in these cases, you would have performance tests to measure that you're actually hitting your performance targets.

Now, it is important that you know just what to test.

I want to caution you that I don't ever immediately jump to these sort of performance overhauls as a way of fixing a performance problem.

I love to do that. It's sort of Greenfield engineering, where you get to design things from the ground up, but it's very risky.

You're going to end up with a better product at the end, but it's going to be a turbulent path getting there as you rework an entire feature. When you're doing this style of work, it is imperative you understand not only the functional constraints of the code in question, but also the performance constraints, and the typical use patterns that your users are most frequently applying to this feature, and you only get that by having done performance work in the area in the past.

I'd like to share an anecdote about our work on a situation like this, within Xcode.

In Xcode 9, we reworked Project Find, with performance as a primary goal.

It was our goal to deliver search results in just tens of milliseconds.

When we were going to discuss this feature with our colleagues, we were often challenged to perform searches across large projects for things like string, or even the letter E. Things that produce millions of results, right? And certainly if our application could produce millions of results quickly, it would be fast on anything.

But if you consider what typical patterns are, we search for APIs we use, the names of our own classes, the names of, you know, images that we're referencing. Things like that. They produce dozens, maybe hundreds of results.

Certainly, it is essential that the application works decently when you get a million results, but the normal use case is hundreds of results.

Now, some of your work in doing a task like search is going to be proportional on things like generating the raw results, and other work is going to be based on how efficiently you can index the text in the project, and avoid work in the first place.

In these two scenarios, you're likely to have completely different targets for what you would optimize to make one of these searches faster than the other, right? So it's essential that you understand how your users are going to use the product, so that you can optimize for the right cases.

Now, in all of these cases, I need to do some form of testing, whether it's manual, or automated.

I want to share with you two types of performance tests that I will typically write to measure the performance of Xcode.

We will either do unit tests, or integration tests.

Let's compare and contrast them.

In a performance unit test, it's your goal to isolate some feature of your application and measure it all by itself.

You might mock out its dependencies, and you might launch it in a context where it has been isolated.

If I were to write performance unit tests for Xcode's code completion, I might write a series of three small tests.

One of these tests would measure talking to the compiler and getting the raw results, the raw set of code completion candidates back.

Another performance test would measure correlating, ranking and scoring those results, so we knew which ones to display to the user.

A third test might take those already prepared results, and measure putting them into UI elements for final display. And in covering all three of these areas, I would have pretty good coverage over the major components of code completion in the IDE.

Now, there are some great aspects to these performance unit tests.

They're going to be highly focused, which means if they regress in the future, I'm going to have a very good idea on where the regression is, because the code that is running has been scoped so well.

They are also going to produce much more repeatable results from run to run. They're not going to have a big variance in the times that they produce.

Again, because the code is so focused.

Now, let's contrast that to an integration test.

In an integration test, your job is to measure the performance of your application as your users experience it.

Holistically.

So, if I was writing code completion unit tests for Xcode, I'm sorry, integration tests, I would launch the full Xcode app. I would open a source file. I would navigate to the source file, and I would type, and I would bring up code completion over and over again.

When I profile this, to see what Xcode is doing, and how much time it is taking, I am going to find that this test is anything but focused and quiet.

Xcode is going to be doing drawing and layout as I type. It is going to be doing syntax coloring as I type.

In the background, it might be indexing, fetching get status, deciding to show new files in the Assistant Editor, and all of these things are going to be competing for CPU resources, along with code completion.

Maybe when I look in the Profiler, I'll see that we spend 80% of our time syntax coloring, and 20% of our time in code completion.

And with this data, I would know that the best way to improve code completion performance would be to defer syntax coloring.

I will never gain that type of knowledge with a highly focused unit test. So if I can get everyone here to take two things away from this presentation, the second one should be that your performance investigations should absolutely start with these wide integration tests that measure how the users experience your application.

So I'm talking about testing, measuring and profiling.

And right now, I'd like to introduce you to profiling in Xcode with instruments. Let's head over to the demo machine.

Today we are going to be looking at a performance problem that we fixed between Xcode 9 and Xcode 10. I want to show it to you.

I'm going to launch Xcode 9, and open our solar system application.

Now the problem that we are going to be looking at is creating tabs.

I'm going to just press Command-T quickly a couple of times, and as you can see, the whole screen flashes black, and it takes several seconds to create those tabs.

That definitely doesn't meet my expectations as far as performance goes, and we need to fix this.

So let's take a look at how you would do that.

First, I'm going to launch Instruments. That is our profiling tool.

You can do that from the Xcode menu, under Open Developer Tool, Instruments.

Now, I'm currently in Xcode 9, so if I choose this, it's going to launch the Instruments from Xcode 9, and of course, I want the Instruments from Xcode 10, which I've put here in my doc.

So I'm going to hide Xcode, and bring up Instruments. Now, when Instruments launches, we're presented with a list of profiling tools that we could use to measure our application.

There's all kinds of tools here.

They can measure graphics utilization, memory consumption, IO, and time in general.

It can be intimidating to know which one of these profilers to start with.

I would encourage all of you, if you just learn one of these tools, it should be the Time Profiler.

I use it for 95% or more of my performance work.

When your users complain about your app being slow, they're complaining about it taking too long, and long is time.

If it turns out that you're slow because you're doing too much IO, that is going to correlate with time, and you will be able to see this with the Time Profiler. So if you learn just one instrument, it should be the Time Profiler.

Let's take a look at how that works.

I'm going to launch the Time Profiler by just double clicking on it here, and make Instruments take the full best op.

Now, we'd like to record Xcode.

In the upper left-hand corner of the Instruments window, you can control which process you're going to attach to and record.

By default, hitting this record button would record all processes on my Mac.

I just want to focus on Xcode.

I'll switch this popover to Xcode and hit record.

Now, I like to keep an eye on this area of the window to track view while I'm recording.

So I'm going to resize the Xcode window to be a little shorter, so I can still see that, and then I'm going to do the thing that was slow. I'm going to create a couple more tabs.

And you can see the graph changed here. Now, I'm going to go ahead and quit, and return to Instruments.

So what just happened? While the Profiler was running, it was attached to our process like a debugger. And it stopped it, thousands of times per second, and as it was stopping it, it gathered back traces.

Now, just a reminder, a back trace is a description of how your program got to where it currently is.

So if you're on line 6 of function C and you got there because main called A, called B, called C, then your back trace is Main, A, B, C.

When Instruments captures one of these back traces, it notes, hey, we just spent one millisecond in function C. It says one millisecond, because that is our sampling interval for recording once every millisecond.

Now, on the main thread, all these back traces are going to start with the Main function, and they're probably going to call Application Main, and they're going to branch out, all through your source code after that.

We can collapse these back traces together, and overlay them into a prefix tree, so they start at Main and work their way out. And we can bubble up those millisecond counters that we captured at the top, so that we can hierarchically see how much time was spent in all the different areas of our source code.

And we are going to look at this data to try and find redundant and unnecessary operations that we can make faster, and that is our primary method that we are going to use to improve the performance of our application.

Now, as you can imagine, we're capturing thousands of back traces per second. There is an overwhelming amount of data for you to wade through in instruments.

My primary advice to you is that you want to filter this data as much as possible so that you can see the course grain performance leads, and not focus on minutia. All right? So I want to show you how to apply a bunch of powerful filters and instruments.

So as I did the recording, you remember, I had the track view visible.

I did that because I wanted to see how the CPU utilization changed and where it was changing, while I was creating new tabs, and I noted to myself that it was right here.

I simply dragged and selected over that area of the trace, and I've caused instruments to only focus its back trace data on just that time interval.

Everything over here, this is before I was creating tabs.

Everything over here, this is after I was creating tabs, when I was quitting the application.

That's not what I'm trying to optimize right now, so I don't need to see that data. Now, in the bottom area of the Instruments window, Instruments is showing me all the traces it collected.

By default, there is one row per thread that was running.

And in this example it looks like there was only four threads running. Sometimes you'll have much more. Depends on how concurrent your application is.

I often like to collapse these in the name of focusing, and I also like to collapse them so they're based on the top level functions executing in each of the threads, rather than the thread IDs, because that corresponds better with how I use Grand Central Dispatch.

Down in the bottom of the Instruments window, I'm going to click on this button that says Call Tree, and I'm going to zoom in on it, so you can see what I'm about to do.

There are several filters available here.

One of them is separate by thread. It is on by default. I am going to go ahead and disable that, and instead, all of the threads are going to be grouped by their top level entry point, rather than their thread ID.

Now, looking at this trace, I can see that of all these threads running, which by the way, below the main trace, which is the aggregate CPU usage, the CPU usage is broken down per thread, I can see that almost all the other threads were largely inactive during this trace.

I can focus on just the main thread by selecting it here, and now I'm only looking at traces from the main thread during this time period.

I'm ready to start digging into this call hierarchy, so I can see what my application was doing.

Often, I'll walk this with the keyboard, by just pressing right arrow and down, over and over again.

But I'd like to show you the heaviest back trace inspector that Instruments offers.

If your Inspector is not visible, you can toggle it with this button, and the heaviest back trace will be available here, in this tab, Extended Detail.

Now, the heaviest back trace is just the trace that occurred most frequently. It's the back trace that happened most frequently while we were recording under the current selection.

And you can use this to quickly navigate many frames deep at a time.

I typically look through here, looking for my own APIs, and things that would surprise me for taking up this amount of time, or for areas where we make a significant branching point in the number of samples.

Now, looking through here, I see this call, which is to IDE Navigator, replacement view, did install view controller.

Now, I'm familiar with this API, because it's an internal API of Xcode.

And in the trace, I can see over here on the left-hand side of the window that it is responsible for 1.19 seconds of the total time we're recording, or 45% of the time.

That is far and away above my expectations for how much this method should cost. However, it's hard to focus on what is happening here. Right? I'm, there is all this other stuff at the bottom of the trace, and it looks like I'm, you know, 30 or 40 stack ranges deep.

That can be intimidating.

I want to show you how to focus.

The first technique is back here in that call tree popover again. I'm going to use this popover to choose the flattened recursion.

Let's go ahead and do that. And now you can see that, that repeated set of method calls that was right here, oops, has been collapsed.

I'm sorry, let me scroll down.

That has been collapsed.

In fact, I'm confident that I want to continue my performance investigation inside of this IDE Navigator area, API call, and I can refocus the entire call tree by context, clicking here, and choosing Focus on Subtree.

And Instruments is going to take that symbol up to the top of the call graph, it's going to remove everything else, and it is going to reset the percentages at 100% so I can focus on just this.

Now, I can continue to walk this sample with the arrow keys to see what we're doing. And I'm familiar with these APIs. And it looks like we're doing state restoration.

And as I continue to expand this, I can see that we are sort of deep inside the table view, and in addition to there being this sort of hot call path, you know, that is taking large number of the total percentage, there's all these other incidental samples as well.

It's easy to get distracted by these.

One of them here is OPC Message Send.

This can occur all over your tracers if you're writing objective C.

Even if you're writing Swift code, as you work your way into the system libraries, you'll see this.

You'll often see its counterpart functions, OPC, Load Strong, Load Weak, etc., Retain, you can remove all that content from the call tree by context clicking on it, and choosing Charge OPC to Callers.

That's going to tell Instruments to take all the samples that came from lib OPC and remove them from the call data, but keep the time as attributed to the parent frames that called them.

I tend to treat those objective C runtime functions as just the cost of doing business when writing objective C code.

It's rarely the case that I'm going to attempt to optimize them out, so I just prefer to remove them from the data, so I can focus on the things that I'm likely to take action on.

Another very powerful filter that you can apply, and one that I'm going to use to remove all these small samples that occurred during this set of frames, is here in the call tree constraint section.

Let me show you.

I'm going to tell Instruments that I would only like to see areas of the trace that accounted for let's say 20 or more samples.

I'm picking 20 because I know that I've selected about a two second interval and 20 milliseconds is going to represent about 1% of the total work, and that is about the granularity that I like to work at by default.

So with call tree constraints set to a minimum of 20, I now focus this down much more significantly.

Now, I mentioned here that we were expanding out my view items. I see that in the fact that we're calling NS outline view, expand item, expand children.

Now, a lot of people would stop with the call graph at this point.

They'd see I'm calling into a system framework, and I'm spending a lot of time there.

This isn't my fault, right? What can I do about this? I can't optimize NS Outline View, Expand Items.

You absolutely have the power to influence these situations.

For example, the system framework could be spending all of this time because it's operating on data that you provided it.

It could be taking a lot of time because you are calling this method thousands or millions of times.

It could be taking a lot of time because it's calling back into your code through delegation. And most importantly, you can get an insight into what the system framework is doing by expanding down through the Instruments tree, and looking at the names of functions that are being called.

In fact, that's exactly how I learned to fix this bug. As I expand the trace into the outline view, I can see that it is calling these two methods here.

Batch Expand Items with item entries, expand children, and do work after end updates.

Now, those are big clues to me that there is probably some opportunity for efficiency through batching. As you could imagine, the outline view starts with a small set of items, and then we are trying to restore expansion state in this area of our code, and so we are telling it to open, for example, the top item. And when we tell it to open the top item, internally you might imagine that it moves all the other items down.

Then you ask me to expand the second item.

It moves all the items down again.

And the third item, and so on. And by the time you're done, you've moved those bottom items down thousands of times.

That is all redundant work, and that is exactly the sort of thing I'm looking to eliminate when I'm trying to improve performance.

Now the fact of these method calls talk about batching leads me to believe that there is probably some API where I can ask the outline view to do the work in bulk so it computes all the positions just once, instead of over and over again as I make the calls.

I also see a call that says to do the work after end updates.

Now, sometimes an API will offer sort of bulk method that operates on an array, and other times, it will offer a sort of transactional API that says I'm going to begin making changes, then you make a bunch of changes, and then you say you're done, and it computes something that happened for the whole range of your changes, more efficiently than if it had done them all individually.

So at this point, I would head over to the NS Outline View, or NS Table View API, and I would look for some such method.

And there is exactly one there. In NS Table View, there is methods for beginning and end updating, that allow the table view to coalesce, and make all this work significantly more efficient.

Of course, we adopted that in Xcode 10.

Let me show you. I'm going to launch Xcode 10.

I'm going to open the source as an application, and I'm going to create a couple of tabs.

And you can see, there is no awful flashing, and the tabs open much more quickly.

Now, I'd like the tabs to open even quicker than that, right? So what am I going to do next? I got lucky here.

It's not every day that you're going to go into the trace, and find something so obvious and easy to fix, that is responsible for 50% of the sample.

Right? In fact, there is not going to be any other huge lead sitting there waiting for me.

Instead, what I'm going to need to do is go through that whole sample, with those course filters applied, so I'm only looking at operations that take about 1% of the time or more, and I'm going to look for every single thing that I see that I think I can come up with some mechanism for making a little bit faster.

I'm going to note them all down on a piece of paper or in a text document or something, and then I'm going to start solving them.

Now, I need to pick an order to solve them in, right? Because sometimes the fifth thing on the list, fixing it with an obsolete, whatever fix you would do for the second thing on the list, and it feels bad to do them in the wrong order, such that you did redundant work, because that's the whole thing we're trying to remove in the first place, is redundant work.

But it's very hard to predict how these things are all going to play out. And you often can't know until you've already done the work.

So do not let this stop you from getting started, because you're going to get your second 30% improvement by stacking 10 3% improvements.

Okay? Now, I want to go back to the slides, and show you some of the techniques we typically use to make those continued improvements.

Far and away, the thing that comes up the most frequently is using those same techniques the outline view was using. Batching and deferring, right? You have an API, and when the API is called, it has some side effect.

And then you have some code calling your API in the loop. That's what you're doing-- the primary piece of work that is being requested, and having a side effect.

Well, if no one was reading the result of the side effect, then you're doing that work redundantly, over and over again.

You can often get a much more efficient interface by using a batch interface, where a client gives you an array or some sort of collection of all the work to be done, so that you can compute that side effect just once.

Now, sometimes you have many clients, right? And they can't batch across each other, and you can get even-- you can still get that same style of performance through deferring the work and doing it lazily.

A third easy way to improve performance is you look through that instrument's trace, is to find areas where you see the same thing being computed over and over again.

For example, you have a method in its computing, the size of some text, then you see the same thing happening several frames later, for the same text, and again, and again.

Now, in this situation, of course, you want to try to just compute that value one time.

Compute it at the top, and pass it down or maybe cache it.

Another technique you have available in your UI applications is considering how many views you are using to render your UI. It can be very great for your source code organization to use very small views, with small sets of functionality, and to compose them together into larger pieces.

But the more views you use, the harder you tax the rendering and layout systems.

Now, this is a two-way street, because smaller views often led you to have more fine-grain caching, which can be good for performance as well.

But generally, you can tweak the number of views that you have in order to have a significant impact on performance. It is not always best to have fewer views, otherwise all of our applications would just have one giant view for the whole thing.

Another technique that comes up pretty frequently is using direct observation.

We often have two areas of our source code that are loosely coupled. Maybe one area knows about the other, and they're communicating with each other through some indirect mechanism.

Maybe they're using NS Notification Center, some block-based call backs, delegation, or key value observing.

Now something that I see very frequently is we'll have some model code, and it's going in a loop, being changed, and every time it is going to that loop, it is firing lots of KVO notifications. You can't actually see that in the model code, of course, but over in some other controller, it's madly responding and trying to keep up with whatever is changing in the model, and you're burning lots of CPU time doing this, that ends up being redundant when you consider the whole scope of changes.

Now, if this was direct callouts from the model code, either through notifications, delegation or manual block-based call backs, it would be much more obvious that this was happening as you edited that model code.

And you might decide that it is totally appropriate to pull some of those notifications out from inside the loop to outside the loop, to have a big impact on performance.

Now, alternatively, on the controller side, you could use one of these deferring and batching techniques to avoid the redundant work and just not respond synchronously.

Last, this is an easy one.

Once your code is already on the happy path, you know, it's already linear, and it's not going to get any better than linear. That's sort of the minimum performance that you're going to get.

You're after all the constant time improvements that you can. Now, an easy one is that if you're using dictionaries like they were objects, then you probably know you're doing this, if you have a bunch of string constants for all the keys, then you can get a big improvement to code clarity, to code completion, to re-factoring, to making the validating your source code, by using specific types. It couldn't be easier with strucks and swift with their implicit initializers and conformance to equitable hash.

And this can just be hands-down an improvement to your source code, and you'd be surprised at how much time you're spending in string hashing and string equation if you were doing this millions of times on lots of small objects.

So with that, I'd like to turn it over to Matthew to talk to you about how we've applied these techniques inside of photos.

Thanks Jim.

Hi everyone. I'm Matthew Lucas, an engineer in the photos team, and today I want to give you some practical examples on performance from directly from photos.

So first, let's talk about photos for a second.

We are all familiar with this app.

It lets you store, browse, and experience your favorite moments.

So you can browse your favorite moments from the moments view, that you can see here. It's is the default view.

But you can also get another view from the collection, or the years.

And I'll talk more about this view later.

Now, libraries today can go from 1,000 to 100,000 assets previous depending on your love for photography.

And we all love capturing those fun and precious moments we live every day.

So we are patient enough to capture them, but we are less patient when something like this appears.

How would you feel if something moments like this would be displayed in Photos the first time you launch the app? Now, you may also experience something like this, where we are showing a lot of placeholders, and that's really not great.

Maybe you're soft scrolling, you'll be lost in this gray area, the would start to load, but then you'll keep scrolling and then you'll experience some frame drops because the views are being updated.

Well, our goal is to not show views like this.

We think this is not providing a great user experience, but we understand that sometimes it's unavoidable.

But when it's too frequent, this isn't really great.

Now, when you work on an app, you want to make sure that it's responsive, and usable at once.

You also want to make sure that the animations are smooth.

And these two attributes are really crucial to providing a great user experience.

If the users don't find your app relatable or pertinent, they might stop using it.

Now, to illustrate these two points, I would like to give you two examples.

And the first one is going to be how we optimize launching to this moment view.

The second one is how we build the collections and years view for good scrolling preference.

First, let's do launching .

So what is launch? There are three kinds of launches.

The first and more expensive one is the find referred as called, and it depends the first time you are going to relaunch your app after it reboots.

So basically, nothing has been cached yet, and it might require some bug run processes or some libraries to load.

Now, it also happens when the system goes under memory pressure and starts reclaiming some memory.

Now, if you kill an app, it might not trigger a code launch, because the system decides when the resources should be paged out. And when you kill an app, and you relaunch it a few second later, it's almost guaranteed that you'll hit a warm launch.

And we call it warm, because the resources or the dependents are still in the cache, so it's faster to launch.

Now, the last type is-- we call it hot, and it's basically a resume, because it's when your app is already running and is being brought back to the foreground.

So when you start measuring launch, you should start by measuring the warm launch.

And the time it takes to launch during this warm is less variable than the cold launch, and the test iteration is much faster as you don't need to reboot your device.

Now, the way we measure launch is by evaluating the time it takes from the moment you hit the application icon, and until you can start interacting with the app.

And what I mean by interacting is that it's really using and not interacting with a spinner.

A common pattern is to dispatch some work and display a spinner in the meantime, well that doesn't make the app usable sooner, so we are trying to avoid that here.

Now there are three goals that we are shooting for at Photos, and the first one is that we want to instant, we don't want to display any spinner, and we don't want to display any placeholder or .

And I Have to be honest with you, we-- you might see some placeholders the first time you synchronize with iClub, but when the data is local, we really try our best to not display any. Now, what do we mean by instant? Well, the time it takes to launch should be the same time as the zoom animation from the home screen.

That is usually between 500 and 600 milliseconds, and that way, the transition from the home screen to the application is seamless for the user, and the user can start interacting with it, as soon as the animation is done.

And by the way, this is the lowest recommendation, not something just for photos, so it's valid for any apps. Now, let's look at how photos launches today.

If we look more closely at what is happening exactly, you can see that photos is all set up and ready before the animation is done.

And if we dive into the launch anatomy, you will see there is mainly two parts.

The first part is being spent in DYD, this is the loader that is going to load and link all of your dependent libraries, but it's also going to run your static initializers.

And your control over that part is limited, but it's not impossible.

I would encourage you to watch the DYD session from last year in order to get more details on that part.

Now DYD is also calling Main in your object table, which leads us to the second part here, where you have lots of control over, and this part, you need to make sure that it stays under 500 milliseconds.

Now, the first pass that is being scheduled right after the Did Finish launching will mark the end of your launch, and this is basically when your app should be usable.

There are a few principles that we will be referring to during this session, and these are really the common pillars of the performance work that we achieved.

The first one is that we want to be lazy and defer the work that we don't need.

The second one is that we want to be proactive, and it's valid for two things. It's valid for being proactive in order to anticipate the work that we are making it later, we also want to be proactive and catch regressions quickly, so you should make sure that you have continuous integration testing in place.

And the last point is we want to be constant, regardless of the total amount of data that we need to load.

Now, if we were taking a naïve approach, and we were loading everything we needed during launch, this is how long it would take roughly for a 30,000 item library.

First you need to initialize the database, then you need to prepare some view controllers.

You need to configure the data sources, load some library images, and fetch the cloud status.

And keep in mind that this might vary as the data grow, and in fact, the data will grow forever as people takes pictures every day.

So at Photos, really keep in mind that we are dealing with a non-bonded data sets. Now, let's see how we optimize each of these steps for Photos, and let's start with initializing the database.

So first, usually, the database is initialized and loaded when the first query is being fired.

One optimization that we have found was to do it as early as possible in the background thread, so that it doesn't have to do the initialization when the first query has been fired.

And this is an issue, especially if the first query is being done from the main thread.

Now, we spend a lot of time and we are still spending a lot of time reviewing all the queries that we're doing during launch, and we want to make sure that the work that we are doing is only the necessary one, and we are not doing more.

Now, lastly, we want to ensure that all the queries that we are doing are efficient as possible, and we want to avoid the complex query as much as possible as well.

And we sometimes we understand that we need this, and for these cases, we are setting up some indexes, so that we can speed them up.

Now we are aiming for, at most, 30 milliseconds spent in that initialization.

So next, let's look at how we are preparing our view controllers.

So we have four tabs representing the main features of the app. And so the first thing that we need to be careful of is we want to minimize the work that is being done in the initialization of these three non-visible ones, and the rule that we are trying to follow here is to do as little work as possible in the initializers.

We really want to do the bare minimum, and note all the data in the view that loads.

This also allows us to initialize our controllers in constant time.

Now, lastly, we also want to ensure that only the visible views are loaded.

It's easy, and we often regress on that part, so you should really be careful about that.

So preparing the view controllers, we are now aiming for 120 milliseconds.

But preparing view controllers implies configuring the data sources, and let's look at that chunk next.

So the Moments view is a representation of these things, events in your life, and the UI represents that by having this group of photos, and these headers.

In this library, for example, we might have 500 moments, and in order to build a view, we need to load all the moments up front.

But the only thing we need really for these moments is only the meta data so we can build the view.

We don't need your content.

So the first thing we do is we fire that query, which is super fast. And then we are only loading the content that we need here.

In that case here, we are only going to load the visible content, which in our case is going to be between 7 to 10 Moments.

Since our deficit is limited, and finite, we can allow ourselves to do it synchronously on the main thread.

Now, we also want to anticipate and schedule the work so that we can start loading the remaining data as synchronously.

And we do that on the bug run thread, with the right quality of service to make sure that it doesn't preempt the main thread from running.

Now we are aiming at 100 milliseconds here.

So lastly, our data sources are also providing some images and let's see how we optimize that part.

So this was by far the biggest chunk here that we are all attacking, and when we realized that we were spending multiple seconds loading this image during launch, we realized that we were doing too much work.

So the first thing that we did is that we evaluated the number of images that we needed during launch, and we are only loading that during that first transaction.

In that case, that can be up to 60 including some piling above and below.

And next, in order to load those images firstly, we need to make sure that we are all loading only low-resolutions one.

That way we are loading fewer pixels in memory, and it is much more efficient.

That chunk is now representing 200 milliseconds.

And this is, by far, the biggest gain that we had.

Which I need to be a constant time, and that's really great.

Now, sometimes you have to ask yourself the question, is this really needed during launch? And one of our examples here is this footer view.

That pulls information via the network or the database, and literally first our design was to not show it during launch.

To prioritize all the images that we are seeing here.

We wanted to show as much images as possible.

So that may be simpler.

We are now only scheduling that work post-launch, and we cache to process information for raising later.

Now, if we would have had the requirement of displaying this information, one approach could have been to leverage the register background at refresh API from UA kit, that will proactively clear your app so that you can start preparing some content when the user is going to launch your app.

So now, that part has gone from launch, and that saves us 400 milliseconds of CPU time.

If we look at the updated breakdown here, we can see that we now have only 450 milliseconds worth of work.

We are now fitting into that 500 millisecond time window, and regardless of how things can be represented concurrently here, the most important part of that is to really make sure that you think about the cost of preparing your content.

And what I mean by think is really measure it.

Now, you should strive for doing work in constant time, regardless of the total amount of data you are loading.

In our case, really have unbonded data assets, and we need to stay constant.

Now that we have launched the app, we need to start using it. And let's see how we did collections and for good performance. So as I mentioned earlier, our users can seamlessly transition with animation from the Moments, through the collections, to the years view.

And this is a complex hierarchy.

We have thousands of pictures to display.

We need to support live updates, we need to also support animation between these layers, and we also have some gestures.

Now, we also have some goals here.

For the experience we want to provide to our users.

The first one is the same as before, we don't want to have any spinner. We don't want to have placeholders, but we also want to have smooth animations. And by smooth animations, I mean 60 or 120 frames per second, depending on the screen you're running on.

Now, remember the principles that we've seen before.

Well, they are all applicable here.

We want to be lazy and defer the work we donate up front.

We want to be proactive, and catch regressions quickly, but we also want to be constant in our layout passes, and regardless of a lot of data that we are loading.

Now, this time, we also want to be timely, and we want to remember the rendering loop cycle.

And what I mean by that is that I want you to remember that we only have 8 or 16 milliseconds to render that frame, so we need to make sure that we are not going over that time, otherwise we would start dropping frames.

Now, let's take a step back, and look at what we are trying to achieve here.

We wanted to have this portable view, with sections and mini cells in it.

And that is basically what your Collection view is providing, right? Except that in this extreme case, we are restricting the limit of what we could achieve with a basic approach. And that resulted in too many views, too many layers.

But also in an increased layered complexity, and that also had an increased memory cost.

So we needed to innovate here, and we did that by restricting the number of views drastically while still using a collection view.

We used a technique more commonly used in video games, that is called atlasing.

And it basically consists of combining a set of images into a single one.

We do that efficiently by using only very small thumbnails first, then we stamp all the raw image data on the canvas we are using as a strip.

Now, we use image raw data so that we can avoid decoding each thumbnail as we send.

So basically we are displaying a random strip of images.

Now, we generate and cache them on the fly so that we can be more flexible.

And as we render multiple images into a single one, we are registering the number of cells, layers, objects drastically, which simplifies the layout and the time spent building it.

Now, the separate works well, but it has trade offs to consider as well, and this is one of them.

So if someone tries to long press or force search an item here, we will need to figure its position so that we can achieve the preview correctly.

And as we display a single image, we need to maintain the mapping of each individual image, and its render strip.

Now, you might be thinking, why are we generating them on the fly? Well, we need to support live updates, that's the reason.

We need also to support different view sizes.

For example, we have landscape here.

But we also have portraits.

And also we can do that because we can because our user's labor typically grows organically over a long period of time, and the cases where we might need to generate thousands of them are pretty rare.

Now, you may be wondering also why are we not generating the whole section then? Well the answer is that our design record is to do this cool animation, where you can see that the collections are expanding into their own sections or collapsing into group ones, and the other way around.

So if there is one thing that you should also remember from that second part is you should really think about the layout course of your hierarchy and measure it.

Lastly, you should always think about performance.

At Photos, we care deeply about it, and this is really part of our daily job.

For more information, you can come and see us in these three labs that are mentioned here, and I hope that you have a great conference.

Thank you.