In Part 1 of this blog post I discussed why I chose to implement this application using the Ada Web Server to serve the computed fractal to a web browser. In this part I will discuss a bit more about the backend of the application, the Ada part. Why do we care about performance? The Ada backend will compute a new image each time one is requested from the front-end. The front-end immediately requests a new image right after it receives the last one it requested. Ideally, this will present to the user as an animation of a Mandelbrot fractal changing colors. If the update is too slow, the animation will look slow and terrible. So we want to minimize the compute time as much as possible. We have a few ways to do that: optimize the computation, and parallelize the computation. Parallelizing the Fractal Computation An interesting feature of the Mandelbrot calculation is that the computation of any pixel is independent of any other pixel. That means we can completely parallelize the computation of each pixel! So if our requested image is 1920x1280 pixels we can spawn 2457600 tasks right? Theoretically, yes we can do that. But it doesn’t necessarily speed up our application compared to, let’s say, 8 or 16 tasks, each of which computes a row or selection of rows. Either way, we know we need to create what’s called a task pool, or a group of tasks that can be queued up as needed to do some calculations. We will create our task pool by creating a task type, which will implement the actual activity that the task will complete, and we will use a Synchronous_Barrier to synchronize all of the tasks back together.

task type Chunk_Task_Type is pragma Priority (0); entry Go (Start_Row : Natural; Stop_Row : Natural; Buf : Stream_Element_Array_Access); end Chunk_Task_Type; type Chunk_Task is record T : Chunk_Task_Type; Start_Row : Natural; Stop_Row : Natural; end record; type Chunk_Task_Pool is array (1 .. Task_Pool_Size) of Chunk_Task; S_Task_Pool : Chunk_Task_Pool; S_Sync_Obj : Synchronous_Barrier (Release_Threshold => Task_Pool_Size + 1);

This snippet is from fractal.ads. S_Task_Pool will be our task pool and S_Sync_Obj will be our synchronization object. If you notice, each Chunk_Task_Type in the S_Task_Pool takes an access to a buffer in its Go entry procedure. In our implementation, each task will have an access to the same buffer. Isn’t this a race condition? Shouldn’t we use a protected object? The downside of protected object The answer to both of those questions, is yes. This is a race condition and we should be using a protected object. If you were to run CodePeer on this project, it identifies this as a definite problem. But using a protected object is going to destroy our performance. The reason for this is because under the hood, the protected object will be using locks each time we access data from the buffer. Each lock, unlock, and wait-on-lock call is going to make the animation of our fractal look worse and worse. However, there is a way to get around this race-condition issue. By design, we can guarantee that each task is going to access the buffer from Start_Row to Stop_Row. So, by design we can make sure that each task’s rows don’t overlap another task’s rows thereby avoiding a race condition. Parallelization Implementation Now that we understand the specification of the task pool, let’s look at the implementation.

task body Chunk_Task_Type is Start : Natural; Stop : Natural; Buffer : Stream_Element_Array_Access; Notified : Boolean; begin loop accept Go (Start_Row : Natural; Stop_Row : Natural; Buf : Stream_Element_Array_Access) do Start := Start_Row; Stop := Stop_Row; Buffer := Buf; end Go; for I in Start .. Stop loop Calculate_Row (Y => I, Idx => Buffer'First + Stream_Element_Offset ((I - 1) * Get_Width * Pixel'Size / 8), Buffer => Buffer); end loop; Wait_For_Release (The_Barrier => S_Sync_Obj, Notified => Notified); end loop; end Chunk_Task_Type;

The body of each Chunk_Task_Type waits for a Go signal from the main task. When it receives the go, it also receives an access to the buffer and the start and stop row to work on. It then calculates the pixels for the rows. When it finishes it calls the Wait_For_Release procedure of the Synchronous_Barrier object which blocks until all tasks have completed their work. Once all the tasks have checked in with the Synchronous_Barrier using the Wait_For_Release call, all tasks are released. The main task grabs the buffer with the new image and hands it back off to the AWS server to be sent out and the worker tasks go back to hanging on their entry calls. The bigger the task pool… Now we come back to the size of the task pool. Based on the implementation above we can chunk the processing up to the granularity of at least the number of rows being requested. So in the case of a 1920x1280 image, we could have 1280 tasks! But we have to ask ourselves, is that going to give us better performance than 8 or 16 tasks? The answer, unfortunately, is probably not. If we create 8 tasks, and we have an 8 core processor, we can assume that some of our tasks are going to execute on different cores in parallel. If we create 1280 tasks and we use the same 8 core processor, we don’t get much more parallelization than with 8 tasks. This is a place where tuning and best judgement will give you the best performance. Fixed vs Floating Point Now that we have the parallelization component, let’s think about optimizing the maths. In most fractal computations floating point complex numbers are used. Based on our knowledge of processors, we can assume that in most cases floating point calculations will be slower than integer calculations. So theoretically, using fixed point numbers might give us better performance. For more information on Ada fixed point types check out the Fixed-point types section of the Introduction to Ada course on learn.adacore.com . The Generic Real Type Because we are going to use the same algorithm for both floating and fixed point math, we can implement the algorithm using a generic type called Real. The Real type is defined in computation_type.ads.

generic type Real is private; with function "*" (Left, Right : Real) return Real is <>; with function "/" (Left, Right : Real) return Real is <>; with function To_Real (V : Integer) return Real is <>; with function F_To_Real (V : Float) return Real is <>; with function To_Integer (V : Real) return Integer is <>; with function To_Float (V : Real) return Float is <>; with function Image (V : Real) return String is <>; with function "+" (Left, Right : Real) return Real is <>; with function "-" (Left, Right : Real) return Real is <>; with function ">" (Left, Right : Real) return Boolean is <>; with function "<" (Left, Right : Real) return Boolean is <>; with function "<=" (Left, Right : Real) return Boolean is <>; with function ">=" (Left, Right : Real) return Boolean is <>; package Computation_Type is end Computation_Type;

We can then create instances of the Julia_Set package using a floating point and fixed point version of the computation_type package.

type Real_Float is new Float; function Integer_To_Float (V : Integer) return Real_Float is (Real_Float (V)); function Float_To_Integer (V : Real_Float) return Integer is (Natural (V)); function Float_To_Real_Float (V : Float) return Real_Float is (Real_Float (V)); function Real_Float_To_Float (V : Real_Float) return Float is (Float (V)); function Float_Image (V : Real_Float) return String is (V'Img); D_Small : constant := 2.0 ** (-21); type Real_Fixed is delta D_Small range -100.0 .. 201.0 - D_Small; function "*" (Left, Right : Real_Fixed) return Real_Fixed; pragma Import (Intrinsic, "*"); function "/" (Left, Right : Real_Fixed) return Real_Fixed; pragma Import (Intrinsic, "/"); function Integer_To_Fixed (V : Integer) return Real_Fixed is (Real_Fixed (V)); function Float_To_Fixed (V : Float) return Real_Fixed is (Real_Fixed (V)); function Fixed_To_Float (V : Real_Fixed) return Float is (Float (V)); function Fixed_To_Integer (V : Real_Fixed) return Integer is (Natural (V)); function Fixed_Image (V : Real_Fixed) return String is (V'Img); package Fixed_Computation is new Computation_Type (Real => Real_Fixed, "*" => Router_Cb."*", "/" => Router_Cb."/", To_Real => Integer_To_Fixed, F_To_Real => Float_To_Fixed, To_Integer => Fixed_To_Integer, To_Float => Fixed_To_Float, Image => Fixed_Image); package Fixed_Julia is new Julia_Set (CT => Fixed_Computation, Escape_Threshold => 100.0); package Fixed_Julia_Fractal is new Fractal (CT => Fixed_Computation, Calculate_Pixel => Fixed_Julia.Calculate_Pixel, Task_Pool_Size => Task_Pool_Size); package Float_Computation is new Computation_Type (Real => Real_Float, To_Real => Integer_To_Float, F_To_Real => Float_To_Real_Float, To_Integer => Float_To_Integer, To_Float => Real_Float_To_Float, Image => Float_Image); package Float_Julia is new Julia_Set (CT => Float_Computation, Escape_Threshold => 100.0); package Float_Julia_Fractal is new Fractal (CT => Float_Computation, Calculate_Pixel => Float_Julia.Calculate_Pixel, Task_Pool_Size => Task_Pool_Size);