The Mind-Bending World of Heuristic Emergence Might Mean We're Artificial

1,691 reads

@ smepals David Mercer David contributes to SME Pals, a blog aimed at helping startups and online businesses.

Imagine winning the lottery!

reactions

I know. What are the chances, right?

reactions

For a sec, humor me.

reactions

Pretend we’re playing to win the Powerball jackpot. In your mind, pick 5 correct numbers plus the correct Powerball. Calculating the odds is not hard provided you have a calculator handy:

reactions

5/69 x 4/68 x 3/67 x 2/66 x 1/65 x 1/26 = 120/35 064 160 560 = 1/292 201 338

reactions

For argument’s sake let’s say it’s two hundred and ninety million to one. Two hundred and ninety million can be written in a far more succinct way using scientific notation:

reactions

2.9 x 10⁸

reactions

That’s 2.9 multiplied by 10 eight times in a row.

reactions

That’s a pretty big number. So big that it’s easy to simply discard it as a “big number” without really appreciating how big it is.

reactions

To get some perspective we can compare it to, say, the chances of being struck by lightning. Most of us aren’t particularly worried about being struck by lightning because the chances are too small.

reactions

According to National Geographic, the chances of being struck by lightning in the U.S. over the course of a single year are 700 000 to 1, or 7 x 10⁵.

reactions

So how do the odds stack up?

reactions

You have to live for a shade over 417 years before the odds of being struck by lightning match the odds of winning the lottery. reactions

With me so far?

reactions

Great, that’s our warm-up done. Let’s start looking at some real numbers.

reactions

Off the top of your head, what’s the smallest thing you can think of?

reactions

Did Hydrogen atom come to mind?

reactions

Ok sure, there are plenty of things smaller than a Hydrogen atom — electrons, quarks, a host of sub-atomic particles, the Planck length, and so on.

reactions

For our purposes a single atom of Hydrogen will suffice. Primarily because it is the most abundant element in our Universe (assuming there isn’t Dark Hydrogen) so it makes a great building block for big number analogies.

reactions

To get an idea of just how small Hydrogen is, we can think about how many molecules of water H2O there are in a single 250ml cup. Using something called the molar mass of water we known that 18 grams of water contains approximately:

reactions

6.022 x 10²³

reactions

molecules of water. Let’s assume we have a 250ml cup. We know the total number of molecules will be:

reactions

250/18 x 6.022 x 10²³ = 8.36 x 10²⁴

reactions

Of course, a water molecule has two Hydrogen atoms, so our total amount of Hydrogen atoms comes to a grand total of:

reactions

1.67 x 10²⁵

reactions

Now that’s a real number.

reactions

Estimates based on the volume of all the water in all the oceans on Earth put the number of 250ml cups of water in the oceans at:

reactions

5 x 10²¹

reactions

This yields a rather startling result.

reactions

There are more molecules in a cup of water than there are cups in all the water. reactions

We can go ahead and calculate the total number of all Hydrogen atoms in all the water on Earth by simple multiplication:

reactions

1.67 x 10²⁵ x 5 x 10²¹ = 8.35 x 10⁴⁶

reactions

That’s getting up there, but we can still think bigger.

reactions

The Sun.

reactions

Around 99.8% of all the mass in our solar system is held in the Sun. It’s pretty big.

reactions

It’s also mainly Hydrogen (although it has been fusing this into Helium and other heavier elements), which makes it perfect for our purposes. Estimates for the number of Hydrogen atoms in the Sun sit at slightly over:

reactions

10⁵⁷

reactions

You might be tempted to think that the number of Hydrogen atoms in all the Earth’s water is pretty close to that number. It’s only about 10¹⁰ less after all. In fact, this difference is still about 34 times the odds of winning the lottery, so it’s bigger than it appears.

reactions

Still, the Sun is only one star in a pretty big galaxy.

reactions

Our Milky way is a barred spiral galaxy approximately 100 000 light years across containing approximately 100 billion solar masses. Approximating the number of Hydrogen atoms in our entire galaxy is simple:

reactions

10⁵⁷ x 10¹¹ = 10⁶⁸

reactions

The total number of stars in the entire Universe is estimated at 10²³. Putting the total number of Hydrogen atoms in all the stars that exist:

reactions

10⁵⁷ x 10²³ = 10⁸⁰

reactions

Now that’s a pretty rough estimate, but it’s also a pretty big number.

reactions

Except that it’s not.

reactions

It’s a tiny number. Negligible. Minuscule.

reactions

Compared to the complexity of NP-Hard (Non-Deterministic Polynomial Hard) problems, it’s nothing.

reactions

10⁸⁰ would have to go to Kindergarten, school, college and get ten years of work experience before it could even be compared to the most modest NP-Hard problems. reactions

One famous type of NP-Hard problem is known as the Travelling Salesman Problem (TSP). Essentially the goal is to work out the most efficient way to visit a bunch of locations. For real world purposes we generally want to consider a slight variation known as the multiple Travelling Sales Problem, or mTSP.

reactions

The mTSP has a lot of practical applications, most notably in the field of route optimization. Delivery companies want to ensure that their fleet of vehicles is operating as efficiently as possible in order to save time and money on fuel, labor, wear and tear, and so on.

reactions

Consider a small delivery company with a fleet of 5 vehicles and 100 deliveries to fulfill. By no means an unreasonable real-world scenario.

reactions

For this type of problem there is no way to know whether or not a given solution is the best without checking every single possibility.

reactions

On the face of it this doesn’t seem like it should be too hard. All we need to do is work out the number of possibilities, try each one and pick the best.

reactions

To work out how many potential solutions there are we can repeat the following process until all the locations have been visited:

reactions

1. Pick 1 of 5 vehicles to move to 1 of the 100 locations.

reactions

2. Pick 1 of 5 vehicles to move to 1 of the remaining 99 locations.

reactions

3. Pick 1 of 5 vehicles to move to 1 of the remaining 98 locations.

reactions

4. …

reactions

100. Pick 1 of 5 vehicles to move to the final remaining location

reactions

To calculate this would be a pretty long formula:

reactions

(1/5 x 1/100) x (1/5 x 1/99) x (1/5 x 1/98) x (1/5 x 1/97) x … x (1/5 x 1/1)

reactions

To get the total number of permutations we can write this far more succinctly:

reactions

Vᴸ x L!

reactions

Where V is the number of vehicles and L is the number of locations. The exclamation mark is shorthand for writing out the sequence of multiplication (in our case starting at 100):

reactions

100 x 99 x 98 x 97 x … x 1

reactions

Plugging in the number of vehicles as 5 and the number of locations as 100 we get the following number:

reactions

7.88 x 10⁶⁹ x 9.33 x 10¹⁵⁷= 7.36 x 10²²⁷

reactions

Bear in mind that the total number of Hydrogen atoms in all the stars in all the galaxies in the entire known Universe only comes to 10⁸⁰. Even if we’ve underestimated the size of the Universe by a factor of one million the number only goes up to 10⁸⁶.

reactions

We don’t really have the language to describe how much smaller that number is than the complexity of our run-of-the-mill mTSP.

reactions

With that many permutations to check we’re going to need to start thinking about how long it might take us.

reactions

The SUMMIT supercomputer built by IBM for the Oak Ridge National Laboratory has the fastest processing capability in history (for now). This is measured in something called FLOPS, which stands for Floating Point Operations Per Second.

reactions

SUMMIT manages a respectable 200 petaflops. Peta denotes 10¹⁵, so we can write the number of petaflops from SUMMIT as 2 x 10¹⁷. Impressively fast.

reactions

Let’s be extremely generous and assume that a single flop is equivalent to a complete check of one possibility in our problem. Essentially, this means we could test 2 x 10¹⁷ possibilities every single second.

reactions

After calling in some favors, we secure some processing time, load up the problem and go grab a cup of coffee while we wait. And wait. And wait. And …

reactions

Here’s the thing. At 2 x 10¹⁷ checks per second we’re going to take:

reactions

7.36 x 10²²⁷ / 2 x 10¹⁷ = 3.68 x 10²¹⁰ seconds

reactions

Divide by the number of seconds in a minute, minutes in an hour, hours in a day, days in a year and we get:

reactions

1.16 x 10²⁰³ years

reactions

This is going to make us unpopular with other scientists waiting for their turn.

reactions

Our best estimates tell us that the Universe is currently 13.8 billion years old and that it will likely last another 5 billion years — regardless of whether it’s going to be a heat death, big crunch or big tear.

reactions

5 billion years, or 5 x 10⁹, is what we have. 1.16 x 10²⁰³ is what we need to complete our calculations. That’s a bit depressing.

reactions

Trillions of SUMMIT supercomputers with trillions of years more than the lifespan of the Universe would not be able to check each permutation in our modest size NP-Hard problem. reactions

We’re going to need to find a different way of tackling this problem in order to help the delivery company lower their costs.

reactions

One possibility would be to use quantum computing as this would allow us to check (theoretically) infinite possibilities simultaneously. Unfortunately, we’re not quite production ready so that might still be a few years off.

reactions

We’re stuck with traditional computing for the moment.

reactions

If the hardware can’t change, our methods must.

reactions

Instead of using brute force to check every possibility we can try comparing guessed solutions (approximations) over and over. If we’re sensible about how to generate those guesses and clever about the lessons derived from the comparisons, it might be possible to produce a really good outcome (even if we can never prove it’s the best possible outcome).

reactions

Algorithms that compare alternatives (as opposed to traditional algorithms that are procedural — i.e. take predefined set of steps to arrive at an outcome) are known as heuristic algorithms. These emerge solutions over time and often take their inspiration from nature.

reactions

The word emerge here is pivotal.

reactions

Emergence occurs when a system exhibits properties not observed in any of its constituent parts. reactions

Our solar system is a good example. From a cloud of gas and dust acted on by gravity, emerged a star with planets and moons, and life itself — none of which existed initially but emerged over time.

reactions

Ant Colony Optimization (ACO) mimics the behavior of ants foraging for resources around their nest. Initially ants head out in all directions. If one finds food it heads back to the nest, leaving a faint trail of pheromones that may be picked up by successive waves of ants.

reactions

If other ants find food or resources in the same place, they also leave a trail of pheromones leading back to the nest. Over time these pheromone trails build up causing more and more ants to follow them directly. After a while very direct and efficient lines of transport emerge between the nest and surrounding resources.

reactions

How might this apply to the mTSP?

reactions

One way would be to make 10 (or a hundred, or a thousand) random guesses at the solution and record how much each one cost. The lowest cost guess could have each constituent route marked with a digital pheromone.

reactions

Repeat this process again and again to build up ever stronger trails of digital pheromones that can start influencing successive guesses ever so slightly — with a bias towards lower cost (since we only ever add pheromones to the lowest cost result from each iteration).

reactions

After hundreds, thousands, or tens of thousands of iterations fairly strong pheromone trails build up over what (hopefully) turns out to be a pretty efficient solution.

reactions

ACO is not the only heuristic algorithm we could use. Simulated annealing (annealing is the process of successively heating and working metal as it cools to remove weaknesses in the crystalline arrangement of atoms) and genetic algorithms also add their own unique heuristic opportunities to generate better solutions.

reactions

Different heuristic algorithms come with their own unique set of strengths and weaknesses so combining them can help produce better solutions faster.

reactions

It helps to be able to picture what 5 vehicle 100 location problem might look like in the real world.

reactions

5 vehicle 100 location route optimization map courtesy of Optergon

reactions

This screenshot shows 100 locations dotted around London (in fact, it’s a big list of museums for anyone interested in a whirlwind historical tour).

reactions

Note that, as in the real world, there are limitations placed on the formulation of the problem itself.

reactions

For example, the vehicles have operating hours. In this case, between 9am and 4pm. They have costs associated with them — Fixed, Distance & Time. In this example, distance costs (including fuel and wear and tear, etc) are set to $1 per kilometer and time costs (such as driver pay) set to $10 per hour. These would change to match the specific operational costs of the company concerned (for example, lightweight pickups might have lower distance-based costs compared to large trucks).

reactions

Locations have a time associated with them. It’s important to take into account the amount of time a vehicle would need to stop at a location in order to fulfill its task (i.e. a delivery, pickup or service).

reactions

Here’s the result.

reactions

5 vehicle 100 location route optimization solution courtesy of Optergon

reactions

There some interesting characteristics of this result that may not be immediately apparent.

reactions

First, it is possible to visit all the locations with the vehicles available in the time-frame given. This is important. Companies that are not able to optimize their routes well need to expand their fleet in order to manage their workload at a significantly greater cost.

reactions

Second, there is some overlap in the routes taken by each vehicle. More often than not we intuitively expect each vehicle to work within its own little partition of the map. In fact, this is often the result of short-cuts built into optimization algorithms intended to reduce the complexity of the problem.

reactions

In essence, the complexity of an mTSP problem may be drastically reduced by dividing up the map and optimizing V smaller problems (where V is the number of vehicles). Assigning each vehicle, a partition of the map to divide the locations more or less equally gives us the following complexity (only roughly, since there are many ways to partition a surface):

reactions

(L/V)!ⱽ

reactions

Plugging in 5 vehicles and 100 locations gives:

reactions

(100/5)!⁵ = 20!⁵ = 8.5 x 10⁹¹

reactions

A negligible fraction of the complexity of the initial formulation of the problem that came to 7.36 x 10²²⁷.

reactions

That’s a massive reduction in the size of the potential solution space. However, it doesn’t come for free. There’s always a trade-off. What you gain in reduced complexity may be lost in accuracy since you are drastically limiting the solution space available to explore.

reactions

There are many scenarios in which optimized routes will cross-over and/or overlap. Especially in a city like London where there are numerous one-way streets, arterial routes that move significantly faster than smaller, more direct routes, restrictions on time, distance and vehicle capacities, and so on.

reactions

The particular problem we have used here has a large, tight cluster of locations in the center of town with more distant outlying points surrounding the central cluster. Since there are more locations in the dense, central cluster than one vehicle can cater for, more than one needs to visit the same area.

reactions

The two vehicles that handle the bulk of this dense cluster need to travel up the same highway from the depot, before visiting their respective locations.

reactions

Partially differentiated routes courtesy of Optergon

reactions

The three other vehicles handle the bulk of the outlying locations while picking up a few of the clustered locations as part of their longer (in terms of distance) routes.

reactions

Partially overlapping routes courtesy of Optergon

reactions

What would you expect if the vehicles were allowed more time? Instead of working from 9am — 4pm they could work from 9am — 5pm.

reactions

Here’s the result.

reactions

Reduced cost & vehicle routes courtesy of Optergon

reactions

This solution is about 5% cheaper. Less vehicles were required to meet the demands of the problem posed. At the very least, using less vehicles reduces the redundant time and distance traveled to and from the depot.

reactions

Again, the result shows that it is possible for the company to meet their objectives within the specific time-frame using only 4 vehicles — even though 5 were available. Another important result since it shows how a smaller, more efficient fleet can potentially lower costs.

reactions

Until now, each of the vehicles used incurred precisely the same cost. In reality there may be significant variation over a large fleet of different vehicle types. Since this entire optimization is an exercise in reducing costs, it’s important to see the effect of non-uniform costs.

reactions

Let’s assume that one vehicle is a bit older and tends to have worse fuel consumption. Its distance cost moves up to $1.50 per kilometer. Another vehicle is being driven by a replacement driver who is on time-and-a-half, $15 per hour.

reactions

The new problem formulation looks like this.

reactions

Unique time and costs parameters courtesy of Optergon

reactions

Green Pickup I (top of the list) now has a distance cost of $1.50 per km. Blue Pickup III (second in the list) now has an hourly cost of $15. What, if any, differences do you expect in the result?

reactions

Here’s the new result.

reactions

Distance cost optimization result courtesy of Optergon

reactions

A slightly more expensive result. To be expected since costs increased.

reactions

Green Pickup I (with 50% increase in distance costs) was utilized but only traveled a total of 49km. Less than half the distance of any other vehicle.

reactions

Blue Pickup III was not used at all.

reactions

Recall we have five vehicles available so only one of the more expensive vehicles had to be used in order to successfully meet all the objectives.

reactions

Can you guess what might happen if we reduce the time-frame back to 9am — 4pm in order to force all five vehicles to be used?

reactions

Here’s the result.

reactions

Asymmetric distance & time cost optimization result courtesy of Optergon

reactions

It’s easy to see that while Blue Pickup III had to be used it spent roughly 30% less time operating than any other vehicle. While Green Pickup I traveled significantly less distance than any other vehicle, further reducing costs.

reactions

Nuances between distance and time costs can play a significant role in shaping the optimal result. Both are important. Factoring in two cost parameters adds additional complexity of the system since it must now balance asymmetric cost conditions.

reactions

On the face of things these results seem agreeable. They’re what you’d expect, right? Common sense.

reactions

Think about this for a second.

reactions

The underlying algorithms have no concept of common sense. They only manipulate and compare alternatives. Yet, with unwavering accuracy they will emerge results we expect — except in cases where they outsmart us (most likely due to conditions we could not possibly foresee at the start of the problem).

reactions

Agreeing with heuristic algorithms says more about us than it does about the algorithms.

Humans cannot hope to perform the calculations required to optimize these types of problem. This doesn’t prevent us from intuitively knowing what to expect using a wide array of abstractions, tricks and mental leaps we take for granted.

reactions

Turns out that we’re heuristics in the flesh. Our brains implement them, and both our brains and bodies are evolved using them.

reactions

An everyday task like catching a ball is a skill we have to emerge by practicing it over and over again in order to build up a significant number of “alternatives” that we compare to our current situation in real-time in order to predict the most likely outcome (and avoid getting hit on the nose).

reactions

Heuristics aren’t only used by our minds; they’re used to build our minds. reactions

DNA is the product of millions of years of genetic heuristics that emerge new and well-adapted individuals, likely to survive and procreate.

reactions

What’s weird here is that one natural heuristic process (i.e. genetic evolution) emerged another, organic heuristic process (common sense, hard-wired into our brains).

reactions

If nature has been using heuristics and emergence to produce intelligent life from nothing more than gravity and clouds of gas, perhaps tackling NP-Hard problems using heuristics is a gateway to helping us produce something far more profound than solutions to the mTSP.

reactions

Right now, heuristic techniques can help you be more effective in a number of day-to-day ways. Especially when it comes to solving creative problems (that can’t be procedurally brute forced). The SubMerge Technique, explained in section 12. of How to Make Money Blogging is a great example of how an heuristic methodology can be applied to a range of everyday creative problems - such as coming up with new article ideas.

reactions

Ultimately, however, heuristics might be a great way to emerge artificial intelligence (artificial common sense, if you like).

reactions

“Iterative changes acted on by selective pressures” is the exact scenario under which evolution emerges. reactions

With only very simple rules governing which heuristic programs “live” or “die” based on their fitness-for-purpose, such as those codified into Conway’s Game of Life, a rich and diverse ecosystem of rapidly evolving programs may establish themselves.

reactions

A good example of fitness-for-purpose in this instance might be the design of better chips, better architecture, creativity, more accurate weather pattern predictions, stock market changes, and so on.

reactions

A diverse ecosystem of programs competing to be of value to us would provide one potential definition of fitness-for-purpose for a burgeoning ecosystem. Fail to be useful and lose processing resources. Be of use and capture additional resources. An analog for competition (food & resources, such as drinking water, salt, shelter, etc) in the natural world.

reactions

Heuristic emergence in software would become the driver of artificial evolution that may ultimately lead to self-aware artificial entities in the same way that physical heuristics and emergence lead to self-aware natural organisms like us.

reactions

The only difference being that evolution in software would occur rapidly.

reactions

Extremely rapidly.

reactions

What took nature millions of years might take software months, then weeks, then days…

reactions

Whether self-aware software would want to explore our physical Universe, building Von Neumann machines to manufacture wormholes and/or ships with FTL travel (hopefully taking us along for the ride), or simply inhabit its own virtual Universe is anyone’s guess.

reactions

The former option would seem unnecessarily archaic for software unless it was concerned about solar system level extinction events, so I favor the latter (virtual Universe) option.

reactions

The Fermi paradox would suggest this is the most likely scenario, anyway.

reactions

This leads to an extremely unsettling conclusion.

reactions

Consider this principle:

reactions

One element of an unordered set of known size greater than one should not be assumed set-wise unique. reactions

Full disclosure: I made that principle up. It may well exist in one form or another elsewhere.

reactions

To demonstrate it in action, imagine a large tin of cookies.

reactions

The first cookie you grab has chocolate chips.

reactions

Is it reasonable to assume it is unique and that no other cookies in that large tin have chocolate chips? Or is it more reasonable to assume that at least some other cookies have chocolate chips?

reactions

We can apply the same argument to life in the Universe.

reactions

Consider a Universe consisting of around 10²⁴ planets. The very first planet observed (called Earth) has life in abundance. Should you assume it is unique, or is it more likely that at least some other planets have life?

reactions

We can apply this principle to Universes in turn.

reactions

Consider the following scenario (for argument’s sake, let’s call it the heuristic intelligence Universe scenario):

reactions

Heuristic intelligence emerges in one Universe sufficient to create an artificial Universe. This implies a set of Universes greater than one (it is not necessary for this to have happened here on Earth — only that it is possible for it to happen anywhere in the Universe).

reactions

With a set of known Universes greater than one, we should not assume uniqueness.

reactions

Therefore, our Universe is probably not the first one (since this would make it set-wise unique).

reactions

This leaves us with the mildly vertiginous conclusion that we may exist in some iteration of the above-mentioned process in which artificial Universes are emerged by intelligences emerged from heuristic processes.

reactions

David contributes to SME Pals - a blog for online startups - and consults to a wide array of technology startups, including Optergon. He can't shake the feeling we're running in a simulation.





reactions

Tags