I ran my own analysis of several of the algorithms in this thread and came up with some new results. You can see those old results in the edit history of this answer, but they're not accurate, as I made a mistake, and wasted time analyzing several algorithms which aren't close. However, pulling lessons from several different answers, I now have two algorithms that crush the "winner" of this thread. Here's the core thing I do differently than everyone else:

// This is faster because a number is divisible by 2^4 or more only 6% of the time // and more than that a vanishingly small percentage. while((x & 0x3) == 0) x >>= 2; // This is effectively the same as the switch-case statement used in the original // answer. if((x & 0x7) != 1) return false;

However, this simple line, which most of the time adds one or two very fast instructions, greatly simplifies the switch-case statement into one if statement. However, it can add to the runtime if many of the tested numbers have significant power-of-two factors.

The algorithms below are as follows:

Internet - Kip's posted answer

- Kip's posted answer Durron - My modified answer using the one-pass answer as a base

- My modified answer using the one-pass answer as a base DurronTwo - My modified answer using the two-pass answer (by @JohnnyHeggheim), with some other slight modifications.

Here is a sample runtime if the numbers are generated using Math.abs(java.util.Random.nextLong())

0% Scenario{vm=java, trial=0, benchmark=Internet} 39673.40 ns; ?=378.78 ns @ 3 trials 33% Scenario{vm=java, trial=0, benchmark=Durron} 37785.75 ns; ?=478.86 ns @ 10 trials 67% Scenario{vm=java, trial=0, benchmark=DurronTwo} 35978.10 ns; ?=734.10 ns @ 10 trials benchmark us linear runtime Internet 39.7 ============================== Durron 37.8 ============================ DurronTwo 36.0 =========================== vm: java trial: 0

And here is a sample runtime if it's run on the first million longs only:

0% Scenario{vm=java, trial=0, benchmark=Internet} 2933380.84 ns; ?=56939.84 ns @ 10 trials 33% Scenario{vm=java, trial=0, benchmark=Durron} 2243266.81 ns; ?=50537.62 ns @ 10 trials 67% Scenario{vm=java, trial=0, benchmark=DurronTwo} 3159227.68 ns; ?=10766.22 ns @ 3 trials benchmark ms linear runtime Internet 2.93 =========================== Durron 2.24 ===================== DurronTwo 3.16 ============================== vm: java trial: 0

As you can see, DurronTwo does better for large inputs, because it gets to use the magic trick very very often, but gets clobbered compared to the first algorithm and Math.sqrt because the numbers are so much smaller. Meanwhile, the simpler Durron is a huge winner because it never has to divide by 4 many many times in the first million numbers.

Here's Durron :

public final static boolean isPerfectSquareDurron(long n) { if(n < 0) return false; if(n == 0) return true; long x = n; // This is faster because a number is divisible by 16 only 6% of the time // and more than that a vanishingly small percentage. while((x & 0x3) == 0) x >>= 2; // This is effectively the same as the switch-case statement used in the original // answer. if((x & 0x7) == 1) { long sqrt; if(x < 410881L) { int i; float x2, y; x2 = x * 0.5F; y = x; i = Float.floatToRawIntBits(y); i = 0x5f3759df - ( i >> 1 ); y = Float.intBitsToFloat(i); y = y * ( 1.5F - ( x2 * y * y ) ); sqrt = (long)(1.0F/y); } else { sqrt = (long) Math.sqrt(x); } return sqrt*sqrt == x; } return false; }

And DurronTwo

public final static boolean isPerfectSquareDurronTwo(long n) { if(n < 0) return false; // Needed to prevent infinite loop if(n == 0) return true; long x = n; while((x & 0x3) == 0) x >>= 2; if((x & 0x7) == 1) { long sqrt; if (x < 41529141369L) { int i; float x2, y; x2 = x * 0.5F; y = x; i = Float.floatToRawIntBits(y); //using the magic number from //http://www.lomont.org/Math/Papers/2003/InvSqrt.pdf //since it more accurate i = 0x5f375a86 - (i >> 1); y = Float.intBitsToFloat(i); y = y * (1.5F - (x2 * y * y)); y = y * (1.5F - (x2 * y * y)); //Newton iteration, more accurate sqrt = (long) ((1.0F/y) + 0.2); } else { //Carmack hack gives incorrect answer for n >= 41529141369. sqrt = (long) Math.sqrt(x); } return sqrt*sqrt == x; } return false; }

And my benchmark harness: (Requires Google caliper 0.1-rc5)

public class SquareRootBenchmark { public static class Benchmark1 extends SimpleBenchmark { private static final int ARRAY_SIZE = 10000; long[] trials = new long[ARRAY_SIZE]; @Override protected void setUp() throws Exception { Random r = new Random(); for (int i = 0; i < ARRAY_SIZE; i++) { trials[i] = Math.abs(r.nextLong()); } } public int timeInternet(int reps) { int trues = 0; for(int i = 0; i < reps; i++) { for(int j = 0; j < ARRAY_SIZE; j++) { if(SquareRootAlgs.isPerfectSquareInternet(trials[j])) trues++; } } return trues; } public int timeDurron(int reps) { int trues = 0; for(int i = 0; i < reps; i++) { for(int j = 0; j < ARRAY_SIZE; j++) { if(SquareRootAlgs.isPerfectSquareDurron(trials[j])) trues++; } } return trues; } public int timeDurronTwo(int reps) { int trues = 0; for(int i = 0; i < reps; i++) { for(int j = 0; j < ARRAY_SIZE; j++) { if(SquareRootAlgs.isPerfectSquareDurronTwo(trials[j])) trues++; } } return trues; } } public static void main(String... args) { Runner.main(Benchmark1.class, args); } }

UPDATE: I've made a new algorithm that is faster in some scenarios, slower in others, I've gotten different benchmarks based on different inputs. If we calculate modulo 0xFFFFFF = 3 x 3 x 5 x 7 x 13 x 17 x 241 , we can eliminate 97.82% of numbers that cannot be squares. This can be (sort of) done in one line, with 5 bitwise operations:

if (!goodLookupSquares[(int) ((n & 0xFFFFFFl) + ((n >> 24) & 0xFFFFFFl) + (n >> 48))]) return false;

The resulting index is either 1) the residue, 2) the residue + 0xFFFFFF , or 3) the residue + 0x1FFFFFE . Of course, we need to have a lookup table for residues modulo 0xFFFFFF , which is about a 3mb file (in this case stored as ascii text decimal numbers, not optimal but clearly improvable with a ByteBuffer and so forth. But since that is precalculation it doesn't matter so much. You can find the file here (or generate it yourself):

public final static boolean isPerfectSquareDurronThree(long n) { if(n < 0) return false; if(n == 0) return true; long x = n; while((x & 0x3) == 0) x >>= 2; if((x & 0x7) == 1) { if (!goodLookupSquares[(int) ((n & 0xFFFFFFl) + ((n >> 24) & 0xFFFFFFl) + (n >> 48))]) return false; long sqrt; if(x < 410881L) { int i; float x2, y; x2 = x * 0.5F; y = x; i = Float.floatToRawIntBits(y); i = 0x5f3759df - ( i >> 1 ); y = Float.intBitsToFloat(i); y = y * ( 1.5F - ( x2 * y * y ) ); sqrt = (long)(1.0F/y); } else { sqrt = (long) Math.sqrt(x); } return sqrt*sqrt == x; } return false; }

I load it into a boolean array like this:

private static boolean[] goodLookupSquares = null; public static void initGoodLookupSquares() throws Exception { Scanner s = new Scanner(new File("24residues_squares.txt")); goodLookupSquares = new boolean[0x1FFFFFE]; while(s.hasNextLine()) { int residue = Integer.valueOf(s.nextLine()); goodLookupSquares[residue] = true; goodLookupSquares[residue + 0xFFFFFF] = true; goodLookupSquares[residue + 0x1FFFFFE] = true; } s.close(); }

Example runtime. It beat Durron (version one) in every trial I ran.