Just adding outline of how i'm doing this problem - a little hacky but is quickest way i've found that scales well.

Input table looks like:

{ "ip": "130.211.149.140", "ip_int": "2194904460", "ip_part1": "130", "ip_part2": "211", "ip_part3": "149", "ip_part4": "140", "num_requests": "6811" }

And lookup table is like:

{ "de_ip_key": "DE18_92.66.156.93_92.66.156.112", "ip_key": "92.66.156.93_92.66.156.112", "ip_from_int": "1547869277", "ip_to_int": "1547869296", "ip_from": "92.66.156.93", "ip_to": "92.66.156.112", "naics_code": "518210", "ip_from_part1": "92", "ip_from_part2": "66", "ip_from_part3": "156", "ip_from_part4": "93", "ip_to_part1": "92", "ip_to_part2": "66", "ip_to_part3": "156", "ip_to_part4": "112" }

So using part 1 and part 2 of the ip address to join as a way of reducing the search space (the from and to ranges in my lookup table don't tend to span as wide as to have different part 1's and 2's - if so this approach fails).

select ip, ip_int, -- pick first info from de first(ip_key) as ip_key, first(de_ip_key) as de_ip_key, first(naics_code) as naics_code from ( select ip as ip, ip_int as ip_int, ip_key as ip_key, de_ip_key as de_ip_key, naics_code as naics_code, from -- join based on part 1 and 2 of ip from range ( select input.ip as ip, input.ip_int as ip_int, if(input.ip_int between de.ip_from_int and de.ip_to_int,de.ip_key,null) as ip_key, if(input.ip_int between de.ip_from_int and de.ip_to_int,de.de_ip_key,null) as de_ip_key, if(input.ip_int between de.ip_from_int and de.ip_to_int,de.naics_code,null) as naics_code, from [ip.lookup_input_tbl] input left outer join each [digital_element.data_naics_code] de on input.ip_part1=de.ip_from_part1 and input.ip_part2=de.ip_from_part2 group by 1,2,3,4,5 ), -- join based on part 1 and 2 of ip to range ( select input.ip as ip, input.ip_int as ip_int, if(input.ip_int between de.ip_from_int and de.ip_to_int,de.ip_key,null) as ip_key, if(input.ip_int between de.ip_from_int and de.ip_to_int,de.de_ip_key,null) as de_ip_key, if(input.ip_int between de.ip_from_int and de.ip_to_int,de.naics_code,null) as naics_code, from [ip.lookup_input_tbl] input left outer join each [digital_element.data_naics_code] de on input.ip_part1=de.ip_to_part1 and input.ip_part2=de.ip_to_part2 group by 1,2,3,4,5 ), group by 1,2,3,4,5 -- order so null records from either join go to bottom and get left behind on the first group by order by ip_int,ip_key desc ) group by 1,2

So it basically blows out the data (by equality join on part 1 and part 2 of ip address and ip_from and ip_to addresses) and then reduces it on the group by using the if between statement (doing this instead of a where condition ensures you get a proper left outer join so you can also see which records you processed but that had no info in the lookup table).

Defo not the prettiest and probably one or two more ways to optimize it but is working for me now and looking up 500K input ip addresses against a lookup table of 16M records in 10-20 seconds.