Update: This was logged as a bug to Apple and has been resolved in iOS 10 and macOS 10.12

See http://www.openradar.me/radar?id=4958891762778112 for details

Background

Apple Caching Server is pretty cool and it really makes a lot of sense in a large environment.

However, large environments often have a rather complex network topology which makes configuration and troubleshooting a little more difficult.

I just happen to work in a very large environment with a complex network topology.

We have many public WAN IP’s which our client devices and Apple caching servers use to get out to the internet – via authenticated proxies no less.

Apple has some pretty good, although a bit ambiguous in parts, documentation on configuring Apple Caching for complex networks here: http://help.apple.com/serverapp/mac/5.1/#/apd6015d9573

Essentially we have a network that looks a little bit like this:

Apple Caching server supports this network topology, however we need to provide our client devices access to a DNS TXT service record in their default search domain so the client device will know all of our WAN IP ranges.

So how does this caching server thing work on the client anyway?

There is a small binary/framework on the client device that does a ‘discovery’ of Apple caching servers approximately every hour – or if it has not yet populated a special file on disk, it will run immediately when requested by a content downloading service such as the App Store.

This special binary does this discovery by grabbing some info about the client device such as the LAN IP address and subnet range, and then it looks up our special DNS Text record ( _aaplcache._tcp. ) and sends all of this data to the Apple Locator service at: lcdn-locator.apple.com

Apple then matches the WAN IP ranges and the LAN IP ranges provided and sends back a config that the special process writes out to disk. This config file contains the URL of a caching server that it should use (if one has been registered)

This special file on disk is called diskcache.plist, if it has been able to successfully locate a caching server, you should see in this file a line like this:

"localAddressAndPort" => "10.10.10.10:49313"

Where 10.10.10.10:49313 is the IP address and port of the caching server the client should use.

Now this diskcache.plist file exists in a folder called com.apple.AssetCacheLocatorService inside /var/folders. The exact location is stored in the DARWIN_USER_CACHE_DIR variable. This can be revealed by running:

getconf DARWIN_USER_CACHE_DIR

Which should output a directory path like this:

/var/folders/yd/y87k7kk14494j_9c0y814r8c0000gp/C/

Then you can just use plutil -p to read the diskCache.plist

sudo plutil -p /var/folders/yd/y87k7kk14494j_9c0y814r8c0000gp/C/com.appleAssetCacheLocatorService/diskCache.plist

And it should give you some output like this

*Thanks to n8felton for the info about the /var/folders !

Now all of this is fine and no problem, it all works as expected.

Except when it doesn’t.

At some sites, we were seeing a failure of client devices to pull their content from their caching server. The client device would simply pull its content over the WAN.

After a lot of trial and error and wire-sharking (is that a thing?) we found the problem.

As I mentioned earlier we were having _some_ client devices not able to pull their content from the caching server. After investigation on the client we found that they were not populating their diskcache.plist with the information we need from the apple locator service.

How come?

Well in our environment, we utilise a RODC at each site. This AD RODC (Read only domain controller) also operates as a DNS server. It is also the primary DNS server that is provided to clients via DHCP.

We have a few “issues” with our RODCs from time to time and quite often we just shut them down and let the clients talk to our main DC’s and DNS servers over the WAN. However, when we shutdown the RODC’s we don’t remove them from the DHCP servers DNS option. So clients still receive a DHCP packet with a primary DNS server of our now turned off RODC DNS server, they also receive a secondary, and third DNS server that they will use.

As expected the clients seem quite happy with this, the clients are able to perform DNS lookups and browse the internet as expected even though their primary DNS server is non-responsive.

BUT it seems that the special little caching service discovery tool on the client devices does not fail over and use the secondary (or third) DNS server. It seems that this tool only does the DNS lookup for our TXT record against the primary DNS server.

So because this DNS TXT record lookup fails, the caching service discovery tool doesn’t get a list of WAN IP address ranges to send to the Apple locator URL and thus never gets a response back about which caching server it should use!

The fix.

Once we manually remove the non-responsive primary DNS server from the DHCP packet, so the client device now only gets our 2 functional DNS servers as the primary and secondary servers, the caching service discovery tool is able to lookup our DNS TXT record and receive the correct caching server URL from the Apple locator service and everything is right in the world again!