Troubleshooting Mac OS X Timeout & Login Failure With Active Directory Credentials After Locking the System

Issue

Students (including myself) at Holberton School had reported that the provided iMacs would freeze and would not allow access after locking the system. The reports were inconsistent and showed no clear time or user patterns. Some students noted that if they left the machine and returned after 20–30 minutes it would work again. However, this was not the case for all students, and many would resort to hard-resetting the system.

Environment

We have 30+ iMac systems managed using Casper Suite. The systems are bound to Active Directory (AD) and students use their AD credentials to access those systems and other linked services. The iMacs are currently running OS X 10.11 (El Capitan).

Investigation

Our first attempt to reproduce the issue was to initialize a Casper policy update to see if a policy configuration was interfering with accessing the system. This did not reproduce the problem, and we had to wait for the next report. Daniel, another student member of the DevOps team, was able to reproduce the problem by locking his system for 25 minutes. I used SSH to access the system with a local admin account. Via the terminal I entered, “tail -f /var/log/system.log” to begin monitoring the system. Daniel attempted to login to the machine, and we recorded the following error messages.

loginwindow[1877]: in od_record_create(): failed: 13

loginwindow[1877]: in od_record_create_cstring(): failed: 13

The error messages appeared in conjunction with timeouts rather than immediate errors. This coordinated with the timeout experienced on the user-side. The symptoms indicated a network/connection issue between the iMacs and the Active Directory server.

Network Configurations

The next step of investigation led me to analyze the network configurations. First, I wanted to know more about the directory binding. I turned to System Preferences > Users & Groups > Login Options. I used the domain address to do some more testing in terminal with domain resolution. To test, I ran “nslookup <domain>” and it resolved successfully. Not having worked with the domain configuration for Holberton School, I was not certain if there was an alternate domain controller address. I logged into JAMF to analyze the binding policy. There did not seem to be any alternate address that I could use for testing.

When there is an issue with connecting to a server, the first step in the network process I investigate is domain resolution to an IP address. Our systems are configured with two DNS servers. The first address pointed to a DNS server hosted at Holberton School, and the second address was Google’s public DNS address. Our DNS server was used automatically with the previous nslookup for the domain. For testing the secondary DNS configuration, I ran the command, “nslookup <domain> 8.8.8.8” which forced the usage of Google’s public DNS. The domain could not be resolved.

This seemed odd as we had external services that relied on the Active Directory information including Google Apps and Casper Suite hosted on jamfcloud.net. I re-accessed Casper to review the connection to our AD. In System Settings > LDAP Servers I found that the AD connection was made using a static IP address rather than the domain name. This explained why external services could connect even though the domain could not be resolved using Google’s public DNS.

Problem

The public DNS address was added to the configuration as a failover if our locally-hosted DNS server went down. This worked well when students were connected and actively using the system. However, when students would lock the machine and leave it for 20–30 minutes, the system would go to sleep disconnecting from the network. On wake if the system used the public DNS server, login would timeout and ultimately fail as it could not connect to Active Directory. This resulted in the inconsistent occurrence without a clear pattern.

Confirming Through Tests

To test this issue, I modified my DNS configuration to point only to the public DNS. Using SSH with a local admin account, I tried to switch to my AD user account using ‘su <userid>’. The attempt timed out and failed. I modified the system to only use the Holberton DNS server and entered the same command, it immediately logged in.

Temporary Workaround

We have removed the public DNS configuration to only point to the local DNS server, and systems should be updated after reboot or reconnection. If the system is locked, the workaround is to disconnect the Ethernet cable and reconnect to allow the system to reconfigure the network settings. Upon reconnection, the DNS configuration resets and login may be attempted again. Full problem resolution including DNS failover is scheduled for discussion during the next DevOps meeting.