A First Approach

The language used for this project is Golang. It does provide native synchronization mechanisms such as native channels and locks, and is able to spin light weight threads for concurrent processing.

gophers hacking together (credit: Ashley McNamara)

You can start first by designing a structure which represents our Session and Timeout Handlers.

type Session struct {

State SessionState

Id SessionId

RemoteIp string

} type TimeoutHandler struct {

callback func(Session)

session Session

duration int

timer *timer.Timer

}

Session identifies the connection session, with the session ID, neighboring link IP, and the current session state.

TimeoutHandler holds the callback function, the session for which it should run, the duration, and a pointer to the scheduled timer.

There is a global map that will store, per neighboring link session, the scheduled timeout handler.

SessionTimeout map[Session]*TimeoutHandler

Registering and cancelling a timeout is achieved by the following methods:

// schedules the timeout callback function.

func (timeout* TimeoutHandler) Register() {

timeout.timer = time.AfterFunc(time.Duration(timeout.duration) * time.Second, func() {

timeout.callback(timeout.session)

})

} func (timeout* TimeoutHandler) Cancel() {

if timeout.timer == nil {

return

}

timeout.timer.Stop()

}

For the timeouts creation and storage, you can use a method like the following:

func CreateTimeoutHandler(callback func(Session), session Session, duration int) *TimeoutHandler {

if sessionTimeout[session] == nil {

sessionTimeout[session] := new(TimeoutHandler)

}



timeout = sessionTimeout[session]

timeout.session = session

timeout.callback = callback

timeout.duration = duration

return timeout

}

Once the timeout handler is created and registered, it runs the callback after duration seconds have elapsed. However, some events will require you to reschedule a timeout handler (as it happens at SYN state — every 3 seconds).

For that, you can have the callback rescheduling a new timeout:

func synCallback(session Session) {

sendSynPacket(session) // reschedules the same callback.

newTimeout := NewTimeoutHandler(synCallback, session, SYN_TIMEOUT_DURATION)

newTimeout.Register() sessionTimeout[state] = newTimeout

}

This callback reschedules itself in a new timeout handler and update the global sessionTimeout map.

Data Race and References

Your solution is ready. One simple test is to check that a timeout callback is executed after the timer has expired. For that you register a timeout, sleep for its duration, and then check whether the callback actions were done. After the test is executed, it is a good idea to cancel the scheduled timeout (as it reschedules), so it won’t have side effects between tests.

Surprisingly, this simple test found a bug in the solution. Cancelling timeouts using the cancel method was just not doing its job. The following order of events would cause a data race condition:

You have one scheduled timeout handler. Thread 1:

a) You receive a control packet, and you want to cancel the registered timeout and move on to the next session state. (E.g. received a SYN-ACK after you sent a SYN).

b) You call timeout.Cancel(), which calls a timer.Stop(). (Note that a Golang timer stop doesn’t prevent an already expired timer from running.) Thread 2:

a) Right before that cancel call, the timer has expired, and the callback was about to execute.

b) The callback is executed, it schedules a new timeout and updates the global map. Thread 1:

a) Thread 1 transitions to a new session state and registers a new timeout, updating the global map.

Both threads were updating the timeout map concurrently. The end result is that you failed to cancel the registered timeout, and then you also lost the reference to the rescheduled timeout done by thread 2. This results in a handler that keeps executing and rescheduling for a while, doing unwanted behavior.

When Locking Is Not Enough

Using locks also doesn’t fix the issue completely. If you add locks before processing any event and before executing a callback, it still doesn’t prevent an expired callback to run:

func (timeout* TimeoutHandler) Register() {

timeout.timer = time.AfterFunc(time.Duration(timeout.duration) * time.Second, func() {

stateLock.Lock()

defer stateLock.Unlock() timeout.callback(timeout.session)

})

}

The difference now is that the updates in the global map are synchronized, but this doesn’t prevent the callback from running after you call the timeout.Cancel() — This is the case if the scheduled timer expired but didn’t grab the lock yet. You should again lose reference to one of the registered timeouts.

Using Cancellation Channels

Instead of relying on golang’s timer.Stop(), which doesn’t prevent an expired timer to execute, you can use cancellation channels.

It is a slightly different approach. Now you won’t do a recursive re-scheduling through callbacks; instead, you register an infinite loop that waits for cancellation signals or timeout events.

The new Register() spawns a new go thread that runs your callback after a timeout and schedules a new timeout after the previous one has been executed. A cancellation channel is returned to the caller to control when the loop should stop.

func (timeout *TimeoutHandler) Register() chan struct{} {

cancelChan := make(chan struct{})



go func () {

select {

case _ = <- cancelChan:

return

case _ = <- time.AfterFunc(time.Duration(timeout.duration) * time.Second):

func () {

stateLock.Lock()

defer stateLock.Unlock() timeout.callback(timeout.session)

} ()

}

} () return cancelChan

} func (timeout* TimeoutHandler) Cancel() {

if timeout.cancelChan == nil {

return

}

timeout.cancelChan <- struct{}{}

}

This approach gives you a cancellation channel for each timeout you register. A cancel call sends an empty struct to the channel and trigger the cancellation. However this doesn’t resolve the previous issue; the timeout can expire right before you call cancel over the channel, and before the lock is grabbed by the timeout thread.

The solution here is to check the cancellation channel inside the timeout scope after you grab the lock.

case _ = <- time.AfterFunc(time.Duration(timeout.duration) * time.Second):

func () {

stateLock.Lock()

defer stateLock.Unlock()



select {

case _ = <- handler.cancelChan:

return

default:

timeout.callback(timeout.session)

}

} ()

}

Finally this guarantees that the callback is only executed after you grab the lock and no cancellation was triggered.

Beware of Deadlocks

This solution seems to work; however there is one hidden pitfall here: deadlocks.

Please read again the code above and try to find it yourself. Think of concurrent calls to any of the methods described.

The last problem here is with the cancellation channel itself. We made it an unbuffered channel, which means that sending is a blocking call. Once you call cancel in a timeout handler, you only proceed once that handler is cancelled. The problem here is when you have multiple calls to the same cancellation channel, where a cancel request is only consumed once. And this can easily happen if concurrent events were to cancel the same timeout handler, like a link down or control packet event. This results in a deadlock situation, possibly bringing the application to a halt.

is anyone listening? (credit: Trevor Forrey)

The solution here is to at least make the channel buffered by one, so sends are not always blocking, and also explicitly make the send non-blocking in case of concurrent calls. This guarantees the cancellation is sent once and won’t block the subsequent cancel calls.