At a theoretical level, they are no different semantically. You can implement a mutex using semaphores or vice versa (see here for an example). In practice, the implementations are different and they offer slightly different services.

The practical difference (in terms of the system services surrounding them) is that the implementation of a mutex is aimed at being a more lightweight synchronisation mechanism. In oracle-speak, mutexes are known as latches and semaphores are known as waits.

At the lowest level, they use some sort of atomic test and set mechanism. This reads the current value of a memory location, computes some sort of conditional and writes out a value at that location in a single instruction that cannot be interrupted. This means that you can acquire a mutex and test to see if anyone else had it before you.

A typical mutex implementation has a process or thread executing the test-and-set instruction and evaluating whether anything else had set the mutex. A key point here is that there is no interaction with the scheduler, so we have no idea (and don't care) who has set the lock. Then we either give up our time slice and attempt it again when the task is re-scheduled or execute a spin-lock. A spin lock is an algorithm like:

Count down from 5000: i. Execute the test-and-set instruction ii. If the mutex is clear, we have acquired it in the previous instruction so we can exit the loop iii. When we get to zero, give up our time slice.

When we have finished executing our protected code (known as a critical section) we just set the mutex value to zero or whatever means 'clear.' If multiple tasks are attempting to acquire the mutex then the next task that happens to be scheduled after the mutex is released will get access to the resource. Typically you would use mutexes to control a synchronised resource where exclusive access is only needed for very short periods of time, normally to make an update to a shared data structure.

A semaphore is a synchronised data structure (typically using a mutex) that has a count and some system call wrappers that interact with the scheduler in a bit more depth than the mutex libraries would. Semaphores are incremented and decremented and used to block tasks until something else is ready. See Producer/Consumer Problem for a simple example of this. Semaphores are initialised to some value - a binary semaphore is just a special case where the semaphore is initialised to 1. Posting to a semaphore has the effect of waking up a waiting process.

A basic semaphore algorithm looks like:

(somewhere in the program startup) Initialise the semaphore to its start-up value. Acquiring a semaphore i. (synchronised) Attempt to decrement the semaphore value ii. If the value would be less than zero, put the task on the tail of the list of tasks waiting on the semaphore and give up the time slice. Posting a semaphore i. (synchronised) Increment the semaphore value ii. If the value is greater or equal to the amount requested in the post at the front of the queue, take that task off the queue and make it runnable. iii. Repeat (ii) for all tasks until the posted value is exhausted or there are no more tasks waiting.

In the case of a binary semaphore the main practical difference between the two is the nature of the system services surrounding the actual data structure.

EDIT: As evan has rightly pointed out, spinlocks will slow down a single processor machine. You would only use a spinlock on a multi-processor box because on a single processor the process holding the mutex will never reset it while another task is running. Spinlocks are only useful on multi-processor architectures.