Shared Memory Access

About the Author: Lammert Bies is a dad, husband and polyglot. He is developing embedded systems since the eighties. Used machine learning before it had a name. Specializes in interconnecting computers, robots and humans. Was a Google Mapmaker Advocate and speaker on several international Google conferences from 2011 until the plug was pulled on Mapmaker in 2017. Bughunter with Google. Currently spreading artificial intelligence to the wildest locations in production environments. He never stops learning.

In other pages I have already explained that running multiple parallel goroutines is one of the core features of the Go language. While starting multiple goroutines is extremely easy, it becomes difficult when these routines have to use the same resources. With a single threaded procedural language like C, the programmer has full control over what happens when. The sync package in Go contains a lot of tools which can help to control the way Go routines use shared resources. One way to do it is using a mutex.

Mutex, for Exclusive Access

Practically said, a mutex is key which can open a door to a resource. There is only one key for a door, and the goroutine that owns the key at a given moment is the only goroutine with access rights to that resource. This sounds great, but comes with a drawback. During the period a mutex is used by a goroutines, other goroutines that want to access the same resource have their execution postponed until the mutex is freed. A programmer must therefore be very careful to release a mutex after it has been used, and to used it for the smallest duration possible.

A mutex in the sync package has two methods, sync.Lock() and sync.Unlock(). The sync.Lock() method reserves the mutex for the calling routine. If no other routines has an outstanding call to the sync.Lock() method, the call ends immediately and all code executed after the sync.Lock() method is guaranteed to not interfere with other goroutines. Other goroutines calling the sync.Lock() method will be blocked inside the sync.Lock() call until the running goroutine gives up its access rights.

Problems Arising from Mutex Use

A goroutine no longer needing exclusive access to a resource can release the mutex with the method sync.Unlock(). After this release the resource should no longer be accessed by the goroutine, because exclusive access is no longer guaranteed. A call to sync.Unlock() will cause another goroutine blocked inside a sync.Lock() call to be released. Which goroutine will gain control is not well defined and the programmer should not make assumptions about the sequence in which sync.Lock() requests are granted.

Example Program: a Marathon Race

To show how this works, I will use the running race example used earlier, but with a twist. Instead of a running race with a well defined number of participants, the example program will simulate a marathon run. If you have ever watched a marathon, or maybe even been part of one, you will know that there can be many thousands participants. Starting in a time frame of a few seconds to several minutes.

In our example program, every marathon runner is represented by a goroutine. The starting period is 3 seconds in which the main loop will fire-up as much as goroutines as possible. The main routine increments a counter each time a new goroutine is started. Each goroutine runs for at least five seconds and increments two counters to signal it has passed the finish line.

The main program waits until all goroutines are finished and prints the number of started and finished marathon runners on the screen. Those number would be equal you would think. Unfortunately that is not the case. The variables to count the number of finished runners is stored in memory and each goroutine receives a pointer to the location where the values are stored. One variable can be locked with a mutex, the other variable uses free access. When the program finishes, the mutex-protected counter contains the same value as the counter of started goroutines. The value in the unprotected counter is less.

package main
import (
        "fmt"
        "sync"
        "time"
)

type protected_counter_tp struct {
        mux     sync.Mutex
        val     int
}

func marathon( tr1 *int, tr2 *protected_counter_tp, wg *sync.WaitGroup ) {

        defer wg.Done()

        time.Sleep( 5*time.Second )

        *tr1++;
        tr2.mux.Lock()
        tr2.val++;
        tr2.mux.Unlock()

}  /* marathon */

func main() {
        var wg sync.WaitGroup
        var start_time time.Time
        var curr_time time.Time
        var elapsed time.Duration
        var started_runners int
        var total_runners1 int
        var total_runners2 protected_counter_tp

        started_runners    = 0
        total_runners1     = 0
        total_runners2.val = 0
        start_time         = time.Now()

        for {
                wg.Add( 1 )
                go marathon( & total_runners1, & total_runners2, & wg )
                started_runners++

                curr_time = time.Now()
                elapsed   = curr_time.Sub(start_time)
                if elapsed >= 3*time.Second { break }
        }

        fmt.Printf( "%d runners started\n", started_runners )

        wg.Wait()

        fmt.Printf( "All %d / %d runners have finished\n", total_runners1, total_runners2.val )

}  /* main */

The reason is, that the increment of a variable accessed through a pointer is not an atomic operation in Go. In C this might be the case when the code compiles to a single machine language opcode. But in most Go implementations the increment of a variable through a pointer is translated to intermediate code which requires more machine code statements to execute. Between fetching the previous value, incrementing it and storing the new value, a goroutine may be interrupted by another routines that tries to do the same. One of the two increment actions will then be lost in the process. Spawning a lot of parallel processes increases the chance that some goroutines have simultaneous access causing the counter value to be wrong. On my 64 Bit Centos 8 system, the call to this Go program resulted in the following output:

$ go run marathon.go
1331440 runners started
All 1314521 / 1331440 runners have finished

The second counter–the one protected with a mutex–contains the right value. In the first counter a total of 16919 counter increments are missing. That means that one out of every 79 increments was interrupted by another goroutine causing the outcome of the increment to be overwritten. In our example with more than a million concurrent goroutines, it is obvious that this type of errors would occur. But with programs where just a few goroutines access shared resources, problems may not surface as easily as in this example. Concurrent access and data corruption may occur sporadically, but not often enough to pinpoint the problem. These issues are extremely hard to debug, especially because debugging parallel execution is extremely difficult because debugging itself alters the timing of the goroutines. Therefore, shared memory access should be limited in Go programs. When necessary use always the proper methods to guarantee unique access.

Memory Usage of Parallel Goroutines

You may have noticed that on my development system, a total of 1.3 million parallel goroutines have been spawned. Despite that, the program didn’t crash and executed in a timely matter. Now try this in C with threads. According to the pthread_create() manual page, each thread created on an x86 32 or 64 bit uses a stack size of 2 MB. Multiply this by 1331440, and we get a staggering memory footprint of 2.5 TB. Yes, when programmed in C, with the default options for multi-threading, this simple program would need 2600 GB memory to execute smoothly. Even if most of it was marked as unused virtual memory by the memory manager, this would have put an enormous strain on the system due to the overhead of managing more than a million parallel threads. Why is it that Go can handle this so elegantly?

The answer can be found in the way goroutines are implemented in the language core. At startup, each goroutine only uses a stack of a few kilobytes, which is enough for many practical function implementations. Only when during runtime additional memory is needed, the Go core will assign on-the-fly additional stack memory. I checked with htop the running process, and it peeked at 4028 MB virtual memory address space and 3462 MB physical memory in use. That is 4 GB virtual memory allocation vs 2600 GB in a C implementation. Goroutines are not started as a single thread each, but many goroutines can share one thread where additional threads are only spawned when a blocking goroutine prevents further execution of a thread. That saves an enormous amount of thread management on the OS level which would otherwise have been necessary. On my Centos 8 system with 48 GB, the thread limit in the kernel is set to 384040 as the information in /proc/sys/kernel/threads-max shows, so starting more than a million threads would not even have been possible without changing the kernel parameters.

The concurrent running of multiple goroutines in threads all happens in the background without the programmer or end user having any idea what is going on. Go is the first language I have encountered where you can start a million parallel processes in literally a heartbeat without crippling the system it is running on. While being simple and intuitive, the language has tremendous multi-tasking power. Imagine how you could use this functionality to run a network server which allows many thousands concurrent connections with just a few lines of code? Ah wait, that will be one of the next examples.

The secret to creativity is knowing how to hide your sources.

ALBERT EINSTEIN