Go Lock Performance: RwMutex vs Mutex in Various Scenarios

9 min read3 days ago

Leapcell: The Next-Gen Serverless Platform for Golang app Hosting

Research and Analysis on the Performance of Golang Locks

In the field of software development, testing the performance of Golang locks is a practical task. Recently, a friend raised a question: when performing thread-safe read and write operations on a slice, should one choose a read-write lock (rwlock) or a mutex lock (mutex), and which lock has better performance? This question has triggered an in-depth discussion.

I. Background and Purpose of the Lock Performance Test

In the scenario of multi-threaded programming, ensuring the thread safety of data is of great importance. For read and write operations on data structures such as slices, choosing an appropriate locking mechanism can significantly affect the performance of the program. The aim of this research is to provide a reference for developers to select a locking mechanism in practical applications by comparing the performance of read-write locks and mutex locks in different scenarios.

II. Performance Analysis of Different Locking Mechanisms in Different Scenarios

(I) Theoretical Discussion on the Performance Comparison between Read-Write Locks (Rwmutex) and Mutex Locks (Mutex)

In which scenarios read-write locks perform better than mutex locks is a question worthy of in-depth analysis. During the locking (lock) and unlocking (unlock) processes of locks, if there is no input-output (io) logic and complex calculation logic, theoretically, mutex locks may be more efficient than read-write locks. Currently, there are various design and implementation methods of read-write locks in the community, and most of them are achieved by abstracting two locks and a reader counter.

(II) Reference for the Performance Comparison of Locks in the C++ Environment

Previously, the performance comparison between mutex locks (lock) and read-write locks (rwlock) in the C++ environment has been conducted. In the scenario of simple assignment logic, the benchmark test results are consistent with expectations, that is, the performance of mutex locks is better than that of read-write locks. When the intermediate logic is an empty io read-write operation, the performance of read-write locks is higher than that of mutex locks, which is also in line with common knowledge. When the intermediate logic is a map lookup, read-write locks also show higher performance than mutex locks. This is because the map is a complex data structure. When looking up a key, it is necessary to calculate the hash code, find the corresponding bucket in the array through the hash code, and then find the relevant key from the linked list. The specific performance data are as follows:

Simple Assignment:
raw_lock takes 1.732199s;
raw_rwlock takes 3.420338s
io Operation:
simple_lock takes 13.858138s;
simple_rwlock takes 8.94691s
map:
lock takes 2.729701s;
rwlock takes 0.300296s

(III) Performance Test of sync.rwmutex and sync.mutex in the Golang Environment

In order to deeply explore the performance of read-write locks and mutex locks in the Golang environment, the following tests were carried out. The test code is as follows:

import (
    "fmt"
    "sync"
    "time"
)var (
    num  = 1000 * 10
    gnum = 1000
)func main() {
    fmt.Println("only read")
    testRwmutexReadOnly()
    testMutexReadOnly()    fmt.Println("write and read")
    testRwmutexWriteRead()
    testMutexWriteRead()    fmt.Println("write only")
    testRwmutexWriteOnly()
    testMutexWriteOnly()
}func testRwmutexReadOnly() {
    var w = &sync.WaitGroup{}
    var rwmutexTmp = newRwmutex()
    w.Add(gnum)
    t1 := time.Now()
    for i := 0; i < gnum; i++ {
        go func() {
            defer w.Done()
            for in := 0; in < num; in++ {
                rwmutexTmp.get(in)
            }
        }()
    }
    w.Wait()
    fmt.Println("testRwmutexReadOnly cost:", time.Now().Sub(t1).String())
}func testRwmutexWriteOnly() {
    var w = &sync.WaitGroup{}
    var rwmutexTmp = newRwmutex()
    w.Add(gnum)
    t1 := time.Now()
    for i := 0; i < gnum; i++ {
        go func() {
            defer w.Done()
            for in := 0; in < num; in++ {
                rwmutexTmp.set(in, in)
            }
        }()
    }
    w.Wait()
    fmt.Println("testRwmutexWriteOnly cost:", time.Now().Sub(t1).String())
}func testRwmutexWriteRead() {
    var w = &sync.WaitGroup{}
    var rwmutexTmp = newRwmutex()
    w.Add(gnum)
    t1 := time.Now()
    for i := 0; i < gnum; i++ {
        if i%2 == 0 {
            go func() {
                defer w.Done()
                for in := 0; in < num; in++ {
                    rwmutexTmp.get(in)
                }
            }()
        } else {
            go func() {
                defer w.Done()
                for in := 0; in < num; in++ {
                    rwmutexTmp.set(in, in)
                }
            }()
        }
    }
    w.Wait()
    fmt.Println("testRwmutexWriteRead cost:", time.Now().Sub(t1).String())
}func testMutexReadOnly() {
    var w = &sync.WaitGroup{}
    var mutexTmp = newMutex()
    w.Add(gnum)    t1 := time.Now()
    for i := 0; i < gnum; i++ {
        go func() {
            defer w.Done()
            for in := 0; in < num; in++ {
                mutexTmp.get(in)
            }
        }()
    }
    w.Wait()
    fmt.Println("testMutexReadOnly cost:", time.Now().Sub(t1).String())
}func testMutexWriteOnly() {
    var w = &sync.WaitGroup{}
    var mutexTmp = newMutex()
    w.Add(gnum)    t1 := time.Now()
    for i := 0; i < gnum; i++ {
        go func() {
            defer w.Done()
            for in := 0; in < num; in++ {
                mutexTmp.set(in, in)
            }
        }()
    }
    w.Wait()
    fmt.Println("testMutexWriteOnly cost:", time.Now().Sub(t1).String())
}func testMutexWriteRead() {
    var w = &sync.WaitGroup{}
    var mutexTmp = newMutex()
    w.Add(gnum)
    t1 := time.Now()
    for i := 0; i < gnum; i++ {
        if i%2 == 0 {
            go func() {
                defer w.Done()
                for in := 0; in < num; in++ {
                    mutexTmp.get(in)
                }
            }()
        } else {
            go func() {
                defer w.Done()
                for in := 0; in < num; in++ {
                    mutexTmp.set(in, in)
                }
            }()
        }    }
    w.Wait()
    fmt.Println("testMutexWriteRead cost:", time.Now().Sub(t1).String())
}func newRwmutex() *rwmutex {
    var t = &rwmutex{}
    t.mu = &sync.RWMutex{}
    t.ipmap = make(map[int]int, 100)    for i := 0; i < 100; i++ {
        t.ipmap[i] = 0
    }
    return t
}type rwmutex struct {
    mu    *sync.RWMutex
    ipmap map[int]int
}func (t *rwmutex) get(i int) int {
    t.mu.RLock()
    defer t.mu.RUnlock()    return t.ipmap[i]
}func (t *rwmutex) set(k, v int) {
    t.mu.Lock()
    defer t.mu.Unlock()    k = k % 100
    t.ipmap[k] = v
}func newMutex() *mutex {
    var t = &mutex{}
    t.mu = &sync.Mutex{}
    t.ipmap = make(map[int]int, 100)    for i := 0; i < 100; i++ {
        t.ipmap[i] = 0
    }
    return t
}type mutex struct {
    mu    *sync.Mutex
    ipmap map[int]int
}func (t *mutex) get(i int) int {
    t.mu.Lock()
    defer t.mu.Unlock()    return t.ipmap[i]
}func (t *mutex) set(k, v int) {
    t.mu.Lock()
    defer t.mu.Unlock()    k = k % 100
    t.ipmap[k] = v
}

The test results are as follows:
In the scenarios where mutex and rwmutex are used in multiple goroutines, the three test scenarios of read-only, write-only, and read-write are tested respectively. The results show that it seems that only in the write-only scenario, the performance of mutex is slightly higher than that of rwmutex.

only read:

testRwmutexReadOnly cost: 455.566965ms
testMutexReadOnly cost: 2.13687988s

write and read:

testRwmutexWriteRead cost: 1.79215194s
testMutexWriteRead cost: 2.62997403s

write only:

testRwmutexWriteOnly cost: 2.6378979159s
testMutexWriteOnly cost: 2.39077869s

Further, when replacing the read-write logic of the map with the global increment and decrement of the counter, the test results are similar to the above situation, that is, in the write-only scenario, the performance of mutex is slightly higher than that of rwlock.

only read:

testRwmutexReadOnly cost: 10.483448ms
testMutexReadOnly cost: 10.808006ms

write and read:

testRwmutexWriteRead cost: 12.405655ms
testMutexWriteRead cost: 14.571228ms

write only:

testRwmutexWriteOnly cost: 13.453028ms
testMutexWriteOnly cost: 13.782282ms

III. Source Code Analysis of sync.RwMutex in Golang

The structure of sync.RwMutex in Golang includes a read lock, a write lock, and a reader counter. The biggest difference from the common implementation methods in the community is that it uses atomic instructions (atomic) for operations on the reader counter. The specific structure definition is as follows:

type RWMutex struct {
    w           Mutex  // held if there are pending writers
    writerSem   uint32 // semaphore for writers to wait for completing readers
    readerSem   uint32 // semaphore for readers to wait for completing writers
    readerCount int32  // number of pending readers
    readerWait  int32  // number of departing readers
}

(I) The Process of Acquiring the Read Lock

The acquisition of the read lock directly uses atomic for subtraction operations. When the readerCount is less than 0, it indicates that there is a write operation waiting, and at this time, it is necessary to wait for the read lock. The code implementation is as follows:

func (rw *RWMutex) RLock() {
    if race.Enabled {
        _ = rw.w.state
        race.Disable()
    }
    if atomic.AddInt32(&rw.readerCount, 1) < 0 {
        // A writer is pending, wait for it.
        runtime_Semacquire(&rw.readerSem)
    }
    if race.Enabled {
        race.Enable()
        race.Acquire(unsafe.Pointer(&rw.readerSem))
    }
}

(II) The Process of Releasing the Read Lock

The release of the read lock also uses atomic to operate on the count. When there are no readers, the write lock is released. The relevant code is as follows:

func (rw *RWMutex) RUnlock() {
    if race.Enabled {
        _ = rw.w.state
        race.ReleaseMerge(unsafe.Pointer(&rw.writerSem))
        race.Disable()
    }
    if r := atomic.AddInt32(&rw.readerCount, -1); r < 0 {
        if r+1 == 0 || r+1 == -rwmutexMaxReaders {
            race.Enable()
            throw("sync: RUnlock of unlocked RWMutex")
        }
        // A writer is pending.
        if atomic.AddInt32(&rw.readerWait, -1) == 0 {
            // The last reader unblocks the writer.
            runtime_Semrelease(&rw.writerSem, false)
        }
    }
    if race.Enabled {
        race.Enable()
    }
}

(III) The Process of Acquiring and Releasing the Write Lock

In the process of acquiring the write lock, first, it is judged whether there is a read operation. If there is a read operation, it will wait to be awakened after the read operation is completed. When releasing the write lock, the read lock will be released at the same time, and then the goroutines waiting for the read lock will be awakened. The relevant code is as follows:

func (rw *RWMutex) Lock() {
    if race.Enabled {
        _ = rw.w.state
        race.Disable()
    }
    // First, resolve competition with other writers.
    rw.w.Lock()
    // Announce to readers there is a pending writer.
    r := atomic.AddInt32(&rw.readerCount, -rwmutexMaxReaders) + rwmutexMaxReaders
    // Wait for active readers.
    if r != 0 && atomic.AddInt32(&rw.readerWait, r) != 0 {
        runtime_Semacquire(&rw.writerSem)
    }
    if race.Enabled {
        race.Enable()
        race.Acquire(unsafe.Pointer(&rw.readerSem))
        race.Acquire(unsafe.Pointer(&rw.writerSem))
    }
}

func (rw *RWMutex) Unlock() {
    if race.Enabled {
        _ = rw.w.state
        race.Release(unsafe.Pointer(&rw.readerSem))
        race.Release(unsafe.Pointer(&rw.writerSem))
        race.Disable()
    }    // Announce to readers there is no active writer.
    r := atomic.AddInt32(&rw.readerCount, rwmutexMaxReaders)
    if r >= rwmutexMaxReaders {
        race.Enable()
        throw("sync: Unlock of unlocked RWMutex")
    }
    // Unblock blocked readers, if any.
    for i := 0; i < int(r); i++ {
        runtime_Semrelease(&rw.readerSem, false)
    }
    // Allow other writers to proceed.
    rw.w.Unlock()
    if race.Enabled {
        race.Enable()
    }
}

IV. Summary and Suggestions

The problem of lock contention has always been one of the key challenges faced by high-concurrency systems. For the above scenario where the map is used in combination with the mutex, in Go versions 1.9 and later, it is possible to consider using sync.Map as a replacement. In scenarios where read operations are frequent and write operations are few, the performance of sync.Map has significant advantages over the combination of sync.RwMutex and map.

After in-depth research on the implementation principle of sync.Map, it can be found that its write operation performance is relatively low. Although read operations can achieve lock-free reading through the copy on write method, write operations still involve the locking mechanism. To alleviate the pressure of lock contention, the segmented lock method similar to Java’s ConcurrentMap can be used for reference.

In addition to segmented locks, atomic compare and swap (atomic cas) instructions can also be used to implement optimistic locks, effectively solving the problem of lock contention and improving the performance of the system in high-concurrency scenarios.