Leapcell: The Next-Gen Serverless Platform for Golang app Hosting
Research and Analysis on the Performance of Golang Locks
In the field of software development, testing the performance of Golang locks is a practical task. Recently, a friend raised a question: when performing thread-safe read and write operations on a slice, should one choose a read-write lock (rwlock) or a mutex lock (mutex), and which lock has better performance? This question has triggered an in-depth discussion.
I. Background and Purpose of the Lock Performance Test
In the scenario of multi-threaded programming, ensuring the thread safety of data is of great importance. For read and write operations on data structures such as slices, choosing an appropriate locking mechanism can significantly affect the performance of the program. The aim of this research is to provide a reference for developers to select a locking mechanism in practical applications by comparing the performance of read-write locks and mutex locks in different scenarios.
II. Performance Analysis of Different Locking Mechanisms in Different Scenarios
(I) Theoretical Discussion on the Performance Comparison between Read-Write Locks (Rwmutex) and Mutex Locks (Mutex)
In which scenarios read-write locks perform better than mutex locks is a question worthy of in-depth analysis. During the locking (lock) and unlocking (unlock) processes of locks, if there is no input-output (io) logic and complex calculation logic, theoretically, mutex locks may be more efficient than read-write locks. Currently, there are various design and implementation methods of read-write locks in the community, and most of them are achieved by abstracting two locks and a reader counter.
(II) Reference for the Performance Comparison of Locks in the C++ Environment
Previously, the performance comparison between mutex locks (lock) and read-write locks (rwlock) in the C++ environment has been conducted. In the scenario of simple assignment logic, the benchmark test results are consistent with expectations, that is, the performance of mutex locks is better than that of read-write locks. When the intermediate logic is an empty io read-write operation, the performance of read-write locks is higher than that of mutex locks, which is also in line with common knowledge. When the intermediate logic is a map lookup, read-write locks also show higher performance than mutex locks. This is because the map is a complex data structure. When looking up a key, it is necessary to calculate the hash code, find the corresponding bucket in the array through the hash code, and then find the relevant key from the linked list. The specific performance data are as follows:
- Simple Assignment:
- raw_lock takes 1.732199s;
- raw_rwlock takes 3.420338s
- io Operation:
- simple_lock takes 13.858138s;
- simple_rwlock takes 8.94691s
- map:
- lock takes 2.729701s;
- rwlock takes 0.300296s
(III) Performance Test of sync.rwmutex and sync.mutex in the Golang Environment
In order to deeply explore the performance of read-write locks and mutex locks in the Golang environment, the following tests were carried out. The test code is as follows:
import (
"fmt"
"sync"
"time"
)var (
num = 1000 * 10
gnum = 1000
)func main() {
fmt.Println("only read")
testRwmutexReadOnly()
testMutexReadOnly() fmt.Println("write and read")
testRwmutexWriteRead()
testMutexWriteRead() fmt.Println("write only")
testRwmutexWriteOnly()
testMutexWriteOnly()
}func testRwmutexReadOnly() {
var w = &sync.WaitGroup{}
var rwmutexTmp = newRwmutex()
w.Add(gnum)
t1 := time.Now()
for i := 0; i < gnum; i++ {
go func() {
defer w.Done()
for in := 0; in < num; in++ {
rwmutexTmp.get(in)
}
}()
}
w.Wait()
fmt.Println("testRwmutexReadOnly cost:", time.Now().Sub(t1).String())
}func testRwmutexWriteOnly() {
var w = &sync.WaitGroup{}
var rwmutexTmp = newRwmutex()
w.Add(gnum)
t1 := time.Now()
for i := 0; i < gnum; i++ {
go func() {
defer w.Done()
for in := 0; in < num; in++ {
rwmutexTmp.set(in, in)
}
}()
}
w.Wait()
fmt.Println("testRwmutexWriteOnly cost:", time.Now().Sub(t1).String())
}func testRwmutexWriteRead() {
var w = &sync.WaitGroup{}
var rwmutexTmp = newRwmutex()
w.Add(gnum)
t1 := time.Now()
for i := 0; i < gnum; i++ {
if i%2 == 0 {
go func() {
defer w.Done()
for in := 0; in < num; in++ {
rwmutexTmp.get(in)
}
}()
} else {
go func() {
defer w.Done()
for in := 0; in < num; in++ {
rwmutexTmp.set(in, in)
}
}()
}
}
w.Wait()
fmt.Println("testRwmutexWriteRead cost:", time.Now().Sub(t1).String())
}func testMutexReadOnly() {
var w = &sync.WaitGroup{}
var mutexTmp = newMutex()
w.Add(gnum) t1 := time.Now()
for i := 0; i < gnum; i++ {
go func() {
defer w.Done()
for in := 0; in < num; in++ {
mutexTmp.get(in)
}
}()
}
w.Wait()
fmt.Println("testMutexReadOnly cost:", time.Now().Sub(t1).String())
}func testMutexWriteOnly() {
var w = &sync.WaitGroup{}
var mutexTmp = newMutex()
w.Add(gnum) t1 := time.Now()
for i := 0; i < gnum; i++ {
go func() {
defer w.Done()
for in := 0; in < num; in++ {
mutexTmp.set(in, in)
}
}()
}
w.Wait()
fmt.Println("testMutexWriteOnly cost:", time.Now().Sub(t1).String())
}func testMutexWriteRead() {
var w = &sync.WaitGroup{}
var mutexTmp = newMutex()
w.Add(gnum)
t1 := time.Now()
for i := 0; i < gnum; i++ {
if i%2 == 0 {
go func() {
defer w.Done()
for in := 0; in < num; in++ {
mutexTmp.get(in)
}
}()
} else {
go func() {
defer w.Done()
for in := 0; in < num; in++ {
mutexTmp.set(in, in)
}
}()
} }
w.Wait()
fmt.Println("testMutexWriteRead cost:", time.Now().Sub(t1).String())
}func newRwmutex() *rwmutex {
var t = &rwmutex{}
t.mu = &sync.RWMutex{}
t.ipmap = make(map[int]int, 100) for i := 0; i < 100; i++ {
t.ipmap[i] = 0
}
return t
}type rwmutex struct {
mu *sync.RWMutex
ipmap map[int]int
}func (t *rwmutex) get(i int) int {
t.mu.RLock()
defer t.mu.RUnlock() return t.ipmap[i]
}func (t *rwmutex) set(k, v int) {
t.mu.Lock()
defer t.mu.Unlock() k = k % 100
t.ipmap[k] = v
}func newMutex() *mutex {
var t = &mutex{}
t.mu = &sync.Mutex{}
t.ipmap = make(map[int]int, 100) for i := 0; i < 100; i++ {
t.ipmap[i] = 0
}
return t
}type mutex struct {
mu *sync.Mutex
ipmap map[int]int
}func (t *mutex) get(i int) int {
t.mu.Lock()
defer t.mu.Unlock() return t.ipmap[i]
}func (t *mutex) set(k, v int) {
t.mu.Lock()
defer t.mu.Unlock() k = k % 100
t.ipmap[k] = v
}
The test results are as follows:
In the scenarios where mutex and rwmutex are used in multiple goroutines, the three test scenarios of read-only, write-only, and read-write are tested respectively. The results show that it seems that only in the write-only scenario, the performance of mutex is slightly higher than that of rwmutex.
only read:
- testRwmutexReadOnly cost: 455.566965ms
- testMutexReadOnly cost: 2.13687988s
write and read:
- testRwmutexWriteRead cost: 1.79215194s
- testMutexWriteRead cost: 2.62997403s
write only:
- testRwmutexWriteOnly cost: 2.6378979159s
- testMutexWriteOnly cost: 2.39077869s
Further, when replacing the read-write logic of the map with the global increment and decrement of the counter, the test results are similar to the above situation, that is, in the write-only scenario, the performance of mutex is slightly higher than that of rwlock.
only read:
- testRwmutexReadOnly cost: 10.483448ms
- testMutexReadOnly cost: 10.808006ms
write and read:
- testRwmutexWriteRead cost: 12.405655ms
- testMutexWriteRead cost: 14.571228ms
write only:
- testRwmutexWriteOnly cost: 13.453028ms
- testMutexWriteOnly cost: 13.782282ms
III. Source Code Analysis of sync.RwMutex in Golang
The structure of sync.RwMutex in Golang includes a read lock, a write lock, and a reader counter. The biggest difference from the common implementation methods in the community is that it uses atomic instructions (atomic) for operations on the reader counter. The specific structure definition is as follows:
type RWMutex struct {
w Mutex // held if there are pending writers
writerSem uint32 // semaphore for writers to wait for completing readers
readerSem uint32 // semaphore for readers to wait for completing writers
readerCount int32 // number of pending readers
readerWait int32 // number of departing readers
}
(I) The Process of Acquiring the Read Lock
The acquisition of the read lock directly uses atomic for subtraction operations. When the readerCount is less than 0, it indicates that there is a write operation waiting, and at this time, it is necessary to wait for the read lock. The code implementation is as follows:
func (rw *RWMutex) RLock() {
if race.Enabled {
_ = rw.w.state
race.Disable()
}
if atomic.AddInt32(&rw.readerCount, 1) < 0 {
// A writer is pending, wait for it.
runtime_Semacquire(&rw.readerSem)
}
if race.Enabled {
race.Enable()
race.Acquire(unsafe.Pointer(&rw.readerSem))
}
}
(II) The Process of Releasing the Read Lock
The release of the read lock also uses atomic to operate on the count. When there are no readers, the write lock is released. The relevant code is as follows:
func (rw *RWMutex) RUnlock() {
if race.Enabled {
_ = rw.w.state
race.ReleaseMerge(unsafe.Pointer(&rw.writerSem))
race.Disable()
}
if r := atomic.AddInt32(&rw.readerCount, -1); r < 0 {
if r+1 == 0 || r+1 == -rwmutexMaxReaders {
race.Enable()
throw("sync: RUnlock of unlocked RWMutex")
}
// A writer is pending.
if atomic.AddInt32(&rw.readerWait, -1) == 0 {
// The last reader unblocks the writer.
runtime_Semrelease(&rw.writerSem, false)
}
}
if race.Enabled {
race.Enable()
}
}
(III) The Process of Acquiring and Releasing the Write Lock
In the process of acquiring the write lock, first, it is judged whether there is a read operation. If there is a read operation, it will wait to be awakened after the read operation is completed. When releasing the write lock, the read lock will be released at the same time, and then the goroutines waiting for the read lock will be awakened. The relevant code is as follows:
func (rw *RWMutex) Lock() {
if race.Enabled {
_ = rw.w.state
race.Disable()
}
// First, resolve competition with other writers.
rw.w.Lock()
// Announce to readers there is a pending writer.
r := atomic.AddInt32(&rw.readerCount, -rwmutexMaxReaders) + rwmutexMaxReaders
// Wait for active readers.
if r != 0 && atomic.AddInt32(&rw.readerWait, r) != 0 {
runtime_Semacquire(&rw.writerSem)
}
if race.Enabled {
race.Enable()
race.Acquire(unsafe.Pointer(&rw.readerSem))
race.Acquire(unsafe.Pointer(&rw.writerSem))
}
}
func (rw *RWMutex) Unlock() {
if race.Enabled {
_ = rw.w.state
race.Release(unsafe.Pointer(&rw.readerSem))
race.Release(unsafe.Pointer(&rw.writerSem))
race.Disable()
} // Announce to readers there is no active writer.
r := atomic.AddInt32(&rw.readerCount, rwmutexMaxReaders)
if r >= rwmutexMaxReaders {
race.Enable()
throw("sync: Unlock of unlocked RWMutex")
}
// Unblock blocked readers, if any.
for i := 0; i < int(r); i++ {
runtime_Semrelease(&rw.readerSem, false)
}
// Allow other writers to proceed.
rw.w.Unlock()
if race.Enabled {
race.Enable()
}
}
IV. Summary and Suggestions
The problem of lock contention has always been one of the key challenges faced by high-concurrency systems. For the above scenario where the map is used in combination with the mutex, in Go versions 1.9 and later, it is possible to consider using sync.Map as a replacement. In scenarios where read operations are frequent and write operations are few, the performance of sync.Map has significant advantages over the combination of sync.RwMutex and map.
After in-depth research on the implementation principle of sync.Map, it can be found that its write operation performance is relatively low. Although read operations can achieve lock-free reading through the copy on write method, write operations still involve the locking mechanism. To alleviate the pressure of lock contention, the segmented lock method similar to Java’s ConcurrentMap can be used for reference.
In addition to segmented locks, atomic compare and swap (atomic cas) instructions can also be used to implement optimistic locks, effectively solving the problem of lock contention and improving the performance of the system in high-concurrency scenarios.
Leapcell: The Next-Gen Serverless Platform for Golang app Hosting
Finally, I would like to recommend the most suitable platform for deploying Golang services: Leapcell
1. Multi-Language Support
- Develop with JavaScript, Python, Go, or Rust.
2. Deploy unlimited projects for free
- pay only for usage — no requests, no charges.
3. Unbeatable Cost Efficiency
- Pay-as-you-go with no idle charges.
- Example: $25 supports 6.94M requests at a 60ms average response time.
4. Streamlined Developer Experience
- Intuitive UI for effortless setup.
- Fully automated CI/CD pipelines and GitOps integration.
- Real-time metrics and logging for actionable insights.
5. Effortless Scalability and High Performance
- Auto-scaling to handle high concurrency with ease.
- Zero operational overhead — just focus on building.
Explore more in the documentation!
Leapcell Twitter: https://x.com/LeapcellHQ