bigcache 高性能内存缓存库 Golang 手撕

2024-11-15

1. 前言

你好哇！我是小翔。之前写了三篇#Golang 并发编程的文章了，这次来换换口味，开个手撕源码的新坑！一同来扒一扒 Go 言语高性能 local cache 库 bigcache，看看能不能把开源大佬们的骚操作带到名目里去装一装（？）

2. 为什么要学习开源名目

团体以为学习开源名目的收益：

3. bigcache 简介

3.1 本地缓存与散布式缓存

缓存是系统优化并发才干、降低时延的利器，依据存储介质和经常使用场景，咱们普通又会经常使用本地缓存与散布式缓存两种手腕。本地缓存普通是在进程内的，最便捷的，用 go 的sync.Map就能成功一个便捷的并发安保的本地缓存了。经常出现的，将一些静态的、性能类的数据搁置在本地缓存中，能有效降低到下游存储的压力。散布式缓存普通会用 redis 或 memcached 等散布式内存数据库来成功，能做到散布式、有形态。这次先钻研下 bigcache 后续无时机再挖一挖这里。

3.2 bigcache 降生背景

bigcache 的开发者是 allegro，是波兰的一个电商网站，参考资料中给出了他们的技术博客的原文，文中具体形容了他们疑问的背景以及思索，值得钻研。他们的需求关键是：

开发团队经过了一番对比，选用了 go 言语（高并发度、带内存治理安保性比 C/C++ 好），放弃了散布式缓存组件（redis/memcached/couchbase），关键理由是多一跳网络开支。这里我示意疑心，P999 ms 的时延其实不至于担忧到 redis 网络那点期间，散布式环境下 local cache 不同机器间的数据不分歧带来的 cache miss 或许更蛋疼。最终开发团队选用了成功一个允许以下个性的内存缓存库：

4. 关键设计

4.1 并发与 sharding

设计上如何做到并发安保呢？最便捷的思绪就是给 map 上一把sync.RWMutex即读写锁。但是当缓存项过多时，并发恳求会形成锁抵触，因此须要降低锁粒度。bigcache 驳回了散布式系统里罕用的 sharding 思绪，行将一个大 map 拆分红 N 个小 map，咱们称为一个shard(分片)

如bigcache.go的申明，咱们初始化获取的BigCache，外围实践上是一个[]*cacheShard，缓存的写入、淘汰等内围逻辑都在cacheShard中了

type BigCache struct {shards[]*cacheShardlifeWindow uint64clockclockhashHasherconfigConfigshardMaskuint64closechan struct{}}

那么在写入一个 key value 缓存时，是如何做分片的呢？

func (c *BigCache) Set(key string, entry []byte) error {hashedKey := c.hash.Sum64(key)shard := c.getShard(hashedKey)return shard.set(key, hashedKey, entry)}

这里会首先启动一次性 hash 操作，将 string key hash 到一个uint64类型的 key。再依据这个数字 key 去做 sharding

func (c *BigCache) getShard(hashedKey uint64) (shard *cacheShard) {return c.shards[hashedKey&c.shardMask]}

这里把取余的操作用位运算来成功了，这也解释了为什么在经常使用 bigcache 的时刻须要经常使用 2 的幂来初始化 shard num 了

cache := &BigCache{shards:make([]*cacheShard, config.Shards),lifeWindow: uint64(config.LifeWindow.Seconds()),clock:clock,hash:config.Hasher,config:config,// config.Shards 必定是 2 的幂// 减一后获取一个二进制结果全为 1 的 maskshardMask:uint64(config.Shards - 1),close:make(chan struct{}),}

例如经常使用 1024 作为 shard num 时，mask 值为1024 - 1即二进制的 '111111111'，经常使用num & mask时，即可取得num % mask的成果

须要留意，这里的 hash 或许是会抵触的，只管概率极小，当出现 hash 抵触时，bigcache 将间接前往结果不存在：

func (s *cacheShard) get(key string, hashedKey uint64) ([]byte, error) {s.lock.RLock()wrappedEntry, err := s.getWrappedEntry(hashedKey)if err != nil {s.lock.RUnlock()return nil, err}// 这里会将二进制 buffer 按顺序解开// 在打包时将 key 打包的作用就表现进去了// 假设这次操作的 key 和打包时的 key 不相反// 则说明出现了抵触，不会失误地前往另一个 key 的缓存结果if entryKey := readKeyFromEntry(wrappedEntry); key != entryKey {s.lock.RUnlock()s.collision()if s.isVerbose {s.logger.Printf("Collision detected. Both %q and %q have the same hash %x", key, entryKey, hashedKey)}return nil, ErrEntryNotFound}entry := readEntry(wrappedEntry)s.lock.RUnlock()s.hit(hashedKey)return entry, nil}

4.2 cacheShard 与 bytes queue 设计

bigcache 对每个 shard 经常使用了一个相似ringbuffer的BytesQueue结构，定义如下：

type cacheShard struct {// hashed key => bytes queue indexhashmapmap[uint64]uint32entriesqueue.BytesQueuelocksync.RWMutexentryBuffer []byteonRemoveonRemoveCallbackisVerboseboolstatsEnabled boolloggerLoggerclockclocklifeWindowuint64hashmapStats map[uint64]uint32statsStats}

下图很好地解释了cacheShard的底层结构~

图片来自

在处置完 sharding 后，bigcache 会将整个 value 与 key、hashedKey 等消息序列化后存进一个 byte array，这里的设计是不是有点相似网络协定里的 header 呢？

// 将整个 entry 打包到 shard 的// byte array 中w := wrapEntry(currentTimestamp, hashedKey, key, entry, &s.entryBuffer)func wrapEntry(timestamp uint64, hash uint64, key string, entry []byte, buffer *[]byte) []byte {keyLength := len(key)blobLength := len(entry) + headersSizeInBytes + keyLengthif blobLength > len(*buffer) {*buffer = make([]byte, blobLength)}blob := *buffer// 小端字节序binary.LittleEndian.PutUint64(blob, timestamp)binary.LittleEndian.PutUint64(blob[timestampSizeInBytes:], hash)binary.LittleEndian.PutUint16(blob[timestampSizeInBytes+hashSizeInBytes:], uint16(keyLength))copy(blob[headersSizeInBytes:], key)copy(blob[headersSizeInBytes+keyLength:], entry)return blob[:blobLength]}

这里存原始的 string key，我了解单纯是为了处置 hash 抵触用的。

每一个cacheShard底层的缓存数据都会存储在bytes queue中，即一个FIFO的 bytes 队列，新进入的 entry 都会 push 到开端，假设空间无余，则会发生内存调配的环节，初始的 queue 的大小，是可以在性能中指定的：

func initNewShard(config Config, callback onRemoveCallback, clock clock) *cacheShard {// 1. 初始化指定好大小可以缩小内存调配的次数bytesQueueInitialCapacity := config.initialShardSize() * config.MaxEntrySizemaximumShardSizeInBytes := config.maximumShardSizeInBytes()if maximumShardSizeInBytes > 0 && bytesQueueInitialCapacity > maximumShardSizeInBytes {bytesQueueInitialCapacity = maximumShardSizeInBytes}return &cacheShard{hashmap:make(map[uint64]uint32, config.initialShardSize()),hashmapStats: make(map[uint64]uint32, config.initialShardSize()),// 2. 初始化 bytes queue，这里用到了上方读取的性能entries:*queue.NewBytesQueue(bytesQueueInitialCapacity, maximumShardSizeInBytes, config.Verbose),entryBuffer:make([]byte, config.MaxEntrySize+headersSizeInBytes),onRemove:callback,isVerbose:config.Verbose,logger:newLogger(config.Logger),clock:clock,lifeWindow:uint64(config.LifeWindow.Seconds()),statsEnabled: config.StatsEnabled,}}

留意到这点，在初始化时经常使用正确的性能，就能缩小从新调配内存的次数了。

4.3 GC 优化

bigcache 实质上就是一个大的哈希表，在 go 里，由于 GC STW(Stop the World) 的存在大的哈希表是十分要命的，看看 bigcache 开发团队的博客的测试数据：

With an empty cache, this endpoint had maximum responsiveness latency of 10ms for 10k rps. When the cache was filled, it had more than a second latency for 99th percentile. Metrics indicated that there were over 40 mln objects in the heap and GC mark and scan phase took over four seconds.

缓存塞满后，堆上有 4 千万个对象，GC 的扫描环节就超越了 4 秒钟，这就不能忍了。

关键的优化思绪有：

最终他们驳回了map[uint64]uint32作为cacheShard中的关键存储。key 是 sharding 时获取的 uint64 hashed key，value 则只存 offset ，全体经常使用FIFO的 bytes queue，也合乎依照时序淘汰的需求，十分精美。

经过优化，bigcache 在 2000w 条记载下 GC 的表现

成果挺清楚，但是关于低延时的服务来说，22ms 的 GC 期间还是很致命的，对象数还是尽量能控制住比拟好。

5. 小结

仔细学完 bigcache 的代码，咱们至少有以下几点收获：

参考资料

高性能缓存库 Golang

<<Vue3 Element

Room成功数据存储耐久性一篇聊聊Jetpack>>