aboutsummaryrefslogtreecommitdiff
path: root/README.md
diff options
context:
space:
mode:
Diffstat (limited to 'README.md')
-rw-r--r--README.md179
1 files changed, 155 insertions, 24 deletions
diff --git a/README.md b/README.md
index 3c9e465..cc968ce 100644
--- a/README.md
+++ b/README.md
@@ -1,8 +1,8 @@
-Bolt [![Build Status](https://drone.io/github.com/boltdb/bolt/status.png)](https://drone.io/github.com/boltdb/bolt/latest) [![Coverage Status](https://coveralls.io/repos/boltdb/bolt/badge.png?branch=master)](https://coveralls.io/r/boltdb/bolt?branch=master) [![GoDoc](https://godoc.org/github.com/boltdb/bolt?status.png)](https://godoc.org/github.com/boltdb/bolt) ![Version](http://img.shields.io/badge/version-1.0-green.png)
+Bolt [![Build Status](https://drone.io/github.com/boltdb/bolt/status.png)](https://drone.io/github.com/boltdb/bolt/latest) [![Coverage Status](https://coveralls.io/repos/boltdb/bolt/badge.png?branch=master)](https://coveralls.io/r/boltdb/bolt?branch=master) [![GoDoc](https://godoc.org/github.com/boltdb/bolt?status.png)](https://godoc.org/github.com/boltdb/bolt) ![Version](https://img.shields.io/badge/version-1.0-green.png)
====
-Bolt is a pure Go key/value store inspired by [Howard Chu's][hyc_symas] and
-the [LMDB project][lmdb]. The goal of the project is to provide a simple,
+Bolt is a pure Go key/value store inspired by [Howard Chu's][hyc_symas]
+[LMDB project][lmdb]. The goal of the project is to provide a simple,
fast, and reliable database for projects that don't require a full database
server such as Postgres or MySQL.
@@ -13,7 +13,6 @@ and setting values. That's it.
[hyc_symas]: https://twitter.com/hyc_symas
[lmdb]: http://symas.com/mdb/
-
## Project Status
Bolt is stable and the API is fixed. Full unit test coverage and randomized
@@ -22,6 +21,35 @@ Bolt is currently in high-load production environments serving databases as
large as 1TB. Many companies such as Shopify and Heroku use Bolt-backed
services every day.
+## Table of Contents
+
+- [Getting Started](#getting-started)
+ - [Installing](#installing)
+ - [Opening a database](#opening-a-database)
+ - [Transactions](#transactions)
+ - [Read-write transactions](#read-write-transactions)
+ - [Read-only transactions](#read-only-transactions)
+ - [Batch read-write transactions](#batch-read-write-transactions)
+ - [Managing transactions manually](#managing-transactions-manually)
+ - [Using buckets](#using-buckets)
+ - [Using key/value pairs](#using-keyvalue-pairs)
+ - [Autoincrementing integer for the bucket](#autoincrementing-integer-for-the-bucket)
+ - [Iterating over keys](#iterating-over-keys)
+ - [Prefix scans](#prefix-scans)
+ - [Range scans](#range-scans)
+ - [ForEach()](#foreach)
+ - [Nested buckets](#nested-buckets)
+ - [Database backups](#database-backups)
+ - [Statistics](#statistics)
+ - [Read-Only Mode](#read-only-mode)
+- [Resources](#resources)
+- [Comparison with other databases](#comparison-with-other-databases)
+ - [Postgres, MySQL, & other relational databases](#postgres-mysql--other-relational-databases)
+ - [LevelDB, RocksDB](#leveldb-rocksdb)
+ - [LMDB](#lmdb)
+- [Caveats & Limitations](#caveats--limitations)
+- [Reading the Source](#reading-the-source)
+- [Other Projects Using Bolt](#other-projects-using-bolt)
## Getting Started
@@ -180,8 +208,8 @@ and then safely close your transaction if an error is returned. This is the
recommended way to use Bolt transactions.
However, sometimes you may want to manually start and end your transactions.
-You can use the `Tx.Begin()` function directly but _please_ be sure to close the
-transaction.
+You can use the `Tx.Begin()` function directly but **please** be sure to close
+the transaction.
```go
// Start a writable transaction.
@@ -256,7 +284,7 @@ db.View(func(tx *bolt.Tx) error {
```
The `Get()` function does not return an error because its operation is
-guarenteed to work (unless there is some kind of system failure). If the key
+guaranteed to work (unless there is some kind of system failure). If the key
exists then it will return its byte slice value. If it doesn't exist then it
will return `nil`. It's important to note that you can have a zero-length value
set to a key which is different than the key not existing.
@@ -268,6 +296,50 @@ transaction is open. If you need to use a value outside of the transaction
then you must use `copy()` to copy it to another byte slice.
+### Autoincrementing integer for the bucket
+By using the `NextSequence()` function, you can let Bolt determine a sequence
+which can be used as the unique identifier for your key/value pairs. See the
+example below.
+
+```go
+// CreateUser saves u to the store. The new user ID is set on u once the data is persisted.
+func (s *Store) CreateUser(u *User) error {
+ return s.db.Update(func(tx *bolt.Tx) error {
+ // Retrieve the users bucket.
+ // This should be created when the DB is first opened.
+ b := tx.Bucket([]byte("users"))
+
+ // Generate ID for the user.
+ // This returns an error only if the Tx is closed or not writeable.
+ // That can't happen in an Update() call so I ignore the error check.
+ id, _ = b.NextSequence()
+ u.ID = int(id)
+
+ // Marshal user data into bytes.
+ buf, err := json.Marshal(u)
+ if err != nil {
+ return err
+ }
+
+ // Persist bytes to users bucket.
+ return b.Put(itob(u.ID), buf)
+ })
+}
+
+// itob returns an 8-byte big endian representation of v.
+func itob(v int) []byte {
+ b := make([]byte, 8)
+ binary.BigEndian.PutUint64(b, uint64(v))
+ return b
+}
+
+type User struct {
+ ID int
+ ...
+}
+
+```
+
### Iterating over keys
Bolt stores its keys in byte-sorted order within a bucket. This makes sequential
@@ -300,10 +372,15 @@ Next() Move to the next key.
Prev() Move to the previous key.
```
-When you have iterated to the end of the cursor then `Next()` will return `nil`.
-You must seek to a position using `First()`, `Last()`, or `Seek()` before
-calling `Next()` or `Prev()`. If you do not seek to a position then these
-functions will return `nil`.
+Each of those functions has a return signature of `(key []byte, value []byte)`.
+When you have iterated to the end of the cursor then `Next()` will return a
+`nil` key. You must seek to a position using `First()`, `Last()`, or `Seek()`
+before calling `Next()` or `Prev()`. If you do not seek to a position then
+these functions will return a `nil` key.
+
+During iteration, if the key is non-`nil` but the value is `nil`, that means
+the key refers to a bucket rather than a value. Use `Bucket.Bucket()` to
+access the sub-bucket.
#### Prefix scans
@@ -382,8 +459,11 @@ func (*Bucket) DeleteBucket(key []byte) error
Bolt is a single file so it's easy to backup. You can use the `Tx.WriteTo()`
function to write a consistent view of the database to a writer. If you call
this from a read-only transaction, it will perform a hot backup and not block
-your other database reads and writes. It will also use `O_DIRECT` when available
-to prevent page cache trashing.
+your other database reads and writes.
+
+By default, it will use a regular file handle which will utilize the operating
+system's page cache. See the [`Tx`](https://godoc.org/github.com/boltdb/bolt#Tx)
+documentation for information about optimizing for larger-than-RAM datasets.
One common use case is to backup over HTTP so you can use tools like `cURL` to
do database backups:
@@ -500,7 +580,7 @@ they are libraries bundled into the application, however, their underlying
structure is a log-structured merge-tree (LSM tree). An LSM tree optimizes
random writes by using a write ahead log and multi-tiered, sorted files called
SSTables. Bolt uses a B+tree internally and only a single file. Both approaches
-have trade offs.
+have trade-offs.
If you require a high random write throughput (>10,000 w/sec) or you need to use
spinning disks then LevelDB could be a good choice. If your application is
@@ -536,9 +616,8 @@ It's important to pick the right tool for the job and Bolt is no exception.
Here are a few things to note when evaluating and using Bolt:
* Bolt is good for read intensive workloads. Sequential write performance is
- also fast but random writes can be slow. You can add a write-ahead log or
- [transaction coalescer](https://github.com/boltdb/coalescer) in front of Bolt
- to mitigate this issue.
+ also fast but random writes can be slow. You can use `DB.Batch()` or add a
+ write-ahead log to help mitigate this issue.
* Bolt uses a B+tree internally so there can be a lot of random page access.
SSDs provide a significant performance boost over spinning disks.
@@ -568,11 +647,13 @@ Here are a few things to note when evaluating and using Bolt:
can in memory and will release memory as needed to other processes. This means
that Bolt can show very high memory usage when working with large databases.
However, this is expected and the OS will release memory as needed. Bolt can
- handle databases much larger than the available physical RAM.
+ handle databases much larger than the available physical RAM, provided its
+ memory-map fits in the process virtual address space. It may be problematic
+ on 32-bits systems.
* The data structures in the Bolt database are memory mapped so the data file
will be endian specific. This means that you cannot copy a Bolt file from a
- little endian machine to a big endian machine and have it work. For most
+ little endian machine to a big endian machine and have it work. For most
users this is not a concern since most modern CPUs are little endian.
* Because of the way pages are laid out on disk, Bolt cannot truncate data files
@@ -587,6 +668,56 @@ Here are a few things to note when evaluating and using Bolt:
[page-allocation]: https://github.com/boltdb/bolt/issues/308#issuecomment-74811638
+## Reading the Source
+
+Bolt is a relatively small code base (<3KLOC) for an embedded, serializable,
+transactional key/value database so it can be a good starting point for people
+interested in how databases work.
+
+The best places to start are the main entry points into Bolt:
+
+- `Open()` - Initializes the reference to the database. It's responsible for
+ creating the database if it doesn't exist, obtaining an exclusive lock on the
+ file, reading the meta pages, & memory-mapping the file.
+
+- `DB.Begin()` - Starts a read-only or read-write transaction depending on the
+ value of the `writable` argument. This requires briefly obtaining the "meta"
+ lock to keep track of open transactions. Only one read-write transaction can
+ exist at a time so the "rwlock" is acquired during the life of a read-write
+ transaction.
+
+- `Bucket.Put()` - Writes a key/value pair into a bucket. After validating the
+ arguments, a cursor is used to traverse the B+tree to the page and position
+ where they key & value will be written. Once the position is found, the bucket
+ materializes the underlying page and the page's parent pages into memory as
+ "nodes". These nodes are where mutations occur during read-write transactions.
+ These changes get flushed to disk during commit.
+
+- `Bucket.Get()` - Retrieves a key/value pair from a bucket. This uses a cursor
+ to move to the page & position of a key/value pair. During a read-only
+ transaction, the key and value data is returned as a direct reference to the
+ underlying mmap file so there's no allocation overhead. For read-write
+ transactions, this data may reference the mmap file or one of the in-memory
+ node values.
+
+- `Cursor` - This object is simply for traversing the B+tree of on-disk pages
+ or in-memory nodes. It can seek to a specific key, move to the first or last
+ value, or it can move forward or backward. The cursor handles the movement up
+ and down the B+tree transparently to the end user.
+
+- `Tx.Commit()` - Converts the in-memory dirty nodes and the list of free pages
+ into pages to be written to disk. Writing to disk then occurs in two phases.
+ First, the dirty pages are written to disk and an `fsync()` occurs. Second, a
+ new meta page with an incremented transaction ID is written and another
+ `fsync()` occurs. This two phase write ensures that partially written data
+ pages are ignored in the event of a crash since the meta page pointing to them
+ is never written. Partially written meta pages are invalidated because they
+ are written with a checksum.
+
+If you have additional notes that could be helpful for others, please submit
+them via pull request.
+
+
## Other Projects Using Bolt
Below is a list of public, open source projects that use Bolt:
@@ -597,25 +728,25 @@ Below is a list of public, open source projects that use Bolt:
* [Skybox Analytics](https://github.com/skybox/skybox) - A standalone funnel analysis tool for web analytics.
* [Scuttlebutt](https://github.com/benbjohnson/scuttlebutt) - Uses Bolt to store and process all Twitter mentions of GitHub projects.
* [Wiki](https://github.com/peterhellberg/wiki) - A tiny wiki using Goji, BoltDB and Blackfriday.
-* [ChainStore](https://github.com/nulayer/chainstore) - Simple key-value interface to a variety of storage engines organized as a chain of operations.
+* [ChainStore](https://github.com/pressly/chainstore) - Simple key-value interface to a variety of storage engines organized as a chain of operations.
* [MetricBase](https://github.com/msiebuhr/MetricBase) - Single-binary version of Graphite.
* [Gitchain](https://github.com/gitchain/gitchain) - Decentralized, peer-to-peer Git repositories aka "Git meets Bitcoin".
* [event-shuttle](https://github.com/sclasen/event-shuttle) - A Unix system service to collect and reliably deliver messages to Kafka.
* [ipxed](https://github.com/kelseyhightower/ipxed) - Web interface and api for ipxed.
* [BoltStore](https://github.com/yosssi/boltstore) - Session store using Bolt.
-* [photosite/session](http://godoc.org/bitbucket.org/kardianos/photosite/session) - Sessions for a photo viewing site.
+* [photosite/session](https://godoc.org/bitbucket.org/kardianos/photosite/session) - Sessions for a photo viewing site.
* [LedisDB](https://github.com/siddontang/ledisdb) - A high performance NoSQL, using Bolt as optional storage.
* [ipLocator](https://github.com/AndreasBriese/ipLocator) - A fast ip-geo-location-server using bolt with bloom filters.
* [cayley](https://github.com/google/cayley) - Cayley is an open-source graph database using Bolt as optional backend.
* [bleve](http://www.blevesearch.com/) - A pure Go search engine similar to ElasticSearch that uses Bolt as the default storage backend.
* [tentacool](https://github.com/optiflows/tentacool) - REST api server to manage system stuff (IP, DNS, Gateway...) on a linux server.
* [SkyDB](https://github.com/skydb/sky) - Behavioral analytics database.
-* [Seaweed File System](https://github.com/chrislusf/weed-fs) - Highly scalable distributed key~file system with O(1) disk read.
-* [InfluxDB](http://influxdb.com) - Scalable datastore for metrics, events, and real-time analytics.
+* [Seaweed File System](https://github.com/chrislusf/seaweedfs) - Highly scalable distributed key~file system with O(1) disk read.
+* [InfluxDB](https://influxdata.com) - Scalable datastore for metrics, events, and real-time analytics.
* [Freehold](http://tshannon.bitbucket.org/freehold/) - An open, secure, and lightweight platform for your files and data.
* [Prometheus Annotation Server](https://github.com/oliver006/prom_annotation_server) - Annotation server for PromDash & Prometheus service monitoring system.
* [Consul](https://github.com/hashicorp/consul) - Consul is service discovery and configuration made easy. Distributed, highly available, and datacenter-aware.
-* [Kala](https://github.com/ajvb/kala) - Kala is a modern job scheduler optimized to run on a single node. It is persistant, JSON over HTTP API, ISO 8601 duration notation, and dependent jobs.
+* [Kala](https://github.com/ajvb/kala) - Kala is a modern job scheduler optimized to run on a single node. It is persistent, JSON over HTTP API, ISO 8601 duration notation, and dependent jobs.
* [drive](https://github.com/odeke-em/drive) - drive is an unofficial Google Drive command line client for \*NIX operating systems.
* [stow](https://github.com/djherbis/stow) - a persistence manager for objects
backed by boltdb.