User guide

This user guide gives an overview of Plyvel. It covers:

  • opening and closing databases,

  • storing and retrieving data,

  • working with write batches,

  • using snapshots,

  • iterating over your data,

  • using prefixed databases, and

  • implementing custom comparators.

Note: this document assumes basic familiarity with LevelDB; visit the LevelDB homepage for more information about its features and design.

Getting started

After installing Plyvel, we can simply import plyvel:

>>> import plyvel

Let’s open a new database by creating a new DB instance:

>>> db = plyvel.DB('/tmp/testdb/', create_if_missing=True)

That’s all there is to it. At this point /tmp/testdb/ contains a fresh LevelDB database (assuming the directory did not contain a LevelDB database already).

For real-world applications, you probably want to tweak things like the size of the memory cache and the number of bits to use for the (optional) bloom filter. These settings, and many others, can be specified as arguments to the DB constructor. For this tutorial we’ll just use the LevelDB defaults.

To close the database we just opened, use DB.close() and inspect the closed property:

>>> db.closed
False
>>> db.close()
>>> db.closed
True

DB instances can also be used as context managers, which will cause the database to close itself at the end of the with block:

with plyvel.DB('/tmp/testdb/') as db:
    ...

Alternatively, you can just delete the variable that points to it, but this might not close the database immediately, e.g. because active iterators are using it:

>>> del db

Note that the remainder of this tutorial assumes an open database, so you probably want to skip the above if you’re performing all the steps in this tutorial yourself.

Basic operations

Now that we have our database, we can use the basic LevelDB operations: putting, getting, and deleting data. Let’s look at these in turn.

First we’ll add some data to the database by calling DB.put() with a key/value pair:

>>> db.put(b'key', b'value')
>>> db.put(b'another-key', b'another-value')

To get the data out again, use DB.get():

>>> db.get(b'key')
'value'

If you try to retrieve a key that does not exist, a None value is returned:

>>> print(db.get(b'yet-another-key'))
None

Optionally, you can specify a default value, just like dict.get():

>>> print(db.get(b'yet-another-key', b'default-value'))
'default-value'

Finally, to delete data from the database, use DB.delete():

>>> db.delete(b'key')
>>> db.delete(b'another-key')

At this point our database is empty again. Note that, in addition to the basic use shown above, the put(), get(), and delete() methods accept optional keyword arguments that influence their behaviour, e.g. for synchronous writes or reads that will not fill the cache.

Write batches

LevelDB provides write batches for bulk data modification. Since batches are faster than repeatedly calling DB.put() or DB.delete(), batches are perfect for bulk loading data. Let’s write some data:

>>> wb = db.write_batch()
>>> for i in xrange(100000):
        wb.put(str(i).encode(), str(i).encode() * 100)
...
>>> wb.write()

Since write batches are committed in an atomic way, either the complete batch is written, or not at all, so if your machine crashes while LevelDB writes the batch to disk, the database will not end up containing partial or inconsistent data. This makes write batches very useful for multiple modifications to the database that should be applied as a group.

Write batches can also act as context managers. The following code does the same as the example above, but there is no call to WriteBatch.write() anymore:

>>> with db.write_batch() as wb:
...     for i in xrange(100000):
...         wb.put(str(i).encode(), str(i).encode() * 100)

If the with block raises an exception, pending modifications in the write batch will still be written to the database. This means each modification using put() or delete() that happened before the exception was raised will be applied to the database:

>>> with db.write_batch() as wb:
...     wb.put(b'key-1', b'value-1')
...     raise ValueError("Something went wrong!")
...     wb.put(b'key-2', b'value-2')

At this point the database contains key-1, but not key-2. Sometimes this behaviour is undesirable. If you want to discard all pending modifications in the write batch if an exception occurs, you can simply set the transaction argument:

>>> with db.write_batch(transaction=True) as wb:
...     wb.put(b'key-3', b'value-3')
...     raise ValueError("Something went wrong!")
...     wb.put(b'key-4', b'value-4')

In this case the database will not be modified, because the with block raised an exception. In this example this means that neither key-3 nor key-4 will be saved.

Note

Write batches will never silently suppress exceptions. Exceptions will be propagated regardless of the value of the transaction argument, so in the examples above you will still see the ValueError.

Snapshots

A snapshot is a consistent read-only view over the entire database. Any data that is modified after the snapshot was taken, will not be seen by the snapshot. Let’s store a value:

>>> db.put(b'key', b'first-value')

Now we’ll make a snapshot using DB.snapshot():

>>> sn = db.snapshot()
>>> sn.get(b'key')
'first-value'

At this point any modifications to the database will not be visible by the snapshot:

>>> db.put(b'key', b'second-value')
>>> sn.get(b'key')
'first-value'

Long-lived snapshots may consume significant resources in your LevelDB database, since the snapshot prevents LevelDB from cleaning up old data that is still accessible by the snapshot. This means that you should never keep a snapshot around longer than necessary. The snapshot and its associated resources will be released automatically when the snapshot reference count drops to zero, which (for local variables) happens when the variable goes out of scope (or after you’ve issued a del statement). If you want explicit control over the lifetime of a snapshot, you can also clean it up yourself using Snapshot.close():

>>> sn.close()

Alternatively, you can use the snapshot as a context manager:

>>> with db.snapshot() as sn:
...     sn.get(b'key')

Iterators

All key/value pairs in a LevelDB database will be sorted by key. Because of this, data can be efficiently retrieved in sorted order. This is what iterators are for. Iterators allow you to efficiently iterate over all sorted key/value pairs in the database, or more likely, a range of the database.

Let’s fill the database with some data first:

>>> db.put(b'key-1', b'value-1')
>>> db.put(b'key-5', b'value-5')
>>> db.put(b'key-3', b'value-3')
>>> db.put(b'key-2', b'value-2')
>>> db.put(b'key-4', b'value-4')

Now we can iterate over all data using a simple for loop, which will return all key/value pairs in lexicographical key order:

>>> for key, value in db:
...     print(key)
...     print(value)
...
key-1
value-1
key-2
value-2
key-3
value-3
key-4
value-4
key-5

While the complete database can be iterated over by just looping over the DB instance, this is generally not useful. The DB.iterator() method allows you to obtain more specific iterators. This method takes several optional arguments to specify how the iterator should behave.

Iterating over a key range

Limiting the range of values that you want the iterator to iterate over can be achieved by supplying start and/or stop arguments:

>>> for key, value in db.iterator(start=b'key-2', stop=b'key-4'):
...     print(key)
...
key-2
key-3

Any combination of start and stop arguments is possible. For example, to iterate from a specific start key until the end of the database:

>>> for key, value in db.iterator(start=b'key-3'):
...     print(key)
...
key-3
key-4
key-5

By default the start key is inclusive and the stop key is exclusive. This matches the behaviour of Python’s built-in range() function. If you want different behaviour, you can use the include_start and include_stop arguments:

>>> for key, value in db.iterator(start=b'key-2', include_start=False,
...                               stop=b'key-5', include_stop=True):
...     print(key)
key-3
key-4
key-5

Instead of specifying start and stop keys, you can also specify a prefix for keys. In this case the iterator will only return key/value pairs whose key starts with the specified prefix. In our example, all keys have the same prefix, so this will return all key/value pairs:

>>> for key, value in db.iterator(prefix=b'ke'):
...     print(key)
key-1
key-2
key-3
key-4
key-5
>>> for key, value in db.iterator(prefix=b'key-4'):
...     print(key)
key-4

Limiting the returned data

If you’re only interested in either the key or the value, you can use the include_key and include_value arguments to omit data you don’t need:

>>> list(db.iterator(start=b'key-2', stop=b'key-4', include_value=False))
['key-2', 'key-3']
>>> list(db.iterator(start=b'key-2', stop=b'key-4', include_key=False))
['value-2', 'value-3']

Only requesting the data that you are interested in results in slightly faster iterators, since Plyvel will avoid unnecessary memory copies and object construction in this case.

Iterating in reverse order

LevelDB also supports reverse iteration. Just set the reverse argument to True to obtain a reverse iterator:

>>> list(db.iterator(start=b'key-2', stop=b'key-4', include_value=False, reverse=True))
['key-3', 'key-2']

Note that the start and stop keys are the same; the only difference is the reverse argument.

Iterating over snapshots

In addition to directly iterating over the database, LevelDB also supports iterating over snapshots using the Snapshot.iterator() method. This method works exactly the same as DB.iterator(), except that it operates on the snapshot instead of the complete database.

Closing iterators

It is generally not required to close an iterator explicitly, since it will be closed when it goes out of scope (or is garbage collected). However, due to the way LevelDB is designed, each iterator operates on an implicit database snapshot, which can be an expensive resource depending on how the database is used during the iterator’s lifetime. The Iterator.close() method gives explicit control over when those resources are released:

>>> it = db.iterator()
>>> it.close()

Alternatively, to ensure that an iterator is immediately closed after use, you can also use it as a context manager using the with statement:

>>> with db.iterator() as it:
...    for k, v in it:
...        pass

Non-linear iteration

In the examples above, we’ve only used Python’s standard iteration methods using a for loop and the list() constructor. This suffices for most applications, but sometimes more advanced iterator tricks can be useful. Plyvel exposes pretty much all features of the LevelDB iterators using extra functions on the Iterator instance that DB.iterator() and Snapshot.iterator() returns.

For instance, you can step forward and backward over the same iterator. For forward stepping, Python’s standard next() built-in function can be used (this is also what a standard for loop does). For backward stepping, you will need to call the prev() method on the iterator:

>>> it = db.iterator(include_value=False)
>>> next(it)
'key-1'
>>> next(it)
'key-2'
>>> next(it)
'key-3'
>>> it.prev()
'key-3'
>>> next(it)
'key-3'
>>> next(it)
'key-4'
>>> next(it)
'key-5'
>>> next(it)
Traceback (most recent call last):
  ...
StopIteration

>>> it.prev()
'key-5'

Note that for reverse iterators, the definition of ‘forward’ and ‘backward’ is inverted, i.e. calling next(it) on a reverse iterator will return the key that sorts before the key that was most recently returned.

Additionally, Plyvel supports seeking on iterators:

>>> it = db.iterator(include_value=False)
>>> it.seek(b'key-3')
>>> next(it)
'key-3'
>>> it.seek_to_start()
>>> next(it)
'key-1'

See the Iterator API reference for more information about advanced iterator usage.

Raw iterators

In addition to the iterators describe above, which adhere to the Python iterator protocol, there is also a raw iterator API that mimics the C++ iterator API provided by LevelDB. Since this interface is only intended for advanced use cases, it is not covered in this user guide. See the API reference for DB.raw_iterator() and RawIterator for more information.

Prefixed databases

LevelDB databases have a single key space. A common way to split a LevelDB database into separate partitions is to use a prefix for each partition. Plyvel makes this very easy to do using the DB.prefixed_db() method:

>>> my_sub_db = db.prefixed_db(b'example-')

The my_sub_db variable in this example points to an instance of the PrefixedDB class. This class behaves mostly like a normal Plyvel DB instance, but all operations will transparently add the key prefix to all keys that it accepts (e.g. in PrefixedDB.get()), and strip the key prefix from all keys that it returns (e.g. from PrefixedDB.iterator()). Examples:

>>> my_sub_db.get(b'some-key')  # this looks up b'example-some-key'
>>> my_sub_db.put(b'some-key', b'value')  # this sets b'example-some-key'

Almost all functionality available on DB is also available from PrefixedDB: write batches, iterators, snapshots, and also iterators over snapshots. A PrefixedDB is simply a lightweight object that delegates to the the real DB, which is accessible using the db attribute:

>>> real_db = my_sub_db.db

You can even nest key spaces by creating prefixed prefixed databases using PrefixedDB.prefixed_db():

>>> my_sub_sub_db = my_sub_db.prefixed_db(b'other-prefix')

Custom comparators

LevelDB provides an ordered data store, which means all keys are stored in sorted order. By default, a byte-wise comparator that works like strcmp() is used, but this behaviour can be changed by providing a custom comparator. Plyvel allows you to use a Python callable as a custom LevelDB comparator.

The signature for a comparator callable is simple: it takes two byte strings and should return either a positive number, zero, or a negative number, depending on whether the first byte string is greater than, equal to or less than the second byte string. (These are the same semantics as the built-in cmp(), which has been removed in Python 3 in favour of the so-called ‘rich’ comparison methods.)

A simple comparator function for case insensitive comparisons might look like this:

def comparator(a, b):
    a = a.lower()
    b = b.lower()

    if a < b:
        # a sorts before b
        return -1

    if a > b:
        # a sorts after b
        return 1

    # a and b are equal
    return 0

(This is a toy example. It only works properly for byte strings with characters in the ASCII range.)

To use this comparator, pass the comparator and comparator_name arguments to the DB constructor:

>>> db = DB('/path/to/database/',
...         comparator=comparator,  # the function defined above
...         comparator_name=b'CaseInsensitiveComparator')

The comparator name, which must be a byte string, will be stored in the database. LevelDB refuses to open existing databases if the provided comparator name does not match the one in the database.

LevelDB invokes the comparator callable repeatedly during many of its operations, including storing and retrieving data, but also during background compactions. Background compaction uses threads that are ‘invisible’ from Python. This means that custom comparator callables must not raise any exceptions, since there is no proper way to recover from those. If an exception happens nonetheless, Plyvel will print the traceback to stderr and immediately abort your program to avoid database corruption.

A final thing to keep in mind is that custom comparators written in Python come with a considerable performance impact. Experiments with simple Python comparator functions like the example above show a 4× slowdown for bulk writes compared to the built-in LevelDB comparator.

Next steps

The user guide should be enough to get you started with Plyvel. A complete description of the Plyvel API is available from the API reference.