Paging S3 Files With Glommio Intro

by Jonathan Strong 2021-06-04T18:10:04.609562999Z

Glommio: Rust's answer to ScyllaDB
Putting the 'bleeding' into 'bleeding edge'
Objectives and expectations

Glommio is a brand new Rust IO framework designed around the new io_uring interface in Linux, high performance NVMe drives, and "thread-per-core" execution. It is at the bleeding edge of high-performance IO-bound software, and likely be a core component of the next generation of data-intensive applications.

Glommio is led by Glauber Costa, who previously worked on the Seastar C++ library, a similar IO framework that is the foundation of ScyllaDB, a blazing fast NoSQL database.

Rust developers have it pretty good when it comes to people building cool things in their language of choice, with rg being a prime example. But in the last few years, ScyllaDB and Seastar have been high-profile reminders that C++ has serious chops and an impressive legacy when it comes to ultra high performance software.

ScyllaDB is pretty bad ass. It has a userspace networking stack. It provides compatible front-end interfaces for multiple rival databases (Cassandra, AWS DynamoDB), each of which it obliterates in benchmarks. They also publish excellent content about the design of ScyllaDB.

So, when I heard that Costa had begun work on a Seastar-inspired Rust library, I was thrilled!

Putting the 'bleeding' into 'bleeding edge'

Now, the down side of bleeding edge is instability and software immaturity. ScyllaDB, for all its strengths, is a huge pain to build, and I have spent multiple hours trying, and failing, on previous occasions.

Glommio is so bleeding edge that the first step of using it, for me, will be installing a newer Linux kernel on my workstation.

Here's what happens when you try to run the Glommio tests with a 5.4 kernel:

$ uname -r
5.4.0-72-generic

$ cargo test

# ...

...panicked at 'Failed to register a probe.  The most likely reason is that
your kernel witnessed Romulus killing Remus (too old!! kernel should be at
least 5.8)', glommio/src/sys/uring.rs:214:13

But, before we get into all that, lets talk big picture.

Objectives and Expectations

In this article, we will walk through building a Rust library for paging S3 documents with keys that start with a given prefix. By paging, I mean downloading each file and performing some arbitrary operation on the data.

Our program will:

retrieve a list of keys that begin with a given prefix
retrieve the S3 files that correspond to a list of keys
provide both sequential and non-sequential means of iterating over the downloaded file data, as it arrives
utilize the Glommio library for network, file and other IO work, which uses io_uring under the hood
use Rust's async and await syntaxes/idioms

My (our) goals are:

test drive Glommio
learn more about designing high performance software around io_uring
give async/await a serious, open-minded look

This is not:

the easiest way to page S3 files in your program
something will result in an open source library well-designed for general use
guaranteed to succeed