Skip to content
Go back

Building a Redis Clone in Rust: Parsing the RESP Protocol

Building a Redis Clone in Rust: Parsing the RESP Protocol

I’ve spent some time working with Redis — optimizing cache hit rates, debugging latency issues, tuning configurations. But there’s a difference between knowing how to use something well and understanding how it actually works under the hood. So, I’m building a clone from scratch (in Rust) to dig deeper. I’m calling it Rudis because it sounds catchy. Not hoping that it grows into something at all, but just so I get to tinker with stuff.

This post covers Phase 1: a TCP server that understands the Redis wire protocol and can respond to PING. Not exactly groundbreaking, but I ran into more issues than expected.


The Protocol

Redis uses RESP (REdis Serialization Protocol) for client-server communication. It’s pretty minimal — each message starts with a type byte, has some data, and ends with \r\n:

+OK\r\n                          # Simple String
-Error message\r\n               # Error
:1000\r\n                        # Integer
$6\r\nfoobar\r\n                 # Bulk String (length-prefixed)
*2\r\n$3\r\nfoo\r\n$3\r\nbar\r\n # Array

Here’s how a Bulk String like $6\r\nfoobar\r\n breaks down:

 $    6    \r\n   f  o  o  b  a  r   \r\n
 │    │     │     └───────┬───────┘   │
 │    │     │             │           └── terminator
 │    │     │             └── data (exactly 6 bytes)
 │    │     └── separator
 │    └── length
 └── type marker

When parsing, the offsets matter: skip the $, read until \r\n to get the length, then read exactly that many bytes starting after the separator.

I got a basic parser working quickly. Then I spent the rest of the day fixing bugs in it.


Null vs Empty

RESP distinguishes between “empty” and “doesn’t exist”:

$0\r\n\r\n    # Empty string (zero bytes)
$-1\r\n       # Null (nothing)
TypeRESP RepresentationRust Representation
Empty String$0\r\n\r\nBulkString(Some(vec![]))
Null String$-1\r\nBulkString(None)

When Redis can’t find a key, it sends $-1\r\n. If you treat that as an empty string, you can’t tell the difference between “key not found” and “key is empty.” In Rust:

pub enum RespValue {
    SimpleString(String),
    Error(String),
    Integer(i64),
    BulkString(Option<Vec<u8>>),   // None = null
    Array(Option<Vec<RespValue>>),
}

Future refinement: This enum owns its data via String and Vec<u8>, which means allocation on every parse. For Redis-level performance, a later phase could use &'a [u8] with lifetimes to borrow directly from the input buffer — zero-copy parsing. For Phase 1, owned data keeps the borrow checker happy and the code simple.


Binary Data in Bulk Strings

Bulk strings are length-prefixed, so they can contain anything — including \r\n:

#[test]
fn parse_bulk_string_with_binary_data() {
    // The data is: null byte, \r, \n, 0x01, 0x02
    let mut buffer = BytesMut::from("$5\r\n\0\r\n\x01\x02\r\n");
    let result = RespValue::parse(&mut buffer).unwrap().unwrap();
    assert_eq!(
        result.0,
        RespValue::BulkString(Some(vec![0, b'\r', b'\n', 1, 2]))
    );
}

That \r\n in the middle is data, not a delimiter. If you scan for \r\n without checking the length first, you’ll cut the message in the wrong place.

The “fix” is straightforward — parse the length, then read exactly that many bytes. The find_crlf helper just scans the buffer for the first \r\n sequence and returns its position:

fn parse_bulk_string(buffer: &mut BytesMut) -> Result<Option<(RespValue, usize)>> {
    if let Some(pos) = find_crlf(&buffer[1..]) {
        let len_str = String::from_utf8(buffer[1..pos + 1].to_vec())?;
        let len = len_str.parse::<i64>()?;

        if len == -1 {
            return Ok(Some((RespValue::BulkString(None), pos + 3)));
        }

        let len = len as usize;
        let total_needed = pos + 3 + len + 2;

        if buffer.len() < total_needed {
            return Ok(None);
        }

        let data_start = pos + 3;
        let data = buffer[data_start..data_start + len].to_vec();
        Ok(Some((RespValue::BulkString(Some(data)), total_needed)))
    } else {
        Ok(None)
    }
}

I only caught this because I wrote a test for it after reading the spec carefully. Maybe, it wouldn’t have shown up in normal testing, it, but it’s really straightforward, so I should read more carefully.


Partial TCP Reads

TCP is a stream. A single read() might give you half a message, or three messages, or something in between.

My parser returns how many bytes it consumed, so the caller knows where the next message starts:

pub fn parse(buffer: &mut BytesMut) -> Result<Option<(RespValue, usize)>> {
    // Ok(Some((value, bytes_consumed))) - complete message
    // Ok(None) - need more data
    // Err(...) - malformed
}

The server loop reads into a buffer and keeps parsing until it needs more data:

loop {
    let n = socket.read_buf(&mut buffer).await?;
    if n == 0 { return Ok(()); }

    while !buffer.is_empty() {
        match RespValue::parse(&mut buffer)? {
            Some((value, consumed)) => {
                let response = handle_command(value);
                socket.write_all(&response.serialize()).await?;
                buffer.advance(consumed);
            }
            None => break,
        }
    }
}

Nested Arrays

RESP arrays can contain arrays, so we should handle nested arrays. The parsing is recursive, but you need to track total bytes consumed across the whole structure:

fn parse_array(buffer: &mut BytesMut) -> Result<Option<(RespValue, usize)>> {
    // ... parse array length ...
    
    let mut consumed = pos + 3;
    let mut elements = Vec::new();
    let mut remaining = buffer.clone();
    remaining.advance(consumed);

    for _ in 0..len {
        match RespValue::parse(&mut remaining)? {
            Some((value, bytes)) => {
                elements.push(value);
                consumed += bytes;
                remaining.advance(bytes);
            }
            None => return Ok(None),
        }
    }

    Ok(Some((RespValue::Array(Some(elements)), consumed)))
}

If any nested element is incomplete, the whole array parse returns None (Ok(None) to be precise) and waits for more data.


Testing with redis-cli

Unit tests only go so far. The real check was whether redis-cli could talk to my server:

$ redis-cli -p 6379 PING
PONG

$ redis-cli -p 6379 PING "Hello, Rudis!"
"Hello, Rudis!"

This caught a couple of formatting issues that my tests missed. redis-cli is picky about the exact response format.


What’s Next

Phase 1 is done — RESP parsing works, PING works, and I have 41 unit tests and 7 integration tests covering the edge cases above.

Phase 2 is GET and SET with an actual key-value store. That means thread-safe data structures, which is where Rust’s Arc<RwLock<HashMap>> pattern comes in.

Code is here: github.com/aleksandar-had/rudis

This blog post was written by me, then optimized and polished by the the LLMs.


Share this post on:

Previous Post
Building a Redis Clone in Rust: The Key-Value Store
Next Post
Learning Hidden Markov Models with Simple Examples [Part 2]