Building a Redis Clone in Rust: Parsing the RESP Protocol
I’ve spent some time working with Redis — optimizing cache hit rates, debugging latency issues, tuning configurations. But there’s a difference between knowing how to use something well and understanding how it actually works under the hood. So, I’m building a clone from scratch (in Rust) to dig deeper. I’m calling it Rudis because it sounds catchy. Not hoping that it grows into something at all, but just so I get to tinker with stuff.
This post covers Phase 1: a TCP server that understands the Redis wire protocol and can respond to PING. Not exactly groundbreaking, but I ran into more issues than expected.
The Protocol
Redis uses RESP (REdis Serialization Protocol) for client-server communication. It’s pretty minimal — each message starts with a type byte, has some data, and ends with \r\n:
+OK\r\n # Simple String
-Error message\r\n # Error
:1000\r\n # Integer
$6\r\nfoobar\r\n # Bulk String (length-prefixed)
*2\r\n$3\r\nfoo\r\n$3\r\nbar\r\n # Array
Here’s how a Bulk String like $6\r\nfoobar\r\n breaks down:
$ 6 \r\n f o o b a r \r\n
│ │ │ └───────┬───────┘ │
│ │ │ │ └── terminator
│ │ │ └── data (exactly 6 bytes)
│ │ └── separator
│ └── length
└── type marker
When parsing, the offsets matter: skip the $, read until \r\n to get the length, then read exactly that many bytes starting after the separator.
I got a basic parser working quickly. Then I spent the rest of the day fixing bugs in it.
Null vs Empty
RESP distinguishes between “empty” and “doesn’t exist”:
$0\r\n\r\n # Empty string (zero bytes)
$-1\r\n # Null (nothing)
| Type | RESP Representation | Rust Representation |
|---|---|---|
| Empty String | $0\r\n\r\n | BulkString(Some(vec![])) |
| Null String | $-1\r\n | BulkString(None) |
When Redis can’t find a key, it sends $-1\r\n. If you treat that as an empty string, you can’t tell the difference between “key not found” and “key is empty.” In Rust:
pub enum RespValue {
SimpleString(String),
Error(String),
Integer(i64),
BulkString(Option<Vec<u8>>), // None = null
Array(Option<Vec<RespValue>>),
}
Future refinement: This enum owns its data via
StringandVec<u8>, which means allocation on every parse. For Redis-level performance, a later phase could use&'a [u8]with lifetimes to borrow directly from the input buffer — zero-copy parsing. For Phase 1, owned data keeps the borrow checker happy and the code simple.
Binary Data in Bulk Strings
Bulk strings are length-prefixed, so they can contain anything — including \r\n:
#[test]
fn parse_bulk_string_with_binary_data() {
// The data is: null byte, \r, \n, 0x01, 0x02
let mut buffer = BytesMut::from("$5\r\n\0\r\n\x01\x02\r\n");
let result = RespValue::parse(&mut buffer).unwrap().unwrap();
assert_eq!(
result.0,
RespValue::BulkString(Some(vec![0, b'\r', b'\n', 1, 2]))
);
}
That \r\n in the middle is data, not a delimiter. If you scan for \r\n without checking the length first, you’ll cut the message in the wrong place.
The “fix” is straightforward — parse the length, then read exactly that many bytes. The find_crlf helper just scans the buffer for the first \r\n sequence and returns its position:
fn parse_bulk_string(buffer: &mut BytesMut) -> Result<Option<(RespValue, usize)>> {
if let Some(pos) = find_crlf(&buffer[1..]) {
let len_str = String::from_utf8(buffer[1..pos + 1].to_vec())?;
let len = len_str.parse::<i64>()?;
if len == -1 {
return Ok(Some((RespValue::BulkString(None), pos + 3)));
}
let len = len as usize;
let total_needed = pos + 3 + len + 2;
if buffer.len() < total_needed {
return Ok(None);
}
let data_start = pos + 3;
let data = buffer[data_start..data_start + len].to_vec();
Ok(Some((RespValue::BulkString(Some(data)), total_needed)))
} else {
Ok(None)
}
}
I only caught this because I wrote a test for it after reading the spec carefully. Maybe, it wouldn’t have shown up in normal testing, it, but it’s really straightforward, so I should read more carefully.
Partial TCP Reads
TCP is a stream. A single read() might give you half a message, or three messages, or something in between.
My parser returns how many bytes it consumed, so the caller knows where the next message starts:
pub fn parse(buffer: &mut BytesMut) -> Result<Option<(RespValue, usize)>> {
// Ok(Some((value, bytes_consumed))) - complete message
// Ok(None) - need more data
// Err(...) - malformed
}
The server loop reads into a buffer and keeps parsing until it needs more data:
loop {
let n = socket.read_buf(&mut buffer).await?;
if n == 0 { return Ok(()); }
while !buffer.is_empty() {
match RespValue::parse(&mut buffer)? {
Some((value, consumed)) => {
let response = handle_command(value);
socket.write_all(&response.serialize()).await?;
buffer.advance(consumed);
}
None => break,
}
}
}
Nested Arrays
RESP arrays can contain arrays, so we should handle nested arrays. The parsing is recursive, but you need to track total bytes consumed across the whole structure:
fn parse_array(buffer: &mut BytesMut) -> Result<Option<(RespValue, usize)>> {
// ... parse array length ...
let mut consumed = pos + 3;
let mut elements = Vec::new();
let mut remaining = buffer.clone();
remaining.advance(consumed);
for _ in 0..len {
match RespValue::parse(&mut remaining)? {
Some((value, bytes)) => {
elements.push(value);
consumed += bytes;
remaining.advance(bytes);
}
None => return Ok(None),
}
}
Ok(Some((RespValue::Array(Some(elements)), consumed)))
}
If any nested element is incomplete, the whole array parse returns None (Ok(None) to be precise) and waits for more data.
Testing with redis-cli
Unit tests only go so far. The real check was whether redis-cli could talk to my server:
$ redis-cli -p 6379 PING
PONG
$ redis-cli -p 6379 PING "Hello, Rudis!"
"Hello, Rudis!"
This caught a couple of formatting issues that my tests missed. redis-cli is picky about the exact response format.
What’s Next
Phase 1 is done — RESP parsing works, PING works, and I have 41 unit tests and 7 integration tests covering the edge cases above.
Phase 2 is GET and SET with an actual key-value store. That means thread-safe data structures, which is where Rust’s Arc<RwLock<HashMap>> pattern comes in.
Code is here: github.com/aleksandar-had/rudis
This blog post was written by me, then optimized and polished by the the LLMs.