Meet the borrow checker | Whisper of the Heartman

This article was originally published by LogRocket on their blog under
the title /Understanding the Rust borrow checker/. You can find their
version here.

You've heard a lot about it, you've bought into the hype, and the day has finally come. It's time for you to start writing Rust!

So you sit down---hands on the keyboard, heart giddy with anticipation---and write a few lines of code. You run the cargo run command, excited to see whether the program works as expected. You've heard Rust is one of those 'once it compiles, it works' languages and want to test it for yourself. The compiler starts up, you follow the output, when suddenly:

error[E0382]: borrow of moved value

Uh-oh. Seems like you've run into ... (puts on scary voice) the /borrow checker/! Dun, dun, DUUUUUUN!

The ... borrow checker?

That's right. The borrow checker is an essential part of the Rust language and part of what makes Rust Rust. The borrow checker helps you (or forces you to) manage ownership. As chapter 4 (/Understanding Ownership/) of the Rust Programming Language puts it: "Ownership is Rust's most unique feature, and it enables Rust to make memory safety guarantees without needing a garbage collector."

Ownership, borrow checker, and garbage collectors: There's a lot to unpack in the above paragraph, so let's break it down a bit. We'll look at what the borrow checker does for us (and what it stops us from doing), what guarantees it gives us, and how it compares to other forms of memory management. I'll assume that you have some experience with writing code in higher level languages such as Python, JavaScript, or C#, but not necessarily that you're familiar with how computer memory works.

Garbage collection vs manual memory allocation vs the borrow checker

Let's talk about memory and memory management for a minute. In most popular programming languages, you don't need to think about where your variables are stored. You simply declare them and the language runtime takes care of the rest via a garbage collector. This abstracts away how the computer memory actually works and makes it easier and more uniform to work with. This is a good thing.

However: we need to peel back a layer to talk about how this compares to the borrow checker. We'll start by looking at the stack and the heap.

The stack and the heap

Your programs have access to two kinds of memory where it can store values: the stack and the heap. These differ in a number of ways, but for our sake, the most important difference is that data that is stored on the stack must have a known, fixed size. Data on the heap can be of any arbitrary size.

What do I mean by size? The size is how many bytes it takes to store the data. In broad terms, certain data types, such as booleans, characters, and integers, have a fixed size. These are easy to put on the stack. On the other hand, data types such as strings, lists, and other collections, can be of any arbitrary size. As such, they cannot be stored on the stack, and we must instead use the heap.

Because data of arbitrary size can be stored on the heap, the computer needs to find a chunk of memory large enough to fit whatever we are looking to store. This is time-consuming, and the program also doesn't have direct access to the data as with the stack, but is instead left with a pointer to where the data is stored.

A pointer is pretty much what it says on the tin. It points to some memory address on the heap where the data you're looking for can be found. There's any number of pointer tutorials out there on the web, and which one works for you may depend on your background. But for a quick primer, here's one I found that explains C pointers pretty well (by Jason C. McDonald).

What is the point of having these two different memory stores? Because of the way the stack works, data access on the stack is very fast and easy, but requires the data to conform to certain standards. The heap is slower, but more versatile, and is thus useful for when you can't use the stack.

Garbage collection

In garbage collected languages, you don't need to worry about what goes on the stack and what goes on the heap. Data that goes on the stack gets dropped once it goes out of scope. Data that lives on the heap is taken care of by the garbage collector once it's no longer needed.

In languages like C, on the other hand, you need to manage memory yourself. Where you might simply initialize a list in higher level languages, you need to manually allocate memory on the heap in C. And when you've allocated memory, you should also free the memory once you're done with it to avoid memory leaks. But take care: Memory should only be freed once.

This process of manual allocation and freeing is error-prone. In fact, a Microsoft representative has said that 70% of all of Microsoft's vulnerabilities and exploits are memory-related! So why would you use manual memory management? Because it allows for more control and can often give better performance characteristics than garbage collection. The program doesn't need to stop what it's doing and spend time finding out what it needs to clean up before cleaning it up.

Rust's ownership model feels like something in between. By keeping track of where data is used throughout the program and by following a set of rules, the borrow checker is able to determine where data needs to be initialized and where it needs to be freed (or dropped, in Rust terms). It's like it auto-inserts memory allocations and frees for you, giving you the convenience of a garbage collector, but with the speed and efficiency of manual management.

The way this comes out in practice, is that when passing variables around you can do one of three things. You can move the data itself and give up ownership in the process. You can create a copy of the data and pass that along. Or you can pass a reference to the data and retain ownership, letting the recipient borrow it for a while. Which one is more appropriate depends entirely on the situation.

Other borrow checker superpowers: paralyzed or parallelized?

In addition to handling memory allocation and freeing for the programmer, the borrow checker also prevents data races (though not general race conditions) through its set of sharing rules.

These same borrowing rules also help you work with concurrent and parallel code without having to worry about memory safety, enabling Rust's fearless concurrency.

Drawbacks

But as with all good things in life, Rust's ownership system comes with it's set of drawbacks. Indeed, without any drawbacks, this article probably wouldn't exist. The borrow checker can be tricky to understand and work with. So much so that it's pretty common for newcomers to the Rust community to get stuck 'fighting the borrow checker' (and yes, I've personally lost many hours of my life to that struggle).

For instance, sharing data can suddenly become a problem, especially if you need to mutate it at the same time. Certain data structures that are super easy to create from scratch in other languages are very hard to get right in Rust. For a good example of the latter, check out the book Learn Rust With Entirely Too Many Linked Lists. It goes through a number of ways to implement a linked list Rust and details all the issues the author ran into on the way there. It's both informative and very entertaining, so it's well worth a look.

But once you get on board with the borrow checker, things start to improve. I quite like Reddit user dnkndnts' explanation from this comment:

[The borrow checker] operates by a few simple rules. If you don't understand or at least have some intuition for what those rules are, then it's going to be about as useful as using a spell checker to help you write in a language you don't even know: it'll just reject everything you say. Once you know the rules the borrow checker is based on, you'll find it useful rather than oppressive and annoying, just like a spell checker.

And what are these rules? Here are the two most important ones to remember concerning variables that are stored on the heap:

When passing a variable (instead of a reference to the variable) to

another function, you are giving up ownership. The other function is now the owner of this variable and you can't use it anymore.

When passing references to a variable (lending it out), you can have

either as many immutable borrows as you want or a single mutable borrow. Once you start borrowing mutably, there can be only one.

In practice

With some understanding of what the borrow checker is and how it works, let's examine how it affects us in practice. We'll be working with the Vec<T> type, which is Rust's version of a growable list (analogous to Python's lists or JavaScript's arrays). Because it doesn't have a fixed size, a Vec needs to be heap-allocated.

The example may be contrived, but it demonstrates the basic principles. We'll create a vector, call a function that simply accepts it as an argument, and then try and see what's inside later.

Note: this code sample does not compile.

fn hold_my_vec<T>(_: Vec<T>) {}

fn main() {
    let v = vec![2, 3, 5, 7, 11, 13, 17];
    hold_my_vec(v);
    let element = v.get(3);

    println!("I got this element from the vector: {:?}", element);
}

When trying to run this, you'll get the following compiler error:

error[E0382]: borrow of moved value: `v`
--> src/main.rs:6:19
          |
        4 |     let v = vec![2, 3, 5, 7, 11, 13, 17];
          |         - move occurs because `v` has type `std::vec::Vec<i32>`, which does not implement the `Copy` trait
        5 |     hold_my_vec(v);
          |                 - value moved here
        6 |     let element = v.get(3);
          |                   ^ value borrowed here after move

The above message tells us that Vec<i32> doesn't implement the ~Copy~ trait, and as such must be moved (or borrowed). The Copy trait is only implementable by data types that can be put on the stack, and because Vec must go on the heap, it cannot implement Copy. We need to find another way around this.

Attack of the clones

Even though a Vec can't implement the Copy trait, it can (and does) implement the ~Clone~ trait. In Rust, cloning is another way to make duplicates of data. But while copying can only be done on stack-based values and is always very cheap, cloning also works on heap-based values and can be very expensive.

So if the function takes ownership of the value, why don't we just give it a clone of our vector? That'll make it happy, right? Indeed, the below code works just fine.

fn hold_my_vec<T>(_: Vec<T>) {}

fn main() {
    let v = vec![2, 3, 5, 7, 11, 13, 17];
    hold_my_vec(v.clone());
    let element = v.get(3);

    println!("I got this element from the vector: {:?}", element);
}

However, we have now done a lot of extra work for nothing! The hold_my_vec function doesn't even use the vector for anything; it just takes ownership of it. In this case, our vector (v) is pretty small, so it's not a big deal to clone it, and in the just-getting-things-to-work-stage of development this may be the quickest and easiest way to see results. However, there is a better, more idiomatic way to do this. Let's have a look.

References

As mentioned previously, rather than giving away our variable to the other function, we can lend it to them. To do this, we need to change the signature of hold_my_vec to instead accept a reference by changing the type of the incoming parameter from Vec<T> to &Vec<T>.

We also need to change how we call the function and let Rust know that we're only giving the function a reference: a borrowed value. This way, we let the function borrow the vector for a little bit, but make sure that we get it back before continuing the program:

fn hold_my_vec<T>(_: &Vec<T>) {}

fn main() {
    let v = vec![2, 3, 5, 7, 11, 13, 17];
    hold_my_vec(&v);
    let element = v.get(3);

    println!("I got this element from the vector: {:?}", element);
}

Summary

It's worth noting that this is only a very brief overview of the borrow checker, what it does, and why it does it. A lot of the finer details have been left out to make this article as easy to digest as possible.

Often, as your programs grow, you'll find more intricate problems that require more thinking and fiddling with ownership and borrows. Oftentimes you'll even have to rethink how you've structured your program to make it work with Rust's borrow checker. It's a learning curve, for sure, but if you stick around and make your way to the top, you're sure to have learned a thing or two about memory along the way.