Memory leaks in Rust

Contents

When we talk about memory in Rust, we first need to identify four main sections, each of which has a key impact on working with memory throughout the language: stack, heap, in processor register and static memory. The last section holds little interest for us, since in Rust static variables are similar to global variables (extern) and static ones in other low-level languages ​​(such as C++); they are created when a program starts and deleted when it exits. The variables use a fixed memory address (referring to them, in most cases, it’s safe). The main point right now is to show the difference in working with memory in other languages ​​​​(C++, Haskell) and Rust and the challenges that can await us.

Stack and heap

In languages ​​such as C/C++, stack is used for local variables and function arguments. In order to allocate heap memory, you need to call the malloc function in C, or the operator new in C++.

In Rust, a local variable is created in the stack memory, e.g.:

let a: u32 = 3;
let n: u8 = 200;

Variables are placed in the stack memory in the order in which these variables are declared. Stack is very convenient, but it also has its drawbacks: it’s usually very small, just a few megabytes. Rust only allows you to place the objects whose size is known at compile time (primitive types, arrays) and disallows ones whose size is determined at runtime, such as vectors.

We cannot explicitly place an object on the stack, and we cannot remove it from the stack as well. Memory is allocated automatically and automatically freed, we do not have the ability to override this behavior, and it can lead to errors. Here is one of them:

const SIZE: usize = 100_000;
const N_ARRAY: usize = 1_000_000;
fn create_ar() -> [u8; SIZE] { [0u8; SIZE] }
fn recursive_func(n: usize) {
let a = create_ar();
println!("{} {}", N_ARRAY - n + 1, a[0]);
if n > 1 { recursive_func(n - 1) }
}
recursive_func(N_ARRAY);

The program exits with an error because of the stack overflow.

So what is happening to the stack? Well, the create_ar function takes 100000 bytes from the stack each time the function is called; this memory is not released because of the stack design. Additionally, we use recursion in this sample: therefore the process needs more and more memory. Finally, the process runs out of memory because the system is just unable to allocate more memory for stack.

Now let’s move on to the description of the next section of memory – heap.

Let’s start studying this section of memory with similar example as above, just change one line in it:

const SIZE: usize = 100_000;
const N_ARRAY: usize = 1_000_000;
fn create_ar() -> Box<[u8; SIZE]> { Box::new([0u8; SIZE]) }
fn recursive_func(n: usize) {
let a = create_ar();
println!("{} {}", N_ARRAY - n + 1, a[0]);
if n > 1 { recursive_func(n - 1) }
}
recursive_func(N_ARRAY);

This program will also crash, but not as soon as the previous one. The matter is that we have changed the third line, namely Box – and now where it was just allocating an array of 100,000 bytes, we have a generic structure type that is located in another part of memory, neither static memory nor stack, in heap.

As soon as our program starts, its heap memory is almost empty. Typically, heap bytes are described with two states of “used” or “unused” at any given time. When a program needs to allocate an object in a heap memory, a heap manager first checks if the heap contains the sequence of free bytes needed to allocate the object. If there is such a sequence, the program reserves and occupies the required capacity(and then the bytes, “occupied” by the object, become “used”), if there is no sequence of bytes of the required length, a request is made to the operating system for additional memory allocation and after that the heap size will be sufficient to “accept” the object. It should be noted that the size of the heap, even after the objects has been removed from memory, will not decrease, that means, it will not return to its original state. The main problem when working with heap is fragmentation: a lot of small objects can be stored there, and searching for them takes much time. There are some techniques designed to fix this problem, but they all eventually come to an algorithm of work similar to stack (FIFO – first input first out).

Summing up a little, we mainly work in two types of memory:stack and heap. Stack is small but fast. Heap is slower, but more flexible, capacious and more convenient. Understanding these differences will allow us to show the causes of memory leaks better.

Rust has some tools that make it easier to work with memory, among them is Rc<T>, RefCell<T>, RawVec<T> – the so-called references, these are tools similar (roughly speaking) to pointers from C++ – we leave a reference to some object, not copying and not receiving, or vice versa – receiving it into possession. Now we need to talk about such an important feature of Rust language as borrowing, incorrect use of which can lead to errors in the future, including memory leaks.

Ownership and borrowing

Every book, article or tutorial on Rust has a chapter or section with the title in which the words ownership, lifetime, and borrowing appear. It’s not surprising, since these innovations have made it possible to get rid of garbage collectors (garbage collections) and turned the language into a safe one. It should be noted that such a property of the language as ownership allows not only to make it safer in terms of memory management, but also safe from the standpoint of using multithreaded architectures. Static memory safety testing is one of the most important qualities of Rust and is achieved by three main language specifications:

  1. Lifetime – the lifetime of an object during which the access to its value is valid. For example, local variables can be cleared as soon as the function has returned a value ( static variables live all the time the program is running).
  2. Ownership is directly related to memory, to the fact that the program always makes sure that all variables that are no longer needed are not “fun” in memory (stack or heap – it doesn’t matter) and always strives to free this memory.
  3. Borrowing – get access to the value and the rights to transfer this value, change it or destroy it. We will show below how it works on examples of the clone, copy, move functions.

Consider a trivial example:

let one = String::from("World");
let two = one;
println!("our value = {}", one );

In this example, you’ll get a compile time error because ownership has changed from variable one to variable two. This implementation of ownership ensures that you do not use variables whose memory is occupied (not freed) and therefore memory loss occurs. As soon as the variable goes out of scope, the program cleans up the memory occupied by it. In this example, it is very important to understand how the variable changes – our variable contains a pointer to the stack that contains the data, the data size and capacity. When we call let two = one; we don’t copy data from one place on the stack to another, we just create another pointer to the data already on the stack, “forgetting” the original owner of the data. As a result, when clearing the data, we do not have to remove the same thing from the stack once more. Again, this is a very important element of the Rust language, and if not taken into consideration, can lead to very serious errors in the future. Here’s another example:

let one = String::from("World");
let two = one clone();
println!("our value = {} and next out value {} ", one, two );

In the first example, we don’t copy the data from the stack to the second variable, but if we still need to do this we call the clone() command – which does exactly copy the data. And now in our system (the heap is already in memory, not the stack) – there will be two identical data sets, for which different variables are responsible. These are very simple concepts of the language, but it is important to understand them very well, as we will see later how small differences in proficiency can lead to errors in memory handling.

You might be interested in a very handy tool with which you can track the ownership of a variable and its value, as well as see its changes:
https://github.com/rustviz/rustviz

Memory leaks and unsafe Rust

As it was mentioned above, Rust is full of protection mechanisms to help prevent things like memory loss or uninitialized variable access. But sometimes we need to execute some code that will break the usual rules of Rust. Such code is usually called “unsafe” or “unsafe Rust”. It allows you to use raw pointers (which can now be null), dereference references, and other things that are expressly prohibited in standard Rust. The main function of such a section is that all work with memory, after you put the word unsafe, falls on the shoulders of the developer. But why do we need such code? Usually they talk about the four most common use cases:

  1. get values ​​by dereferencing from a raw pointer;
  2. create and call an unsafe function, method, or closure;
  3. modify static variables (mutable);
  4. work with unsafe traits.

Let’s consider the following example:

let mut los = 5;
let losraw = &los as *const i32;
println!("our raw {}", *losraw)

This code will throw an error: raw pointers may be null – since it violates Rust’s security requirements. But if we change it a little, taking the dereference operation into the unsafe block, then the error will disappear:

 
let mut los = 5;
let losraw = &los as *const i32;
unsafe {
    println!("my raw {}", * losraw)
}

This is the simplest example of how it;s possible to bypass the check for safe memory operation using the unsafe block. If in such a construction we turn to the memory that has already been freed, we will get the reference version of the memory leaks error.

Let’s consider one more example of how, using unsafe, we can get to those sections of memory that are hidden from us by the standard means of Rust. The use of extern is a call using its code from other programming languages, since Rust cannot be responsible for other languages, then all such calls pass under the unsafe function:

extern "C" {
    fn max( input1: i32, input2: i32 ) -> i32;
}
 
fn main() {
    unsage() {
      println!("calling c function {}", abs(-3));
    }
}

In the unsafe block, we can execute any code, and only we will be responsible for its safety.

Modifying static variables is not safe in terms of accessing them from different threads, so we will not consider this option here, but simply mention it. Just like working with traits, which, if you don’t go deep, comes down to obviously unsafe operations.

Consider the case when, formally, we do not use the word unsafe, but actually perform such a function. (The specification gives the following explanation for this case: Rust’s safety guarantees do not include a point that destructors will always run – std::mem::forget) This function “forgets” about the value by taking ownership and not running the destructor. It can lead to memory leaks if we haven’t destroyed the reference to it, but released the data it refers to and protection for checking these resources as well. In fact, such tools are simply created to create memory leaks of errors. Here is a simple example of such an implementation:

let s = String::from("this error");
forget(s);

Now, if we turn to s, we will get a memory leak.

As you can see, the main memory problems can arise through access to raw pointers, that is, to something that potentially allows you to work with memory directly, bypassing all language security checks. Consider how it is implemented in the design of probably the most accessible smart pointer Box<T>.

Namely, in the function leak which returns a mutable reference to the object &’a mut T:

let x = Box::new(41);
let static_ref: &'static mut usize = Box::leak(x);
*static_ref += 1;
assert_eq!(*static_ref, 42);

Deleting this reference will lead to a memory error – namely, we have not freed the memory, but have already deleted the reference to it. Thus, we see that not only unsafe space can influence memory management.

But not only unsafe options can contain errors related to memory leaks. Sometimes we stay in the safe code space, but, nevertheless, we get a classic memory leak error.

Let’s consider the case where we have created several references, redefined them, and exited a loop or function. See example:

use create::List::{Cons, Nil};
use std::cell::RefCell;
use std::rc::Rc;

#[derive(debug)]
enum List {
    Cons(i32, RefCell<Rc<List>>),
    Nil,
}

impl List {
    fn tail(&self) -> Option<&RefCell<Rc<List>>> {
        match self {
            Cons(_, item) => Some(item),
            Nil => none,
        }
    }
}

fn main() {
    let first_ref = Rc::new(Cons(5, RefCell::new(Rc::new(Nil))));

    let next_ref = Rc::new(Cons(10, RefCell::new(Rc::clone(&first_ref))));

    if let Some(link) = a.tail() {
        *link.borrow_mut() = Rc::clone(&next_ref);
    }
}

We get a reference counter in which there is no pointer to memory that has not been freed. Although such errors don’t cause an instant crash of the program, especially if it happens to the cycle, or if in such a configuration there is too much “occupied memory”, the probability of an error is very high.

There is another important point with ownership: if we change our code a little and write different ownership relations for some variables, the memory will be cleared and we will remain in the safe space of the Rust language.

Look at a handy tool for catching exactly this kind of errors:
https://github.com/saethlin/miri-tools

Conclusion

Rust is designed to be memory safe but memory leaks are still possible. As Rust doesn’t provide garbage collection, reference cycles can still lead to leaks, as it goes in C++. Sometimes developers have to use unsafe Rust that can also be a source of leaks as Rust doesn’t check such code for memory safety.