Pull to refresh

How does Rust treat Strings and Vectors internally

Reading time4 min
Views2.9K

1. Strings

In Rust strings can be represented in two ways:

a)       String type

b)      String slice

String type:

String type is defined as a struct of the following structure:

Depending on arch (in my case x86 64bit it is 24byte)

{

   pointer to the address where string characters are stored (8b)

   capacity (8b)

   length (8b)

}

 Example:

let my_string = String::from("hello");

String slice (String slice has type &str):

There are two types of slices:

a)      built from Strings

let my_string_slice = &my_string[3..];

The goal of such slice is to point on some part of the string

The structure of such slice similar to String:

{

   pointer to the address where string characters are stored (8b)

   length (8b)

   starting index (8b)

}

b)      built from string literals

let my_str = "Hello";

It has reduced size:

{

   pointer to the address where string characters are stored (8b)

   length (8b)

}

Memory layout:

So let’s have a look how does these two types lays on memory:

Let’s consider couple definitions:

let my_str :&str = "hello";
let my_string1: String = my_str.to_string();
let my_string2: String = String::from("hello");
let my_string_slice: &str = &my_string2[1..4];

We have defined 4 local variables:

my_str – is a slice defined from string literals “hello”. Such string literals are part of the code(data segment more precisely)

my_string1 – result of creation String from the slice using to_string method;

my_string2 – string creation using from method.

my_string_slice – slice from string

 The following picture shows memory layout for each variable:

As you can see first variable(&str) is created in stack and pointing to the static location of our code where string literals of “hello” are placed

Then when we call to_string() the new variable my_string1 is created and memory on heap is allocated to place “hello” chars there.

Same happens when we create my_string2 variable.

Creation of my_string_slice from my_string2 leads to pointing to the same area in heap where my_string2 characters are placed but with shift in address considering starting index

2.     Vectors

Vectors in rust are dynamically extended data structures;

They have similar structure as String:

{

   pointer to the address in the heap where data are stored (8b)

   capacity (8b)

   length (8b)

}

Let’s have a look what happens when we create new vector and push some data to it (continue of the previous example with strings).

let mut str_vec = Vec::new();
str_vec.push(my_string1);
str_vec.push(my_string2);

So as we see when we created str_vec variable of type Vec<String> the new memory were allocated in heap. To be more precise during variable creation using new() method pointer of the vector doesn’t point to real heap space. This happen only with first push:

Memory dump:

let mut str_vec = Vec::new();

pointer = 0x3a772ff2f8 (&str_vec)

08 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

str_vec.push(my_string1);

50 5e ca db db 01 00 00 04 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00

str_vec.push(my_string2);

50 5e ca db db 01 00 00 04 00 00 00 00 00 00 00 02 00 00 00 00 00 00 00

So as you can see real address pointing to the heap(0x000001dbdbca5e50) appears only after first push. Initial capacity 4 is reserved at this time as well. When more the 4 elements will be pushed the new area will be allocated with extended size.

 When my_string1 and my_string2 were pushed to the vector two things happened:

1)      Memory allocation in heap for storing two string variables. Pay attention that pointers of the strings still point to same addresses were chars were stored.

2)      Move of ownership: my_string1 and my_string2 are no more accessible as local variables. (see more details from official rust tutorial regarding transfer ownership). Moving variables to heap space is shown with dot arrow on the pic.

3.     Magic transformation during procedure call

Let’s consider the following code:

fn main() {
    let args: Vec<String> = env::args().collect();

    let ptr: *mut u8 = unsafe { mem::transmute(&args) };
    print_mem(ptr, 24 );

    let (query, filename) = parse_config(&args);
}

fn parse_config(args: &[String]) -> (&str, &str) {
    println!("args val {:p}", args);

    let query = &args[1];
    let filename = &args[2];

    (query, filename)
}

Here we can see declaration of vector  variable “args” that takes incoming parameters of main function in vector format.

let args: Vec<String> = env::args().collect();

As result we have local variable “args” created on stack.

Let’s print it’s content (execute the program with two args: cargo run param1 param2):

let ptr: *mut u8 = unsafe { mem::transmute(&args) };
print_mem(ptr, 24 );

pointer = 0xfa9a4ff730

40 2a 6f b3 6a 02 00 00 03 00 00 00 00 00 00 00 03 00 00 00 00 00 00 00

So as we can see “args” has three parts:

1)      Pointer to heap 0x26ab36f2a40 where strings with argument values are allocated

2)      Capacity of the vector (03)

3)      Length of the vector (03)

Now let’s have a look at function call:

let (query, filename) = parse_config(&args);

and function definition:

fn parse_config(args: &[String]) -> (&str, &str) {
…
}

You can notice some mismatch in calling params type and param definition in function.

The function is defined with &[String]) parameter but is called with address of local variable “args” that has Vec<String> type.

So let’s try to understand how it can be?

Here some magic of rust compiler happens: as “args” variable in main function is local variable it’s address itself(address in stack) is not transferred to calling function (actually technically it can be but not done in this way) overwise compiler analyze that expected param of calling function is address of String array &[String]) and sends address of start of String array in heap where pointer in “args” is referring(0x26ab36f2a40 in our example)

We can see it if will print value of  incoming parameter “args” in parse_config function:

println!("args val {:p}", args);

output: 0x26ab36f2a40

Note: in rust it is called "Implicit Deref Coercions with Functions and Methods". It is related to Deref trait and you can read about it in more details from official tutorial.

Tags:
Hubs:
Total votes 4: ↑4 and ↓0+4
Comments0

Articles