Rust - My note of Getting Started (19/21)

Page content

What’s this page?

As of Dec. 2021, I work as a DevOps engineer, but I like to solve problems with codes (front, back, whatever. it depends on the purposes.) Thesedays, my motivation about learning Rust surged enough. This page is my memo while I’ve learned with Rust official document so that I can easily remind/refer to the key feafures. Most part of this post consist of quotes from the document, but I also leave my opinions (could be wrong.)

I have experiences on

  • Python,
  • C++,
  • Java,
  • Go,
  • and Fortran (!!)

1. Getting started

1.1. Installation

  • Official installation goes well.
  • I leave my installation process to another page.

1.2 Hello, World!

  • Rust files always end with .rs extension.
  • The main function is special: it is always the first code that runs in every executable Rust program.
  • Rust style is to indent with four spaces, not a tab.
  • Using a ! means that you’re calling a macro instead of a normal function.

1.3 Hello, Cargo!

  • In Rust, packages of code are referred to as crates.
  • cargo new hello_cargo
  • Cargo expects your source files to live inside the src directory.
  • cargo build command creates an executable file in target/debug/hello_cargo.
  • Cargo.lock: This file keeps track of the exact versions of dependencies in your project.
  • cargo check: command quickly checks your code to make sure it compiles but doesn’t produce an executable.
  • When your project is finally ready for release, you can use cargo build --release to compile it with optimizations.

2. Programming a Guessing Game

  • We can start comment line with //.

  • Create variables.

    let foo = 5; // immmutable
    let mut foo = 5; // mutable
    
  • let mut guess = String::new(); The :: syntax indicates that new is an associated function of the String type. An associated function is implemented on a type, in this case String, rather than on a particular instance of a String.

  • User input.

    use std::io;
    let mut guess = String::new();
    io::stdin()
          .read_line(&mut guess)
          .expect("Failed to read line");
    

    The code store a standart input to the variable guess as a String.

  • std::io::stdin function returns an instance of std::io::Stdin, which is a type that represents a handle to the standard input for your terminal.

  • The job of read_line is to take whatever the user types into standard input and place that into a string, so it takes that string as an argument.

  • The & indicates that this argument is a reference, which gives you a way to let multiple parts of your code access one piece of data without needing to copy that data into memory multiple times.

  • References are immutable by default. Hence, you need to write &mut guess rather than &guess to make it mutable.

  • .expect() is a potential failuer handling.

  • It’s often wise to introduce a newline and other whitespace to help break up long lines.

  • read_line returns io::Result

  • Rust has a number of types named Result in its standard library

  • The Result types are enumerations = enum, which is a type that can have a fixed set of value.

  • For Result, the variants are Ok or Err.

  • An instance of io::Result has an expect method.

  • If you don’t call expect, the program will compile, but you’ll get a warning:

    $ cargo build
       Compiling guessing_game v0.1.0 (file:///projects/guessing_game)
    warning: unused `std::result::Result` that must be used
      --> src/main.rs:10:5
       |
    10 |     io::stdin().read_line(&mut guess);
       |     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
       |
       = note: `#[warn(unused_must_use)]` on by default
       = note: this `Result` may be an `Err` variant, which should be handled
    
        Finished dev [unoptimized + debuginfo] target(s) in 0.59s
    
  • The set of curly brackets, {}, is a placeholder.

    let x = 5;
    let y = 10;
    
    println!("x = {} and y = {}", x, y);
    
  • From Rust version 1.58, format strings are supported!

    let x = 5;
    let y = 10;
    println!("x = {x} and y = {y}");
    
  • Rust doesn’t yet include random number functionality in its standard library. However, the Rust team does provide a rand crate.

  • Using crate in Cargo.toml.

    [dependencies]
    rand = "0.5.5"
    
    • The number 0.5.5 is actually shorthand for ^0.5.5, which means “any version that has a public API compatible with version 0.5.5.”
    • (My note): This line downloads and start compiling. See under /target/debug/deps/.
    • When you build a project for the first time, Cargo figures out all the versions of the dependencies that fit the criteria and then writes them to the Cargo.lock file. When you build your project in the future, Cargo will see that the Cargo.lock file exists and use the versions specified there rather than doing all the work of figuring out versions again.
    • When you do want to update a crate, Cargo provides another command, cargo update, which will ignore the Cargo.lock file and figure out all the latest versions that fit your specifications in Cargo.toml. If that works, Cargo will write those versions to the Cargo.lock file.
    • by default, Cargo will only look for (middle) versions greater than 0.5.5 and less than 0.6.0.
  • (My note, reminder) In rand::Rng, Rng is called associated function of rand.

    use rand::Rng;
    ...
    let secret_number = rand::thread_rng().gen_range(1, 101);
    
  • Note: A trait is a collection of methods defined for an unknown type: Self. They can access other methods declared in the same trait. https://doc.rust-lang.org/rust-by-example/trait.html

  • Simple match example.

    match guess.cmp(&secret_number) {
        Ordering::Less => println!("Too small!"),
        Ordering::Greater => println!("Too big!"),
        Ordering::Equal => println!("You win!"),
    }
    
    • std::cmp::Ordering is another enum, but the variants for Ordering are Less, Greater, and Equal.
    • The cmp method compares two values and can be called on anything that can be compared.
    • A match expression is made up of arms. An arm (=>) consists of a pattern and the code that should be run if the value given to the beginning of the match expression fits that arm’s pattern.
  • Rust has a strong, static type system. However, it also has type inference.

  • Integer type examples: i32, u32, i64.

  • Read the following code with paying attention to the type of guess.

    let mut guess = String::new();
    
    io::stdin()
        .read_line(&mut guess)
        .expect("Failed to read line");
    
    let guess: u32 = guess.trim().parse().expect("Please type a number!");
    
    • Rust allows us to shadow the previous value of guess with a new one. This feature is often used in situations in which you want to convert a value from one type to another type.
    • trim method on a String instance will eliminate any whitespace at the beginning and end.
    • When the user presses enter, a newline character is added to the string.
    • The parse method on strings parses a string into some kind of number, and could easily cause an error (the string contained A👍%, there would be no way to convert that to a number.)
    • The colon : after guess tells Rust we’ll annotate the variable’s type.
  • Make above code more Rust-like.

    // from
    //let guess: u32 = guess.trim().parse().expect("Please type a number!");
    //
    // to
    let guess: u32 = match guess.trim().parse() {
        Ok(num) => num,
        Err(_) => continue,
    };
    
    • Switching from an expect call to a match expression is how you generally move from crashing on an error to handling the error.
    • The underscore, _, is a catchall value; in this example, we’re saying we want to match all Err values, no matter what information they have inside them.
  • loop can loop unlimitedly unless break; appears.

3. Common Programming Concepts

3.1 Variables and Mutability

  • By default variables are immutable. -> takes advantage of the safety and easy concurrency.

  • Why Rust encourages you to favor immutability?

    • It’s important that we get compile-time errors when we attempt to change a value that we previously designated as immutable because this very situation can lead to bugs.
    • But mutability can be very useful.
  • Like immutable variables, constants are values that are bound to a name and are not allowed to change, but there are a few differences between constants and variables.

    • First, you aren’t allowed to use mut with constants.
    • Constants can be declared in any scope, including the global scope, which makes them useful for values that many parts of code need to know about.
    • The last difference is that constants may be set only to a constant expression, not the result of a function call or any other value that could only be computed at runtime.
  • An example of constants.

    const MAX_POINTS: u32 = 100_000;
    
    • Use all uppercase with underscores between words.
    • Underscores can be inserted in numeric literals to improve readability.
  • Shadowing

    fn main() {
        let x = 5;
        let x = x + 1;
        let x = x * 2;
    
        println!("The value of x is: {}", x);
    }
    
  • Shadowing is different from marking a variable as mut, because we’ll get a compile-time error if we accidentally try to reassign to this variable without using the let keyword.

  • The other difference between mut and shadowing is that, because we’re effectively creating a new variable when we use the let keyword again, we can change the type of the value but reuse the same name.

  • Shadowing thus spares us from having to come up with different names, such as spaces_str and spaces_num; instead, we can reuse the simpler spaces name.

My summary of shadowing

This is a good StackOverflow answer: https://stackoverflow.com/a/48696415/9923806

  • When you shadow a variable, you created a new variable but with the same name.
  • The value of original shadowed variable still exists on a memory. When you overwrite (assign) a new value to the variable, it will drop (free) the original variable, but shadowing doesn’t drop the original value on the memory.
  • So, shadowing JUST creates a new variable with a same name! (Let me repeat.)
  • cf. std::mem::drop
fn main() {
    let x = 1;
    println!("Value: {x}  Address: {:p}", &x);

    // Save address before shadowing
    let addr_first = &x;

    let x = 2;
    println!("Value: {x}  Address: {:p}", &x);
    println!("Old value: {}", *addr_first);
}

// Result:
//
// Value: 1  Address: 0x7ffeb2c0bf54
// Value: 2  Address: 0x7ffeb2c0bfc4
// Old value: 1

Unless I put the shadowing lines inside the scope ({}) I couldn’t find a way to recover it.

You can find shadowing using scope from this official example.

3.2 Data Types

  • A scalar type represents a single value.

    • Examples) integers, floating-point numbers, booleans, and characters.
  • Integer types.

    LengthSignedUnsigned
    8-biti8u8
    16-biti16u16
    32-biti32u32
    64-biti64u64
    128-biti128u128
    archisizeusize
  • Signed numbers are stored using two’s complement representation.

  • Interger Literals in Rust.

    Number literalsExample
    Decimal98_222
    Hex0xff
    Octal0o77
    Binary0b1111_0000
    Byte (u8 only)b'A'
    • We can use underscores in decimals.
  • Integer types default to i32: this type is generally the fastest, even on 64-bit systems.

  • When you’re compiling in debug mode, Rust includes checks for integer overflow that cause your program to panic at runtime if this behavior occurs.

  • Rust uses the term panicking when a program exits with an error.

  • When you’re compiling in release mode with the --release flag, Rust does not include checks for integer overflow that cause panics.

  • Rust’s floating-point types are f32 and f64.

  • The default type is f64 because on modern CPUs it’s roughly the same speed as f32 but is capable of more precision.

  • Floating-point numbers are represented according to the IEEE-754 standard.

  • Booleans are one byte in size.

  • Rust’s char type is four bytes in size and represents a Unicode Scalar Value, which means it can represent a lot more than just ASCII. …. your human intuition for what a “character” is may not match up with what a char is in Rust.

  • Compound types: tuple and array.

Tuple

  • Tuples have a fixed length: once declared, they cannot grow or shrink in size.
  • Example: let tup: (i32, f64, u8) = (500, 6.4, 1);
fn main() {
    let tup = (500, 6.4, 1);

    let (x, y, z) = tup;

    println!("The value of y is: {}", y);
}
// The value of y is: 6.4
  • We can access a tuple element directly by using a period (.) followed by the index of the value we want to access.
let x: (i32, f64, u8) = (500, 6.4, 1);
let five_hundred = x.0;

The tuple without any values, (), is a special type that has only one value, also written (). The type is called the unit type and the value is called the unit value. This is frequently used for unit-like struct. Another use case of unit-like struct is OK(()). Expressions implicitly return the unit value if they don’t return any other value.

Array

  • Every element of an array must have the same type.
  • We can define like let a: [i32; 5] = [1, 2, 3, 4, 5];
  • If you want to create an array that contains the same value for each element, you can specify the initial value, followed by a semicolon, and then the length of the array in square brackets,
let a = [3; 5];
  • You can access elements of an array using indexing, let first = a[0];.
  • What happens if you try to access an element of an array that is past the end of the array? … The compilation didn’t produce any errors, but the program resulted in a runtime error and didn’t exit successfully (panic).
  • In many low-level languages, this kind of check is not done, and when you provide an incorrect index, invalid memory can be accessed.
  • An array is allocated on stack.

3.3 functions

  • Function definitions in Rust start with fn and have a set of parentheses after the function name. The curly brackets tell the compiler where the function body begins and ends.
  • Rust code uses snake case as the conventional style for function and variable names.
  • Rust doesn’t care where you define your functions, only that they’re defined somewhere.
  • Rust is an expression-based language, this is an important distinction to understand.
    • Creating a variable and assigning a value to it with the let keyword is a statement.
    • Function definitions are also statements;
    • Statements do not return values.
    • 5 + 6, which is an expression that evaluates to the value 11.
    • Calling a function is an expression. Calling a macro is an expression. The block that we use to create new scopes, {}, is an expression,
    {
        let x = 3;
        x + 1
    }
    
    • The x + 1 line without a semicolon at the end, which is unlike most of the lines you’ve seen so far. Expressions do not include ending semicolons.
fn main() {
    let y = {
        let x = 3;
        x + 1
    };

    println!("The value of y is: {}", y);
    // This value of y is: 4
}

Functions with Return Values

fn main() {
    let x = plus_one(5);

    println!("The value of x is: {}", x);
}

fn plus_one(x: i32) -> i32 {
    x + 1
}
  • We don’t name return values, but we do declare their type after an arrow (->).
  • The return value of the function is synonymous with the value of the final expression in the block of the body of a function. You can return early from a function by using the return keyword and specifying a value, but most functions return the last expression implicitly.

3.4 Comments

Pass ;)

3.5 Control Flow

if number < 5 {
    println!("condition was true");
} else {
    println!("condition was false");
}
  • Blocks of code associated with the conditions in if expressions are sometimes called arms, just like the arms in match expressions.
  • It’s also worth noting that the condition in this code must be a bool. If the condition isn’t a bool, we’ll get an error.
  • You can have multiple conditions by combining if and else in an else if expression.
  • Because if is an expression, we can use it on the right side of a let statement.
    let condition = true;
    let number = if condition { 5 } else { 6 };
    
  • The values that have the potential to be results from each arm of the if must be the same type.
    • Decided at compile time.
    • The compiler would be more complex and would make fewer guarantees about the code if it had to keep track of multiple hypothetical types for any variable.
  • loop and break;.
  • You can add the value you want returned after the break expression you use to stop the loop; that value will be returned out of the loop so you can use it
let mut counter = 0;

let result = loop {
    counter += 1;

    if counter == 10 {
        break counter * 2;
    }
};

println!("The result is {}", result);
//The result is 20
  • while -> If the condition matches, out from the loop.
fn main() {
    let mut number = 3;

    while number != 0 {
        println!("{}!", number);

        number -= 1;
    }

    println!("LIFTOFF!!!");
}
//3!
//2!
//1!
//LIFTOFF!!!
  • You could use the while construct to loop over the elements of a collection, such as an array.
fn main() {
    let a = [10, 20, 30, 40, 50];
    let mut index = 0;

    while index < 5 {
        println!("the value is: {}", a[index]);

        index += 1;
    }
}
the value is: 10
the value is: 20
the value is: 30
the value is: 40
the value is: 50
  • And there is for also.
fn main() {
    let a = [10, 20, 30, 40, 50];

    for element in a.iter() {
        println!("the value is: {}", element);
    }
}
  • An array loop should be use for because of safetiness.
fn main() {
    for number in (1..4).rev() {
        println!("{}!", number);
    }
    println!("LIFTOFF!!!");
}
  • rev reverses the iteration.

4. Understanding Ownership

  • Rust has no GC, but Ownership.

4.1 What Is Ownership?

General programming knowledge: stack and heap

  • The stack stores values in the order it gets them and removes the values in the opposite order. LIFO = FILO.
  • All data stored on the stack must have a known, fixed size. Data with an unknown size at compile time or a size that might change must be stored on the heap instead.
    • cf. In a context of computer science, heap is a tree with some special property. That special property of the heap is, the value of a node must be >= or <= to its children. But in a context of programming language, you can think heap is a free memory area which is assined to a program (process) when it’s execution time.
  • The heap is less organized: when you put data on the heap, you request a certain amount of space. The memory allocator finds an empty spot in the heap that is big enough, marks it as being in use, and returns a pointer, which is the address of that location. This process is called allocating on the heap and is sometimes abbreviated as just allocating. Pushing values onto the stack is not considered allocating. Because the pointer is a known, fixed size, you can store the pointer on the stack, but when you want the actual data, you must follow the pointer.
  • Pushing to the stack is faster than allocating on the heap because the allocator never has to search for a place to store new data; that location is always at the top of the stack.
  • When your code calls a function, the values passed into the function (including, potentially, pointers to data on the heap) and the function’s local variables get pushed onto the stack. When the function is over, those values get popped off the stack.

Ownership addresses the problems,

  1. Keeping track of what parts of code are using what data on the heap,
  2. minimizing the amount of duplicate data on the heap,
  3. and cleaning up unused data on the heap so you don’t run out of space

Once you understand ownership, you won’t need to think about the stack and the heap very often, but knowing that managing heap data is why ownership exists can help explain why it works the way it does.

In Rust, memory is managed through a system of ownership with a set of rules that the compiler checks at compile time. None of the ownership features slow down your program while it’s running.

This blog post is a good reference about GC in Rust:

YES, RUST HAS GARBAGE COLLECTION, AND A FAST ONE - https://blog.akquinet.de/2020/10/09/yes-rust-has-garbage-collection-and-a-fast-one/

Sideway: Heap fragmentation in Rust

https://internals.rust-lang.org/t/jemalloc-was-just-removed-from-the-standard-library/8759

… the std::alloc::System type to represent the system’s default allocator.

https://stackoverflow.com/questions/40658045/does-rusts-memory-management-result-in-fragmented-memory

Stack or heap in Rust

All values in Rust are stack allocated by default. Values can be boxed (allocated on the heap) by creating a Box<T>.

We will learn about Box laaaater (chapter 15).

Ownership Rules

  1. Each value in Rust has a variable that’s called its owner.
  2. There can only be one owner at a time.
  3. When the owner goes out of scope, the value will be dropped.
  • The types covered previously are all stored on the stack and popped off the stack when their scope is over, but we want to look at data that is stored on the heap and explore how Rust knows when to clean up that data.
let s = String::from("hello");
  • The Type String is allocated on the heap and as such is able to store an amount of text that is unknown to us at compile time.
  • In the case of a string literal (like, let literal = "I'm a string literal"), we know the contents at compile time, so the text is hardcoded directly into the final executable. This is why string literals are fast and efficient. But these properties only come from the string literal’s immutability.
  • With the String type, in order to support a mutable, growable piece of text, we need to allocate an amount of memory on the heap, unknown at compile time, to hold the contents. This means:
    • The memory must be requested from the memory allocator at runtime.
    • We need a way of returning this memory to the allocator when we’re done with our String.
    • That first part is done by us: when we call String::from, its implementation requests the memory it needs. However, the second part is different. (GC)
  • Rust takes a different path: the memory is automatically returned (~free) once the variable that owns it goes out of scope.
  • When a variable goes out of scope, Rust calls a special function for us. This function is called drop, and it’s where the author of String can put the code to return the memory. Rust calls drop automatically at the closing curly bracket.

Example.1: Stack

let x = 5;
let y = x;
  1. The value 5 will stored in the stack.
  2. Make a copy of the value in x and bind it to y.
  • Integers are simple values with a known, fixed size, and these two 5 values are pushed onto the stack.
  • My note: the variables like x and y have no meanings in assembly (a.k.a. compiled code). Only the values 5 are stored in real memory stack, and the Rust compiler remembers the each locations of these variables x and y.

Example.2: Heap

let s1 = String::from("hello");
let s2 = s1;
  • A String is made up of three parts:
    • A pointer to the memory that holds the contents of the string,
    • The length is how much memory, in bytes, the contents of the String is currently using, and
    • The capacity is the total amount of memory, in bytes, that the String has received from the allocator.
    • When we assign s1 to s2, the String data is copied, meaning we copy the pointer, the length, and the capacity that are on the stack. We do not copy the data on the heap that the pointer refers to.

image alt text
How to store String in memory

ptr, len and capacity are stored in stack.

The following code returns error at its compile time.

let s1 = String::from("hello");
let s2 = s1;
println!("{}, world!", s1);
  • Note: shallow copy and deep copy: from the Python documentation.
    • A shallow copy constructs a new compound object and then (to the extent possible) inserts references into it to the objects found in the original.
    • A deep copy constructs a new compound object and then, recursively, inserts copies into it of the objects found in the original.
    • … OK, deep copy make its copy of object in memory, and shallow copy just refer to the value.
  • The concept of copying the pointer, length, and capacity without copying the data probably sounds like making a shallow copy. But because Rust also invalidates the first variable, instead of being called a shallow copy, it’s known as a move.
  • Only s2 is valid, when it goes out of scope.
  • Rust will never automatically create “deep” copies of your data. Therefore, any automatic copying can be assumed to be inexpensive in terms of runtime performance.
  • If we do want to deeply copy the heap data of the String, not just the stack data, we can use a common method called clone.
fn main() {
    let s1 = String::from("hello");
    let s2 = s1.clone();

    println!("s1 = {}, s2 = {}", s1, s2);
}
// s1 = hello, s2 = hello
let x = 5;
let y = x;

println!("x = {}, y = {}", x, y);
  • The codes above returns no error because types such as integers that have a known size at compile time are stored entirely on the stack, so copies of the actual values are quick to make.
  • Rust has a special annotation called the Copy trait that we can place on types like integers that are stored on the stack.
  • As a general rule, any group of simple scalar values can be Copy, and nothing that requires allocation or is some form of resource is Copy.
    • u32, bool, f64, char, or Tuples (if they only contain types that are also Copy.

Ownership and Functions

  • The following code is failed when its compile time at the line println!("{}", s).
fn main() {
    let s = String::from("hello");  // s comes into scope

    takes_ownership(s);             // s's value moves into the function...
                                    // ... and so is no longer valid here
    println!("{}", s)
}

fn takes_ownership(some_string: String) { // some_string comes into scope
    println!("{}", some_string);
} // Here, some_string goes out of scope and `drop` is called. The backing
  // memory is freed.

Return Values and Scope

  • Returning values can also transfer ownership.
  • When a variable that includes data on the heap goes out of scope, the value will be cleaned up by drop unless the data has been moved to be owned by another variable.

Example:

fn main() {
    let s1 = gives_ownership();         // gives_ownership moves its return
                                        // value into s1

    let s2 = String::from("hello");     // s2 comes into scope

    let s3 = takes_and_gives_back(s2);  // s2 is moved into
                                        // takes_and_gives_back, which also
                                        // moves its return value into s3
} // Here, s3 goes out of scope and is dropped. s2 goes out of scope but was
  // moved, so nothing happens. s1 goes out of scope and is dropped.

fn gives_ownership() -> String {             // gives_ownership will move its
                                             // return value into the function
                                             // that calls it

    let some_string = String::from("hello"); // some_string comes into scope

    some_string                              // some_string is returned and
                                             // moves out to the calling
                                             // function
}

// takes_and_gives_back will take a String and return one
fn takes_and_gives_back(a_string: String) -> String { // a_string comes into
                                                      // scope

    a_string  // a_string is returned and moves out to the calling function
}
  • What if we want to let a function use a value but not take ownership? It’s quite annoying that anything we pass in also needs to be passed back if we want to use it again -> The solution is references.

4.2 References

let s1 = String::from("hello");
let len = calculate_length(&s1);

fn calculate_length(s: &String) -> usize {
    s.len()
}

image alt text
Reference.

  • &s1 is a reference. It doesn’t own the ownership of s.
  • We call having references as function parameters borrowing.
  • So what happens if we try to modify something we’re borrowing?
    • we can change a value if the variable is mutable,
    • but a restricton. You can have only one mutable reference to a particular piece of data in a particular scope.

Mutable reference (pass compiling):

fn main() {
    let mut s = String::from("hello");

    change(&mut s);
}

fn change(some_string: &mut String) {
    some_string.push_str(", world");
}

Mutable reference, but double borrowing (compile error):

fn main() {
    let mut s = String::from("hello");

    let r1 = &mut s;
    let r2 = &mut s;

    println!("{}, {}", r1, r2);
}
  • Mutable and immutable reference have no compatibility.
  • Multiple immutable references are okay.
  • Note that a reference’s scope starts from where it is introduced and continues through the last time that reference is used.
fn main() {
    let mut s = String::from("hello");

    let r1 = &s; // no problem
    let r2 = &s; // no problem (multiple immutable references.)
    println!("{} and {}", r1, r2);
    // r1 and r2 are no longer used after this point

    let r3 = &mut s; // no problem
    println!("{}", r3);
}
  • In Rust, the compiler guarantees that references will never be dangling reference.
  • The two rules of references
    • At any given time, you can have either one mutable reference or any number of immutable references.
    • References must always be valid.

4.3 The Slice Type

  • The slice is another data type that does not have ownership.
  • Slices let you reference (so doesn’t have ownership) a contiguous sequence of elements in a collection rather than the whole collection.
  • For example, let bytes = s.as_bytes();: s is String and bytes is an array of bytes.
fn first_word(s: &String) -> usize {
    let bytes = s.as_bytes();

    for (i, &item) in bytes.iter().enumerate() {
        if item == b' ' { //search for the byte that represents the space by using the byte literal syntax.
            return i;  //If we find a space, we return the position
        }
    }

    s.len() // Otherwise, we return the length of the string by using s.len()
}

fn main() {
    let mut s = String::from("hello world");

    let word = first_word(&s); // word will get the value 5

    s.clear(); // this empties the String, making it equal to ""

    // word still has the value 5 here, but there's no more string that
    // we could meaningfully use the value 5 with. word is now totally "invalid"!
}
  • Because we get a reference to the element from .iter().enumerate(), we use & in the pattern.
  • Because word isn’t connected to the state of s at all, word still contains the value 5.

String Slice

fn main() {
    let s = String::from("hello world");

    let hello = &s[0..5];
    let world = &s[6..11];
}

image alt text
String slice.

  • world contains ptr to the 6th element of the s and length 5 (slice is references).
  • Rust’s range syntax is ...
  • String slice range indices must occur at valid UTF-8 character boundaries. If you attempt to create a string slice in the middle of a multibyte character, your program will exit with an error. For the purposes of introducing string slices, we are assuming ASCII only in this section;
fn first_word(s: &String) -> &str {
    let bytes = s.as_bytes();

    for (i, &item) in bytes.iter().enumerate() {
        if item == b' ' {
            return &s[0..i];
        }
    }

    &s[..]
}

fn main() {
    let mut s = String::from("hello world");

    let word = first_word(&s); // immutable reference

    s.clear(); // error! Because clear needs to truncate the String, it needs to get a mutable reference.

    println!("the first word is: {}", word);
}
  • My note: Because word is immutable, it cant clean (make it "", which means mutable borrow.)

Example: frequently used slice

fn main() {
let a = [1, 2, 3, 4, 5];

let slice = &a[1..3];
}

This slice has the type &[i32].

A struct, or structure, is a custom data type that lets you name and package together multiple related values that make up a meaningful group.

5.1 Defining and Instantiating Structs

  • The pieces of a struct can be different types.
  • Unlike with tuples, you’ll name each piece of data so it’s clear what the values mean.
  • you don’t have to rely on the order of the data to specify or access the values of an instance.
  • To use a struct after we’ve defined it, we create an instance of that struct by specifying concrete values for each of the fields. wiht key: value pairs.
struct User {
    username: String,
    email: String,
    sign_in_count: u64,
    active: bool,
}

fn main() {
    let user1 = User {
        email: String::from("someone@example.com"),
        username: String::from("someusername123"),
        active: true,
        sign_in_count: 1,
    };
}
  • To get a specific value from a struct, we can use dot notation.
  • If the instance is mutable, we can change a value by using the dot notation and assigning into a particular field.
  • user1.email = String::from("anotheremail@example.com");
  • the entire instance must be mutable; Rust doesn’t allow us to mark only certain fields as mutable.
  • create instalnce with function sample
fn build_user(email: String, username: String) -> User {
    User {
        email: email,
        username: username,
        active: true,
        sign_in_count: 1,
    }
}
  • Because the parameter names and the struct field names are exactly the same, we can use the field init shorthand syntax to rewrite build_user so that it behaves exactly the same but doesn’t have the repetition of email and username.
fn build_user(email: String, username: String) -> User {
    User {
        email,
        username,
        active: true,
        sign_in_count: 1,
    }
}
  • The syntax .. specifies that the remaining fields not explicitly set should have the same value as the fields in the given instance.
let user2 = User {
    email: String::from("another@example.com"),
    username: String::from("anotherusername567"),
    ..user1
};

user2 has a different value for email and username but has the same values for the active and sign_in_count fields from user1.

  • You can also define structs that look similar to tuples, called tuple structs. Tuple structs have the added meaning the struct name provides but don’t have names associated with their fields;
struct Color(i32, i32, i32);
let black = Color(0, 0, 0);

Unit-Like Structs Without Any Fields

You can define a struct without fields:

struct AlwaysEqual;
let subject = AlwaysEqual;

It is called unit-like struct.

Unit structs are most commonly used as marker. They have a size of zero bytes, but unlike empty enums they can be instantiated, making them isomorphic to the unit type (). Unit structs are useful when you need to implement a trait on something, but don’t need to store any data inside it.

Ownership of Struct Data

  • We can’t use &str instead of String::from() in a Structure. It returns error because of its lifetime. &str is a “string slice”, so it is a reference. The value of a struct can be reference, but lifetime issues are there.
  • (From chapter 10): Every reference in Rust has a lifetime, which is the scope for which that reference is valid.

5.2 An Example Program Using Structs

  • Practicale tips
  • We use structs to add meaning by labeling the data.
struct Rectangle {
    width: u32,
    height: u32,
}

fn main() {
    let rect1 = Rectangle {
        width: 30,
        height: 50,
    };

    println!(
        "The area of the rectangle is {} square pixels.",
        area(&rect1)
    );
}

fn area(rectangle: &Rectangle) -> u32 {
    rectangle.width * rectangle.height
}
  • We want to borrow the struct rather than take ownership of it. This way, main retains its ownership and can continue using rect1, which is the reason we use the & in the function signature and where we call the function.

  • By default, the curly brackets {} tell println! to use formatting known as Display: output intended for direct end user consumption. Due to this ambiguity, Rust doesn’t try to guess what we want, and structs don’t have a provided implementation of Display.

  • {:?} debug or {:#?} for pretty-print. Require #[derive(Debug)] jsut before the struct definition as shown below.

#[derive(Debug)]
struct Rectangle {
    width: u32,
    height: u32,
}

fn main() {
    let rect1 = Rectangle {
        width: 30,
        height: 50,
    };

    println!("rect1 is {:?}", rect1);
    // rect1 is Rectangle { width: 30, height: 50 }
}

I add the annotation to derive the Debug trait and printing the Rectangle instance using debug formatting. Rust has provided a number of traits for us to use with the derive annotation that can add useful behavior to our custom types.

About #[derive(Debug)], it’s called an attribute. https://doc.rust-lang.org/rust-by-example/attribute.html

5.3 Method syntax

  • Methods are different from functions in that they’re defined within the context of a struct (or an enum or a trait object).
  • The first parameter of methods is always self, which represents the instance of the struct the method is being called on.
  • How to add method on struct? -> impl
#[derive(Debug)]
struct Rectangle {
    width: u32,
    height: u32,
}

impl Rectangle {
    fn area(&self) -> u32 {
        self.width * self.height
    }
}
  • How to access to method? -> Dot
  • Methods can take ownership of self, borrow self immutably as we’ve done above, or borrow self mutably, just as they can any other parameter.
  • C, C++ : In other words, if object is a pointer, object->something() is similar to (*object).something().
  • Rust doesn’t have an equivalent to the -> operator; instead, Rust has a feature called automatic referencing and dereferencing.
  • When you call a method with object.something(), Rust automatically adds in &, &mut, or * so object matches the signature of the method.
impl Rectangle {
    fn area(&self) -> u32 {
        self.width * self.height
    }

    fn can_hold(&self, other: &Rectangle) -> bool {
        self.width > other.width && self.height > other.height
    }
}
  • We’re allowed to define functions within impl blocks that don’t take self as a parameter. These are called associated functions because they’re associated with the struct. It’s similar concept to a static method.
  • Associated functions are often used for constructors that will return a new instance of the struct.
#[derive(Debug)]
struct Rectangle {
    width: u32,
    height: u32,
}

impl Rectangle {
    fn square(size: u32) -> Rectangle {
        Rectangle {
            width: size,
            height: size,
        }
    }
}

fn main() {
    let sq = Rectangle::square(3);
}
  • To call this associated function, we use the :: syntax with the struct name; let sq = Rectangle::square(3); is an example. This function is namespaced by the struct:
  • Each struct is allowed to have multiple impl blocks. (My memo) I can add new functions later.
  • My note: Why we need an associated function? -> my answer: at first the main benefit of method over function is for organization of codes, because we can put all function related to struct in a place. If we write functions instead, we could check all code base which is available with the struct. second, some function could be related real instances of struct, but some fuctions are related with the struct itself, thus they don’t need an instance of the type to work with, like String::from.

6 Enums and Pattern Matching

6.1 Defining an Enum

An enum definition is kind of custom data type. This YouTube video explaings what is an enumeration type in C (The video is very understandable).

Introduction to Enumerations in C

I can regard enum as kind of a lookup table in this simplest case.

Example 1: IP (enum in this example is similar to enum in C ).

enum IpAddrKind {
    V4,
    V6,
}
  • The custom data type is IpAddrKind.

  • The variant of type IpAddrKind could be either V4 or V6.

  • The variants of the enum are namespaced under its identifier, and we use a double colon to separate the two:

    let four = IpAddrKind::V4;
    let six = IpAddrKind::V6;
    
  • Like C, you can label (map) each variable as integer (let x = IpAddrKind::V4 as i32;).

  • The reason this is useful is that both values IpAddrKind::V4 and IpAddrKind::V6 are of the same type: IpAddrKind. We can then, for instance, define a function that takes any IpAddrKind:

    fn route(ip_kind: IpAddrKind) {}
    

    And we can call this function with either variant:

    route(IpAddrKind::V4);
    route(IpAddrKind::V6);
    

Example 2: IP with the addess.

We can associate values to the enum values:

enum IpAddr {
    V4(String),
    V6(String),
}

let home = IpAddr::V4(String::from("127.0.0.1"));
let loopback = IpAddr::V6(String::from("::1"));
  • There’s another advantage to using an enum rather than a struct: each variant can have different types and amounts of associated data:
    enum IpAddr {
        V4(u8, u8, u8, u8),
        V6(String),
    }
    
    let home = IpAddr::V4(127, 0, 0, 1);
    let loopback = IpAddr::V6(String::from("::1"));
    
  • The following code are actually written in Rust standard library (because wanting to store IP addresses and encode which kind they are is so common.)
    struct Ipv4Addr {
        // --snip--
    }
    
    struct Ipv6Addr {
        // --snip--
    }
    
    enum IpAddr {
        V4(Ipv4Addr),
        V6(Ipv6Addr),
    }
    

Example 3: Message.

enum Message {
    Quit,
    Move { x: i32, y: i32 },
    Write(String),
    ChangeColor(i32, i32, i32),
}
  • If we use the different structs, which each have their own type, we couldn’t as easily define a function to take any of these kinds of messages as we could with the Message enum defined above, which is a single type.
  • enum also can be implemented.
    impl Message {
        fn call(&self) {
            // method body would be defined here
        }
    }
    
    let m = Message::Write(String::from("hello"));
    m.call();
    

My homework: How Rust compiler compile enum into machine code…?

The Option Enum and Its Advantages Over Null Values

Please learn to learn from at once, eventhough you don’t understand the Option at the first time.

YouTube video: Rust Programming Tutorial #37 - Option (Enum)

  • Option is another enum defined by the standard library.
  • The Option type is used in many places because it encodes the very common scenario in which a value could be something or it could be nothing. Expressing this concept in terms of the type system means the compiler can check whether you’ve handled all the cases you should be handling; this functionality can prevent bugs that are extremely common in other programming languages.
    • Rust doesn’t have the null feature. … In languages with null, variables can always be in one of two states: null or not-null.
    • The problem with null values is that if you try to use a null value as a not-null value, you’ll get an error of some kind.
    • However, the concept that null is trying to express is still a useful one: a null is a value that is currently invalid or absent for some reason.
    • The problem isn’t really with the concept but with the particular implementation.
  • Rust does not have nulls, but it does have an enum that can encode the concept of a value being present or absent. This enum is Option<T>, and it is defined by the standard library as follows:
    enum Option<T> {
        Some(T),
        None,
    }
    
  • You can use Some and None directly without the Option:: prefix.
  • For now, all you need to know is that <T> means the Some variant of the Option enum can hold one piece of data of any type.
let some_number = Some(5);
let some_string = Some("a string");

let absent_number: Option<i32> = None;
  • If we use None rather than Some, we need to tell Rust what type of Option<T> we have.

Why is having Option<T> any better than having null?

Because Option<T> and T (where T can be any type) are different types, the compiler won’t let us use an Option<T> value as if it were definitely a valid value.

In the following code, sum returns a compile error because Rust doesn’t understand how to add an i8 and an Option<i8>.

fn main() {
    let x: i8 = 5;
    let y: Option<i8> = Some(5);

    let sum = x + y;
}

This means, when we have a value of a type like i8 in Rust, the compiler will ensure that we always have a valid value. In other words, you have to convert an Option<T> to a T before you can perform T operations with it (usually done by match in the next section). Generally, this helps catch one of the most common issues with null.

6.2 The match Control Flow Operator

Here is an example. (Tips. From 1999 through 2008, the United States minted quarters with different designs for each of the 50 states on one side.)

#[derive(Debug)] // so we can inspect the state in a minute
enum UsState {
    Alabama,
    Alaska,
    // --snip--
}

enum Coin {
    Penny,
    Nickel,
    Dime,
    Quarter(UsState),
}

fn value_in_cents(coin: Coin) -> u8 {
    match coin {
        Coin::Penny => 1,
        Coin::Nickel => 5,
        Coin::Dime => 10,
        Coin::Quarter(state) => {
            println!("State quarter from {:?}!", state);
            25
        }
    }
}

Matching with Option<T>

  • Especially in the case of Option<T>, when Rust prevents us from forgetting to explicitly handle the None case, it protects us from assuming that we have a value when we might have null, thus making the billion-dollar mistake discussed earlier impossible.

The _ Placeholder

The _ will match all the possible cases that aren’t specified before it.

6.3 Concise Control Flow with if let

The if let syntax lets you combine if and let into a less verbose way to handle values that match one pattern while ignoring the rest.

let some_u8_value = Some(0u8);

// no if let syntax
match some_u8_value {
    Some(3) => println!("three"),
    _ => (),
}

// same as above (with if let syntax)
// note: the pattern is its first arm.
if let Some(3) = some_u8_value {
    println!("three");
}
  • We can include an else with an if let.
match coin {
    Coin::Quarter(state) => println!("State quarter from {:?}!", state),
    _ => count += 1,
}

// same as
if let Coin::Quarter(state) = coin {
        println!("State quarter from {:?}!", state);
} else {
    count += 1;
}

When we use if let?

Using if let means less typing, less indentation, and less boilerplate code. However, you lose the exhaustive checking that match enforces. Choosing between match and if let depends on what you’re doing in your particular situation and whether gaining conciseness is an appropriate trade-off for losing exhaustive checking.

With if let, we don’t need to write _ in match. And the difference between if is, in place of a condition expression if let expects the keyword let followed by a pattern, an = and a scrutinee expression.`

7. Managing Growing Projects with Packages, Crates, and Modules

  • Packages: A Cargo feature that lets you build, test, and share crates
  • Crates: A tree of modules that produces a library or executable
  • Modules and use: Let you control the organization, scope, and privacy of paths
  • Paths: A way of naming an item, such as a struct, function, or module

A package can contain multiple binary crates and optionally one library crate.

7.1 Packages and Crates

Packages

  • A package is one or more crates that provide a set of functionality.
  • A package contains a Cargo.toml file that describes how to build those crates.
  • A package must contain zero or one library crates, and no more.
  • When you enter cargo new hello_cargo, it creates the package hello_cargo, and this is described in Cargo.toml file.
    • We have a package that only contains src/main.rs, meaning it only contains a binary crate named hello_cargo.

Crates

  • A crate is a binary or library.
  • The crate root is a source file that the Rust compiler starts from and makes up the root module of your crate.

Sample: Cargo.toml

[package]
name = "hello_cargo"
version = "0.1.0"
authors = ["atlex <itsme@myemail.com>"]
edition = "2018"

# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html

[dependencies]

Conventions

  • src/main.rs is the crate root of a binary crate with the same name as the package.
  • If the package directory contains src/lib.rs, the package contains a library crate with the same name as the package, and src/lib.rs is its crate root.
  • If a package contains src/main.rs and src/lib.rs, it has two crates: a library and a binary, both with the same name as the package.
  • A package can have multiple binary crates by placing files in the src/bin directory.
  • A crate will group related functionality together in a scope so the functionality is easy to share between multiple projects.

My note: in terms of a crate, library and binary can be regarded as different elements.

My summary of chapter 7

The section was too verbose for me when it comes to practice, so I summarized some practical memo.

  • If you want to create a binary, you need src/main.rs.
    • src/main.rs handles running the program, and src/lib.rs handles all the logic of the task at hand. We can see how it works in the section 12.3.
  • You can create a module by creating src/{{ name_of_module }}.rs or src/{{ name_of_module }}/mod.rs.
    • it is not allowed to have both file.
  • mod {{ name_of_module }}; imports the module.

For details:

7.2 Defining Modules to Control Scope and Privacy

  • By using modules, we can group related definitions together and name why they’re related.
  • The use keyword brings a path into scope. Written in a parent code
  • The pub keyword to make items public. Written in child code
  • Privacy of an item is whether item can be used by outside code (public) or is an internal implementation detail and not available for outside use (private).

Sample

cargo new --lib restraunt

restraunt
├── Cargo.toml
└── src
    └── lib.rs

Write lib.rs as follows.

mod front_of_house {
    mod hosting {
        fn add_to_waitlist() {}
        fn seat_at_table() {}
    }

    mod serving {
        fn take_order() {}
        fn serve_order() {}
        fn take_payment() {}
    }
}

Module tree in this example

crate
 └── front_of_house
     ├── hosting
        ├── add_to_waitlist
        └── seat_at_table
     └── serving
         ├── take_order
         ├── serve_order
         └── take_payment
  • If module A is contained inside module B, we say that module A is the child of module B and that module B is the parent of module A.
  • The module tree might remind you of the filesystem’s directory tree on your computer; this is a very apt comparison! Just like directories in a filesystem, you use modules to organize your code. And just like files in a directory, we need a way to find our modules.

7.3 Paths for Referring to an Item in the Module Tree

  • If we want to call a function, we need to know its path.
  • A path can take two forms:
    • An absolute path starts from a crate root (like src/lib.rs) by using a crate name or a literal crate.
    • A relative path starts from the current module and uses self, super, or an identifier in the current module.
  • Both absolute and relative paths are followed by one or more identifiers separated by double colons (::).
  • Our preference is to specify absolute paths because it’s more likely to move code definitions and item calls independently of each other.
  • Rust’s privacy boundary: the line that encapsulates the implementation details external code isn’t allowed to know about, call, or rely on. So, if you want to make an item like a function or struct private, you put it in a module.
  • The way privacy works in Rust is that all items (functions, methods, structs, enums, modules, and constants) are private by default.
  • Items in a parent module can’t use the private items inside child modules, but items in child modules can use the items in their ancestor modules.
  • Making the module public doesn’t make its contents public.
Sample

src/lib.rs

mod front_of_house {
    pub mod hosting {
        pub fn add_to_waitlist() {}
    }
}

pub fn eat_at_restaurant() {
    // Absolute path
    crate::front_of_house::hosting::add_to_waitlist();

    // Relative path
    front_of_house::hosting::add_to_waitlist();
}
  • We can also construct relative paths that begin in the parent module by using super at the start of the path. This is like starting a filesystem path with the .. syntax.
fn serve_order() {}

mod back_of_house {
    fn fix_incorrect_order() {
        cook_order();
        super::serve_order();
    }

    fn cook_order() {}
}
  • We think the back_of_house module and the serve_order function are likely to stay in the same relationship to each other and get moved together should we decide to reorganize the crate’s module tree. Therefore, we used super so we’ll have fewer places to update code in the future if this code gets moved to a different module.

  • If we use pub before a struct definition, we make the struct public, but the struct’s fields will still be private. We can make each field public or not on a case-by-case basis.

mod back_of_house {
    pub struct Breakfast {
        pub toast: String,
        seasonal_fruit: String,
    }

    impl Breakfast {
        pub fn summer(toast: &str) -> Breakfast {
            Breakfast {
                toast: String::from(toast),
                seasonal_fruit: String::from("peaches"),
            }
        }
    }
}
pub fn eat_at_restaurant() {
    let mut meal = back_of_house::Breakfast::summer("Rye");
    meal.toast = String::from("Wheat");
    println!("I'd like {} toast please", meal.toast);

}

We’ve defined a public back_of_house::Breakfast struct with a public toast field but a private seasonal_fruit field. This models the case in a restaurant where the customer can pick the type of bread that comes with a meal, but the chef decides which fruit accompanies the meal based on what’s in season and in stock. The available fruit changes quickly, so customers can’t choose the fruit or even see which fruit they’ll get.

  • In contrast, if we make an enum public, all of its variants are then public. We only need the pub before the enum keyword.
mod back_of_house {
    pub enum Appetizer {
        Soup,
        Salad,
    }
}

pub fn eat_at_restaurant() {
    let order1 = back_of_house::Appetizer::Soup;
    let order2 = back_of_house::Appetizer::Salad;
}

7.4 Bringing Paths into Scope with the use Keyword

  • We can bring a path into a scope once and then call the items in that path as if they’re local items with the use keyword.
mod front_of_house {
    pub mod hosting {
        pub fn add_to_waitlist() {}
    }
}

use crate::front_of_house::hosting;
//or
//use self::front_of_house::hosting;

pub fn eat_at_restaurant() {
    hosting::add_to_waitlist();
    hosting::add_to_waitlist();
    hosting::add_to_waitlist();
}

Creating Idiomatic use Paths convention

The following use is bad.

use crate::front_of_house::hosting::add_to_waitlist;

pub fn eat_at_restaurant() {
    add_to_waitlist();
    add_to_waitlist();
    add_to_waitlist();
}

We don’t know in which scope add_to_waitlist comes from?

Another snippet which has the same probelm (bad).

use std::fmt::Result;
use std::io::Result as IoResult;
  • When we bring a name into scope with the use keyword, the name available in the new scope is private. -> pub use is called re-exporting, and with this syntax an external code also use them.

  • Note that the standard library (std) is also a crate that’s external to our package. Because the standard library is shipped with the Rust language, we don’t need to change Cargo.toml to include std. But we do need to refer to it with use to bring items from there into our package’s scope.

  • Here are smart ways to use.

// old
//use std::cmp::Ordering;
//use std::io;

// New!
use std::{cmp::Ordering, io};

// How about this?
//use std::io;
//use std::io::Write;

// Here!
use std::io::{self, Write};
  • If we want to bring all public items defined in a path into scope, we can specify that path followed by *, the glob operator:
use std::collections::*;

The glob operator is often used when testing to bring everything under test into the tests module.

7.5 Separating Modules into Different Files

src/lib.rs

mod front_of_house;
pub use crate::front_of_house::hosting;
// --snip--

src/front_of_house.rs

pub mod hosting {
    pub fn add_to_waitlist() {
        // --snip--
    }
}

Using a semicolon after mod front_of_house rather than using a block tells Rust to load the contents of the module from another file with the same name as the module.

My note: sample of an available depth structure

  1. src/lib.rs: pub use crate::front_of_house::hosting
  2. src/front_of_house.rs: pub mod hosting;
  3. src/front_of_house/hosting.rs: pub fn add_to_waitlist() {}

8. Common Collections

  • Collections: a number of very useful data structures included in Rust’s standard library.
  • The data these collections point to is stored on the heap, which means the amount of data does not need to be known at compile time and can grow or shrink as the program runs.
  • Three main collections: vector, string, hashmap

8.1 Storing Lists of Values with Vectors

Vector

  • Vec<T>
  • Vectors can only store values of the same type.
  • How to create a new empty vector:
    let v: Vec<i32> = Vec::new();
    
  • Rust can infer the type.
  • Rust provides the vec! macro for convenience. The macro will create a new vector that holds the values you give it.
    let v = vec![1, 2, 3];
    
  • Updating a vector (input a value to a vector) -> push
    let mut v = Vec::new();
    
    v.push(5);
    
  • A vector is freed when it goes out of scope. When the vector gets dropped, all of its contents are also dropped, meaning those integers it holds will be cleaned up.
  • There are two ways to read an element. &v[2] and v.get(2).
  • &v[2] returns the value, and v.get(2) returns Option<&T>.
  • &v[100] will cause the program to panic when it references a nonexistent element (i.e. there is no 100th element in v). When the get method is passed an index that is outside the vector, it returns None without panicking.
  • You would use get method if accessing an element beyond the range of the vector happens occasionally under normal circumstances.

Sample code of v.get():

let v = vec![1, 2, 3, 4, 5];

let third: &i32 = &v[2];
println!("The third element is {}", third);

match v.get(2) {
    Some(third) => println!("The third element is {}", third),
    None => println!("There is no third element."),
}
  • Mutability of elements: The following code returne compile error at line v.push(6);.
    let mut v = vec![1, 2, 3, 4, 5];
    let first = &v[0]; //immutable borrow
    v.push(6); //mutable borrow
    println!("The first element is: {}", first); // immutable borrow
    
    • Details about the error: If there isn’t enough room to put all the elements next to each other where the vector currently is. In that case, the reference to the first element would be pointing to deallocated memory. The borrowing rules prevent programs from ending up in that situation.
  • Note. push and pop method operate at the last element the vector.

Iterating over the Values in a Vector

// Just referencing
let v = vec![100, 32, 57];
for i in &v {
    println!("{}", i);
}

// Change elements
let mut v = vec![100, 32, 57];
for i in &mut v {
    *i += 50;
}

*i is called “dereference operator”. (Details are in Chapter 15)

There are definitely use cases for needing to store a list of items of different types. -> enum!!

enum SpreadsheetCell {
    Int(i32),
    Float(f64),
    Text(String),
}

let row = vec![
    SpreadsheetCell::Int(3),
    SpreadsheetCell::Text(String::from("blue")),
    SpreadsheetCell::Float(10.12),
];

8.2 Storing UTF-8 Encoded Text with Strings

  • Rust has only one string type in the core language, which is the string slice str that is usually seen in its borrowed form &str.
  • When Rustaceans refer to “strings” in Rust, they usually mean the String and the string slice &str types, not just one of those types.
  • Both String and a string slice &str are UTF-8 encoded.
  • We use the to_string method, which is available on any type that implements the Display trait, as string literals do.
  • Using the to_string method to create a String from a string literal.
    // the method works on a literal directly:
    let s = "initial contents".to_string();
    // same as
    let s = String::from("initial contents");
    
  • We can grow a String by using the push_str method to append a string.
    let mut s = String::from("foo");
    s.push_str("bar");
    // s ~ "foobar"
    
  • The push_str method takes a string slice because we don’t necessarily want to take ownership of the parameter. Therefore, the following codes returns s2 is bar, not a compile error.
    let mut s1 = String::from("foo");
    let s2 = "bar";
    s1.push_str(s2); // push_str() don't take ownership of s2
    println!("s2 is {}", s2);
    

Concatenation with the + Operator or the format! Macro

The following code contains a lot of knowledge.

let s1 = String::from("Hello, ");
let s2 = String::from("world!");
let s3 = s1 + &s2;

Before discussing about the code above, we should know that the + operator uses the add method, whose “signature” looks something like this (but isn’t exact):

fn add(self, s: &str) -> String {

Two discussions: let s3 = s1 + &s2;

  1. s3 takes ownership of s1. s1 becomes self of the add function.
  2. The + operator uses the add method, whose input is &str, not &String. The reason we’re able to use &s2 in the call to add is that the compiler can coerce the &String argument into a &str. When we call the add method, Rust uses a deref coercion, which here turns &s2 into &s2[..].
  • Tip: Append multiple Strings. With format! macro.
    let s1 = String::from("tic");
    let s2 = String::from("tac");
    let s3 = String::from("toe");
    
    let s = format!("{}-{}-{}", s1, s2, s3);
    
    The version of the code using format! is much easier to read and doesn’t take ownership of any of its parameters. format! macro works in the same way as println!, but instead of printing the output to the screen, it returns a String with the contents.

Indexing into Strings

Rust doesn’t allow us to get n-th charactor with the index. The following code returns a compile error.

let s1 = String::from("hello");
let h = s1[0];

The reason is…?

  • A String is a wrapper over a Vec<u8>.
    • Both String and a string slice &str are UTF-8 encoded.
  • In some languages, a character could be sepreated into two parts (in terms of UFT-8), like,
    // The u8 values of the String
    [224, 164, 168, 224, 164, 174, 224, 164, 184, 224, 165, 141, 224, 164, 164, 224, 165, 135]
    // is same as the character set
    ['न', 'म', 'स', '्', 'त',  'े']
    // is same in the letter
    ["न", "म", "स्", "ते"]
    

Slicing Strings

Example 1. specifing by the number of bytes

let hello = "Здравствуйте";
let s = &hello[0..4];
// s will be Зд
// &hello[0..1] returns panic!
// thread 'main' panicked at 'byte index 1 is not a char boundary;

Example 2. specifing by charactors.

for c in "नमस्ते".chars() {
    println!("{}", c);
}
// न
// म
// स
// ्
// त
// े

Example 3. deviding in bytes.

for b in "नमस्ते".bytes() {
    println!("{}", b);
}
//224
//164
//// --snip--
//165
//135

Be sure to remember that valid Unicode scalar values may be made up of more than 1 byte.

8.3 Storing Keys with Associated Values in Hash Maps

  • Terminology: hash ~ map ~ hash table ~ dictionary ~ associative array
  • Hashmap ~ key-value

Example:

use std::collections::HashMap;

let mut scores = HashMap::new();

scores.insert(String::from("Blue"), 10);
  • The type HashMap<K, V> stores a mapping of keys of type K to values of type V.
  • Just like vectors, hash maps store their data on the heap.
  • Like vectors, hash maps are homogeneous: all of the keys must have the same type, and all of the values must have the same type.
  • .insert takes ownerships of the variables.

Example: Combining two Vec into a HashMap.

use std::collections::HashMap;

let teams = vec![String::from("Blue"), String::from("Yellow")];
let initial_scores = vec![10, 50];

let mut scores: HashMap<_, _> =
    teams.into_iter().zip(initial_scores.into_iter()).collect();

Accessing Values in a Hash Map

Done by get method.

use std::collections::HashMap;

let mut scores = HashMap::new();

scores.insert(String::from("Blue"), 10);
scores.insert(String::from("Yellow"), 50);

let team_name = String::from("Blue");
let score = scores.get(&team_name);

Note that the result of scores.get(&team_name) is Some(&10) because get returns an Option<&V>; if there’s no value for that key in the hash map, get will return None.

Iteration

use std::collections::HashMap;

let mut scores = HashMap::new();

scores.insert(String::from("Blue"), 10);
scores.insert(String::from("Yellow"), 50);

for (key, value) in &scores {
    println!("{}: {}", key, value);
}

Update a value (3 types)

Case 1. Overwriting a value. insert simply because an HashMap has a unique key.

use std::collections::HashMap;

let mut scores = HashMap::new();

scores.insert(String::from("Blue"), 10);
scores.insert(String::from("Blue"), 25);

println!("{:?}", scores); // {"Blue": 25}

Case 2. Only inserting a value if the key has no value. or_insert method.

use std::collections::HashMap;

let mut scores = HashMap::new();
scores.insert(String::from("Blue"), 10);

scores.entry(String::from("Yellow")).or_insert(50);
scores.entry(String::from("Blue")).or_insert(50);

println!("{:?}", scores); // {"Yellow": 50, "Blue": 10}

entry method returns an enum called Entry that represents a value that might or might not exist.

Case 3. Updating a value based on the old value. Use dereference (before understanding Chap. 15, just notice about dereference *)

use std::collections::HashMap;

fn main(){
    let text = "hello world wonderful world";

    let mut map = HashMap::new();

    for word in text.split_whitespace() {
        let count = map.entry(word).or_insert(0);
        *count += 1;
    }
    println!("{:?}", map);
}

The or_insert method actually returns a mutable reference (&mut V) to the value for this key. Here we store that mutable reference in the count variable, so in order to assign to that value, we must first dereference count using the asterisk (*).

Hashing Functions

For Hashing algorithm, Rust uses SipHash as of Apr. 2021.

My note: a slide about SipHash.

https://de.slideshare.net/ASF-WS/asfws2012-jean-philippeaumassonmartinbosslethashfloodingdosreloaded1

9. Error Handling

Rust groups errors into two major categories: recoverable and unrecoverable errors.

  • For a recoverable error, such as a file not found error, it’s reasonable to report the problem to the user and retry the operation.
  • Unrecoverable errors are always symptoms of bugs, like trying to access a location beyond the end of an array.

Rust doesn’t have exceptions. Instead, it has the type Result<T, E> for recoverable errors and the panic! macro that stops execution when the program encounters an unrecoverable error.

9.1 Unrecoverable Errors with panic!

  • When the panic! macro executes, your program will print a failure message, unwind and clean up the stack, and then quit.

There are two type of panic, unwinding and abort.

  • Unwinding: Rust walks back up the stack and cleans up the data from each function it encounters.
  • Abort: Memory that the program was using will then need to be cleaned up by the operating system.

Generally the walking back and cleanup in unwinding is a lot of work. Abort is an alternative.

Panic example: Buffer overread

fn main() {
    let v = vec![1, 2, 3];

    v[99];
}

The key to reading the backtrace is to start from the top and read until you see files you wrote. RUST_BACKTRACE=1 cargo run

9.2 Recoverable Errors with Result

Recall Result enum.

enum Result<T, E> {
    Ok(T),
    Err(E),
}
  • <T, E> means “T and E are generic type parameters”.

A good error handling example: Open file.

use std::fs::File;

fn main() {
    let f = File::open("hello.txt");

    let f = match f {
        Ok(file) => file,
        Err(error) => panic!("Problem opening the file: {:?}", error),
    };
}

Run without the file hello.txt.

$ cargo run
... (warning about _f)
    Finished dev [unoptimized + debuginfo] target(s) in 0.00s
     Running `target/debug/panic`
thread 'main' panicked at 'Problem opening the file: Os { code: 2, kind: NotFound, message: "No such file or directory" }', src/main.rs:8:23
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

Example: switch operations by a type of errors.

use std::fs::File;
use std::io::ErrorKind;

fn main() {
    let f = File::open("hello.txt");

    let f = match f {
        Ok(file) => file,
        Err(error) => match error.kind() {
            ErrorKind::NotFound => match File::create("hello.txt") {
                Ok(fc) => fc,
                Err(e) => panic!("Problem creating the file: {:?}", e),
            },
            other_error => {
                panic!("Problem opening the file: {:?}", other_error)
            }
        },
    };
}

It’s more sophicicated because there is no match expression.

unwrap_or_else: Implemented in Option<T>. Returns the contained Some value or computes it from a closure.

use std::fs::File;
use std::io::ErrorKind;

fn main() {
    let f = File::open("hello.txt").unwrap_or_else(|error| {
        if error.kind() == ErrorKind::NotFound {
            File::create("hello.txt").unwrap_or_else(|error| {
                panic!("Problem creating the file: {:?}", error);
            })
        } else {
            panic!("Problem opening the file: {:?}", error);
        }
    });
}

unwrap <- Used frequent (IMO)

The Result<T, E> type has many helper methods defined on it to do various tasks. One of those methods, called unwrap, is a shortcut method that is implemented just like the match expression. If the Result value is the Ok variant, unwrap will return the value inside the Ok. If the Result is the Err variant, unwrap will call the panic! macro for us.

use std::fs::File;

fn main() {
    let f = File::open("hello.txt").unwrap();
}

expect

Similar to unwrap, but it lets us also choose the panic! error message.

use std::fs::File;

fn main() {
    let f = File::open("hello.txt").expect("Failed to open hello.txt");
}

? operator

use std::fs::File;
use std::io;
use std::io::Read;

fn read_username_from_file() -> Result<String, io::Error> {
    let mut f = File::open("hello.txt")?;
    let mut s = String::new();
    f.read_to_string(&mut s)?;
    Ok(s)
}

The ? placed after a Result value is defined to work

  • If the value of the Result is an Ok, the value inside the Ok will get returned from this expression, and the program will continue.
  • If the value is an Err, the Err will be returned from the whole function so the error value gets propagated to the calling code.

Error values that have the ? operator called on them go through the from function, defined in the From trait in the standard library, which is used to convert errors from one type into another.

The ? operator can be used in functions that have a return type of Result. We’re only allowed to use the ? operator in a function that returns Result or Option or another type that implements std::ops::Try. When you’re writing code in a function that doesn’t return one of these types, and you want to use ? when you call other functions that return Result<T, E>, one technique is to change the return type of your function to be Result<T, E> if you have no restrictions preventing that.

The main function is special, and there are restrictions on what its return type must be. One valid return type for main is (), and conveniently, another valid return type is Result<T, E>.

use std::error::Error;
use std::fs::File;

fn main() -> Result<(), Box<dyn Error>> {
    let f = File::open("hello.txt")?;

    Ok(())
}

For now, you can read Box<dyn Error> to mean “any kind of error.”

Tip: Reading a file into a string

Rust provides the convenient fs::read_to_string function that opens the file, creates a new String, reads the contents of the file, puts the contents into that String, and returns it.

use std::fs;
use std::io;

fn read_username_from_file() -> Result<String, io::Error> {
    fs::read_to_string("hello.txt")
}

9.3 To panic! or Not to panic!

Returning Result is a good default choice when you’re defining a function that might fail. (My note: user can handle errors. panic! stop the program!)

The unwrap and expect methods are very handy when prototyping, before you’re ready to decide how to handle errors.

In test phase, panic! is how a test is marked as a failure. (My note: single panic = fail of a whole test)

panic! is often appropriate if you’re calling external code that is out of your control and it returns an invalid state that you have no way of fixing. However, when failure is expected, it’s more appropriate to return a Result than to make a panic! call.

Functions often have contracts: their behavior is only guaranteed if the inputs meet particular requirements. Panicking when the contract is violated makes sense because a contract violation always indicates a caller-side bug and it’s not a kind of error you want the calling code to have to explicitly handle. … Contracts for a function, especially when a violation will cause a panic, should be explained in the API documentation for the function.

My note: for validation, use Rust’s type system.

Creating Custom Types for Validation

We can make a new type and put the validations in a function to create an instance of the type rather than repeating the validations everywhere. That way, it’s safe for functions to use the new type in their signatures and confidently use the values they receive.

Example:

pub struct Guess {
    value: i32,
}

impl Guess {
    pub fn new(value: i32) -> Guess {
        if value < 1 || value > 100 {
            panic!("Guess value must be between 1 and 100, got {}.", value);
        }

        Guess { value }
    }

    pub fn value(&self) -> i32 {
        self.value
    }
}

pub fn value(&self) -> i32 is called getter. This public method is necessary because the value field of the Guess struct is private.

10. Generic Types, Traits, and Lifetimes

Generics are abstract stand-ins for concrete types or other properties.

Similar to the way a function takes parameters with unknown values to run the same code on multiple concrete values, functions can take parameters of some generic type instead of a concrete type, like i32 or String.

The core concept is “removing duplication by extracting a function.”

In case of a function:

  1. Identify duplicate code.
  2. Extract the duplicate code into the body of the function and specify the inputs and return values of that code in the function signature.
  3. Update the two instances of duplicated code to call the function instead.

10.1 Generic Data Types

Tips: By convention, parameter names in Rust are short, often just a letter, and Rust’s type-naming convention is CamelCase. Short for “type,” T is the default choice of most Rust programmers.

Motivation

Practice: We combine the two functions below.

fn largest_i32(list: &[i32]) -> &i32 {
    let mut largest = &list[0];

    for item in list {
        if item > largest {
            largest = item;
        }
    }

    largest
}

fn largest_char(list: &[char]) -> &char {
    let mut largest = &list[0];

    for item in list {
        if item > largest {
            largest = item;
        }
    }

    largest
}

First, define a generic function.

fn largest<T>(list: &[T]) -> &T {
  • To define a generic function, place type name declarations inside angle brackets, <>
  • This function has one parameter named list.
  • The list is a slice of values of type T

Example

fn largest<T>(list: &[T]) -> &T {
    let mut largest = &list[0];

    for item in list {
        if item > largest {
            largest = item;
        }
    }

    largest
}

fn main() {
    let number_list = vec![34, 50, 25, 100, 65];

    let result = largest(&number_list);
    println!("The largest number is {}", result);

    let char_list = vec!['y', 'm', 'a', 'q'];

    let result = largest(&char_list);
    println!("The largest char is {}", result);
}

It looks fine, but unfortunately, it returns compile error.

error[E0369]: binary operation `>` cannot be applied to type `&T`
 --> src/main.rs:5:17
  |
5 |         if item > largest {
  |            ---- ^ ------- &T
  |            |
  |            &T
  |
help: consider restricting type parameter `T`
  |
1 | fn largest<T: std::cmp::PartialOrd>(list: &[T]) -> &T {
  |             ^^^^^^^^^^^^^^^^^^^^^^

error: aborting due to previous error

The root cause is, the trait std::cmp::PartialOrd is not implemented to Strings.

The final answer would be as follows, which is covered in the next section.

fn largest<T: PartialOrd + Copy>(list: &[T]) -> T {
    let mut largest = list[0];

    for &item in list {
        if item > largest {
            largest = item;
        }
    }

    largest
}

In Struct Definitions

We can define structs to use a generic type parameter in one or more fields using the <> syntax.

struct Point<T> {
    x: T,
    y: T,
}

fn main() {
    let integer = Point { x: 5, y: 10 };
    let float = Point { x: 1.0, y: 4.0 };
}

To define a Point struct where x and y are both generics but could have different types…

struct Point<T, U> {
    x: T,
    y: U,
}

In Enum Definitions

Remind Option in the Chapter 6.

enum Option<T> {
    Some(T),
    None,
}

Remind Result in the Chapter 9.

enum Result<T, E> {
    Ok(T),
    Err(E),
}

When we use generic types

When you recognize situations in your code with multiple struct or enum definitions that differ only in the types of the values they hold, you can avoid duplication by using generic types instead.

Implementation (In Method Definitions)

impl<T>. By declaring T as a generic type after impl, Rust can identify that the type in the angle brackets in Point is a generic type rather than a concrete type.

struct Point<T> {
    x: T,
    y: T,
}

impl<T> Point<T> {
    fn x(&self) -> &T {
        &self.x
    }
}

fn main() {
    let p = Point { x: 5, y: 10 };

    println!("p.x = {}", p.x());
}

Defined a method named x on Point<T> that returns a reference to the data in the field x.

When we write impl Point<f32>, methods are implemented only to type f32.

Performance of Code Using Generics

The good news is that Rust implements generics in such a way that your code doesn’t run any slower using generic types than it would with concrete types.

Monomorphization

Monomorphization is the process of turning generic code into specific code by filling in the concrete types that are used when compiled.

For example, when Rust compiles the following code, it performs monomorphization.

let integer = Some(5);
let float = Some(5.0);

10.2 Traits: Defining Shared Behavior

A trait tells the Rust compiler about functionality a particular type has and can share with other types.

pub trait Summary {
    fn summarize(&self) -> String;
}

Interpret as “any type that has the Summary trait will have the method summarize.”

Implementing the trait on a type

pub struct NewsArticle {
    pub headline: String,
    pub location: String,
    pub author: String,
    pub content: String,
}

impl Summary for NewsArticle {
    fn summarize(&self) -> String {
        format!("{}, by {} ({})", self.headline, self.author, self.location)
    }
}

How to use traits to define functions that accept many different types.

pub fn notify(item: &impl Summary) {
    println!("Breaking news! {}", item.summarize());
}

Instead of a concrete type for the item parameter, we specify the impl keyword and the trait name. This parameter accepts any type that implements the specified trait.

Trait Bound Syntax

The above is actually syntax sugar for a longer form,

pub fn notify<T: Summary>(item: &T) {
    println!("Breaking news! {}", item.summarize());
}

Multi input.

// differenct type
pub fn notify(item1: &impl Summary, item2: &impl Summary)
// same type
pub fn notify<T: Summary>(item1: &T, item2: &T)

Specifying Multiple Trait Bounds with the + Syntax

We specify in the notify definition that item must implement both Display and Summary. We can do so using the + syntax:

pub fn notify(item: &(impl Summary + Display)) {...
//or
pub fn notify<T: Summary + Display>(item: &T) {...

where clause

More readable, less cluttered.

fn some_function<T, U>(t: &T, u: &U) -> i32
    where T: Display + Clone,
          U: Clone + Debug
{

// is equal to
fn some_function<T: Display + Clone, U: Clone + Debug>(t: &T, u: &U) -> i32 {

Returning Types that Implement Traits

fn returns_summarizable() -> impl Summary {
    Tweet {
        username: String::from("horse_ebooks"),
        content: String::from(
            "of course, as you probably already know, people",
        ),
        reply: false,
        retweet: false,
    }
}

By using impl Summary for the return type, we specify that the returns_summarizable function returns some type that implements the Summary trait without naming the concrete type.

However, you can only use impl Trait if you’re returning a single type.

A simple example of trait

Here is the answer of the problem which arrosed at the beginning of this section.

fn largest<T: PartialOrd + Copy>(list: &[T]) -> T {
    let mut largest = list[0];

    for &item in list {
        if item > largest {
            largest = item;
        }
    }

    largest
}

fn main() {
    let number_list = vec![34, 50, 25, 100, 65];

    let result = largest(&number_list);
    println!("The largest number is {}", result);

    let char_list = vec!['y', 'm', 'a', 'q'];

    let result = largest(&char_list);
    println!("The largest char is {}", result);
}

Implementations of a trait on any type

custom type ~ struct or enum or etc.

impl<T: Display> ToString for T {
    // --snip--
}

My note: Trait, associated function, method

In “Rust by example”, there are good examples of associated function & methods.

In the reference document

Associated functions whose first parameter is named self are called methods and may be invoked using the method call operator, for example, x.foo(), as well as the usual function call notation.

cf. Instance methods are also stored in

https://stackoverflow.com/questions/8376953/how-are-instance-methods-stored

https://stackoverflow.com/questions/34149386/are-static-methods-always-held-in-memory

#![allow(unused)]
fn main() {
    struct Example {
        number: i32,
    }

    impl Example {
        fn boo() {
            println!("boo! Example::boo() was called!");
        }

        fn add_nuber(&mut self) {
            self.number += 1;
        }

        fn get_number(&self) -> i32 {
            self.number
        }
    }

    trait Thingy {
        fn do_thingy(&self);
    }

    impl Thingy for Example {
        fn do_thingy(&self) {
            println!("doing a thing! also, number is {}!", self.number);
        }
    }

    // Test it
    let mut dummy = Example{number: 2};
    Example::boo(); // boo! Example::boo() was called!
    println!("A number of the instance dummy is {:?}",dummy.get_number()); // A number of the instance dummy is 2
    dummy.do_thingy(); // doing a thing! also, number is 2!
    //dummy.boo(); //error!
}

Traits provide us total abstraction and loose coupling.

10.3 Validating References with Lifetimes

Every reference in Rust has a lifetime, which is the scope for which that reference is valid.

Dangling reference: a reference to an object that no longer exists.

The simplest example: println!("r: {}", r); is a dangling reference, so Rust compiler returns a compile error:

fn main() {
    {
        let r;                // ---------+-- 'a
                              //          |
        {                     //          |
            let x = 5;        // -+-- 'b  |
            r = &x;           //  |       |
        }                     // -+       |
                              //          |
        println!("r: {}", r); //          |
    }                         // ---------+
}

'a and 'b mean the lifetimes of r and x, respectively. Because its scope is larger, we say that “r lives longer than x.”

The following function returns a compile error:

fn longest(x: &str, y: &str) -> &str {
    if x.len() > y.len() {
        x
    } else {
        y
    }
}

longest function could return x or y. If you use it like let result = longest(string1, string2);, the compile can’t decide the lifetime of string1 or string2.

The reason is, the Rust compiler has a borrow checker that compares scopes to determine whether all borrows are valid. The borrow checker doesn’t know how the lifetimes of x and y relate to the lifetime of the return value of the function longest.

How can we fix it?

Lifetime Annotation Syntax

The names of lifetime parameters must start with an apostrophe (') and are usually all lowercase and very short. Most people use the name 'a. We place lifetime parameter annotations after the & of a reference,

&i32        // a reference
&'a i32     // a reference with an explicit lifetime
&'a mut i32 // a mutable reference with an explicit lifetime

The annotations are meant to tell Rust how generic lifetime parameters of multiple references relate to each other. Multi references!! With this notation, we can specify that the lifetime of x and y are same as follows.

fn longest<'a>(x: &'a str, y: &'a str) -> &'a str {
    if x.len() > y.len() {
        x
    } else {
        y
    }
}

The change means “all the references in the parameters and the return value must have the same lifetime.” In practice, it means that the lifetime of the reference returned by the longest function is the same as the smaller of the lifetimes of the references passed in. Remember, when we specify the lifetime parameters in this function signature, we’re not changing the lifetimes of any values passed in or returned. Rather, we’re specifying that the borrow checker should reject any values that don’t adhere to these constraints.

Ultimately, lifetime syntax is about connecting the lifetimes of various parameters and return values of functions. Once they’re connected, Rust has enough information to allow memory-safe operations and disallow operations that would create dangling pointers or otherwise violate memory safety.

You need to specify lifetime parameters for functions or structs that use references.

Lifetime Elision

The developers programmed these patterns into the compiler’s code so the borrow checker could infer the lifetimes in these situations and wouldn’t need explicit annotations. The patterns programmed into Rust’s analysis of references are called the lifetime elision rules.

Lifetimes on function or method parameters are called input lifetimes, and lifetimes on return values are called output lifetimes.

The 3 rules of the elision:

  1. Each parameter that is a reference gets its own lifetime parameter. A function with one parameter gets one lifetime parameter, and a function with two parameters gets two separate lifetime parameters
  2. If there is exactly one input lifetime parameter, that lifetime is assigned to all output lifetime parameters:
  3. If there are multiple input lifetime parameters, but one of them is &self or &mut self because this is a method, the lifetime of self is assigned to all output lifetime parameters.

Example of the rule 1 and rule 2:

fn first_word(s: &str) -> &str {
// Apply rule 1. Same with
fn first_word<'a>(s: &'a str) -> &str {
// Apply rule 2. Same with
fn first_word<'a>(s: &'a str) -> &'a str {

When we implement methods on a struct with lifetimes, we use the same syntax as that of generic type parameters.

My example

src/main.rs:

struct ImportantExcerpt<'a> {
    part: &'a str,
}

impl<'a> ImportantExcerpt<'a> {
    fn level(&self) -> i32 {
        3
    }
}

impl<'a> ImportantExcerpt<'a> {
    fn announce_and_return_part(&self, announcement: &str) -> &str {
        println!("Attention please: {}", announcement);
        self.part
    }
}

fn main () {
    let s1 = String::from("test1");
    let mut s2 = String::from("test2");
    let a = ImportantExcerpt{
            part: s1.as_str()
        };

    println!("{}",a.part);                    // test1
    a.announce_and_return_part(s2.as_str());  // Attention please: test2
    s2 = String::from("new test2");
    println!("{}",s2);                        // new test2
    a.announce_and_return_part(s2.as_str());  //Attention please: new test2
    println!("{}",a.level());                 // 3
}

And result:

➜ cargo run
   Compiling te v0.1.0 (/home/atlex00/rust-project/test)
    Finished dev [unoptimized + debuginfo] target(s) in 0.15s
     Running `target/debug/test`
test1
Attention please: test2
new test2
Attention please: new test2
3

The Static Lifetime

One special lifetime we need to discuss is 'static, which means that this reference can live for the entire duration of the program. All string literals have the 'static lifetime,

let s: &'static str = "I have a static lifetime.";
// Same as
let s = "I have a static lifetime.";

The text of this string is stored directly in the program’s binary, which is always available. Therefore, the lifetime of all string literals is 'static.

During learning tokio framework, I realized that it is Common Rust Lifetime Misconceptions.

I need to tell the difference between static variables and static lifetime.

Well yes, but a type with a 'static lifetime is different from a type bounded by a 'static lifetime. … T: 'static includes all &'static T however it also includes all owned types, like String, Vec, etc. The owner of some data is guaranteed that data will never get invalidated as long as the owner holds onto it, therefore the owner can safely hold onto the data indefinitely long, including up until the end of the program. … Key Takeaways

  • T: 'static should be read as “T is bounded by a 'static lifetime”
  • if T: 'static then T can be a borrowed type with a 'static lifetime or an owned type
  • since T: 'static includes owned types that means T
    • can be dynamically allocated at run-time
    • does not have to be valid for the entire program
    • can be safely and freely mutated
    • can be dynamically dropped at run-time
    • can have lifetimes of different durations

static as a trait bound is described in the official Rust by example.

Generic Type Parameters, Trait Bounds, and Lifetimes Together

Just an example:

fn main() {
    let string1 = String::from("abcd");
    let string2 = "xyz";

    let result = longest_with_an_announcement(
        string1.as_str(),
        string2,
        "Today is someone's birthday!",
    );
    println!("The longest string is {}", result);
}

use std::fmt::Display;

fn longest_with_an_announcement<'a, T>(
    x: &'a str,
    y: &'a str,
    ann: T,
) -> &'a str
where
    T: Display,
{
    println!("Announcement! {}", ann);
    if x.len() > y.len() {
        x
    } else {
        y
    }
}
➜ cargo run
   Compiling te v0.1.0 (/home/atlex00/rust-project/test)
    Finished dev [unoptimized + debuginfo] target(s) in 0.16s
     Running `target/debug/test`
Announcement! Today is someone's birthday!
The longest string is abcd

11. Writing Automated Tests

11.1 How to Write Tests

A test is done by,

  1. Set up any needed data or state.
  2. Run the code you want to test.
  3. Assert the results are what you expect.

Attribute

Attributes are metadata about pieces of Rust code. For example, derive is one of the attributes.

#[derive(Debug)]
struct Rectangle {
    width: u32,
    height: u32,
}

To change a function into a test function, add #[test] on the line before fn. To test, run cargo test. When we make a new library project with Cargo, a test module with a test function in it is automatically generated for us.

#[test] annotation

This is the default test file.

#[cfg(test)]
mod tests {
    #[test]
    fn it_works() {
        assert_eq!(2 + 2, 4);
    }
}

#[test] attribute indicates fn it_works is a test function.

Run the test:

$ cargo test
    Finished test [unoptimized + debuginfo] target(s) in 0.00s
     Running target/debug/deps/adder-6f6d09e2972de52b

running 1 test
test tests::it_works ... ok

test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out

   Doc-tests adder

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out

tests::it_works is the name of the generated test function. Note that measured is a result of benchmark test.

Sideway: Benchmark in Rust

Because the benchmark feature isn’t available in the stable channel, you should if you want to use benchmark feature.

https://doc.rust-lang.org/unstable-book/library-features/test.html

rustup install nightly

You should install nightly channel, unless you’ll get an error like,

$ cargo bench
   Compiling adder v0.1.0 (/home/atlex00/rust-projects/adder)
error[E0554]: `#![feature]` may not be used on the stable release channel
 --> src/lib.rs:1:1
  |
1 | #![feature(test)]
  | ^^^^^^^^^^^^^^^^^

error: aborting due to previous error

src/lib.rs

#![feature(test)]

extern crate test;

pub fn add_two(a: i32) -> i32 {
    a + 2
}

#[cfg(test)]
mod tests {
    use super::*;
    use test::Bencher;

    #[test]
    fn it_works() {
        assert_eq!(4, add_two(2));
    }

    #[bench]
    fn bench_add_two(b: &mut Bencher) {
        b.iter(|| add_two(2));
    }
}

Run a benchmark:

$ cargo +nightly bench
   Compiling adder v0.1.0 (/home/atlex/rust-projects/adder)
    Finished bench [optimized] target(s) in 0.60s
     Running unittests (target/release/deps/adder-8d2056bd46123ee2)

running 2 tests
test tests::it_works ... ignored
test tests::bench_add_two ... bench:           0 ns/iter (+/- 0)

test result: ok. 0 passed; 0 failed; 1 ignored; 1 measured; 0 filtered out; finished in 1.08s

Rust runs our benchmark a number of times, and then takes the average.

about Doc-tests

We’ll learn about it in Chapter 14, but in a nut shell,

  • Triple slash /// is a special comment, called Documentation comment.
  • /// supports Markdown notation.
  • Functions in a documentation comments are tested automatically.

assert! macro

We give the assert! macro an argument that evaluates to a Boolean. If the value is true, assert! does nothing and the test passes. If the value is false, the assert! macro calls the panic! macro, which causes the test to fail. You can put second parameter for a custom asserting message.

assert_eq! and assert_ne!

Under the surface, the assert_eq! and assert_ne! macros use the operators == and !=, respectively. The values being compared must implement the PartialEq and Debug traits.

Derivable Traits

https://doc.rust-lang.org/book/appendix-03-derivable-traits.html

The derive attribute generates code that will implement a trait with its own default implementation on the type you’ve annotated with the derive syntax.

should_panic attribute

This attribute makes a test pass if the code inside the function panics.

Example:

pub struct Guess {
    value: i32,
}

impl Guess {
    pub fn new(value: i32) -> Guess {
        if value < 1 || value > 100 {
            panic!("Guess value must be between 1 and 100, got {}.", value);
        }

        Guess { value }
    }
}

#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    #[should_panic]
    fn greater_than_100() {
        Guess::new(200);
    }
}

Tests that use should_panic can be imprecise because they only indicate that the code has caused some panic. Using expected parameter to the should_panic attributes makes the test more precise. expected parameter is a substring of the message which the function panics with.

...
        } else if value > 100 {
            panic!(
                "Guess value must be less than or equal to 100, got {}.",
                value
            );
        }
...
#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    #[should_panic(expected = "Guess value must be less than or equal to 100")]
    fn greater_than_100() {
        Guess::new(200);
    }
}

It returns Ok(()) when the test passes and an Err with a String inside when the test fails.

  • Writing tests so they return a Result<T, E> enables you to use the question mark operator in the body of tests.
  • You can’t use the #[should_panic] annotation on tests that use Result<T, E>. Instead, you should return an Err value directly when the test should fail.

Using Result<T, E> in Tests

#[cfg(test)]
mod tests {
    #[test]
    fn it_works() -> Result<(), String> {
        if 2 + 2 == 4 {
            Ok(())
        } else {
            Err(String::from("two plus two does not equal four"))
        }
    }
}

11.2 Controlling How Tests Are Run

The default behavior of the binary produced by cargo test is to run all the tests in parallel and capture output generated during test runs, preventing the output from being displayed and making it easier to read the output related to the test results.

Because the tests are running at the same time, make sure your tests don’t depend on each other or on any shared state, including a shared environment, such as the current working directory or environment variables.

If you don’t want to run the tests in parallel, use --test-threads option like cargo test -- --test-threads=1. -- here is called “seperator.”

If we want to see printed values for passing tests as well, we can tell Rust to also show the output of successful tests at the end with --show-output.

We can pass the name of any test function to cargo test to run only that test: cargo test {{ the name of function }}, but we can’t specify the names of multiple tests in this way. We can specify part of a test name, and any test whose name matches that value will be run.

Sometimes a few specific tests can be very time-consuming to execute, so you might want to exclude them during most runs of cargo test. Use ignore attribute.

src/lib.rs

#[test]
fn it_works() {
    assert_eq!(2 + 2, 4);
}

#[test]
#[ignore]
fn expensive_test() {
    // code that takes an hour to run
}

If we want to run only the ignored tests, we can use cargo test -- --ignored.

11.3 Test Organization (I should read again when I need it in my project)

The Rust community thinks about tests in terms of two main categories: unit tests and integration tests.

Unit Tests

The convention is to create a module named tests in each file to contain the test functions and to annotate the module with cfg(test). You’ll use #[cfg(test)] to specify that they shouldn’t be included in the compiled result.

Integration Tests

To create integration tests, you first need a tests directory at the top level of our project directory, next to src. Cargo knows to look for integration test files in this directory. We don’t need to annotate any code in tests/integration_test.rs with #[cfg(test)].

Each file in the tests directory is a separate crate, so we need to bring our library into each test crate’s scope.

tests/integration_test.rs in a project adder.

use adder;

#[test]
fn it_adds_two() {
    assert_eq!(4, adder::add_two(2));
}

12. An I/O Project: Building a Command Line Program

In this tutorial, we write a clone of grep command.

12.1 Accepting Command Line Arguments

  • The function std::env::args() returns an iterator of the command line arguments.
  • We can call the collect method on an iterator to turn it into a collection (such a vector).
  • Note: std::env::args() will panic if any argument contains invalid Unicode. For invalid Unicode, use std::env::args_os instead
use std::env;

fn main() {
    let args: Vec<String> = env::args().collect();
    println!("{:?}", args);
}

Result:

$ cargo run 1starg 2ndarg
   Compiling iptables_viewer v0.1.0 (/path/to/your/project)
    Finished dev [unoptimized + debuginfo] target(s) in 0.25s
     Running `target/debug/project-name 1starg 2ndarg`
["target/debug/project-name", "1starg", "2ndarg"]
  • The first value in the vector is target/debug/project-name, which is the name of our binary.
  • The first argument is reffered as &args[1] in the program.
  • The Type of arguments is &str.

12.2 Reading a File

The following snippet would be refactored in the next section 12.3.

use std::fs;
let contents = fs::read_to_string(filename)
        .expect("Something went wrong reading the file");
println!("With text:\n{}", contents);
  • fs::read_to_string takes the filename, opens that file, and returns a Result<String> of the file’s contents.

12.3 Refactoring to Improve Modularity and Error Handling

I’ve learned general programming concepts in this chapter.

In a nutshell: main.rs handles running the program, and lib.rs handles all the logic of the task at hand.

Here are the reasons:

  1. If we continue to grow our program inside main, the number of separate tasks the main function handles will increase.
  2. The more variables we have in scope, the harder it will be to keep track of the purpose of each. It’s best to group the configuration variables into one structure to make their purpose clear.
  3. The error message Something went wrong reading the file is not clear.
  4. It would be best if all the error-handling code were in one place so future maintainers had only one place to consult in the code if the error-handling logic needed to change.
  • The Rust community has developed a process to use as a guideline for splitting the separate concerns of a binary program when main starts getting large.
    • Split your program into a main.rs and a lib.rs and move your program’s logic to lib.rs.
    • As long as your command line parsing logic is small, it can remain in main.rs.
    • When the command line parsing logic starts getting complicated, extract it from main.rs and move it to lib.rs.
  • The responsibilities that remain in the main function after this process should be limited to the following:
    • Calling the command line parsing logic with the argument values
    • Setting up any other configuration
    • Calling a run function in lib.rs
    • Handling the error if run returns an error

Based on this best practices, we can do

  • Extracting the argument parser (prse_config function)
  • Grouping configuration values (Config struct)

Note: Using primitive values when a complex type would be more appropriate is an anti-pattern known as primitive obsession.

This is the refactored version:

use std::env;
use std::fs;

fn main() {
    let args: Vec<String> = env::args().collect();

    let config = parse_config(&args);

    println!("Searching for {}", config.query);
    println!("In file {}", config.filename);

    let contents = fs::read_to_string(config.filename)
        .expect("Something went wrong reading the file");

    println!("With text:\n{}", contents);
}

struct Config {
    query: String,
    filename: String,
}

fn parse_config(args: &[String]) -> Config {
    let query = args[1].clone();
    let filename = args[2].clone();

    Config { query, filename }
}

If you create a file foo.txt:

➜ cargo run ar1 foo.txt
    Finished dev [unoptimized + debuginfo] target(s) in 0.00s
     Running `target/debug/minigrep ar1 foo.txt`
Searching for ar1
In file foo.txt
With text:
I'm in foo.txt.

There’s a tendency among many Rustaceans to avoid using clone to fix ownership problems because of its runtime cost. We will learn more efficient way in Chapter 13.

The next improvements are:

  • Creating the parse_config as a constructor. Making this change will make the code more idiomatic.
  • Improving the error handling.
  • Returning a Result from constructor instead of calling panic!, so that main function can exit the process more cleanly in the error case.
impl Config {
    fn new(args: &[String]) -> Result<Config, &str> {
        if args.len() < 3 {
            return Err("not enough arguments");
        }

        let query = args[1].clone();
        let filename = args[2].clone();

        Ok(Config { query, filename })
    }
}

// --snip--

    let config = Config::new(&args).unwrap_or_else(|err| {
        println!("Problem parsing arguments: {}", err);
        process::exit(1);
    });

The unwrap_or_else function is, if the value is an Err value, this method calls the code in the closure, which is an anonymous function we define and pass as an argument to unwrap_or_else.

Next, following the next best practice, we’ll create run function.

Calling a run function in lib.rs

fn run(config: Config) -> Result<(), Box<dyn Error>> {
    let contents = fs::read_to_string(config.filename)?;

    println!("With text:\n{}", contents);

    Ok(())
}

Box<dyn Error> is colled a trait object, and we will review it in chapter 17. For now, we can understand that Box<dyn Error> means the function will return a type that implements the Error trait, but we don’t have to specify what particular type the return value will be.

Recall that ? returns Err from the whole function so the error value gets propagated.

This Ok(()) syntax might look a bit strange at first, but using () like this is the idiomatic way to indicate that we’re calling run for its side effects only; it doesn’t return a value we need.

If a function returns () (inside OK(())) in the success case, and we don’t care about the returned value, we can use if let rather than unwrap_or_else.

The last refactoring is splitting code into a library crate. And here is the final result of the section.

src/lib.rs:

use std::error::Error;
use std::fs;

pub struct Config {
    pub query: String,
    pub filename: String,
}

impl Config {
    pub fn new(args: &[String]) -> Result<Config, &str> {
        if args.len() < 3 {
            return Err("not enough arguments");
        }

        let query = args[1].clone();
        let filename = args[2].clone();

        Ok(Config { query, filename })
    }
}

pub fn run(config: Config) -> Result<(), Box<dyn Error>> {
    let contents = fs::read_to_string(config.filename)?;

    println!("With text:\n{}", contents);

    Ok(())
}

src/main.rs:

use std::env;
use std::process;

use minigrep::Config;

fn main() {
    let args: Vec<String> = env::args().collect();

    let config = Config::new(&args).unwrap_or_else(|err| {
        println!("Problem parsing arguments: {}", err);
        process::exit(1);
    });

    println!("Searching for {}", config.query);
    println!("In file {}", config.filename);

    if let Err(e) = minigrep::run(config) {
        println!("Application error: {}", e);

        process::exit(1);
    }
}

12.4 Developing the Library’s Functionality with Test-Driven Development

In this chapter, the TDD process is

  1. Write a test that fails and run it to make sure it fails for the reason you expect.
  2. Write or modify just enough code to make the new test pass.
  3. Refactor the code you just added or changed and make sure the tests continue to pass.
  4. Repeat from step 1!

Before start TDD process, please delete unrequired println! lines.

Writing a Failing Test

In src/lib.rs:

#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn one_result() {
        let query = "duct";
        let contents = "\
Rust:
safe, fast, productive.
Pick three.";

        assert_eq!(vec!["safe, fast, productive."], search(query, contents));
    }
}

Here, we defined the function search, which was not defined yet. But it’s OK for this step (this is the TDD).

Writing Code to Pass the Test

In src/lib.rs:

pub fn search<'a>(query: &str, contents: &'a str) -> Vec<&'a str> {
    let mut results = Vec::new();

    for line in contents.lines() {
        if line.contains(query) {
            results.push(line);
        }
    }

    results
}

Maybe it’s better time to review the lifetime chapter. And use it from run() (this part will be refactored in the chapter 13):

pub fn run(config: Config) -> Result<(), Box<dyn Error>> {
    let contents = fs::read_to_string(config.filename)?;

    for line in search(&config.query, &contents) {
        println!("{}", line);
    }

    Ok(())
}

12.5 Working with Environment Variables

Add a new test case with search_case_insensitive function. We want to use the function when we specify an environment variable. The way of “making all functions ascase insensitive” is to all related strings to lower cases (for now, we don’t think about general UTF-8 characters).

In mod tests of src/lib.rs:

#[test]
fn case_insensitive() {
        let query = "rUsT";
        let contents = "\
Rust:
safe, fast, productive.
Pick three.
Trust me.";

    assert_eq!(
        vec!["Rust:", "Trust me."],
        search_case_insensitive(query, contents)
    );
}

Implement the function:

pub fn search_case_insensitive<'a>(
    query: &str,
    contents: &'a str,
) -> Vec<&'a str> {
    let query = query.to_lowercase();
    let mut results = Vec::new();

    for line in contents.lines() {
        if line.to_lowercase().contains(&query) {
            results.push(line);
        }
    }

    results
}
  • We shadowed query, and the type of query is String (because of to_lowercase method).

Now, add an environment variable part.

Change Config struct:

pub struct Config {
    pub query: String,
    pub filename: String,
    pub case_sensitive: bool,
}

Change run fuction (controll flow):

let results = if config.case_sensitive {
    search(&config.query, &contents)
} else {
    search_case_insensitive(&config.query, &contents)
};

for line in results {
    println!("{}", line);
}

Read an environment variable in the constructor:

let query = args[1].clone();
let filename = args[2].clone();
let case_sensitive = env::var("CASE_INSENSITIVE").is_err();

Ok(Config {
    query,
    filename,
    case_sensitive,
})
  • CASE_INSENSITIVE environment variable could be set to anything.
  • is_error unwraps a Result and returns boolean.

The next section is the last section of the chapter, so I’ll paste the final result at the end of the chapter.

12.6 Writing Error Messages to Standard Error Instead of Standard Output

One thing to learn: eprintln! will output a message to the stdout.

Here is the final result:

src/main.rs:

use std::env;
use std::process;

use minigrep::Config;

fn main() {
    let args: Vec<String> = env::args().collect();

    let config = Config::new(&args).unwrap_or_else(|err| {
        println!("Problem parsing arguments: {}", err);
        process::exit(1);
    });


    if let Err(e) = minigrep::run(config) {
        eprintln!("Application error: {}", e);

        process::exit(1);
    }
}

src/lib.rs:

use std::error::Error;
use std::fs;
use std::env;

pub struct Config {
    pub query: String,
    pub filename: String,
    pub case_sensitive: bool,
}

impl Config {
    pub fn new(args: &[String]) -> Result<Config, &str> {
        if args.len() < 3 {
            return Err("not enough arguments");
        }

        let query = args[1].clone();
        let filename = args[2].clone();
        let case_sensitive = env::var("CASE_INSENSITIVE").is_err();

        Ok(Config {
            query,
            filename,
            case_sensitive,
        })
    }
}

pub fn run(config: Config) -> Result<(), Box<dyn Error>> {
    let contents = fs::read_to_string(config.filename)?;

    let results = if config.case_sensitive {
        search(&config.query, &contents)
    } else {
        search_case_insensitive(&config.query, &contents)
    };

    for line in results {
        println!("{}", line);
    }

    Ok(())
}

pub fn search<'a>(query: &str, contents: &'a str) -> Vec<&'a str> {
    let mut results = Vec::new();

    for line in contents.lines() {
        if line.contains(query) {
            results.push(line);
        }
    }

    results
}

pub fn search_case_insensitive<'a>(
    query: &str,
    contents: &'a str,
) -> Vec<&'a str> {
    let query = query.to_lowercase();
    let mut results = Vec::new();

    for line in contents.lines() {
        if line.to_lowercase().contains(&query) {
            results.push(line);
        }
    }

    results
}

#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn one_result() {
        let query = "duct";
        let contents = "\
Rust:
safe, fast, productive.
Pick three.";

        assert_eq!(vec!["safe, fast, productive."], search(query, contents));
    }

    #[test]
    fn case_insensitive() {
        let query = "rUsT";
        let contents = "\
Rust:
safe, fast, productive.
Pick three.
Trust me.";

        assert_eq!(
            vec!["Rust:", "Trust me."],
            search_case_insensitive(query, contents)
        );
    }
}

If you want to do logging properly, use log crate.

13. Functional Language Features: Iterators and Closures

Programming in a functional style often includes using functions as values by passing them in arguments, returning them from other functions, assigning them to variables for later execution, and so forth.

13.1 Closures: Anonymous Functions that Can Capture Their Environment

An example of a closure.

let expensive_closure = |num| {
    println!("calculating slowly...");
    thread::sleep(Duration::from_secs(2));
    num
};
  • To define a closure, we start with a pair of vertical pipes (|), inside which we specify the parameters to the closure.
  • Unlike functions, closures can capture values from the scope in which they’re defined.

We can use the closure like this:

let i: i32 = 5;
println("{}",expensive_closure(i)));

We don’t need to define type of closure. The Rust compiler infer its parameters and return type. But, closure definitions will have one concrete type inferred for each of their parameters and for their return value.

But we can also define types explicitly:

let expensive_closure = |num: u32| -> u32 {
    println!("calculating slowly...");
    thread::sleep(Duration::from_secs(2));
    num
};

Memoization, lazy evaluation

  • We can create a struct that will hold the closure and the resulting value of calling the closure (not to calculate expensive code multiple times).

  • We need to specify the type of the closure, because a struct definition needs to know the types of each of its fields.

  • Example:

    struct Cacher<T>
    where
        T: Fn(u32) -> u32,
    {
        calculation: T,
        value: Option<u32>,
    }
    
    • The Cacher struct has a calculation field of the generic type T.
    • The trait bounds on T specify that it’s a closure by using the Fn trait.
    • Any closure we want to store in the calculation field must have one u32 parameter (specified within the parentheses after Fn)
    • ,and must return a u32 (specified after the ->).

Fn Traits

All closures implement at least one of the traits: Fn, FnMut, or FnOnce.

  • FnOnce consumes the variables it captures from its enclosing scope, known as the closure’s environment. To consume the captured variables, the closure must take ownership of these variables and move them into the closure when it is defined. The Once part of the name represents the fact that the closure can’t take ownership of the same variables more than once, so it can be called only once.
  • FnMut can change the environment because it mutably borrows values.
  • Fn borrows values from the environment immutably.

Implement the example:

impl<T> Cacher<T>
where
    T: Fn(u32) -> u32,
{
    fn new(calculation: T) -> Cacher<T> {
        Cacher {
            calculation,
            value: None,
        }
    }

    fn value(&mut self, arg: u32) -> u32 {
        match self.value {
            Some(v) => v,
            None => {
                let v = (self.calculation)(arg);
                self.value = Some(v);
                v
            }
        }
    }
}

And use it:

fn generate_workout(intensity: u32, random_number: u32) {
    let mut expensive_result = Cacher::new(|num| {
        println!("calculating slowly...");
        thread::sleep(Duration::from_secs(2));
        num
    });

    if intensity < 25 {
        println!("Today, do {} pushups!", expensive_result.value(intensity));
        println!("Next, do {} situps!", expensive_result.value(intensity));
    } else {
        if random_number == 3 {
            println!("Take a break today! Remember to stay hydrated!");
        } else {
            println!(
                "Today, run for {} minutes!",
                expensive_result.value(intensity)
            );
        }
    }
}

Closures have an additional capability that functions don’t have: they can capture their environment and access variables from the scope in which they’re defined.

Capturing the Environment with Closures

The following snippet returns an error because equal_to_x is a function, not closure.

fn main() {
    let x = 4;

    fn equal_to_x(z: i32) -> bool {
        z == x
    }

    let y = 4;

    assert!(equal_to_x(y));
}
error[E0434]: can't capture dynamic environment in a fn item
 --> src/main.rs:5:14
  |
5 |         z == x
  |              ^
  |
  = help: use the `|| { ... }` closure form instead

Here is the closure version

fn main() {
    let x = 4;

    let equal_to_x = |z| z == x;

    let y = 4;

    assert!(equal_to_x(y));
}

If you want to force the closure to take ownership of the values it uses in the environment, you can use the move keyword before the parameter list.

Here is the move example (returns compile error):

fn main() {
    let x = vec![1, 2, 3];

    let equal_to_x = move |z| z == x;

    println!("can't use x here: {:?}", x);

    let y = vec![1, 2, 3];

    assert!(equal_to_x(y));
}

13.2 Processing a Series of Items with Iterators

In Rust, iterators are lazy, meaning they have no effect until you call methods that consume the iterator to use it up.

We can create an iterater from Vec<T> explicitly:

let v1 = vec![1, 2, 3];
let v1_iter = v1.iter();

The Iterator Trait and the next Method

The definition of the Iterator trait in the standard library looks like this:

pub trait Iterator {
    type Item;

    fn next(&mut self) -> Option<Self::Item>;

    // methods with default implementations elided
}

Methods that Consume the Iterator

Methods that call next are called consuming adaptors, because calling them uses up the iterator. An example of the consuming adaptor is sum() method. After use sum(), you can’t reuse the iterator.

Methods that Produce Other Iterators

A method iterator adaptors allow you to change iterators into different kinds of iterators. The method map, which takes a closure to call on each item to produce a new iterator, is an example. But because all iterators are lazy, you have to call one of the consuming adaptor methods to get results from calls to iterator adaptors. collect() method consumes the iterator and collects the resulting values into a collection data type.

Here is the good snippet how to use iter, map, and collect:

let v1: Vec<i32> = vec![1, 2, 3];
let v2: Vec<_> = v1.iter().map(|x| x + 1).collect();
assert_eq!(v2, vec![2, 3, 4]);

13.3 Improving Our I/O Project

Refactor two components using iterators:

  1. struct Config
  2. pub fn search
  3. main function accordingly

Config before:

impl Config {
    pub fn new(args: &[String]) -> Result<Config, &str> {
        if args.len() < 3 {
            return Err("not enough arguments");
        }

        let query = args[1].clone();
        let filename = args[2].clone();
        let case_sensitive = env::var("CASE_INSENSITIVE").is_err();

        Ok(Config {
            query,
            filename,
            case_sensitive,
        })
    }
}

Config after:

impl Config {
    pub fn new(mut args: env::Args) -> Result<Config, &'static str> {
        args.next();

        let query = match args.next() {
            Some(arg) => arg,
            None => return Err("Didn't get a query string"),
        };

        let filename = match args.next() {
            Some(arg) => arg,
            None => return Err("Didn't get a file name"),
        };

        let case_sensitive = env::var("CASE_INSENSITIVE").is_err();

        Ok(Config {
            query,
            filename,
            case_sensitive,
        })
    }
}
  • We eliminated the clones from the constructor.
  • After refactoring, we don’t access List, instead, we use iterator.
  • Note that the iterator mutatess by iterating over it.
  • The signature of the constructor has another lifetime parameter. If you omit the 'static, the compiler returns error below:
    error[E0106]: missing lifetime specifier
      --> src/lib.rs:12:55
       |
    12 |     pub fn new(mut args: env::Args) -> Result<Config, &str> {
       |                                                       ^ expected named lifetime parameter
       |
       = help: this function's return type contains a borrowed value with an elided lifetime, but the lifetime cannot be derived from the arguments
    

pub fn search before:

pub fn search<'a>(query: &str, contents: &'a str) -> Vec<&'a str> {
    let mut results = Vec::new();

    for line in contents.lines() {
        if line.contains(query) {
            results.push(line);
        }
    }

    results
}

pub fn search after:

pub fn search<'a>(query: &str, contents: &'a str) -> Vec<&'a str> {
    contents
        .lines()
        .filter(|line| line.contains(query))
        .collect()
}

filter creates an iterator which uses a closure to determine if an element should be yielded. Given an element the closure must return true or false.

and minor main change:

fn main() {
    let config = Config::new(env::args()).unwrap_or_else(|err| {
        eprintln!("Problem parsing arguments: {}", err);
        process::exit(1);
    });
    // -- snip --

13.4 Comparing Performance: Loops vs. Iterators

Answer to the title: Iterators, although a high-level abstraction, get compiled down to roughly the same code as if you’d written the lower-level code yourself.

TL;DR: The implementations of closures and iterators are such that runtime performance is not affected. This is part of Rust’s goal to strive to provide zero-cost abstractions.

Unrolling is an optimization that removes the overhead of the loop controlling code and instead generates repetitive code for each iteration of the loop. Rust comiler unrolls some iteration code when its optimization time.

14. More About Cargo and Crates.io

14.1 Customizing Builds with Release Profiles

There are two release profiles by default, dev and release. You can define the profile-specific configurations in Cargo.toml file. Here is the example how to change optimization level in the file (this example is default value):

[profile.dev]
opt-level = 0

[profile.release]
opt-level = 3

You can find other profiles in Cargo book.

14.2 Publishing a Crate to Crates.io

Before publishing, we need to leave documentation. The documentation can be written inside trible slashes comment /// (Doc-test).

cargo doc creates the documentation, and cargo doc --open open the documentation locally.

Commonly Used Sections

  • Examples
  • Panics
  • Errors
  • Safety

Commenting Contained Items

//! comments are used for describing the entire crate, or entire items. We often use this comments in src/lib.rs, which is the crate root, to describe the entire crate.

Exporting a Convenient Public API with pub use

If you use pub use self::{{ your_custom_module }}, such modules are added the “Re-exports” section of the document, and user can use the module easily.

This section isn’t so critical, so I don’t leave a note. If I need to publish an API, I’ll refer to the documentation directly.

Publish

  1. Create an account on crate.io. I’m using GitHub account.
  2. Go https://crates.io/settings/tokens and get token.
  3. Run cargo login {{ you_token }}. The command store your token in $HOME/.cargo/credentials.
  4. Describe metadata (package.{name, version, license, description, etc.}) in Cargo.toml.
  5. Run cargo publish. (Done!)

14.3 Cargo Workspaces

The feature workspaces enable us to split a package into multiple libraries (but still this is a single package).

A workspace is a set of packages that share the same Cargo.lock and output directory.

Here is the sample structure of workspaces

$ tree -I target
.
├── adder
│   ├── Cargo.toml
│   └── src
│       └── main.rs
├── add-one
│   ├── Cargo.toml
│   └── src
│       └── lib.rs
├── Cargo.lock
└── Cargo.toml

4 directories, 6 files

Cargo.toml:

[workspace]

members = [
    "adder",
    "add-one",
]

adder/Cargo.toml:

[package]
name = "adder"
version = "0.1.0"
edition = "2018"

[dependencies]
add-one = { path = "../add-one" }

add-one/Cargo.toml:

[package]
name = "add-one"
version = "0.1.0"
edition = "2018"

[dependencies]
rand = "0.8.3"

add-one/src/lib.rs:

pub fn add_one(x: i32) -> i32 {
    x + 1
}

#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn it_works() {
        assert_eq!(3, add_one(2));
    }
}

adder/src/main.rs:

use add_one;

fn main() {
    let num = 10;
    println!(
        "Hello, world! {} plus one is {}!",
        num,
        add_one::add_one(num)
    );
}

Let’s run cargo run:

$ cargo run
    Finished dev [unoptimized + debuginfo] target(s) in 0.00s
     Running `target/debug/adder`
Hello, world! 10 plus one is 11!
  • The entry point of the cargo run is fn main() in adder/src/main.rs, because this is the only main function and main.rs.
  • We defined rand crate in add-one/Cargo.toml. If you want to use rand crate in adder package, you have to include the package explicitly in adder/Cargo.toml.
  • The workspace can define another scope.

14.4 Installing Binaries from Crates.io with cargo install

cargo install {{ name_of_binary_on_crate.io }}

The default location of the cargo binaries is $HOME/.cargo/bin.

14.5 Extending Cargo with Custom Commands

cargo-something = cargo somthing (easy subcommand).

15. Smart Pointers

A pointer is a general concept for a variable that contains an address in memory. The most common kind of pointer in Rust is a reference.

Smart pointers, on the other hand, are data structures that not only act like a pointer but also have additional metadata and capabilities.

One example that we’ll explore in this chapter is the reference counting smart pointer type (in 15.4). This pointer enables you to have multiple owners of data by keeping track of the number of owners and, when no owners remain, cleaning up the data.

In many cases, smart pointers own the data they point to.

Actually, We’ve already encountered a few smart pointers in this book, such as String and Vec<T>.

Smart pointers are usually implemented using structs. The characteristic that distinguishes a smart pointer from an ordinary struct is that smart pointers implement the Deref and Drop traits.

We’ll cover the most common smart pointers in the standard library:

  • Box<T> for allocating values on the heap
  • Rc<T>, a reference counting type that enables multiple ownership
  • Ref<T> and RefMut<T>, accessed through RefCell<T>, a type that enforces the borrowing rules at runtime instead of compile time

My note: why the smart pointers are important to learn?

The Rust is desiend in a memory-safety way. Think about your company need to create their own database system for some reason (suppose the company don’t want to use 3rd party database services). If you want to implement a relational database by Rust, these pointers could be used frequently.

15.1 Using Box<T> to Point to Data on the Heap

Box<T> allow you to store data on the heap rather than the stack. What remains on the stack is the pointer to the heap data.

You’ll use them most often in these situations:

  • When you have a type whose size can’t be known at compile time and you want to use a value of that type in a context that requires an exact size
  • When you have a large amount of data and you want to transfer ownership but ensure the data won’t be copied when you do so
  • When you want to own a value and you care only that it’s a type that implements a particular trait rather than being of a specific type

Sideway: Memory allocation about Vec

My note: at this point, I wondered how Rust allocate memory when I manipulate Vec. I found a good post about this theme.

https://markusjais.com/unterstanding-rusts-vec-and-its-capacity-for-fast-and-efficient-programs/ <- the page was removed somehow…🤔 I found a good criticism on the post.

Cite from the official document of std::vec::Vec:

The capacity of a vector is the amount of space allocated for any future elements that will be added onto the vector. This is not to be confused with the length of a vector, which specifies the number of actual elements within the vector. If a vector’s length exceeds its capacity, its capacity will automatically be increased, but its elements will have to be reallocated. For example, a vector with capacity 10 and length 0 would be an empty vector with space for 10 more elements. Pushing 10 or fewer elements onto the vector will not change its capacity or cause reallocation to occur. However, if the vector’s length is increased to 11, it will have to reallocate, which can be slow. For this reason, it is recommended to use Vec::with_capacity whenever possible to specify how big the vector is expected to get.

fn main() {
    let v: Vec<i32> = Vec::new();
    println!("{:?}",v.capacity());  // 0
    println!("{:?}",v.len());       // 0

    let v2: Vec<i32> = Vec::with_capacity(5);
    println!("{:?}",v2.capacity()); // 5
    println!("{:?}",v.len());       // 0
}

After several googling, here was also a good explanation. (Thank you u/matthieum !)

Using a Box<T> to Store Data on the Heap

Not used in this way very often, but educational purpose.

fn main() {
    let b = Box::new(5);
    println!("b = {}", b);
}

When a box goes out of scope, as b does at the end of main, it will be deallocated. The deallocation happens for the box (stored on the stack) and the box goes out of scope, as b does at the end of main, it will be deallocated. The deallocation happens for the box (stored on the stack) and the data it points to (stored on the heap) data it points to (stored on the heap).

Example: construct function (cons list)

A construction function constructs a new pair from its two arguments, which usually are a single value and another pair. “To cons x onto y” informally means to construct a new container instance by putting the element x at the start of this new container, followed by the container y.

Each item in a cons list contains two elements: the value of the current item and the next item. The last item in the list contains only a value called Nil without a next item. A cons list is produced by recursively calling the cons function.

Cons list is one of linked lists.

Let’s try to implement a list of i32 with Cons. The following code returns a compile error.

enum List {
    Cons(i32, List),
    Nil,
}

The reason Rust compiler can’t compile is, Rust doesn’t know how much space it needs to store a List value (List is defined recursively).

image alt text
Recursive List

To solve this issue, use a Box<T> (pointer), because the size of pointer is known.

enum List {
    Cons(i32, Box<List>),
    Nil,
}

use crate::List::{Cons, Nil};

fn main() {
    let list = Cons(1, Box::new(Cons(2, Box::new(Cons(3, Box::new(Nil))))));
}

image alt text
Recursive List with Box

15.2 Treating Smart Pointers Like Regular References with the Deref Trait

The code following returns compile error:

fn main() {
    let x = 5;
    let y = &x;

    assert_eq!(5, x);
    assert_eq!(5, y);
}

The error:

error[E0277]: can't compare `{integer}` with `&{integer}`
 --> src/main.rs:6:5
  |
6 |     assert_eq!(5, y);
  |     ^^^^^^^^^^^^^^^^^ no implementation for `{integer} == &{integer}`
  |

To avoid this error, we should change assert_eq!(5, y); to assert_eq!(5, *y);. This * is called dereference, which means “follow the reference to the value it’s pointing to.”

One more dereference example (one mutual borrowing is allowed!):

fn main() {
    let mut x = 5;
    let y = &mut x;

    *y = 4;

    assert_eq!(5, *y);
    // thread 'main' panicked at 'assertion failed: `(left == right)`
    //   left: `5`,
    //  right: `4`', src/main.rs:8:5
}

Like C or C++, print the number of address:

fn main() {
    let x = &42;
    let address = format!("{:p}", x);
    print!("{:?}", address) // like "0x560b046ea000"
}

Instead of let y = &mut x;, write with Box:

fn main() {
    let x = 5;
    let y = Box::new(x);

    assert_eq!(5, x);
    assert_eq!(5, *y);
}

Note that y is an instance of a box pointing to a copied value of x rather than a reference pointing to the value of x.

Defining Our Own Smart Pointer

Box<T> type in standard library is already implemented Deref tarit, so we could use * operator. If you want to used dereference operator for your own type (struct),

Let’s define a sample type MyBox<T> (tuple struct with one element):

struct MyBox<T>(T);

impl<T> MyBox<T> {
    fn new(x: T) -> MyBox<T> {
        MyBox(x)
    }
}

We didn’t implement Deref trait for this struct, so the following code returns a compile error:

let x = 5;
let y = MyBox::new(x);

assert_eq!(5, x);
assert_eq!(5, *y);

// error[E0614]: type `MyBox<{integer}>` cannot be dereferenced
//   --> src/main.rs:14:19
//    |
// 14 |     assert_eq!(5, *y);
//    |                   ^^
//

Let’s implement Deref trait. The official trait document says, the required method is deref and the associated type (about associated type, check Chapter 19) is Target:

use std::ops::Deref;

impl<T> Deref for MyBox<T> {
    type Target = T;

    fn deref(&self) -> &Self::Target {
        &self.0
    }
}

*y: behind the scenes Rust actually ran this code:

*(y.deref())

Rust substitutes the * operator with a call to the deref method and then a plain dereference so we don’t have to think about whether or not we need to call the deref method.

Why the signature of deref is fn deref(&self) -> &Self::Target? The answer is “Rust’s ownership system”. If the deref method returned the value directly instead of a reference to the value, the value would be moved out of self.

Implicit Deref Coercions with Functions and Methods

Advanced review on String and str:

  • https://stackoverflow.com/a/24159933/9923806
  • https://github.com/BrooksPatton/learning-rust/issues/2#issuecomment-382178427 <- I guess str doesn’t store data on stack… partialy wrong.
  • str is known-size, and String isn’t.
  • str is known-size, so it is placed on stack. The first address (a.k.a. base address) stores the length, and the remained addresses stores the actual string data.
  • &str points to data segment. It also means &str is immutable (&'static str).
  • String stores
    1. the length of its strings,
    2. the pointer to the actual string data, and
    3. the capacity. on stack. and the actual string data is stored on heap.

I checked the data segment data in this post.

How is Deref implemented for String in standard library:

#[stable(feature = "rust1", since = "1.0.0")]
impl ops::Deref for String {
    type Target = str;

    #[inline]
    fn deref(&self) -> &str {
        unsafe { str::from_utf8_unchecked(&self.vec) }
    }
}

When we pass a reference to a particular type’s value as an argument to a function or method , Rust tries to dereference as many times as necessary to get a reference to match the parameter’s type. This is called “implicit deref coercions”.

The following code shows a deref coercions chains (&MyBox<String>&String&str):

use std::ops::Deref;

impl<T> Deref for MyBox<T> {
    type Target = T;

    fn deref(&self) -> &T {
        &self.0
    }
}

struct MyBox<T>(T);

impl<T> MyBox<T> {
    fn new(x: T) -> MyBox<T> {
        MyBox(x)
    }
}

fn hello(name: &str) {
    println!("Hello, {}!", name);
}

fn main() {
    let m = MyBox::new(String::from("Rust"));
    hello(&m);
}

I checked the memory allocation of MyBox in this post.

How Deref Coercion Interacts with Mutability

Rust does deref coercion when it finds types and trait implementations in three cases:

  • From &T to &U when T: Deref<Target=U>
  • From &mut T to &mut U when T: DerefMut<Target=U>
  • From &mut T to &U when T: Deref<Target=U>

Rust will also coerce a mutable reference to an immutable one. But the reverse is not possible: immutable references will never coerce to mutable references.

15.4 Rc<T>, the Reference Counted Smart Pointer

We use the Rc<T> type when we want to allocate some data on the heap for multiple parts of our program to read and we can’t determine at compile time which part will finish using the data last.

Note that Rc<T> is only for use in single-threaded scenarios. If you want to use shared reference counter in mutlthread, you need Arc and Mutex like Arc::new(Mutex::new(0));.

Let’s see the sample code:

enum List {
    Cons(i32, Rc<List>),
    Nil,
}

use crate::List::{Cons, Nil};
use std::rc::Rc;

fn main() {
    let a = Rc::new(Cons(5, Rc::new(Cons(10, Rc::new(Nil)))));
    let b = Cons(3, Rc::clone(&a));
    let c = Cons(4, Rc::clone(&a));
}

This code would be interpreted as follows:

image alt text
Reference counter.

When we create b, instead of taking ownership of a, we’ll clone the Rc<List> that a is holding, thereby increasing the number of references from one to two and letting a and b share ownership of the data in that Rc<List>. clone() makes a clone of the Rc pointer. This creates another pointer to the same allocation, increasing the strong reference count. When b goes out of scope, the counter decrece the number automatically.

enum List {
    Cons(i32, Rc<List>),
    Nil,
}

use crate::List::{Cons, Nil};
use std::rc::Rc;

fn main() {
    let a = Rc::new(Cons(5, Rc::new(Cons(10, Rc::new(Nil)))));
    println!("{}", Rc::strong_count(&a)); // 1
    let b = Cons(3, Rc::clone(&a));
    println!("{}", Rc::strong_count(&a)); // 2
    {
        let c = Cons(4, Rc::clone(&a));
        println!("{}", Rc::strong_count(&a)); // 3
    }
    println!("{}", Rc::strong_count(&a)); // 2
}

We’ll see cyclic reference later, and that’s why the name of method is strong_cout (there is a weak_count also.)

15.5 RefCell<T> and the Interior Mutability Pattern

My summary

Suppose the use case such that:

  1. you want to use a trait from 3rd party crate, and implement the trait for your struct MyStruct, which has a field my_field: &str.
  2. the signature of the trait is (&self, foo: &str). &self is immutable reference.
  3. but in your use case, e.g., your mock type MyStruct for tests, your implementation of the trait should mutate the value MyStruct.my_field to foo.
  4. you can’t change the signature of the trait from (&self, foo: &str) to (&mut self, foo: &str) because it is 3rd party crate. (You can fork the crate, but that is another story.)

In this case, you can use RefCell like my_field: RefCell<&str>.

The following methods are basic usages of RefCell:

  • RefCell::new()
  • my_refcell.borrow()
  • my_refcell.borrow_mut()

Enforcing Borrowing Rules at Runtime with RefCell<T>

With references and Box<T>, the borrowing rules’ invariants are enforced at compile time. With RefCell<T>, these invariants are enforced at runtime.

With references, if you break these rules, you’ll get a compiler error. With RefCell<T>, if you break these rules, your program will panic and exit. (Of course you have the question now “why we need to violate the compiler rule?”. Be patient.)

The advantage of checking the borrowing rules at runtime instead is that certain memory-safe scenarios are then allowed, whereas they are disallowed by the compile-time checks.

The advantage of checking the borrowing rules at runtime instead is that certain memory-safe scenarios are then allowed, where they would’ve been disallowed by the compile-time checks. Static analysis, like the Rust compiler, is inherently conservative. Some properties of code are impossible to detect by analyzing the code: the most famous example is the Halting Problem,

Because some static analysis is impossible, if the Rust compiler can’t be sure the code complies with the ownership rules, it might reject a correct program; in this way, it’s conservative.

Similar to Rc<T>, RefCell<T> is only for use in single-threaded scenarios and will give you a compile-time error if you try using it in a multithreaded context.

  • Rc<T> enables multiple owners of the same data; Box<T> and RefCell<T> have single owners.
  • Box<T> allows immutable or mutable borrows checked at compile time; Rc<T> allows only immutable borrows checked at compile time; RefCell<T> allows immutable or mutable borrows checked at runtime.
  • Because RefCell<T> allows mutable borrows checked at runtime, you can mutate the value inside the RefCell<T> even when the RefCell<T> is immutable.

Interior Mutability: A Mutable Borrow to an Immutable Value

Interior mutability is a design pattern in Rust that allows you to mutate data even when there are immutable references to that data.

There are situations in which it would be useful for a value to mutate itself in its methods but appear immutable to other code. Code outside the value’s methods would not be able to mutate the value. Using RefCell<T> is one way to get the ability to have interior mutability.

A Use Case for Interior Mutability: Mock Objects (to be reviewed)

  1. Suppose that a trait in a thrid party library is defined, which takes a parameter as an immutable (default) &self reference.
  2. But, when you implement the method, you want to implement it to reference as mutable reference without touching the library. But this attempt will rejected by compiler.
  3. In that case, you can put a data in RefCell<T> like RefCell<Vec<String>> so that .borrow_mut() method make the reference as mutable.

Having Multiple Owners of Mutable Data by Combining Rc<T> and RefCell<T> (to be reviewed)

need to be reviewed.

15.6 Reference Cycles Can Leak Memory (to be reviewed)

https://doc.rust-lang.org/book/ch15-05-interior-mutability.html#having-multiple-owners-of-mutable-data-by-combining-rct-and-refcell

  • Recall that Rc<T> lets you have multiple owners of some data,
  • but it only gives immutable access to that data.
  • If you have an Rc<T> that holds a RefCell<T>, you can get a value that can have multiple owners and that you can mutate!
#[derive(Debug)]
enum List {
    Cons(Rc<RefCell<i32>>, Rc<List>),
    Nil,
}

use crate::List::{Cons, Nil};
use std::cell::RefCell;
use std::rc::Rc;

fn main() {
    let value = Rc::new(RefCell::new(5));

    let a = Rc::new(Cons(Rc::clone(&value), Rc::new(Nil)));

    let b = Cons(Rc::new(RefCell::new(3)), Rc::clone(&a));
    let c = Cons(Rc::new(RefCell::new(4)), Rc::clone(&a));

    *value.borrow_mut() += 10;

    println!("a after = {:?}", a);
    println!("b after = {:?}", b);
    println!("c after = {:?}", c);
}
a after = Cons(RefCell { value: 15 }, Nil)
b after = Cons(RefCell { value: 3 }, Cons(RefCell { value: 15 }, Nil))
c after = Cons(RefCell { value: 4 }, Cons(RefCell { value: 15 }, Nil))

Mutex is the thread-safe version of RefCell


https://doc.rust-lang.org/book/ch15-06-reference-cycles.html

memory leaks are memory safe in Rust. We can see that Rust allows memory leaks by using Rc and RefCell


Should be reviewed from here

My note: Refernce counted smart pointer for Vector

https://stackoverflow.com/questions/67655381/is-my-understanding-of-a-rust-vector-that-supports-rc-or-box-wrapped-types-corre

16. Fearless Concurrency

The Rust team discovered that the ownership and type systems are a powerful set of tools to help manage memory safety and concurrency problems!

Caution: In this book, authors refer to many of the problems as concurrent rather than being more precise by saying concurrent and/or parallel.

16.1 Using Threads to Run Code Simultaneously

Many operating systems provide an API for creating new threads. This model where a language calls the operating system APIs to create threads is sometimes called 1:1, meaning one operating system thread per one language thread.

Programming language-provided threads are known as green threads, and languages that use these green threads will execute them in the context of a different number of operating system threads. For this reason, the green-threaded model is called the M:N model: there are M green threads per N operating system threads, where M and N are not necessarily the same number.

The Rust standard library only provides an implementation of 1:1 threading.

Creating a New Thread with spawn

To create a new thread, we call the thread::spawn function and pass it a closure containing the code we want to run in the new thread. (The new thread is a new OS thread, because Rust standard library provides only 1:1.) The new thread will be stopped when the main thread ends, whether or not it has finished running.

use std::thread;
use std::time::Duration;

fn main() {
    thread::spawn(|| {
        for i in 1..10 {
            println!("hi number {} from the spawned thread!", i);
            thread::sleep(Duration::from_millis(1));
        }
    });

    for i in 1..5 {
        println!("hi number {} from the main thread!", i);
        thread::sleep(Duration::from_millis(1));
    }
}

Run (You can see, there is no 6 to 10):

$ cargo run
hi number 1 from the main thread!
hi number 1 from the spawned thread!
hi number 2 from the spawned thread!
hi number 2 from the main thread!
hi number 3 from the spawned thread!
hi number 3 from the main thread!
hi number 4 from the spawned thread!
hi number 4 from the main thread!
hi number 5 from the spawned thread!

The calls to thread::sleep force a thread to stop its execution for a short duration, allowing a different thread to run. The number of spawnd thread! line between main thread is depend on your CPU. If I comment-out the lines thread::sleep(Duration::from_millis(1));, the spawned process doesn’t start.

$ cargo run
hi number 1 from the main thread!
hi number 2 from the main thread!
hi number 3 from the main thread!
hi number 4 from the main thread!

Waiting for All Threads to Finish Using join Handles

The return type of thread::spawn is JoinHandle. A JoinHandle is an owned value that, when we call the join method on it, will wait for its thread to finish.

use std::thread;
use std::time::Duration;

fn main() {
    let handle = thread::spawn(|| {
        for i in 1..10 {
            println!("hi number {} from the spawned thread!", i);
            thread::sleep(Duration::from_millis(1));
        }
    });

    for i in 1..5 {
        println!("hi number {} from the main thread!", i);
        thread::sleep(Duration::from_millis(1));
    }

    handle.join().unwrap();
}
hi number 1 from the main thread!
hi number 1 from the spawned thread!
hi number 2 from the main thread!
hi number 2 from the spawned thread!
hi number 3 from the main thread!
hi number 3 from the spawned thread!
hi number 4 from the main thread!
hi number 4 from the spawned thread!
hi number 5 from the spawned thread!
hi number 6 from the spawned thread!
hi number 7 from the spawned thread!
hi number 8 from the spawned thread!
hi number 9 from the spawned thread!

If we put the line handle.join().unwrap(); between the fors statement, result would be like follows, because it waits the end of the sub-thread.

hi number 1 from the spawned thread!
hi number 2 from the spawned thread!
hi number 3 from the spawned thread!
hi number 4 from the spawned thread!
hi number 5 from the spawned thread!
hi number 6 from the spawned thread!
hi number 7 from the spawned thread!
hi number 8 from the spawned thread!
hi number 9 from the spawned thread!
hi number 1 from the main thread!
hi number 2 from the main thread!
hi number 3 from the main thread!
hi number 4 from the main thread!

Using move Closures with Threads

If you want to access to variables with the closure in thread::spawn, the spawned thread doesn’t know how long the variable is valied. By adding the move keyword before the closure, we force the closure to take ownership of the values it’s using rather than allowing Rust to infer that it should borrow the values.

use std::thread;

fn main() {
    let v = vec![1, 2, 3];

    let handle = thread::spawn(move || {
        println!("Here's a vector: {:?}", v);
    });

    handle.join().unwrap();
}

In this example, move keyword moved the ownership of v to the spawned thread, so main thread doesn’t have the ownership of v. If you try to drop(v) in the main thread before handle.join(), compiler doesn’t allow to do that. We deal with this issue ina a later section.

To understand move correctly, please review lifetime in Rust.

16.2 Using Message Passing to Transfer Data Between Threads

In the standard library, Rust provides message-sending concurrency by channel (mpsc::channel). mpsc stands for multiple producer, single consumer (multi TX, single RX, Consumer). A channel is said to be closed if either the transmitter or receiver half is dropped.

You can create a channel like this:

use std::sync::mpsc;
// -- snip
let (tx, rx) = mpsc::channel();

Let’s see the sample code:

use std::sync::mpsc;
use std::thread;

fn main() {
    let (tx, rx) = mpsc::channel();

    thread::spawn(move || {
        let val = String::from("hi");
        tx.send(val).unwrap();
    });

    let received = rx.recv().unwrap();
    println!("Got: {}", received);
}

RX has two useful methods: recv and try_recv. recv is blocking and return Result<T, E>, while try_recv is non-blocking and returns Ok or Err immediately. So, try_recv would be put in loops.

In this context, we can call threads as actors

Actor model - Wikipedia

Channels and Ownership Transference

Transfering via channel takes ownership of the item (like a real word RX.) If you try to use transmitted data after sending via channel, it returns compile error (thank you Rust).

Sending Multiple Values and Seeing the Receiver Waiting

The single receiver could be an iterable. When the TX is closed, RX will be also closed (dropped) and iterator ends:

use std::sync::mpsc;
use std::thread;
use std::time::Duration;

fn main() {
    let (tx, rx) = mpsc::channel();

    thread::spawn(move || {
        let vals = vec![
            String::from("hi"),
            String::from("from"),
            String::from("the"),
            String::from("thread"),
        ];

        for val in vals {
            tx.send(val).unwrap();
            thread::sleep(Duration::from_secs(1));
        }
    });

    for received in rx {
        println!("Got: {}", received);
    }
}

Creating Multiple Producers by Cloning the Transmitter

You can imaging a channel as a queue, so when multiple sender send messages, the order in receiver is random.

16.3 Shared-State Concurrency

Using Mutexes to Allow Access to Data from One Thread at a Time

Cf. My note about mutex

To access the data in a mutex, a thread must first signal that it wants access by asking to acquire the mutex’s lock (lock() method in Rust).

The API of Mutex<T>

Here is 101 of Mutex<T>:

use std::sync::Mutex;

fn main() {
    let m = Mutex::new(5);

    {
        let mut num = m.lock().unwrap();
        *num = 6;
    }

    println!("m = {:?}", m);
}

Result:

m = Mutex { data: 6, poisoned: false, .. }

The call to lock would fail if another thread holding the lock panicked. In that case, no one would ever be able to get the lock, so we’ve chosen to unwrap and have this thread panic if we’re in that situation.

Mutex<T> is a smart pointer. The call to lock returns a smart pointer called MutexGuard, wrapped in a LockResult that we handled with the call to unwrap. The MutexGuard smart pointer implements Deref to point at our inner data; the smart pointer also has a Drop implementation that releases the lock automatically when a MutexGuard goes out of scope.

Multiple Ownership with Multiple Threads

Note that Rc<T> points to the data on heap.

Atomic Reference Counting with Arc<T>

Rc<T> is not safe to share accross threads. Instead we can use an Atomic reference counter Arc<T> in std::sync::atomic. Thread safety comes with a performance penalty that you only want to pay when you really need to.

Here is the proper way to share ownership accross multiple threads:

use std::sync::{Arc, Mutex};
use std::thread;

fn main() {
    let counter = Arc::new(Mutex::new(0));
    let mut handles = vec![];

    for _ in 0..10 {
        let counter = Arc::clone(&counter);
        let handle = thread::spawn(move || {
            let mut num = counter.lock().unwrap();

            *num += 1;
        });
        handles.push(handle);
    }

    for handle in handles {
        handle.join().unwrap();
    }

    println!("Result: {}", *counter.lock().unwrap()); // Result: 10
}

counter is immutable, but Mutex<T> provides interior mutability.

Similarities Between RefCell<T>/Rc<T> and Mutex<T>/Arc<T>


Should be reviewed from here

17. Object Oriented Programming Features of Rust

Objects came from Simula in the 1960s. Those objects influenced Alan Kay’s programming architecture in which objects pass messages to each other. He coined the term object-oriented programming in 1967 to describe this architecture.

Hmm…

17.1 Characteristics of Object-Oriented Languages

Objects Contain Data and Behavior

The book “Design Patterns: Elements of Reusable Object-Oriented Software” by Erich Gamma, Richard Helm, Ralph Johnson, and John Vlissides (Addison-Wesley Professional, 1994) colloquially referred to as “The Gang of Four book”, is a catalog of object-oriented design patterns. It defines OOP this way:

Object-oriented programs are made up of objects. An object packages both data and the procedures that operate on that data. The procedures are typically called methods or operations.

Using this definition, Rust is object oriented: structs and enums have data, and impl blocks provide methods on structs and enums.

Encapsulation that Hides Implementation Details

Encapsulation means that the implementation details of an object aren’t accessible to code using that object. Therefore, the only way to interact with an object is through its public API. In Rust, we can use the pub keyword to decide which modules, types, functions, and methods in our code should be public, and by default everything else is private.

Inheritance as a Type System and as Code Sharing

There is no way to define a struct that inherits the parent struct’s fields and method implementations.

You choose inheritance for two main reasons.

  1. One is for reuse of code
  2. The other reason is polymorphism, which means that you can substitute multiple objects for each other at runtime if they share certain characteristics.

Rust uses generics to abstract over different possible types and trait bounds to impose constraints on what those types must provide. This is sometimes called bounded parametric polymorphism.

polimorphism in practice

Cf. add hoc polymorphism: suppose a function, Add(x,y). The behavior of Add is depend on the type of input (append in case of strings, add in case of int, etc.) This could be an example of ad hoc polymorphism.

Rust takes a different approach, using trait objects instead of inheritance.

17.2 Using Trait Objects That Allow for Values of Different Types

src/lib.rs:

pub trait Draw {
    fn draw(&self);
}

pub struct Screen {
    pub components: Vec<Box<dyn Draw>>,
}

impl Screen {
    pub fn run(&self) {
        for component in self.components.iter() {
            component.draw();
        }
    }
}

pub struct Button {
    pub width: u32,
    pub height: u32,
    pub label: String,
}

impl Draw for Button {
    fn draw(&self) {
        // code to actually draw a button
    }
}

Type Box<dyn Draw> is a trait object; it’s a stand-in for any type inside a Box that implements the Draw trait. dyn stands for “dynamic dispatch” (in computer science, dynamic dispatch is the process of selecting which implementation of a polymorphic operation (method or function) to call at run time.). The official documentation describes the meaning of dynamic dispatch later. When we use trait objects, Rust must use dynamic dispatch.

We can’t use trait <T> here because A generic type parameter can only be substituted with one concrete type at a time, whereas trait objects allow for multiple concrete types to fill in for the trait object at runtime.

We say that the trait occurs as a trait objedt at Box<dyn Draw>.

The dyn keyword is used to highlight that calls to methods on the associated Trait are dynamically dispatched. To use the trait this way, it must be ‘object safe’.) A trait is object safe if all the methods defined in the trait have the following properties:

  • The return type isn’t Self.
  • There are no generic type parameters.

Trait objects must be object safe because once you’ve used a trait object, Rust no longer knows the concrete type that’s implementing that trait.

The code that results from monomorphization is doing static dispatch, which is when the compiler knows what method you’re calling at compile time. This is opposed to dynamic dispatch, which is when the compiler can’t tell at compile time which method you’re calling.

An example of a trait whose methods are not object safe is the standard library’s Clone trait.

pub trait Clone {
    fn clone(&self) -> Self;
}

Dynamic disptch in practice

When you use a generic function, you could encounter &*my_variable.

https://stackoverflow.com/a/41273406

My note: vtable - Should be written in my own words

Dynamic dispatch costs a bit, so consider using enum_dispatch crate.

https://docs.rs/enum_dispatch/latest/enum_dispatch/

18. Patterns and Matching

Pattern matching is mandatory when you want to write your own macro.

18.1 All the Places Patterns Can Be Used

Reviews: match Arms and Conditional if let Expression

You can write more complex match with if let, but the downside of if let expressions is that the compiler doesn’t check exhaustiveness. (Some cases could leak.)

while let

let v = vec!['a', 'b', 'c'];

for (index, value) in v.iter().enumerate() {
    println!("{} is at index {}", value, index);
}

let statement as pattern

let PATTERN = EXPRESSION;

// example
let (x, y, z) = (1, 2, 3);

Function Parameters

fn print_coordinates(&(x, y): &(i32, i32)) {
    println!("Current location: ({}, {})", x, y);
}

fn main() {
    let point = (3, 5);
    print_coordinates(&point);
}

18.2 Refutability: Whether a Pattern Might Fail to Match

irrefutablerefutable
match for any possible value passedcan fail to match for some possible value
let x = 5;Some(x) = a_value;

In general, you shouldn’t have to worry about the distinction between refutable and irrefutable patterns; however, you do need to be familiar with the concept of refutability so you can respond when you see it in an error message.

18.3 Pattern Syntax

This section just contains examples of useful pattern matches.

Value match for a variable:

let x = 1;

match x {
    1 => println!("one"),
    2 => println!("two"),
    3 => println!("three"),
    _ => println!("anything"),
}

Variable scope (shadowed):

    let x = Some(5);
    let y = 10;

    match x {
        Some(50) => println!("Got 50"),
        Some(y) => println!("Matched, y = {:?}", y), // Matched, y = 5
        _ => println!("Default case, x = {:?}", x),
    }

    println!("at the end: x = {:?}, y = {:?}", x, y);
    // at the end: x = Some(5), y = 10

Multiple patterns

let x = 1;

match x {
    1 | 2 => println!("one or two"), // match
    3 => println!("three"),
    _ => println!("anything"),
}

Matching Ranges of Values with ..=

let x = 5;

match x {
    1..=5 => println!("one through five"),
    _ => println!("something else"),
}

Recall that Rust’s char type is four bytes in size and represents a Unicode Scalar Value.

let x = 'c';

match x {
    'a'..='j' => println!("early ASCII letter"),
    'k'..='z' => println!("late ASCII letter"),
    _ => println!("something else"),
}

Destructuring to Break Apart Values

struct Point {
    x: i32,
    y: i32,
}

fn main() {
    let p = Point { x: 0, y: 7 };

    let Point { x: a, y: b } = p;
    assert_eq!(0, a);
    assert_eq!(7, b);

    // Or
    let Point { x, y } = p;
    assert_eq!(0, x);
    assert_eq!(7, y);
}

You can achieve a partial match:

let p = Point { x: 0, y: 7 };

match p {
    Point { x, y: 0 } => println!("On the x axis at {}", x),
    Point { x: 0, y } => println!("On the y axis at {}", y), // match
    Point { x, y } => println!("On neither axis: ({}, {})", x, y),
}

Destructuring Enums

enum Message {
    Quit,
    Move { x: i32, y: i32 },
    Write(String),
    ChangeColor(i32, i32, i32),
}

fn main() {
    let msg = Message::ChangeColor(0, 160, 255);

    match msg {
        Message::Quit => {
            println!("The Quit variant has no data to destructure.")
        }
        Message::Move { x, y } => {
            println!(
                "Move in the x direction {} and in the y direction {}",
                x, y
            );
        }
        Message::Write(text) => println!("Text message: {}", text),
        Message::ChangeColor(r, g, b) => println!(
            "Change the color to red {}, green {}, and blue {}",
            r, g, b
        ),
    }
}

Destructuring Nested Structs and Enums

enum Color {
    Rgb(i32, i32, i32),
    Hsv(i32, i32, i32),
}

enum Message {
    Quit,
    Move { x: i32, y: i32 },
    Write(String),
    ChangeColor(Color),
}

fn main() {
    let msg = Message::ChangeColor(Color::Hsv(0, 160, 255));

    match msg {
        Message::ChangeColor(Color::Rgb(r, g, b)) => println!(
            "Change the color to red {}, green {}, and blue {}",
            r, g, b
        ),
        Message::ChangeColor(Color::Hsv(h, s, v)) => println!(
            "Change the color to hue {}, saturation {}, and value {}",
            h, s, v
        ),
        _ => (),
    }
}

Ignoring Values in a Pattern

Just use underscore _ as a place holder.

Ignoring Remaining Parts of a Value with ..

struct Point {
    x: i32,
    y: i32,
    z: i32,
}

let origin = Point { x: 0, y: 0, z: 0 };

match origin {
    Point { x, .. } => println!("x is {}", x),
}

Don’t make .. as ambiguous (the following code isn’t compiled):

let numbers = (2, 4, 8, 16, 32);

match numbers {
    (.., second, ..) => { // Ambiguous
        println!("Some numbers: {}", second)
    },
}

Extra Conditionals with Match Guards

let x = Some(5);
let y = 10;

match x {
    Some(50) => println!("Got 50"),
    Some(n) if n == y => println!("Matched, n = {}", n),
    _ => println!("Default case, x = {:?}", x),
}

println!("at the end: x = {:?}, y = {}", x, y);

@ Bindings

You can use a variable alias:

enum Message {
    Hello { id: i32 },
}

let msg = Message::Hello { id: 5 };

match msg {
    Message::Hello {
        id: id_variable @ 3..=7,
    } => println!("Found an id in range: {}", id_variable),
    Message::Hello { id: 10..=12 } => {
        println!("Found an id in another range")
    }
    Message::Hello { id } => println!("Found some other id: {}", id),
}

19. Advanced Features

19.1 Unsafe Rust

“Unsafe” means “doesn’t enforce memory safety guarantees”.

Although the code might be okay, if the Rust compiler doesn’t have enough information to be confident, it will reject the code. In these cases, you can use unsafe code to tell the compiler, “Trust me, I know what I’m doing.”

Another reason Rust has an unsafe alter ego is that the underlying computer hardware is inherently unsafe.

??

Unsafe Superpowers

  1. Dereference a raw pointer
  2. Call an unsafe function or method
  3. Access or modify a mutable static variable
  4. Implement an unsafe trait
  5. Access fields of unions

Notes:

  • unsafe doesn’t turn off the borrow checker or disable any other of Rust’s safety checks.
  • unsafe does not mean the code inside the block is necessarily dangerous or that it will definitely have memory safety problems.

Parts of the standard library are implemented as safe abstractions over unsafe code that has been audited.

The rest of the section contains examples when to use unsafe.

Define unsafe function, and use it

unsafe fn dangerous() {}

unsafe {
    dangerous();
}

Dereferencing a Raw Pointer

Raw pointers can be immutable or mutable and are written as *const T and *mut T, respectively. The asterisk isn’t the dereference operator; it’s part of the type name. In the context of raw pointers, immutable means that the pointer can’t be directly assigned to after being dereferenced.

let mut num = 5;

let r1 = &num as *const i32;
let r2 = &mut num as *mut i32;

unsafe {
    println!("r1 is: {}", *r1);
    println!("r2 is: {}", *r2);
}

cf) println! macro expand the arguments as a reference under the cover:

With raw pointers, we can create a mutable pointer and an immutable pointer to the same location and change data through the mutable pointer, potentially creating a data race. Be careful!

Sometimes, Rust isn’t smart enough to know safe code. When we know code is okay, but Rust doesn’t, it’s time to reach for unsafe code.

Using extern Functions to Call External Code

Rust has a keyword, extern, that facilitates the creation and use of a Foreign Function Interface (FFI). Functions declared within extern blocks are always unsafe to call from Rust code.

extern "C" {
    fn abs(input: i32) -> i32;
}

fn main() {
    unsafe {
        println!("Absolute value of -3 according to C: {}", abs(-3));
    }
}

Calling Rust Functions from Other Languages

we make the call_from_c function accessible from C code, after it’s compiled to a shared library and linked from C:

#[no_mangle]
pub extern "C" fn call_from_c() {
    println!("Just called a Rust function from C!");
}

Accessing or Modifying a Mutable Static Variable

In Rust, global variables are called static variables. Rust does support global variables, but can be problematic with Rust’s ownership rules.

Mutatin a static mut variable is unsafe:

static mut COUNTER: u32 = 0;

fn add_to_count(inc: u32) {
    unsafe {
        COUNTER += inc;
    }
}

fn main() {
    add_to_count(3);

    unsafe {
        println!("COUNTER: {}", COUNTER);
    }
}

But why it’s unsafe? With mutable data that is globally accessible, it’s difficult to ensure there are no data races, which is why Rust considers mutable static variables to be unsafe. Where possible, it’s preferable to use the concurrency techniques and thread-safe smart pointers we discussed in Chapter 16 so the compiler checks that data accessed from different threads is done safely.

Other examples the book mentioned

  • Implementing an Unsafe Trait
  • Accessing Fields of a Union

19.2 Advanced Traits (should be reviewed)

Using Supertraits to Require One Trait’s Functionality Within Another Trait

If you know inheritance in OOP, you can understand the meaning of “Super” in Supertraits. We can define a trait which can be implemented only for a struct implemented a certain trait.

In the following snippet, OutlinePrint can be implemented to a struct only when the struct is implemented the trait fmt::Display:

struct Point {
    x: i32,
    y: i32,
}

use std::fmt;
impl fmt::Display for Point {
    fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
        write!(f, "({}, {})", self.x, self.y)
    }
}

use std::fmt;

trait OutlinePrint: fmt::Display {
    fn outline_print(&self) {
        let output = self.to_string();
        let len = output.len();
        println!("{}", "*".repeat(len + 4));
        println!("*{}*", " ".repeat(len + 2));
        println!("* {} *", output);
        println!("*{}*", " ".repeat(len + 2));
        println!("{}", "*".repeat(len + 4));
    }
}

Off topic: fmt::Display

In Rust by Example:

https://doc.rust-lang.org/rust-by-example/trait/supertraits.html#supertraits

Note that even the name is “super” tarit, a supertrait is a basic trait which should be “inherited” by other subtrait.

For example, when we think about a set of all person and a set of all student, Person ⊃ Student is the case, so Person is the supertrait of Student.

trait Person {
    // foo
};

trait Student: Person {
    // bar
}

19.3 Advanced Types (should be reviewed)

Using the Newtype Pattern for Type Safety and Abstraction

The newtype pattern is that create a new type which behaves totally same as another type but the only difference is its name of type. This pattern could reduce the number of bugs, and provides abstraction. You can create the new type with unit-like struct.

use std::ops::Add;

struct Millimeters(u32);
struct Meters(u32);

impl Add<Meters> for Millimeters {
    type Output = Millimeters;

    fn add(self, other: Meters) -> Millimeters {
        Millimeters(self.0 + (other.0 * 1000))
    }
}

You can find more flat explanation at Rust Design Patterns or my note.

Dynamically Sized Types and the Sized Trait

By default, generic functions will work only on types that have a known size at compile time. However, you can use the following special syntax to relax this restriction:

fn generic<T: ?Sized>(t: &T) {
    // --snip--
}

A trait bound on ?Sized means “T may or may not be Sized” and this notation overrides the default that generic types must have a known size at compile time. The ?Trait syntax with this meaning is only available for Sized, not any other traits.

Also note that we switched the type of the t parameter from T to &T. Because the type might not be Sized, we need to use it behind some kind of pointer. In this case, we’ve chosen a reference.

19.4 Advanced Functions and Closures

Function Pointers

The fn type is called a function pointer. By function pointer, you can pass a function as a paramenter:

fn add_one(x: i32) -> i32 {
    x + 1
}

fn do_twice(f: fn(i32) -> i32, arg: i32) -> i32 {
    f(arg) + f(arg)
}

fn main() {
    let answer = do_twice(add_one, 5);

    println!("The answer is: {}", answer);
}

Function pointers implement all three of the closure traits (Fn, FnMut, and FnOnce), so you can always pass a function pointer as an argument for a function that expects a closure.

Returning Closures

fn returns_closure() -> Box<dyn Fn(i32) -> i32> {
    Box::new(|x| x + 1)
}

Note that if you change the return type to dyn Fn(i32) -> i32, the compiler returns error because Rust doesn’t know the size of a closure,

19.5 Macros

macro_rules! can define your custom macros, especially called declarative macros. There are three types of macros in Rust:

Custom #[derive] macroAttribute-like macroFunction-like macro
code added with the derive attribute used on structs and enumsdefine custom attributes usable on any itemlook like function calls but operate on the tokens specified as their argument
#[derive(Debug)]#[tokio:main]vec![1,2,3,]

The Difference Between Macros and Functions

Macros can,

  1. take a variable number of parameters: we can call println!("hello") with one argument or println!("hello {}", name)
  2. implement a trait on a given type, because macros are expanded before the compiler interprets.

Excerpt from “Rust By Example”:

So why are macros useful?

  1. Don’t repeat yourself. …
  2. Domain-specific languages.
  3. Variadic interfaces.

The downside is, macro definitions are generally more difficult to read, understand, and maintain than function definitions.

You must define macros or bring them into scope before you call them in a file, as opposed to functions you can define anywhere and call anywhere.

Declarative Macros with macro_rules! for General Metaprogramming

Before checking how vec! macro should work, here is a simple macro definition from “Rust by Example”:

// This is a simple macro named `say_hello`.
macro_rules! say_hello {
    // `()` indicates that the macro takes no argument.
    () => {
        // The macro will expand into the contents of this block.
        println!("Hello!");
    };
}

fn main() {
    // This call will expand into `println!("Hello");`
    say_hello!()
}

One argument macro:

fn main() {
    // compiles OK
    macro_rules! foo {
        ($l:tt) => {
            bar!($l);
        };
    }

    macro_rules! bar {
        (3) => {};
    }

    foo!(3);
}
  • tt is an abbreviation of “Token Tree”, a single token or tokens in matching delimiters (), [], or {}.
  • ($l:tt): this parentheses mean the macro try to match this pattern. In this case, the macro capture inside of the macro parameter as $l, and this $l should be a TokenTree metavariable.

To achieve variadic interfaces, macros in Rust takes an expression (pattern) inside the first parentheses. We need knowledge on expressions and metavariables.

Let’s quickly look how we can implement the simple version of the familiar macro vec!:

#[macro_export]
macro_rules! vec {
    ( $( $x:expr ),* ) => {
        {
            let mut temp_vec = Vec::new();
            $(
                temp_vec.push($x);
            )*
            temp_vec
        }
    };
}
  • #[macro_export] and macro_rules! declare we will define an exportable macro.
  • ( $( $x:expr ),* ): The input of the macro would be,
    • The form is $ ( MacroMatch+ ) MacroRepSep? MacroRepOp, where MacroMatch+ is a tree labeled by $x, , is a MacroRepSep, and * MacroRepOp indicates how many times the match repeats (in this case 0 or more than 0 times).
    • try to match the all charactors before comma as an expression metavariable. the name of the match is $x.
    • the comma , is a literal comma, which could contain,
    • * means anything after the comma.

From the macro, vec![1,2,3] wil generate the code as follows:

{
    let mut temp_vec = Vec::new();
    temp_vec.push(1);
    temp_vec.push(2);
    temp_vec.push(3);
    temp_vec
}

This book is “Getting-strated book”, so we don’t learn about how to write macro further.

Procedural Macros for Generating Code from Attributes

proc_macro crate is required.

OK. getting-started book doesn’t provide a good tutorial how to write my own macro. I’ll refer other learning material when I need.

Reference table

Fragment specifiersNameExample
itemItem
blockBlockExpression{ let foo = 2;}
stmtStatementlet foo = 2;
pat_paramPatternNoTopAltRefer the section “Pattern and Matching”
patequivalent to pat_paramRefer the section “Pattern and Matching”
exprExpression
tyTypef64, MyStruct, MyEnum
identIDENTIFIER_OR_KEYWORDfoo in let foo: i32;
pathTypePath::std::fmt
ttTokenTree
metaAttributes#![allow(unused_variables)]
lifetimeLIFETIME_TOKEN'a in &'a i32
visVisibility quialifierpub in pub bar
literalLiteralExpression"hello",r#"hi"#, 12

20. Final Project: Building a Multithreaded Web Server

But before we get started, we should mention one detail: the method we’ll use won’t be the best way to build a web server with Rust. A number of production-ready crates are available on crates.io that provide more complete web server and thread pool implementations than we’ll build.

20.1 Building a Single-Threaded Web Server

  • HTTP over TCP
  • Using standard library std::net
cargo new hello
cd hello

src/main.rc:

use std::net::TcpListener;

fn main() {
    let listener = TcpListener::bind("127.0.0.1:7878").unwrap();

    for stream in listener.incoming() {
        let stream = stream.unwrap();

        println!("Connection established!");
    }
}
  • The bind function returns a Result<T, E>, which indicates that binding might fail.
  • We use unwrap to stop the program if errors happen.
  • The incoming method on TcpListener returns an iterator that gives us a sequence of streams (more specifically, streams of type TcpStream).
    • We’re iterating over connection attempts with incoming method.
    • A single stream represents an open connection between the client and the server.
    • A connection is the name for the full request and response process in which a client connects to the server, the server generates a response, and the server closes the connection.
    • As such, TcpStream will read fRom itself to see what the client sent and then allow us to write our response to the stream.
    • Overall, this for loop will process each connection in turn and produce a series of streams for us to handle.
    • The handling of the stream consists of calling unwrap to terminate our program if the stream has any errors.

Test

$ cargo run

# In another terminal
$ curl localhost:7878 -vvv
*   Trying 127.0.0.1:7878...
* TCP_NODELAY set
* Connected to localhost (127.0.0.1) port 7878 (#0)
> GET / HTTP/1.1
> Host: localhost:7878
> User-Agent: curl/7.68.0
> Accept: */*
>
* Recv failure: Connection reset by peer
* Closing connection 0
curl: (56) Recv failure: Connection reset by peer


# cargo run terminal
$ cargo run
   Compiling hello v0.1.0 (/home/atlex00/rust-projects/hello)
warning: unused variable: `stream`
 --> src/main.rs:7:13
  |
7 |         let stream = stream.unwrap();
  |             ^^^^^^ help: if this is intentional, prefix it with an underscore: `_stream`
  |
  = note: `#[warn(unused_variables)]` on by default

warning: 1 warning emitted

    Finished dev [unoptimized + debuginfo] target(s) in 0.98s
     Running `target/debug/hello`
Connection established!
Connection established!
^C
$
  • The connections are reset because the server isn’t currently sending back any data.

Reading the Request

use std::io::prelude::*;
use std::net::TcpListener;
use std::net::TcpStream;

fn main() {
    let listener = TcpListener::bind("127.0.0.1:7878").unwrap();

    for stream in listener.incoming() {
        let stream = stream.unwrap();

        handle_connection(stream);
    }
}

fn handle_connection(mut stream: TcpStream) {
    let mut buffer = [0; 1024];

    stream.read(&mut buffer).unwrap();

    println!("Request: {}", String::from_utf8_lossy(&buffer[..]));
}
  • In the handle_connection function, we’ve made the stream parameter mutable. The reason is that the TcpStream instance keeps track of what data it returns to us internally. It might read more data than we asked for and save that data for the next time we ask for data. It therefore needs to be mut because its internal state might change; usually, we think of “reading” as not needing mutation, but in this case we need the mut keyword.

  • How to read from stream. -> 3 steps.

    1. Declare a buffer on the stack to hold the data. It’s 1024 bytes in the example.
    2. Pass the buffer to stream.read, which will read bytes from the TcpStream and put them in the buffer (stream.read(&mut buffer).unwrap();).
    3. Convert the bytes in the buffer to a string and print that string (String::from_utf8_lossy).

Test:

$ cargo run

# Other terminal
curl localhost:7878 -vvv -H "Host: myserver.com"
*   Trying 127.0.0.1:7878...
* TCP_NODELAY set
* Connected to localhost (127.0.0.1) port 7878 (#0)
> GET / HTTP/1.1
> Host: myserver.com
> User-Agent: curl/7.68.0
> Accept: */*
>
* Empty reply from server
* Connection #0 to host localhost left intact
curl: (52) Empty reply from server

# Cargo run terminal
$ cargo run
    Finished dev [unoptimized + debuginfo] target(s) in 0.00s
     Running `target/debug/hello`
Request: GET / HTTP/1.1
Host: myserver.com
User-Agent: curl/7.68.0
Accept: */*
^C

Writing a Response

First, no HTTP body, just header. Change the handle_connection function as follows.

fn handle_connection(mut stream: TcpStream) {
    let mut buffer = [0; 1024];

    stream.read(&mut buffer).unwrap();

    let response = "HTTP/1.1 200 OK\r\n\r\n";

    stream.write(response.as_bytes()).unwrap();
    stream.flush().unwrap();
}

Returning Real HTML

hello.html (the same location with src)

<!DOCTYPE html>
<html lang="en">
  <head>
    <meta charset="utf-8">
    <title>Hello!</title>
  </head>
  <body>
    <h1>Hello!</h1>
    <p>Hi from Rust</p>
  </body>
</html>

Change the handle_connection function as follows.

use std::fs;
fn handle_connection(mut stream: TcpStream) {
    let mut buffer = [0; 1024];
    stream.read(&mut buffer).unwrap();

    let contents = fs::read_to_string("hello.html").unwrap();

    let response = format!(
        "HTTP/1.1 200 OK\r\nContent-Length: {}\r\n\r\n{}",
        contents.len(),
        contents
    );

    stream.write(response.as_bytes()).unwrap();
    stream.flush().unwrap();
}

Test:

$ curl localhost:7878
<!DOCTYPE html>
<html lang="en">
  <head>
    <meta charset="utf-8">
    <title>Hello!</title>
  </head>
  <body>
    <h1>Hello!</h1>
    <p>Hi from Rust</p>
  </body>
</html>

Validating the Request and Selectively Responding

Returns only GET request. Change the handle_connection function as follows.

fn handle_connection(mut stream: TcpStream) {
    let mut buffer = [0; 1024];
    stream.read(&mut buffer).unwrap();

    let get = b"GET / HTTP/1.1\r\n";

    if buffer.starts_with(get) {
        let contents = fs::read_to_string("hello.html").unwrap();

        let response = format!(
            "HTTP/1.1 200 OK\r\nContent-Length: {}\r\n\r\n{}",
            contents.len(),
            contents
        );

        stream.write(response.as_bytes()).unwrap();
        stream.flush().unwrap();
    } else {
        let contents = String::from("Panic!!");

        let response = format!(
            "HTTP/1.1 401 OK\r\nContent-Length: {}\r\n\r\n{}",
            contents.len(),
            contents
        );

        stream.write(response.as_bytes()).unwrap();
        stream.flush().unwrap();

    }
}

Return error page

Change else part in

else {
    let status_line = "HTTP/1.1 404 NOT FOUND\r\n\r\n";
    let contents = fs::read_to_string("404.html").unwrap();

    let response = format!("{}{}", status_line, contents);

    stream.write(response.as_bytes()).unwrap();
    stream.flush().unwrap();
}

404.html

<!DOCTYPE html>
<html lang="en">
  <head>
    <meta charset="utf-8">
    <title>Hello!</title>
  </head>
  <body>
    <h1>Oops!</h1>
    <p>Sorry, I don't know what you're asking for.</p>
  </body>
</html>
A Touch of Refactoring

Here is the refactored handle_connection function.

fn handle_connection(mut stream: TcpStream) {
    let mut buffer = [0; 1024];
    stream.read(&mut buffer).unwrap();

    let get = b"GET / HTTP/1.1\r\n";

    let (status_line, filename) = if buffer.starts_with(get) {
        ("HTTP/1.1 200 OK\r\n\r\n", "hello.html")
    } else {
        ("HTTP/1.1 404 NOT FOUND\r\n\r\n", "404.html")
    };

    let contents = fs::read_to_string(filename).unwrap();

    let response = format!("{}{}", status_line, contents);

    stream.write(response.as_bytes()).unwrap();
    stream.flush().unwrap();
}

21. Appendix

21.3 C: Derivable Traits

Default for Default Values

The Default trait allows you to create a default value for a type. Deriving Default implements the default function. The derived implementation of the default function calls the default function on each part of the type, meaning all fields or values in the type must also implement Default to derive Default.


Appendixes

A. Naming rule

https://rust-lang.github.io/api-guidelines/naming.html

Note about prelude

https://doc.rust-lang.org/std/prelude/index.html

The prelude is the list of things that Rust automatically imports into every Rust program. It’s kept as small as possible, and is focused on things, particularly traits, which are used in almost every single Rust program.

Note about Rust memory management

The following link was a very good introductive post: https://deepu.tech/memory-management-in-rust/

Super reference how rust allocate memory: https://www.youtube.com/watch?v=rDoqT-a6UFg