Rust - My note of Getting Started (19/21)
What’s this page?
As of Dec. 2021, I work as a DevOps engineer, but I like to solve problems with codes (front, back, whatever. it depends on the purposes.) Thesedays, my motivation about learning Rust surged enough. This page is my memo while I’ve learned with Rust official document so that I can easily remind/refer to the key feafures. Most part of this post consist of quotes from the document, but I also leave my opinions (could be wrong.)
I have experiences on
- Python,
- C++,
- Java,
- Go,
- and Fortran (!!)
1. Getting started
1.1. Installation
- Official installation goes well.
- I leave my installation process to another page.
1.2 Hello, World!
- Rust files always end with
.rs
extension. - The main function is special: it is always the first code that runs in every executable Rust program.
- Rust style is to indent with four spaces, not a tab.
- Using a
!
means that you’re calling a macro instead of a normal function.
1.3 Hello, Cargo!
- In Rust, packages of code are referred to as crates.
cargo new hello_cargo
- Cargo expects your source files to live inside the
src
directory. cargo build
command creates an executable file intarget/debug/hello_cargo
.Cargo.lock
: This file keeps track of the exact versions of dependencies in your project.cargo check
: command quickly checks your code to make sure it compiles but doesn’t produce an executable.- When your project is finally ready for release, you can use
cargo build --release
to compile it with optimizations.
2. Programming a Guessing Game
We can start comment line with
//
.Create variables.
let foo = 5; // immmutable let mut foo = 5; // mutable
let mut guess = String::new();
The::
syntax indicates thatnew
is an associated function of theString
type. An associated function is implemented on a type, in this caseString
, rather than on a particular instance of a String.User input.
use std::io; let mut guess = String::new(); io::stdin() .read_line(&mut guess) .expect("Failed to read line");
The code store a standart input to the variable
guess
as aString
.std::io::stdin
function returns an instance ofstd::io::Stdin
, which is a type that represents a handle to the standard input for your terminal.The job of
read_line
is to take whatever the user types into standard input and place that into a string, so it takes that string as an argument.The
&
indicates that this argument is a reference, which gives you a way to let multiple parts of your code access one piece of data without needing to copy that data into memory multiple times.References are immutable by default. Hence, you need to write
&mut guess
rather than&guess
to make it mutable..expect()
is a potential failuer handling.It’s often wise to introduce a newline and other whitespace to help break up long lines.
read_line
returnsio::Result
Rust has a number of types named
Result
in its standard libraryThe
Result
types are enumerations =enum
, which is a type that can have a fixed set of value.For
Result
, the variants areOk
orErr
.An instance of
io::Result
has an expect method.If you don’t call
expect
, the program will compile, but you’ll get a warning:$ cargo build Compiling guessing_game v0.1.0 (file:///projects/guessing_game) warning: unused `std::result::Result` that must be used --> src/main.rs:10:5 | 10 | io::stdin().read_line(&mut guess); | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | = note: `#[warn(unused_must_use)]` on by default = note: this `Result` may be an `Err` variant, which should be handled Finished dev [unoptimized + debuginfo] target(s) in 0.59s
The set of curly brackets,
{}
, is a placeholder.let x = 5; let y = 10; println!("x = {} and y = {}", x, y);
From Rust version 1.58, format strings are supported!
let x = 5; let y = 10; println!("x = {x} and y = {y}");
Rust doesn’t yet include random number functionality in its standard library. However, the Rust team does provide a
rand
crate.Using crate in
Cargo.toml
.[dependencies] rand = "0.5.5"
- The number
0.5.5
is actually shorthand for^0.5.5
, which means “any version that has a public API compatible with version0.5.5
.” - (My note): This line downloads and start compiling. See under
/target/debug/deps/
. - When you build a project for the first time, Cargo figures out all the versions of the dependencies that fit the criteria and then writes them to the
Cargo.lock
file. When you build your project in the future, Cargo will see that theCargo.lock
file exists and use the versions specified there rather than doing all the work of figuring out versions again. - When you do want to update a crate, Cargo provides another command,
cargo update
, which will ignore theCargo.lock
file and figure out all the latest versions that fit your specifications inCargo.toml
. If that works, Cargo will write those versions to theCargo.lock
file. - by default, Cargo will only look for (middle) versions greater than
0.5.5
and less than0.6.0
.
- The number
(My note, reminder) In
rand::Rng
,Rng
is called associated function ofrand
.use rand::Rng; ... let secret_number = rand::thread_rng().gen_range(1, 101);
Note: A
trait
is a collection of methods defined for an unknown type:Self
. They can access other methods declared in the same trait. https://doc.rust-lang.org/rust-by-example/trait.htmlSimple
match
example.match guess.cmp(&secret_number) { Ordering::Less => println!("Too small!"), Ordering::Greater => println!("Too big!"), Ordering::Equal => println!("You win!"), }
std::cmp::Ordering
is anotherenum
, but the variants forOrdering
areLess
,Greater
, andEqual
.- The
cmp
method compares two values and can be called on anything that can be compared. - A
match
expression is made up of arms. An arm (=>
) consists of a pattern and the code that should be run if the value given to the beginning of the match expression fits that arm’s pattern.
Rust has a strong, static type system. However, it also has type inference.
Integer type examples:
i32
,u32
,i64
.Read the following code with paying attention to the type of
guess
.let mut guess = String::new(); io::stdin() .read_line(&mut guess) .expect("Failed to read line"); let guess: u32 = guess.trim().parse().expect("Please type a number!");
- Rust allows us to shadow the previous value of guess with a new one. This feature is often used in situations in which you want to convert a value from one type to another type.
trim
method on a String instance will eliminate any whitespace at the beginning and end.- When the user presses enter, a newline character is added to the string.
- The
parse
method on strings parses a string into some kind of number, and could easily cause an error (the string contained A👍%, there would be no way to convert that to a number.) - The colon
:
afterguess
tells Rust we’ll annotate the variable’s type.
Make above code more Rust-like.
// from //let guess: u32 = guess.trim().parse().expect("Please type a number!"); // // to let guess: u32 = match guess.trim().parse() { Ok(num) => num, Err(_) => continue, };
- Switching from an expect call to a match expression is how you generally move from crashing on an error to handling the error.
- The underscore,
_
, is a catchall value; in this example, we’re saying we want to match allErr
values, no matter what information they have inside them.
loop
can loop unlimitedly unlessbreak;
appears.
3. Common Programming Concepts
3.1 Variables and Mutability
By default variables are immutable. -> takes advantage of the safety and easy concurrency.
Why Rust encourages you to favor immutability?
- It’s important that we get compile-time errors when we attempt to change a value that we previously designated as immutable because this very situation can lead to bugs.
- But mutability can be very useful.
Like immutable variables, constants are values that are bound to a name and are not allowed to change, but there are a few differences between constants and variables.
- First, you aren’t allowed to use
mut
with constants. - Constants can be declared in any scope, including the global scope, which makes them useful for values that many parts of code need to know about.
- The last difference is that constants may be set only to a constant expression, not the result of a function call or any other value that could only be computed at runtime.
- First, you aren’t allowed to use
An example of constants.
const MAX_POINTS: u32 = 100_000;
- Use all uppercase with underscores between words.
- Underscores can be inserted in numeric literals to improve readability.
Shadowing
fn main() { let x = 5; let x = x + 1; let x = x * 2; println!("The value of x is: {}", x); }
Shadowing is different from marking a variable as
mut
, because we’ll get a compile-time error if we accidentally try to reassign to this variable without using thelet
keyword.The other difference between
mut
and shadowing is that, because we’re effectively creating a new variable when we use the let keyword again, we can change the type of the value but reuse the same name.Shadowing thus spares us from having to come up with different names, such as
spaces_str
andspaces_num
; instead, we can reuse the simplerspaces
name.
My summary of shadowing
This is a good StackOverflow answer: https://stackoverflow.com/a/48696415/9923806
- When you shadow a variable, you created a new variable but with the same name.
- The value of original shadowed variable still exists on a memory. When you overwrite (assign) a new value to the variable, it will drop (free) the original variable, but shadowing doesn’t drop the original value on the memory.
- So, shadowing JUST creates a new variable with a same name! (Let me repeat.)
- cf.
std::mem::drop
fn main() {
let x = 1;
println!("Value: {x} Address: {:p}", &x);
// Save address before shadowing
let addr_first = &x;
let x = 2;
println!("Value: {x} Address: {:p}", &x);
println!("Old value: {}", *addr_first);
}
// Result:
//
// Value: 1 Address: 0x7ffeb2c0bf54
// Value: 2 Address: 0x7ffeb2c0bfc4
// Old value: 1
Unless I put the shadowing lines inside the scope ({}
) I couldn’t find a way to recover it.
You can find shadowing using scope from this official example.
3.2 Data Types
A scalar type represents a single value.
- Examples) integers, floating-point numbers, booleans, and characters.
Integer types.
Length Signed Unsigned 8-bit i8
u8 16-bit i16
u16 32-bit i32
u32 64-bit i64
u64 128-bit i128
u128 arch isize
usize Signed numbers are stored using two’s complement representation.
Interger Literals in Rust.
Number literals Example Decimal 98_222
Hex 0xff
Octal 0o77
Binary 0b1111_0000
Byte (u8 only) b'A'
- We can use underscores in decimals.
Integer types default to
i32
: this type is generally the fastest, even on 64-bit systems.When you’re compiling in debug mode, Rust includes checks for integer overflow that cause your program to panic at runtime if this behavior occurs.
Rust uses the term panicking when a program exits with an error.
When you’re compiling in release mode with the
--release
flag, Rust does not include checks for integer overflow that cause panics.Rust’s floating-point types are
f32
andf64
.The default type is
f64
because on modern CPUs it’s roughly the same speed asf32
but is capable of more precision.Floating-point numbers are represented according to the IEEE-754 standard.
Booleans are one byte in size.
Rust’s
char
type is four bytes in size and represents a Unicode Scalar Value, which means it can represent a lot more than just ASCII. …. your human intuition for what a “character” is may not match up with what a char is in Rust.Compound types:
tuple
andarray
.
Tuple
- Tuples have a fixed length: once declared, they cannot grow or shrink in size.
- Example:
let tup: (i32, f64, u8) = (500, 6.4, 1);
fn main() {
let tup = (500, 6.4, 1);
let (x, y, z) = tup;
println!("The value of y is: {}", y);
}
// The value of y is: 6.4
- We can access a tuple element directly by using a period (
.
) followed by the index of the value we want to access.
let x: (i32, f64, u8) = (500, 6.4, 1);
let five_hundred = x.0;
The tuple without any values, ()
, is a special type that has only one value, also written ()
.
The type is called the unit type and the value is called the unit value.
This is frequently used for unit-like struct.
Another use case of unit-like struct is OK(())
.
Expressions implicitly return the unit value if they don’t return any other value.
Array
- Every element of an array must have the same type.
- We can define like
let a: [i32; 5] = [1, 2, 3, 4, 5];
- If you want to create an array that contains the same value for each element, you can specify the initial value, followed by a semicolon, and then the length of the array in square brackets,
let a = [3; 5];
- You can access elements of an array using indexing,
let first = a[0];
. - What happens if you try to access an element of an array that is past the end of the array? … The compilation didn’t produce any errors, but the program resulted in a runtime error and didn’t exit successfully (panic).
- In many low-level languages, this kind of check is not done, and when you provide an incorrect index, invalid memory can be accessed.
- An array is allocated on stack.
3.3 functions
- Function definitions in Rust start with
fn
and have a set of parentheses after the function name. The curly brackets tell the compiler where the function body begins and ends. - Rust code uses snake case as the conventional style for function and variable names.
- Rust doesn’t care where you define your functions, only that they’re defined somewhere.
- Rust is an expression-based language, this is an important distinction to understand.
- Creating a variable and assigning a value to it with the let keyword is a statement.
- Function definitions are also statements;
- Statements do not return values.
5 + 6
, which is an expression that evaluates to the value11
.- Calling a function is an expression. Calling a macro is an expression. The block that we use to create new scopes,
{}
, is an expression,
{ let x = 3; x + 1 }
- The
x + 1
line without a semicolon at the end, which is unlike most of the lines you’ve seen so far. Expressions do not include ending semicolons.
fn main() {
let y = {
let x = 3;
x + 1
};
println!("The value of y is: {}", y);
// This value of y is: 4
}
Functions with Return Values
fn main() {
let x = plus_one(5);
println!("The value of x is: {}", x);
}
fn plus_one(x: i32) -> i32 {
x + 1
}
- We don’t name return values, but we do declare their type after an arrow (
->
). - The return value of the function is synonymous with the value of the final expression in the block of the body of a function. You can return early from a function by using the return keyword and specifying a value, but most functions return the last expression implicitly.
3.4 Comments
Pass ;)
3.5 Control Flow
if number < 5 {
println!("condition was true");
} else {
println!("condition was false");
}
- Blocks of code associated with the conditions in
if
expressions are sometimes called arms, just like the arms in match expressions. - It’s also worth noting that the condition in this code must be a
bool
. If the condition isn’t abool
, we’ll get an error. - You can have multiple conditions by combining
if
andelse
in anelse if
expression. - Because
if
is an expression, we can use it on the right side of a let statement.let condition = true; let number = if condition { 5 } else { 6 };
- The values that have the potential to be results from each arm of the
if
must be the same type.- Decided at compile time.
- The compiler would be more complex and would make fewer guarantees about the code if it had to keep track of multiple hypothetical types for any variable.
loop
andbreak;
.- You can add the value you want returned after the
break
expression you use to stop theloop
; that value will be returned out of the loop so you can use it
let mut counter = 0;
let result = loop {
counter += 1;
if counter == 10 {
break counter * 2;
}
};
println!("The result is {}", result);
//The result is 20
while
-> If the condition matches, out from the loop.
fn main() {
let mut number = 3;
while number != 0 {
println!("{}!", number);
number -= 1;
}
println!("LIFTOFF!!!");
}
//3!
//2!
//1!
//LIFTOFF!!!
- You could use the
while
construct to loop over the elements of a collection, such as an array.
fn main() {
let a = [10, 20, 30, 40, 50];
let mut index = 0;
while index < 5 {
println!("the value is: {}", a[index]);
index += 1;
}
}
the value is: 10
the value is: 20
the value is: 30
the value is: 40
the value is: 50
- And there is
for
also.
fn main() {
let a = [10, 20, 30, 40, 50];
for element in a.iter() {
println!("the value is: {}", element);
}
}
- An array loop should be use
for
because of safetiness.
fn main() {
for number in (1..4).rev() {
println!("{}!", number);
}
println!("LIFTOFF!!!");
}
rev
reverses the iteration.
4. Understanding Ownership
- Rust has no GC, but Ownership.
4.1 What Is Ownership?
General programming knowledge: stack and heap
- The stack stores values in the order it gets them and removes the values in the opposite order. LIFO = FILO.
- All data stored on the stack must have a known, fixed size. Data with an unknown size at compile time or a size that might change must be stored on the heap instead.
- cf. In a context of computer science, heap is a tree with some special property. That special property of the heap is, the value of a node must be
>=
or<=
to its children. But in a context of programming language, you can think heap is a free memory area which is assined to a program (process) when it’s execution time.
- cf. In a context of computer science, heap is a tree with some special property. That special property of the heap is, the value of a node must be
- The heap is less organized: when you put data on the heap, you request a certain amount of space. The memory allocator finds an empty spot in the heap that is big enough, marks it as being in use, and returns a pointer, which is the address of that location. This process is called allocating on the heap and is sometimes abbreviated as just allocating. Pushing values onto the stack is not considered allocating. Because the pointer is a known, fixed size, you can store the pointer on the stack, but when you want the actual data, you must follow the pointer.
- Pushing to the stack is faster than allocating on the heap because the allocator never has to search for a place to store new data; that location is always at the top of the stack.
- When your code calls a function, the values passed into the function (including, potentially, pointers to data on the heap) and the function’s local variables get pushed onto the stack. When the function is over, those values get popped off the stack.
Ownership addresses the problems,
- Keeping track of what parts of code are using what data on the heap,
- minimizing the amount of duplicate data on the heap,
- and cleaning up unused data on the heap so you don’t run out of space
Once you understand ownership, you won’t need to think about the stack and the heap very often, but knowing that managing heap data is why ownership exists can help explain why it works the way it does.
In Rust, memory is managed through a system of ownership with a set of rules that the compiler checks at compile time. None of the ownership features slow down your program while it’s running.
This blog post is a good reference about GC in Rust:
Sideway: Heap fragmentation in Rust
https://internals.rust-lang.org/t/jemalloc-was-just-removed-from-the-standard-library/8759
… the
std::alloc::System
type to represent the system’s default allocator.
Stack or heap in Rust
We will learn about Box
laaaater (chapter 15).
Ownership Rules
- Each value in Rust has a variable that’s called its owner.
- There can only be one owner at a time.
- When the owner goes out of scope, the value will be dropped.
- The types covered previously are all stored on the stack and popped off the stack when their scope is over, but we want to look at data that is stored on the heap and explore how Rust knows when to clean up that data.
let s = String::from("hello");
- The Type
String
is allocated on the heap and as such is able to store an amount of text that is unknown to us at compile time. - In the case of a string literal (like,
let literal = "I'm a string literal"
), we know the contents at compile time, so the text is hardcoded directly into the final executable. This is why string literals are fast and efficient. But these properties only come from the string literal’s immutability. - With the
String
type, in order to support a mutable, growable piece of text, we need to allocate an amount of memory on the heap, unknown at compile time, to hold the contents. This means:- The memory must be requested from the memory allocator at runtime.
- We need a way of returning this memory to the allocator when we’re done with our
String
. - That first part is done by us: when we call
String::from
, its implementation requests the memory it needs. However, the second part is different. (GC)
- Rust takes a different path: the memory is automatically returned (~free) once the variable that owns it goes out of scope.
- When a variable goes out of scope, Rust calls a special function for us. This function is called
drop
, and it’s where the author ofString
can put the code to return the memory. Rust callsdrop
automatically at the closing curly bracket.
Example.1: Stack
let x = 5;
let y = x;
- The value
5
will stored in the stack. - Make a copy of the value in
x
and bind it toy
.
- Integers are simple values with a known, fixed size, and these two
5
values are pushed onto the stack. - My note: the variables like
x
andy
have no meanings in assembly (a.k.a. compiled code). Only the values5
are stored in real memory stack, and the Rust compiler remembers the each locations of these variablesx
andy
.
Example.2: Heap
let s1 = String::from("hello");
let s2 = s1;
- A
String
is made up of three parts:- A pointer to the memory that holds the contents of the string,
- The length is how much memory, in bytes, the contents of the
String
is currently using, and - The capacity is the total amount of memory, in bytes, that the
String
has received from the allocator. - When we assign
s1
tos2
, theString
data is copied, meaning we copy the pointer, the length, and the capacity that are on the stack. We do not copy the data on the heap that the pointer refers to.
ptr
, len
and capacity
are stored in stack.
The following code returns error at its compile time.
let s1 = String::from("hello");
let s2 = s1;
println!("{}, world!", s1);
- Note: shallow copy and deep copy: from the Python documentation.
- A shallow copy constructs a new compound object and then (to the extent possible) inserts references into it to the objects found in the original.
- A deep copy constructs a new compound object and then, recursively, inserts copies into it of the objects found in the original.
- … OK, deep copy make its copy of object in memory, and shallow copy just refer to the value.
- The concept of copying the pointer, length, and capacity without copying the data probably sounds like making a shallow copy. But because Rust also invalidates the first variable, instead of being called a shallow copy, it’s known as a
move
. - Only
s2
is valid, when it goes out of scope. - Rust will never automatically create “deep” copies of your data. Therefore, any automatic copying can be assumed to be inexpensive in terms of runtime performance.
- If we do want to deeply copy the heap data of the String, not just the stack data, we can use a common method called
clone
.
fn main() {
let s1 = String::from("hello");
let s2 = s1.clone();
println!("s1 = {}, s2 = {}", s1, s2);
}
// s1 = hello, s2 = hello
let x = 5;
let y = x;
println!("x = {}, y = {}", x, y);
- The codes above returns no error because types such as integers that have a known size at compile time are stored entirely on the stack, so copies of the actual values are quick to make.
- Rust has a special annotation called the
Copy
trait that we can place on types like integers that are stored on the stack. - As a general rule, any group of simple scalar values can be
Copy
, and nothing that requires allocation or is some form of resource isCopy
.u32
,bool
,f64
,char
, orTuples
(if they only contain types that are alsoCopy
.
Ownership and Functions
- The following code is failed when its compile time at the line
println!("{}", s)
.
fn main() {
let s = String::from("hello"); // s comes into scope
takes_ownership(s); // s's value moves into the function...
// ... and so is no longer valid here
println!("{}", s)
}
fn takes_ownership(some_string: String) { // some_string comes into scope
println!("{}", some_string);
} // Here, some_string goes out of scope and `drop` is called. The backing
// memory is freed.
Return Values and Scope
- Returning values can also transfer ownership.
- When a variable that includes data on the heap goes out of scope, the value will be cleaned up by
drop
unless the data has been moved to be owned by another variable.
Example:
fn main() {
let s1 = gives_ownership(); // gives_ownership moves its return
// value into s1
let s2 = String::from("hello"); // s2 comes into scope
let s3 = takes_and_gives_back(s2); // s2 is moved into
// takes_and_gives_back, which also
// moves its return value into s3
} // Here, s3 goes out of scope and is dropped. s2 goes out of scope but was
// moved, so nothing happens. s1 goes out of scope and is dropped.
fn gives_ownership() -> String { // gives_ownership will move its
// return value into the function
// that calls it
let some_string = String::from("hello"); // some_string comes into scope
some_string // some_string is returned and
// moves out to the calling
// function
}
// takes_and_gives_back will take a String and return one
fn takes_and_gives_back(a_string: String) -> String { // a_string comes into
// scope
a_string // a_string is returned and moves out to the calling function
}
- What if we want to let a function use a value but not take ownership? It’s quite annoying that anything we pass in also needs to be passed back if we want to use it again -> The solution is references.
4.2 References
let s1 = String::from("hello");
let len = calculate_length(&s1);
fn calculate_length(s: &String) -> usize {
s.len()
}
&s1
is a reference. It doesn’t own the ownership ofs
.- We call having references as function parameters borrowing.
- So what happens if we try to modify something we’re borrowing?
- we can change a value if the variable is mutable,
- but a restricton. You can have only one mutable reference to a particular piece of data in a particular scope.
Mutable reference (pass compiling):
fn main() {
let mut s = String::from("hello");
change(&mut s);
}
fn change(some_string: &mut String) {
some_string.push_str(", world");
}
Mutable reference, but double borrowing (compile error):
fn main() {
let mut s = String::from("hello");
let r1 = &mut s;
let r2 = &mut s;
println!("{}, {}", r1, r2);
}
- Mutable and immutable reference have no compatibility.
- Multiple immutable references are okay.
- Note that a reference’s scope starts from where it is introduced and continues through the last time that reference is used.
fn main() {
let mut s = String::from("hello");
let r1 = &s; // no problem
let r2 = &s; // no problem (multiple immutable references.)
println!("{} and {}", r1, r2);
// r1 and r2 are no longer used after this point
let r3 = &mut s; // no problem
println!("{}", r3);
}
- In Rust, the compiler guarantees that references will never be dangling reference.
- The two rules of references
- At any given time, you can have either one mutable reference or any number of immutable references.
- References must always be valid.
4.3 The Slice Type
- The slice is another data type that does not have ownership.
- Slices let you reference (so doesn’t have ownership) a contiguous sequence of elements in a collection rather than the whole collection.
- For example,
let bytes = s.as_bytes();
:s
isString
andbytes
is an array of bytes.
fn first_word(s: &String) -> usize {
let bytes = s.as_bytes();
for (i, &item) in bytes.iter().enumerate() {
if item == b' ' { //search for the byte that represents the space by using the byte literal syntax.
return i; //If we find a space, we return the position
}
}
s.len() // Otherwise, we return the length of the string by using s.len()
}
fn main() {
let mut s = String::from("hello world");
let word = first_word(&s); // word will get the value 5
s.clear(); // this empties the String, making it equal to ""
// word still has the value 5 here, but there's no more string that
// we could meaningfully use the value 5 with. word is now totally "invalid"!
}
- Because we get a reference to the element from
.iter().enumerate()
, we use&
in the pattern. - Because
word
isn’t connected to the state ofs
at all,word
still contains the value5
.
String Slice
- A string slice is a reference to part of a
String
. - The type that signifies “string slice” is written as
&str
. - Difference between
String
andstr
: https://stackoverflow.com/a/24159933/9923806- I summarized the links here.
fn main() {
let s = String::from("hello world");
let hello = &s[0..5];
let world = &s[6..11];
}
world
containsptr
to the 6th element of thes
and length5
(slice is references).- Rust’s range syntax is
..
. - String slice range indices must occur at valid UTF-8 character boundaries. If you attempt to create a string slice in the middle of a multibyte character, your program will exit with an error. For the purposes of introducing string slices, we are assuming ASCII only in this section;
fn first_word(s: &String) -> &str {
let bytes = s.as_bytes();
for (i, &item) in bytes.iter().enumerate() {
if item == b' ' {
return &s[0..i];
}
}
&s[..]
}
fn main() {
let mut s = String::from("hello world");
let word = first_word(&s); // immutable reference
s.clear(); // error! Because clear needs to truncate the String, it needs to get a mutable reference.
println!("the first word is: {}", word);
}
- My note: Because
word
is immutable, it cant clean (make it""
, which means mutable borrow.)
Example: frequently used slice
fn main() {
let a = [1, 2, 3, 4, 5];
let slice = &a[1..3];
}
This slice has the type &[i32]
.
5. Using Structs to Structure Related Data
A struct, or structure, is a custom data type that lets you name and package together multiple related values that make up a meaningful group.
5.1 Defining and Instantiating Structs
- The pieces of a struct can be different types.
- Unlike with tuples, you’ll name each piece of data so it’s clear what the values mean.
- you don’t have to rely on the order of the data to specify or access the values of an instance.
- To use a struct after we’ve defined it, we create an instance of that struct by specifying concrete values for each of the fields. wiht key: value pairs.
struct User {
username: String,
email: String,
sign_in_count: u64,
active: bool,
}
fn main() {
let user1 = User {
email: String::from("someone@example.com"),
username: String::from("someusername123"),
active: true,
sign_in_count: 1,
};
}
- To get a specific value from a struct, we can use dot notation.
- If the instance is mutable, we can change a value by using the dot notation and assigning into a particular field.
user1.email = String::from("anotheremail@example.com");
- the entire instance must be mutable; Rust doesn’t allow us to mark only certain fields as mutable.
- create instalnce with function sample
fn build_user(email: String, username: String) -> User {
User {
email: email,
username: username,
active: true,
sign_in_count: 1,
}
}
- Because the parameter names and the struct field names are exactly the same, we can use the field init shorthand syntax to rewrite
build_user
so that it behaves exactly the same but doesn’t have the repetition of email and username.
fn build_user(email: String, username: String) -> User {
User {
email,
username,
active: true,
sign_in_count: 1,
}
}
- The syntax
..
specifies that the remaining fields not explicitly set should have the same value as the fields in the given instance.
let user2 = User {
email: String::from("another@example.com"),
username: String::from("anotherusername567"),
..user1
};
user2
has a different value for email
and username
but has the same values for the active
and sign_in_count
fields from user1
.
- You can also define structs that look similar to tuples, called tuple structs. Tuple structs have the added meaning the struct name provides but don’t have names associated with their fields;
struct Color(i32, i32, i32);
let black = Color(0, 0, 0);
Unit-Like Structs Without Any Fields
You can define a struct
without fields:
struct AlwaysEqual;
let subject = AlwaysEqual;
It is called unit-like struct.
Ownership of Struct Data
- We can’t use
&str
instead ofString::from()
in a Structure. It returns error because of itslifetime
.&str
is a “string slice”, so it is a reference. The value of a struct can be reference, butlifetime
issues are there. - (From chapter 10): Every reference in Rust has a
lifetime
, which is the scope for which that reference is valid.
5.2 An Example Program Using Structs
- Practicale tips
- We use structs to add meaning by labeling the data.
struct Rectangle {
width: u32,
height: u32,
}
fn main() {
let rect1 = Rectangle {
width: 30,
height: 50,
};
println!(
"The area of the rectangle is {} square pixels.",
area(&rect1)
);
}
fn area(rectangle: &Rectangle) -> u32 {
rectangle.width * rectangle.height
}
We want to borrow the struct rather than take ownership of it. This way, main retains its ownership and can continue using
rect1
, which is the reason we use the&
in the function signature and where we call the function.By default, the curly brackets
{}
tellprintln!
to use formatting known asDisplay
: output intended for direct end user consumption. Due to this ambiguity, Rust doesn’t try to guess what we want, and structs don’t have a provided implementation ofDisplay
.{:?}
debug or{:#?}
for pretty-print. Require#[derive(Debug)]
jsut before the struct definition as shown below.
#[derive(Debug)]
struct Rectangle {
width: u32,
height: u32,
}
fn main() {
let rect1 = Rectangle {
width: 30,
height: 50,
};
println!("rect1 is {:?}", rect1);
// rect1 is Rectangle { width: 30, height: 50 }
}
I add the annotation to derive the Debug
trait and printing the Rectangle
instance using debug formatting.
Rust has provided a number of traits for us to use with the derive annotation that can add useful behavior to our custom types.
About #[derive(Debug)]
, it’s called an attribute.
https://doc.rust-lang.org/rust-by-example/attribute.html
5.3 Method syntax
- Methods are different from functions in that they’re defined within the context of a struct (or an enum or a trait object).
- The first parameter of methods is always
self
, which represents the instance of the struct the method is being called on. - How to add method on struct? ->
impl
#[derive(Debug)]
struct Rectangle {
width: u32,
height: u32,
}
impl Rectangle {
fn area(&self) -> u32 {
self.width * self.height
}
}
- How to access to method? -> Dot
- Methods can take ownership of
self
, borrowself
immutably as we’ve done above, or borrowself
mutably, just as they can any other parameter. - C, C++ : In other words, if object is a pointer,
object->something()
is similar to(*object).something()
. - Rust doesn’t have an equivalent to the
->
operator; instead, Rust has a feature called automatic referencing and dereferencing. - When you call a method with
object.something()
, Rust automatically adds in&
,&mut
, or*
so object matches the signature of the method.
impl Rectangle {
fn area(&self) -> u32 {
self.width * self.height
}
fn can_hold(&self, other: &Rectangle) -> bool {
self.width > other.width && self.height > other.height
}
}
- We’re allowed to define functions within
impl
blocks that don’t takeself
as a parameter. These are called associated functions because they’re associated with the struct. It’s similar concept to a static method. - Associated functions are often used for constructors that will return a new instance of the struct.
#[derive(Debug)]
struct Rectangle {
width: u32,
height: u32,
}
impl Rectangle {
fn square(size: u32) -> Rectangle {
Rectangle {
width: size,
height: size,
}
}
}
fn main() {
let sq = Rectangle::square(3);
}
- To call this associated function, we use the
::
syntax with the struct name;let sq = Rectangle::square(3)
; is an example. This function is namespaced by the struct: - Each struct is allowed to have multiple
impl
blocks. (My memo) I can add new functions later. - My note: Why we need an associated function? -> my answer: at first the main benefit of method over function is for organization of codes, because we can put all function related to struct in a place. If we write functions instead, we could check all code base which is available with the struct. second, some function could be related real instances of struct, but some fuctions are related with the struct itself, thus they don’t need an instance of the type to work with, like
String::from
.
6 Enums and Pattern Matching
6.1 Defining an Enum
An enum
definition is kind of custom data type.
This YouTube video explaings what is an enumeration type in C (The video is very understandable).
Introduction to Enumerations in C
I can regard enum
as kind of a lookup table in this simplest case.
Example 1: IP (enum
in this example is similar to enum
in C ).
enum IpAddrKind {
V4,
V6,
}
The custom data type is
IpAddrKind
.The variant of type
IpAddrKind
could be eitherV4
orV6
.The variants of the enum are namespaced under its identifier, and we use a double colon to separate the two:
let four = IpAddrKind::V4; let six = IpAddrKind::V6;
Like C, you can label (map) each variable as integer (
let x = IpAddrKind::V4 as i32;
).The reason this is useful is that both values
IpAddrKind::V4
andIpAddrKind::V6
are of the same type:IpAddrKind
. We can then, for instance, define a function that takes anyIpAddrKind
:fn route(ip_kind: IpAddrKind) {}
And we can call this function with either variant:
route(IpAddrKind::V4); route(IpAddrKind::V6);
Example 2: IP with the addess.
We can associate values to the enum
values:
enum IpAddr {
V4(String),
V6(String),
}
let home = IpAddr::V4(String::from("127.0.0.1"));
let loopback = IpAddr::V6(String::from("::1"));
- There’s another advantage to using an enum rather than a struct: each variant can have different types and amounts of associated data:
enum IpAddr { V4(u8, u8, u8, u8), V6(String), } let home = IpAddr::V4(127, 0, 0, 1); let loopback = IpAddr::V6(String::from("::1"));
- The following code are actually written in Rust standard library (because wanting to store IP addresses and encode which kind they are is so common.)
struct Ipv4Addr { // --snip-- } struct Ipv6Addr { // --snip-- } enum IpAddr { V4(Ipv4Addr), V6(Ipv6Addr), }
Example 3: Message.
enum Message {
Quit,
Move { x: i32, y: i32 },
Write(String),
ChangeColor(i32, i32, i32),
}
- If we use the different structs, which each have their own type, we couldn’t as easily define a function to take any of these kinds of messages as we could with the
Message
enum defined above, which is a single type. enum
also can beimpl
emented.impl Message { fn call(&self) { // method body would be defined here } } let m = Message::Write(String::from("hello")); m.call();
My homework: How Rust compiler compile enum
into machine code…?
The Option Enum and Its Advantages Over Null Values
Please learn to learn from at once, eventhough you don’t understand the Option
at the first time.
YouTube video: Rust Programming Tutorial #37 - Option (Enum)
Option
is anotherenum
defined by the standard library.- The
Option
type is used in many places because it encodes the very common scenario in which a value could be something or it could be nothing. Expressing this concept in terms of the type system means the compiler can check whether you’ve handled all the cases you should be handling; this functionality can prevent bugs that are extremely common in other programming languages.- Rust doesn’t have the null feature. … In languages with null, variables can always be in one of two states: null or not-null.
- The problem with null values is that if you try to use a null value as a not-null value, you’ll get an error of some kind.
- However, the concept that null is trying to express is still a useful one: a null is a value that is currently invalid or absent for some reason.
- The problem isn’t really with the concept but with the particular implementation.
- Rust does not have nulls, but it does have an enum that can encode the concept of a value being present or absent. This enum is
Option<T>
, and it is defined by the standard library as follows:enum Option<T> { Some(T), None, }
- You can use
Some
andNone
directly without theOption::
prefix. - For now, all you need to know is that
<T>
means theSome
variant of the Option enum can hold one piece of data of any type.
let some_number = Some(5);
let some_string = Some("a string");
let absent_number: Option<i32> = None;
- If we use
None
rather thanSome
, we need to tell Rust what type ofOption<T>
we have.
Why is having Option<T>
any better than having null?
Because Option<T>
and T
(where T
can be any type) are different types, the compiler won’t let us use an Option<T>
value as if it were definitely a valid value.
In the following code, sum
returns a compile error because Rust doesn’t understand how to add an i8
and an Option<i8>
.
fn main() {
let x: i8 = 5;
let y: Option<i8> = Some(5);
let sum = x + y;
}
This means, when we have a value of a type like i8
in Rust, the compiler will ensure that we always have a valid value.
In other words, you have to convert an Option<T>
to a T
before you can perform T
operations with it (usually done by match
in the next section).
Generally, this helps catch one of the most common issues with null.
6.2 The match
Control Flow Operator
Here is an example. (Tips. From 1999 through 2008, the United States minted quarters with different designs for each of the 50 states on one side.)
#[derive(Debug)] // so we can inspect the state in a minute
enum UsState {
Alabama,
Alaska,
// --snip--
}
enum Coin {
Penny,
Nickel,
Dime,
Quarter(UsState),
}
fn value_in_cents(coin: Coin) -> u8 {
match coin {
Coin::Penny => 1,
Coin::Nickel => 5,
Coin::Dime => 10,
Coin::Quarter(state) => {
println!("State quarter from {:?}!", state);
25
}
}
}
Matching with Option<T>
- Especially in the case of
Option<T>
, when Rust prevents us from forgetting to explicitly handle theNone
case, it protects us from assuming that we have a value when we might have null, thus making the billion-dollar mistake discussed earlier impossible.
The _
Placeholder
The _
will match all the possible cases that aren’t specified before it.
6.3 Concise Control Flow with if let
The if let
syntax lets you combine if
and let
into a less verbose way to handle values that match one pattern while ignoring the rest.
let some_u8_value = Some(0u8);
// no if let syntax
match some_u8_value {
Some(3) => println!("three"),
_ => (),
}
// same as above (with if let syntax)
// note: the pattern is its first arm.
if let Some(3) = some_u8_value {
println!("three");
}
- We can include an
else
with anif let
.
match coin {
Coin::Quarter(state) => println!("State quarter from {:?}!", state),
_ => count += 1,
}
// same as
if let Coin::Quarter(state) = coin {
println!("State quarter from {:?}!", state);
} else {
count += 1;
}
When we use if let
?
Using
if let
means less typing, less indentation, and less boilerplate code. However, you lose the exhaustive checking thatmatch
enforces. Choosing betweenmatch
andif let
depends on what you’re doing in your particular situation and whether gaining conciseness is an appropriate trade-off for losing exhaustive checking.
With if let
, we don’t need to write _
in match.
And the difference between if
is, in place of a condition expression if let
expects the keyword let followed by a pattern, an =
and a scrutinee expression.`
7. Managing Growing Projects with Packages, Crates, and Modules
- Packages: A Cargo feature that lets you build, test, and share crates
- Crates: A tree of modules that produces a library or executable
- Modules and use: Let you control the organization, scope, and privacy of paths
- Paths: A way of naming an item, such as a struct, function, or module
A package can contain multiple binary crates and optionally one library crate.
7.1 Packages and Crates
Packages
- A package is one or more crates that provide a set of functionality.
- A package contains a
Cargo.toml
file that describes how to build those crates. - A package must contain zero or one library crates, and no more.
- When you enter
cargo new hello_cargo
, it creates the packagehello_cargo
, and this is described inCargo.toml
file.- We have a package that only contains
src/main.rs
, meaning it only contains a binary crate namedhello_cargo
.
- We have a package that only contains
Crates
- A crate is a binary or library.
- The crate root is a source file that the Rust compiler starts from and makes up the root module of your crate.
Sample: Cargo.toml
[package]
name = "hello_cargo"
version = "0.1.0"
authors = ["atlex <itsme@myemail.com>"]
edition = "2018"
# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html
[dependencies]
Conventions
src/main.rs
is the crate root of a binary crate with the same name as the package.- If the package directory contains
src/lib.rs
, the package contains a library crate with the same name as the package, andsrc/lib.rs
is its crate root. - If a package contains
src/main.rs
andsrc/lib.rs
, it has two crates: a library and a binary, both with the same name as the package. - A package can have multiple binary crates by placing files in the
src/bin
directory. - A crate will group related functionality together in a scope so the functionality is easy to share between multiple projects.
My note: in terms of a crate, library and binary can be regarded as different elements.
My summary of chapter 7
The section was too verbose for me when it comes to practice, so I summarized some practical memo.
- If you want to create a binary, you need
src/main.rs
.src/main.rs
handles running the program, andsrc/lib.rs
handles all the logic of the task at hand. We can see how it works in the section 12.3.
- You can create a module by creating
src/{{ name_of_module }}.rs
orsrc/{{ name_of_module }}/mod.rs
.- it is not allowed to have both file.
mod {{ name_of_module }};
imports the module.
For details:
7.2 Defining Modules to Control Scope and Privacy
- By using modules, we can group related definitions together and name why they’re related.
- The
use
keyword brings a path into scope. Written in a parent code - The
pub
keyword to make items public. Written in child code - Privacy of an item is whether item can be used by outside code (public) or is an internal implementation detail and not available for outside use (private).
Sample
cargo new --lib restraunt
restraunt
├── Cargo.toml
└── src
└── lib.rs
Write lib.rs
as follows.
mod front_of_house {
mod hosting {
fn add_to_waitlist() {}
fn seat_at_table() {}
}
mod serving {
fn take_order() {}
fn serve_order() {}
fn take_payment() {}
}
}
Module tree in this example
crate
└── front_of_house
├── hosting
│ ├── add_to_waitlist
│ └── seat_at_table
└── serving
├── take_order
├── serve_order
└── take_payment
- If module A is contained inside module B, we say that module A is the child of module B and that module B is the parent of module A.
- The module tree might remind you of the filesystem’s directory tree on your computer; this is a very apt comparison! Just like directories in a filesystem, you use modules to organize your code. And just like files in a directory, we need a way to find our modules.
7.3 Paths for Referring to an Item in the Module Tree
- If we want to call a function, we need to know its path.
- A path can take two forms:
- An absolute path starts from a crate root (like
src/lib.rs
) by using a crate name or a literal crate. - A relative path starts from the current module and uses
self
,super
, or an identifier in the current module.
- An absolute path starts from a crate root (like
- Both absolute and relative paths are followed by one or more identifiers separated by double colons (
::
). - Our preference is to specify absolute paths because it’s more likely to move code definitions and item calls independently of each other.
- Rust’s privacy boundary: the line that encapsulates the implementation details external code isn’t allowed to know about, call, or rely on. So, if you want to make an item like a function or struct private, you put it in a module.
- The way privacy works in Rust is that all items (functions, methods, structs, enums, modules, and constants) are private by default.
- Items in a parent module can’t use the private items inside child modules, but items in child modules can use the items in their ancestor modules.
- Making the module public doesn’t make its contents public.
Sample
src/lib.rs
mod front_of_house {
pub mod hosting {
pub fn add_to_waitlist() {}
}
}
pub fn eat_at_restaurant() {
// Absolute path
crate::front_of_house::hosting::add_to_waitlist();
// Relative path
front_of_house::hosting::add_to_waitlist();
}
- We can also construct relative paths that begin in the parent module by using
super
at the start of the path. This is like starting a filesystem path with the..
syntax.
fn serve_order() {}
mod back_of_house {
fn fix_incorrect_order() {
cook_order();
super::serve_order();
}
fn cook_order() {}
}
We think the
back_of_house
module and theserve_order
function are likely to stay in the same relationship to each other and get moved together should we decide to reorganize the crate’s module tree. Therefore, we usedsuper
so we’ll have fewer places to update code in the future if this code gets moved to a different module.If we use
pub
before a struct definition, we make the struct public, but the struct’s fields will still be private. We can make each field public or not on a case-by-case basis.
mod back_of_house {
pub struct Breakfast {
pub toast: String,
seasonal_fruit: String,
}
impl Breakfast {
pub fn summer(toast: &str) -> Breakfast {
Breakfast {
toast: String::from(toast),
seasonal_fruit: String::from("peaches"),
}
}
}
}
pub fn eat_at_restaurant() {
let mut meal = back_of_house::Breakfast::summer("Rye");
meal.toast = String::from("Wheat");
println!("I'd like {} toast please", meal.toast);
}
We’ve defined a public back_of_house::Breakfast
struct with a public toast
field but a private seasonal_fruit
field. This models the case in a restaurant where the customer can pick the type of bread that comes with a meal, but the chef decides which fruit accompanies the meal based on what’s in season and in stock. The available fruit changes quickly, so customers can’t choose the fruit or even see which fruit they’ll get.
- In contrast, if we make an enum public, all of its variants are then public. We only need the
pub
before theenum
keyword.
mod back_of_house {
pub enum Appetizer {
Soup,
Salad,
}
}
pub fn eat_at_restaurant() {
let order1 = back_of_house::Appetizer::Soup;
let order2 = back_of_house::Appetizer::Salad;
}
7.4 Bringing Paths into Scope with the use
Keyword
- We can bring a path into a scope once and then call the items in that path as if they’re local items with the
use
keyword.
mod front_of_house {
pub mod hosting {
pub fn add_to_waitlist() {}
}
}
use crate::front_of_house::hosting;
//or
//use self::front_of_house::hosting;
pub fn eat_at_restaurant() {
hosting::add_to_waitlist();
hosting::add_to_waitlist();
hosting::add_to_waitlist();
}
Creating Idiomatic use Paths convention
The following use
is bad.
use crate::front_of_house::hosting::add_to_waitlist;
pub fn eat_at_restaurant() {
add_to_waitlist();
add_to_waitlist();
add_to_waitlist();
}
We don’t know in which scope add_to_waitlist
comes from?
Another snippet which has the same probelm (bad).
use std::fmt::Result;
use std::io::Result as IoResult;
When we bring a name into scope with the
use
keyword, the name available in the new scope is private. ->pub use
is called re-exporting, and with this syntax an external code also use them.Note that the standard library (
std
) is also a crate that’s external to our package. Because the standard library is shipped with the Rust language, we don’t need to changeCargo.toml
to includestd
. But we do need to refer to it with use to bring items from there into our package’s scope.Here are smart ways to
use
.
// old
//use std::cmp::Ordering;
//use std::io;
// New!
use std::{cmp::Ordering, io};
// How about this?
//use std::io;
//use std::io::Write;
// Here!
use std::io::{self, Write};
- If we want to bring all public items defined in a path into scope, we can specify that path followed by
*
, the glob operator:
use std::collections::*;
The glob operator is often used when testing to bring everything under test into the tests module.
7.5 Separating Modules into Different Files
src/lib.rs
mod front_of_house;
pub use crate::front_of_house::hosting;
// --snip--
src/front_of_house.rs
pub mod hosting {
pub fn add_to_waitlist() {
// --snip--
}
}
Using a semicolon after mod front_of_house
rather than using a block tells Rust to load the contents of the module from another file with the same name as the module.
My note: sample of an available depth structure
src/lib.rs
:pub use crate::front_of_house::hosting
src/front_of_house.rs
:pub mod hosting;
src/front_of_house/hosting.rs
:pub fn add_to_waitlist() {}
8. Common Collections
- Collections: a number of very useful data structures included in Rust’s standard library.
- The data these collections point to is stored on the heap, which means the amount of data does not need to be known at compile time and can grow or shrink as the program runs.
- Three main collections: vector, string, hashmap
8.1 Storing Lists of Values with Vectors
Vector
Vec<T>
- Vectors can only store values of the same type.
- How to create a new empty vector:
let v: Vec<i32> = Vec::new();
- Rust can infer the type.
- Rust provides the
vec!
macro for convenience. The macro will create a new vector that holds the values you give it.let v = vec![1, 2, 3];
- Updating a vector (input a value to a vector) ->
push
let mut v = Vec::new(); v.push(5);
- A vector is freed when it goes out of scope. When the vector gets dropped, all of its contents are also dropped, meaning those integers it holds will be cleaned up.
- There are two ways to read an element.
&v[2]
andv.get(2)
. &v[2]
returns the value, andv.get(2)
returnsOption<&T>
.&v[100]
will cause the program to panic when it references a nonexistent element (i.e. there is no 100th element inv
). When theget
method is passed an index that is outside the vector, it returnsNone
without panicking.- You would use
get
method if accessing an element beyond the range of the vector happens occasionally under normal circumstances.
Sample code of v.get()
:
let v = vec![1, 2, 3, 4, 5];
let third: &i32 = &v[2];
println!("The third element is {}", third);
match v.get(2) {
Some(third) => println!("The third element is {}", third),
None => println!("There is no third element."),
}
- Mutability of elements: The following code returne compile error at line
v.push(6);
.let mut v = vec![1, 2, 3, 4, 5]; let first = &v[0]; //immutable borrow v.push(6); //mutable borrow println!("The first element is: {}", first); // immutable borrow
- Details about the error: If there isn’t enough room to put all the elements next to each other where the vector currently is. In that case, the reference to the first element would be pointing to deallocated memory. The borrowing rules prevent programs from ending up in that situation.
- Note.
push
andpop
method operate at the last element the vector.
Iterating over the Values in a Vector
// Just referencing
let v = vec![100, 32, 57];
for i in &v {
println!("{}", i);
}
// Change elements
let mut v = vec![100, 32, 57];
for i in &mut v {
*i += 50;
}
*i
is called “dereference operator”. (Details are in Chapter 15)
There are definitely use cases for needing to store a list of items of different types. -> enum!!
enum SpreadsheetCell {
Int(i32),
Float(f64),
Text(String),
}
let row = vec![
SpreadsheetCell::Int(3),
SpreadsheetCell::Text(String::from("blue")),
SpreadsheetCell::Float(10.12),
];
8.2 Storing UTF-8 Encoded Text with Strings
- Rust has only one string type in the core language, which is the string slice
str
that is usually seen in its borrowed form&str
. - When Rustaceans refer to “strings” in Rust, they usually mean the
String
and the string slice&str
types, not just one of those types. - Both
String
and a string slice&str
are UTF-8 encoded. - We use the
to_string
method, which is available on any type that implements theDisplay
trait, as string literals do. - Using the
to_string
method to create aString
from a string literal.// the method works on a literal directly: let s = "initial contents".to_string(); // same as let s = String::from("initial contents");
- We can grow a
String
by using thepush_str
method to append a string.let mut s = String::from("foo"); s.push_str("bar"); // s ~ "foobar"
- The
push_str
method takes a string slice because we don’t necessarily want to take ownership of the parameter. Therefore, the following codes returnss2 is bar
, not a compile error.let mut s1 = String::from("foo"); let s2 = "bar"; s1.push_str(s2); // push_str() don't take ownership of s2 println!("s2 is {}", s2);
Concatenation with the +
Operator or the format!
Macro
The following code contains a lot of knowledge.
let s1 = String::from("Hello, ");
let s2 = String::from("world!");
let s3 = s1 + &s2;
Before discussing about the code above, we should know that the +
operator uses the add
method, whose “signature” looks something like this (but isn’t exact):
fn add(self, s: &str) -> String {
Two discussions: let s3 = s1 + &s2;
s3
takes ownership ofs1
.s1
becomesself
of theadd
function.- The
+
operator uses theadd
method, whose input is&str
, not&String
. The reason we’re able to use&s2
in the call toadd
is that the compiler can coerce the&String
argument into a&str
. When we call theadd
method, Rust uses a deref coercion, which here turns&s2
into&s2[..]
.
- Tip: Append multiple
String
s. Withformat!
macro.The version of the code usinglet s1 = String::from("tic"); let s2 = String::from("tac"); let s3 = String::from("toe"); let s = format!("{}-{}-{}", s1, s2, s3);
format!
is much easier to read and doesn’t take ownership of any of its parameters.format!
macro works in the same way asprintln!
, but instead of printing the output to the screen, it returns aString
with the contents.
Indexing into Strings
Rust doesn’t allow us to get n-th charactor with the index. The following code returns a compile error.
let s1 = String::from("hello");
let h = s1[0];
The reason is…?
- A
String
is a wrapper over aVec<u8>
.- Both
String
and a string slice&str
are UTF-8 encoded.
- Both
- In some languages, a character could be sepreated into two parts (in terms of UFT-8), like,
// The u8 values of the String [224, 164, 168, 224, 164, 174, 224, 164, 184, 224, 165, 141, 224, 164, 164, 224, 165, 135] // is same as the character set ['न', 'म', 'स', '्', 'त', 'े'] // is same in the letter ["न", "म", "स्", "ते"]
Slicing Strings
Example 1. specifing by the number of bytes
let hello = "Здравствуйте";
let s = &hello[0..4];
// s will be Зд
// &hello[0..1] returns panic!
// thread 'main' panicked at 'byte index 1 is not a char boundary;
Example 2. specifing by charactors.
for c in "नमस्ते".chars() {
println!("{}", c);
}
// न
// म
// स
// ्
// त
// े
Example 3. deviding in bytes.
for b in "नमस्ते".bytes() {
println!("{}", b);
}
//224
//164
//// --snip--
//165
//135
Be sure to remember that valid Unicode scalar values may be made up of more than 1 byte.
8.3 Storing Keys with Associated Values in Hash Maps
- Terminology: hash ~ map ~ hash table ~ dictionary ~ associative array
- Hashmap ~ key-value
Example:
use std::collections::HashMap;
let mut scores = HashMap::new();
scores.insert(String::from("Blue"), 10);
- The type
HashMap<K, V>
stores a mapping of keys of typeK
to values of typeV
. - Just like vectors, hash maps store their data on the heap.
- Like vectors, hash maps are homogeneous: all of the keys must have the same type, and all of the values must have the same type.
.insert
takes ownerships of the variables.
Example: Combining two Vec
into a HashMap
.
use std::collections::HashMap;
let teams = vec![String::from("Blue"), String::from("Yellow")];
let initial_scores = vec![10, 50];
let mut scores: HashMap<_, _> =
teams.into_iter().zip(initial_scores.into_iter()).collect();
Accessing Values in a Hash Map
Done by get
method.
use std::collections::HashMap;
let mut scores = HashMap::new();
scores.insert(String::from("Blue"), 10);
scores.insert(String::from("Yellow"), 50);
let team_name = String::from("Blue");
let score = scores.get(&team_name);
Note that the result of scores.get(&team_name)
is Some(&10)
because get
returns an Option<&V>
; if there’s no value for that key in the hash map, get
will return None
.
Iteration
use std::collections::HashMap;
let mut scores = HashMap::new();
scores.insert(String::from("Blue"), 10);
scores.insert(String::from("Yellow"), 50);
for (key, value) in &scores {
println!("{}: {}", key, value);
}
Update a value (3 types)
Case 1. Overwriting a value. insert
simply because an HashMap
has a unique key.
use std::collections::HashMap;
let mut scores = HashMap::new();
scores.insert(String::from("Blue"), 10);
scores.insert(String::from("Blue"), 25);
println!("{:?}", scores); // {"Blue": 25}
Case 2. Only inserting a value if the key has no value. or_insert
method.
use std::collections::HashMap;
let mut scores = HashMap::new();
scores.insert(String::from("Blue"), 10);
scores.entry(String::from("Yellow")).or_insert(50);
scores.entry(String::from("Blue")).or_insert(50);
println!("{:?}", scores); // {"Yellow": 50, "Blue": 10}
entry
method returns an enum called Entry
that represents a value that might or might not exist.
Case 3. Updating a value based on the old value. Use dereference (before understanding Chap. 15, just notice about dereference *
)
use std::collections::HashMap;
fn main(){
let text = "hello world wonderful world";
let mut map = HashMap::new();
for word in text.split_whitespace() {
let count = map.entry(word).or_insert(0);
*count += 1;
}
println!("{:?}", map);
}
The or_insert
method actually returns a mutable reference (&mut V
) to the value for this key. Here we store that mutable reference in the count
variable, so in order to assign to that value, we must first dereference count
using the asterisk (*
).
Hashing Functions
For Hashing algorithm, Rust uses SipHash as of Apr. 2021.
My note: a slide about SipHash.
9. Error Handling
Rust groups errors into two major categories: recoverable and unrecoverable errors.
- For a recoverable error, such as a file not found error, it’s reasonable to report the problem to the user and retry the operation.
- Unrecoverable errors are always symptoms of bugs, like trying to access a location beyond the end of an array.
Rust doesn’t have exceptions.
Instead, it has the type Result<T, E>
for recoverable errors and the panic!
macro that stops execution when the program encounters an unrecoverable error.
9.1 Unrecoverable Errors with panic!
- When the
panic!
macro executes, your program will print a failure message, unwind and clean up the stack, and then quit.
There are two type of panic, unwinding and abort.
- Unwinding: Rust walks back up the stack and cleans up the data from each function it encounters.
- Abort: Memory that the program was using will then need to be cleaned up by the operating system.
Generally the walking back and cleanup in unwinding is a lot of work. Abort is an alternative.
Panic example: Buffer overread
fn main() {
let v = vec![1, 2, 3];
v[99];
}
The key to reading the backtrace is to start from the top and read until you see files you wrote.
RUST_BACKTRACE=1 cargo run
9.2 Recoverable Errors with Result
Recall Result
enum.
enum Result<T, E> {
Ok(T),
Err(E),
}
<T, E>
means “T
andE
are generic type parameters”.
A good error handling example: Open file.
use std::fs::File;
fn main() {
let f = File::open("hello.txt");
let f = match f {
Ok(file) => file,
Err(error) => panic!("Problem opening the file: {:?}", error),
};
}
Run without the file hello.txt
.
$ cargo run
... (warning about _f)
Finished dev [unoptimized + debuginfo] target(s) in 0.00s
Running `target/debug/panic`
thread 'main' panicked at 'Problem opening the file: Os { code: 2, kind: NotFound, message: "No such file or directory" }', src/main.rs:8:23
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
Example: switch operations by a type of errors.
use std::fs::File;
use std::io::ErrorKind;
fn main() {
let f = File::open("hello.txt");
let f = match f {
Ok(file) => file,
Err(error) => match error.kind() {
ErrorKind::NotFound => match File::create("hello.txt") {
Ok(fc) => fc,
Err(e) => panic!("Problem creating the file: {:?}", e),
},
other_error => {
panic!("Problem opening the file: {:?}", other_error)
}
},
};
}
It’s more sophicicated because there is no match
expression.
unwrap_or_else
: Implemented in Option<T>
. Returns the contained Some
value or computes it from a closure.
use std::fs::File;
use std::io::ErrorKind;
fn main() {
let f = File::open("hello.txt").unwrap_or_else(|error| {
if error.kind() == ErrorKind::NotFound {
File::create("hello.txt").unwrap_or_else(|error| {
panic!("Problem creating the file: {:?}", error);
})
} else {
panic!("Problem opening the file: {:?}", error);
}
});
}
unwrap
<- Used frequent (IMO)
The Result<T, E>
type has many helper methods defined on it to do various tasks. One of those methods, called unwrap
, is a shortcut method that is implemented just like the match expression.
If the Result
value is the Ok
variant, unwrap
will return the value inside the Ok
. If the Result
is the Err
variant, unwrap
will call the panic!
macro for us.
use std::fs::File;
fn main() {
let f = File::open("hello.txt").unwrap();
}
expect
Similar to unwrap
, but it lets us also choose the panic!
error message.
use std::fs::File;
fn main() {
let f = File::open("hello.txt").expect("Failed to open hello.txt");
}
?
operator
use std::fs::File;
use std::io;
use std::io::Read;
fn read_username_from_file() -> Result<String, io::Error> {
let mut f = File::open("hello.txt")?;
let mut s = String::new();
f.read_to_string(&mut s)?;
Ok(s)
}
The ?
placed after a Result
value is defined to work
- If the value of the
Result
is anOk
, the value inside theOk
will get returned from this expression, and the program will continue. - If the value is an
Err
, theErr
will be returned from the whole function so the error value gets propagated to the calling code.
Error values that have the ?
operator called on them go through the from
function, defined in the From
trait in the standard library, which is used to convert errors from one type into another.
The ?
operator can be used in functions that have a return type of Result
.
We’re only allowed to use the ?
operator in a function that returns Result
or Option
or another type that implements std::ops::Try
.
When you’re writing code in a function that doesn’t return one of these types, and you want to use ?
when you call other functions that return Result<T, E>
, one technique is to change the return type of your function to be Result<T, E>
if you have no restrictions preventing that.
The main
function is special, and there are restrictions on what its return type must be. One valid return type for main
is ()
, and conveniently, another valid return type is Result<T, E>
.
use std::error::Error;
use std::fs::File;
fn main() -> Result<(), Box<dyn Error>> {
let f = File::open("hello.txt")?;
Ok(())
}
For now, you can read Box<dyn Error>
to mean “any kind of error.”
Tip: Reading a file into a string
Rust provides the convenient fs::read_to_string
function that opens the file, creates a new String
, reads the contents of the file, puts the contents into that String
, and returns it.
use std::fs;
use std::io;
fn read_username_from_file() -> Result<String, io::Error> {
fs::read_to_string("hello.txt")
}
9.3 To panic!
or Not to panic!
Returning Result
is a good default choice when you’re defining a function that might fail.
(My note: user can handle errors. panic!
stop the program!)
The unwrap
and expect
methods are very handy when prototyping, before you’re ready to decide how to handle errors.
In test phase, panic!
is how a test is marked as a failure. (My note: single panic = fail of a whole test)
panic!
is often appropriate if you’re calling external code that is out of your control and it returns an invalid state that you have no way of fixing.
However, when failure is expected, it’s more appropriate to return a Result
than to make a panic!
call.
Functions often have contracts: their behavior is only guaranteed if the inputs meet particular requirements. Panicking when the contract is violated makes sense because a contract violation always indicates a caller-side bug and it’s not a kind of error you want the calling code to have to explicitly handle. … Contracts for a function, especially when a violation will cause a panic, should be explained in the API documentation for the function.
My note: for validation, use Rust’s type system.
Creating Custom Types for Validation
We can make a new type and put the validations in a function to create an instance of the type rather than repeating the validations everywhere. That way, it’s safe for functions to use the new type in their signatures and confidently use the values they receive.
Example:
pub struct Guess {
value: i32,
}
impl Guess {
pub fn new(value: i32) -> Guess {
if value < 1 || value > 100 {
panic!("Guess value must be between 1 and 100, got {}.", value);
}
Guess { value }
}
pub fn value(&self) -> i32 {
self.value
}
}
pub fn value(&self) -> i32
is called getter. This public method is necessary because the value
field of the Guess
struct is private.
10. Generic Types, Traits, and Lifetimes
Generics are abstract stand-ins for concrete types or other properties.
Similar to the way a function takes parameters with unknown values to run the same code on multiple concrete values, functions can take parameters of some generic type instead of a concrete type, like i32
or String
.
The core concept is “removing duplication by extracting a function.”
In case of a function:
- Identify duplicate code.
- Extract the duplicate code into the body of the function and specify the inputs and return values of that code in the function signature.
- Update the two instances of duplicated code to call the function instead.
10.1 Generic Data Types
Tips: By convention, parameter names in Rust are short, often just a letter, and Rust’s type-naming convention is CamelCase. Short for “type,” T
is the default choice of most Rust programmers.
Motivation
Practice: We combine the two functions below.
fn largest_i32(list: &[i32]) -> &i32 {
let mut largest = &list[0];
for item in list {
if item > largest {
largest = item;
}
}
largest
}
fn largest_char(list: &[char]) -> &char {
let mut largest = &list[0];
for item in list {
if item > largest {
largest = item;
}
}
largest
}
First, define a generic function.
fn largest<T>(list: &[T]) -> &T {
- To define a generic function, place type name declarations inside angle brackets,
<>
- This function has one parameter named
list
. - The
list
is a slice of values of typeT
Example
fn largest<T>(list: &[T]) -> &T {
let mut largest = &list[0];
for item in list {
if item > largest {
largest = item;
}
}
largest
}
fn main() {
let number_list = vec![34, 50, 25, 100, 65];
let result = largest(&number_list);
println!("The largest number is {}", result);
let char_list = vec!['y', 'm', 'a', 'q'];
let result = largest(&char_list);
println!("The largest char is {}", result);
}
It looks fine, but unfortunately, it returns compile error.
error[E0369]: binary operation `>` cannot be applied to type `&T`
--> src/main.rs:5:17
|
5 | if item > largest {
| ---- ^ ------- &T
| |
| &T
|
help: consider restricting type parameter `T`
|
1 | fn largest<T: std::cmp::PartialOrd>(list: &[T]) -> &T {
| ^^^^^^^^^^^^^^^^^^^^^^
error: aborting due to previous error
The root cause is, the trait std::cmp::PartialOrd
is not implemented to String
s.
The final answer would be as follows, which is covered in the next section.
fn largest<T: PartialOrd + Copy>(list: &[T]) -> T {
let mut largest = list[0];
for &item in list {
if item > largest {
largest = item;
}
}
largest
}
In Struct Definitions
We can define structs to use a generic type parameter in one or more fields using the <>
syntax.
struct Point<T> {
x: T,
y: T,
}
fn main() {
let integer = Point { x: 5, y: 10 };
let float = Point { x: 1.0, y: 4.0 };
}
To define a Point
struct where x
and y
are both generics but could have different types…
struct Point<T, U> {
x: T,
y: U,
}
In Enum Definitions
Remind Option
in the Chapter 6.
enum Option<T> {
Some(T),
None,
}
Remind Result
in the Chapter 9.
enum Result<T, E> {
Ok(T),
Err(E),
}
When we use generic types
When you recognize situations in your code with multiple struct or enum definitions that differ only in the types of the values they hold, you can avoid duplication by using generic types instead.
Implementation (In Method Definitions)
impl<T>
.
By declaring T
as a generic type after impl
, Rust can identify that the type in the angle brackets in Point
is a generic type rather than a concrete type.
struct Point<T> {
x: T,
y: T,
}
impl<T> Point<T> {
fn x(&self) -> &T {
&self.x
}
}
fn main() {
let p = Point { x: 5, y: 10 };
println!("p.x = {}", p.x());
}
Defined a method named x
on Point<T>
that returns a reference to the data in the field x
.
When we write impl Point<f32>
, methods are implemented only to type f32
.
Performance of Code Using Generics
The good news is that Rust implements generics in such a way that your code doesn’t run any slower using generic types than it would with concrete types.
Monomorphization
Monomorphization is the process of turning generic code into specific code by filling in the concrete types that are used when compiled.
For example, when Rust compiles the following code, it performs monomorphization.
let integer = Some(5);
let float = Some(5.0);
10.2 Traits: Defining Shared Behavior
A trait tells the Rust compiler about functionality a particular type has and can share with other types.
pub trait Summary {
fn summarize(&self) -> String;
}
Interpret as “any type that has the Summary
trait will have the method summarize
.”
Implementing the trait on a type
pub struct NewsArticle {
pub headline: String,
pub location: String,
pub author: String,
pub content: String,
}
impl Summary for NewsArticle {
fn summarize(&self) -> String {
format!("{}, by {} ({})", self.headline, self.author, self.location)
}
}
How to use traits to define functions that accept many different types.
pub fn notify(item: &impl Summary) {
println!("Breaking news! {}", item.summarize());
}
Instead of a concrete type for the item
parameter, we specify the impl
keyword and the trait name. This parameter accepts any type that implements the specified trait.
Trait Bound Syntax
The above is actually syntax sugar for a longer form,
pub fn notify<T: Summary>(item: &T) {
println!("Breaking news! {}", item.summarize());
}
Multi input.
// differenct type
pub fn notify(item1: &impl Summary, item2: &impl Summary)
// same type
pub fn notify<T: Summary>(item1: &T, item2: &T)
Specifying Multiple Trait Bounds with the +
Syntax
We specify in the notify
definition that item
must implement both Display
and Summary
. We can do so using the +
syntax:
pub fn notify(item: &(impl Summary + Display)) {...
//or
pub fn notify<T: Summary + Display>(item: &T) {...
where
clause
More readable, less cluttered.
fn some_function<T, U>(t: &T, u: &U) -> i32
where T: Display + Clone,
U: Clone + Debug
{
// is equal to
fn some_function<T: Display + Clone, U: Clone + Debug>(t: &T, u: &U) -> i32 {
Returning Types that Implement Traits
fn returns_summarizable() -> impl Summary {
Tweet {
username: String::from("horse_ebooks"),
content: String::from(
"of course, as you probably already know, people",
),
reply: false,
retweet: false,
}
}
By using impl Summary
for the return type, we specify that the returns_summarizable
function returns some type that implements the Summary
trait without naming the concrete type.
However, you can only use impl Trait
if you’re returning a single type.
A simple example of trait
Here is the answer of the problem which arrosed at the beginning of this section.
fn largest<T: PartialOrd + Copy>(list: &[T]) -> T {
let mut largest = list[0];
for &item in list {
if item > largest {
largest = item;
}
}
largest
}
fn main() {
let number_list = vec![34, 50, 25, 100, 65];
let result = largest(&number_list);
println!("The largest number is {}", result);
let char_list = vec!['y', 'm', 'a', 'q'];
let result = largest(&char_list);
println!("The largest char is {}", result);
}
Implementations of a trait on any type
custom type ~ struct
or enum
or etc.
impl<T: Display> ToString for T {
// --snip--
}
My note: Trait, associated function, method
In “Rust by example”, there are good examples of associated function & methods.
Associated functions whose first parameter is named
self
are called methods and may be invoked using the method call operator, for example,x.foo()
, as well as the usual function call notation.
cf. Instance methods are also stored in
https://stackoverflow.com/questions/8376953/how-are-instance-methods-stored
https://stackoverflow.com/questions/34149386/are-static-methods-always-held-in-memory
#![allow(unused)]
fn main() {
struct Example {
number: i32,
}
impl Example {
fn boo() {
println!("boo! Example::boo() was called!");
}
fn add_nuber(&mut self) {
self.number += 1;
}
fn get_number(&self) -> i32 {
self.number
}
}
trait Thingy {
fn do_thingy(&self);
}
impl Thingy for Example {
fn do_thingy(&self) {
println!("doing a thing! also, number is {}!", self.number);
}
}
// Test it
let mut dummy = Example{number: 2};
Example::boo(); // boo! Example::boo() was called!
println!("A number of the instance dummy is {:?}",dummy.get_number()); // A number of the instance dummy is 2
dummy.do_thingy(); // doing a thing! also, number is 2!
//dummy.boo(); //error!
}
Traits provide us total abstraction and loose coupling.
10.3 Validating References with Lifetimes
Every reference in Rust has a lifetime, which is the scope for which that reference is valid.
Dangling reference: a reference to an object that no longer exists.
The simplest example: println!("r: {}", r);
is a dangling reference, so Rust compiler returns a compile error:
fn main() {
{
let r; // ---------+-- 'a
// |
{ // |
let x = 5; // -+-- 'b |
r = &x; // | |
} // -+ |
// |
println!("r: {}", r); // |
} // ---------+
}
'a
and 'b
mean the lifetimes of r
and x
, respectively.
Because its scope is larger, we say that “r
lives longer than x
.”
The following function returns a compile error:
fn longest(x: &str, y: &str) -> &str {
if x.len() > y.len() {
x
} else {
y
}
}
longest
function could return x
or y
.
If you use it like let result = longest(string1, string2);
, the compile can’t decide the lifetime of string1
or string2
.
The reason is, the Rust compiler has a borrow checker that compares scopes to determine whether all borrows are valid.
The borrow checker doesn’t know how the lifetimes of x
and y
relate to the lifetime of the return value of the function longest
.
How can we fix it?
Lifetime Annotation Syntax
The names of lifetime parameters must start with an apostrophe ('
) and are usually all lowercase and very short. Most people use the name 'a
.
We place lifetime parameter annotations after the &
of a reference,
&i32 // a reference
&'a i32 // a reference with an explicit lifetime
&'a mut i32 // a mutable reference with an explicit lifetime
The annotations are meant to tell Rust how generic lifetime parameters of multiple references relate to each other. Multi references!!
With this notation, we can specify that the lifetime of x
and y
are same as follows.
fn longest<'a>(x: &'a str, y: &'a str) -> &'a str {
if x.len() > y.len() {
x
} else {
y
}
}
The change means “all the references in the parameters and the return value must have the same lifetime.”
In practice, it means that the lifetime of the reference returned by the longest
function is the same as the smaller of the lifetimes of the references passed in.
Remember, when we specify the lifetime parameters in this function signature, we’re not changing the lifetimes of any values passed in or returned. Rather, we’re specifying that the borrow checker should reject any values that don’t adhere to these constraints.
Ultimately, lifetime syntax is about connecting the lifetimes of various parameters and return values of functions. Once they’re connected, Rust has enough information to allow memory-safe operations and disallow operations that would create dangling pointers or otherwise violate memory safety.
You need to specify lifetime parameters for functions or structs that use references.
Lifetime Elision
The developers programmed these patterns into the compiler’s code so the borrow checker could infer the lifetimes in these situations and wouldn’t need explicit annotations. The patterns programmed into Rust’s analysis of references are called the lifetime elision rules.
Lifetimes on function or method parameters are called input lifetimes, and lifetimes on return values are called output lifetimes.
The 3 rules of the elision:
- Each parameter that is a reference gets its own lifetime parameter. A function with one parameter gets one lifetime parameter, and a function with two parameters gets two separate lifetime parameters
- If there is exactly one input lifetime parameter, that lifetime is assigned to all output lifetime parameters:
- If there are multiple input lifetime parameters, but one of them is
&self
or&mut self
because this is a method, the lifetime ofself
is assigned to all output lifetime parameters.
Example of the rule 1 and rule 2:
fn first_word(s: &str) -> &str {
// Apply rule 1. Same with
fn first_word<'a>(s: &'a str) -> &str {
// Apply rule 2. Same with
fn first_word<'a>(s: &'a str) -> &'a str {
When we implement methods on a struct with lifetimes, we use the same syntax as that of generic type parameters.
My example
src/main.rs
:
struct ImportantExcerpt<'a> {
part: &'a str,
}
impl<'a> ImportantExcerpt<'a> {
fn level(&self) -> i32 {
3
}
}
impl<'a> ImportantExcerpt<'a> {
fn announce_and_return_part(&self, announcement: &str) -> &str {
println!("Attention please: {}", announcement);
self.part
}
}
fn main () {
let s1 = String::from("test1");
let mut s2 = String::from("test2");
let a = ImportantExcerpt{
part: s1.as_str()
};
println!("{}",a.part); // test1
a.announce_and_return_part(s2.as_str()); // Attention please: test2
s2 = String::from("new test2");
println!("{}",s2); // new test2
a.announce_and_return_part(s2.as_str()); //Attention please: new test2
println!("{}",a.level()); // 3
}
And result:
➜ cargo run
Compiling te v0.1.0 (/home/atlex00/rust-project/test)
Finished dev [unoptimized + debuginfo] target(s) in 0.15s
Running `target/debug/test`
test1
Attention please: test2
new test2
Attention please: new test2
3
The Static Lifetime
One special lifetime we need to discuss is 'static
, which means that this reference can live for the entire duration of the program.
All string literals have the 'static
lifetime,
let s: &'static str = "I have a static lifetime.";
// Same as
let s = "I have a static lifetime.";
The text of this string is stored directly in the program’s binary, which is always available. Therefore, the lifetime of all string literals is 'static
.
During learning tokio framework, I realized that it is Common Rust Lifetime Misconceptions.
I need to tell the difference between static variables and static lifetime.
Well yes, but a type with a
'static
lifetime is different from a type bounded by a'static
lifetime. …T: 'static
includes all&'static T
however it also includes all owned types, likeString
,Vec
, etc. The owner of some data is guaranteed that data will never get invalidated as long as the owner holds onto it, therefore the owner can safely hold onto the data indefinitely long, including up until the end of the program. … Key Takeaways
T: 'static
should be read as “T
is bounded by a'static
lifetime”- if
T: 'static
thenT
can be a borrowed type with a'static
lifetime or an owned type- since
T: 'static
includes owned types that meansT
- can be dynamically allocated at run-time
- does not have to be valid for the entire program
- can be safely and freely mutated
- can be dynamically dropped at run-time
- can have lifetimes of different durations
static
as a trait bound is described in the official Rust by example.
Generic Type Parameters, Trait Bounds, and Lifetimes Together
Just an example:
fn main() {
let string1 = String::from("abcd");
let string2 = "xyz";
let result = longest_with_an_announcement(
string1.as_str(),
string2,
"Today is someone's birthday!",
);
println!("The longest string is {}", result);
}
use std::fmt::Display;
fn longest_with_an_announcement<'a, T>(
x: &'a str,
y: &'a str,
ann: T,
) -> &'a str
where
T: Display,
{
println!("Announcement! {}", ann);
if x.len() > y.len() {
x
} else {
y
}
}
➜ cargo run
Compiling te v0.1.0 (/home/atlex00/rust-project/test)
Finished dev [unoptimized + debuginfo] target(s) in 0.16s
Running `target/debug/test`
Announcement! Today is someone's birthday!
The longest string is abcd
11. Writing Automated Tests
11.1 How to Write Tests
A test is done by,
- Set up any needed data or state.
- Run the code you want to test.
- Assert the results are what you expect.
Attribute
Attributes are metadata about pieces of Rust code.
For example, derive
is one of the attributes.
#[derive(Debug)]
struct Rectangle {
width: u32,
height: u32,
}
To change a function into a test function, add #[test]
on the line before fn
.
To test, run cargo test
.
When we make a new library project with Cargo, a test module with a test function in it is automatically generated for us.
#[test]
annotation
This is the default test file.
#[cfg(test)]
mod tests {
#[test]
fn it_works() {
assert_eq!(2 + 2, 4);
}
}
#[test]
attribute indicates fn it_works
is a test function.
Run the test:
$ cargo test
Finished test [unoptimized + debuginfo] target(s) in 0.00s
Running target/debug/deps/adder-6f6d09e2972de52b
running 1 test
test tests::it_works ... ok
test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out
Doc-tests adder
running 0 tests
test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out
tests::it_works
is the name of the generated test function.
Note that measured
is a result of benchmark test.
Sideway: Benchmark in Rust
Because the benchmark feature isn’t available in the stable channel, you should if you want to use benchmark feature.
https://doc.rust-lang.org/unstable-book/library-features/test.html
rustup install nightly
You should install nightly channel, unless you’ll get an error like,
$ cargo bench
Compiling adder v0.1.0 (/home/atlex00/rust-projects/adder)
error[E0554]: `#![feature]` may not be used on the stable release channel
--> src/lib.rs:1:1
|
1 | #![feature(test)]
| ^^^^^^^^^^^^^^^^^
error: aborting due to previous error
src/lib.rs
#![feature(test)]
extern crate test;
pub fn add_two(a: i32) -> i32 {
a + 2
}
#[cfg(test)]
mod tests {
use super::*;
use test::Bencher;
#[test]
fn it_works() {
assert_eq!(4, add_two(2));
}
#[bench]
fn bench_add_two(b: &mut Bencher) {
b.iter(|| add_two(2));
}
}
Run a benchmark:
$ cargo +nightly bench
Compiling adder v0.1.0 (/home/atlex/rust-projects/adder)
Finished bench [optimized] target(s) in 0.60s
Running unittests (target/release/deps/adder-8d2056bd46123ee2)
running 2 tests
test tests::it_works ... ignored
test tests::bench_add_two ... bench: 0 ns/iter (+/- 0)
test result: ok. 0 passed; 0 failed; 1 ignored; 1 measured; 0 filtered out; finished in 1.08s
Rust runs our benchmark a number of times, and then takes the average.
about Doc-tests
We’ll learn about it in Chapter 14, but in a nut shell,
- Triple slash
///
is a special comment, called Documentation comment. ///
supports Markdown notation.- Functions in a documentation comments are tested automatically.
assert!
macro
We give the assert!
macro an argument that evaluates to a Boolean
.
If the value is true
, assert!
does nothing and the test passes.
If the value is false
, the assert!
macro calls the panic!
macro, which causes the test to fail.
You can put second parameter for a custom asserting message.
assert_eq!
and assert_ne!
Under the surface, the assert_eq!
and assert_ne!
macros use the operators ==
and !=
, respectively.
The values being compared must implement the PartialEq
and Debug
traits.
Derivable Traits
https://doc.rust-lang.org/book/appendix-03-derivable-traits.html
The derive
attribute generates code that will implement a trait with its own default implementation on the type you’ve annotated with the derive
syntax.
should_panic
attribute
This attribute makes a test pass if the code inside the function panics.
Example:
pub struct Guess {
value: i32,
}
impl Guess {
pub fn new(value: i32) -> Guess {
if value < 1 || value > 100 {
panic!("Guess value must be between 1 and 100, got {}.", value);
}
Guess { value }
}
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
#[should_panic]
fn greater_than_100() {
Guess::new(200);
}
}
Tests that use should_panic
can be imprecise because they only indicate that the code has caused some panic.
Using expected
parameter to the should_panic
attributes makes the test more precise.
expected
parameter is a substring of the message which the function panics with.
...
} else if value > 100 {
panic!(
"Guess value must be less than or equal to 100, got {}.",
value
);
}
...
#[cfg(test)]
mod tests {
use super::*;
#[test]
#[should_panic(expected = "Guess value must be less than or equal to 100")]
fn greater_than_100() {
Guess::new(200);
}
}
It returns Ok(())
when the test passes and an Err
with a String
inside when the test fails.
- Writing tests so they return a
Result<T, E>
enables you to use the question mark operator in the body of tests. - You can’t use the
#[should_panic]
annotation on tests that useResult<T, E>
. Instead, you should return anErr
value directly when the test should fail.
Using Result<T, E>
in Tests
#[cfg(test)]
mod tests {
#[test]
fn it_works() -> Result<(), String> {
if 2 + 2 == 4 {
Ok(())
} else {
Err(String::from("two plus two does not equal four"))
}
}
}
11.2 Controlling How Tests Are Run
The default behavior of the binary produced by cargo test
is to run all the tests in parallel and capture output generated during test runs, preventing the output from being displayed and making it easier to read the output related to the test results.
Because the tests are running at the same time, make sure your tests don’t depend on each other or on any shared state, including a shared environment, such as the current working directory or environment variables.
If you don’t want to run the tests in parallel, use --test-threads
option like cargo test -- --test-threads=1
. --
here is called “seperator.”
If we want to see printed values for passing tests as well, we can tell Rust to also show the output of successful tests at the end with --show-output
.
We can pass the name of any test function to cargo test to run only that test: cargo test {{ the name of function }}
, but we can’t specify the names of multiple tests in this way.
We can specify part of a test name, and any test whose name matches that value will be run.
Sometimes a few specific tests can be very time-consuming to execute, so you might want to exclude them during most runs of cargo test
. Use ignore
attribute.
src/lib.rs
#[test]
fn it_works() {
assert_eq!(2 + 2, 4);
}
#[test]
#[ignore]
fn expensive_test() {
// code that takes an hour to run
}
If we want to run only the ignored tests, we can use cargo test -- --ignored
.
11.3 Test Organization (I should read again when I need it in my project)
The Rust community thinks about tests in terms of two main categories: unit tests and integration tests.
Unit Tests
The convention is to create a module named tests
in each file to contain the test functions and to annotate the module with cfg(test)
.
You’ll use #[cfg(test)]
to specify that they shouldn’t be included in the compiled result.
Integration Tests
To create integration tests, you first need a tests
directory at the top level of our project directory, next to src
. Cargo knows to look for integration test files in this directory.
We don’t need to annotate any code in tests/integration_test.rs
with #[cfg(test)]
.
Each file in the tests
directory is a separate crate, so we need to bring our library into each test crate’s scope.
tests/integration_test.rs
in a project adder
.
use adder;
#[test]
fn it_adds_two() {
assert_eq!(4, adder::add_two(2));
}
12. An I/O Project: Building a Command Line Program
In this tutorial, we write a clone of grep
command.
12.1 Accepting Command Line Arguments
- The function
std::env::args()
returns an iterator of the command line arguments. - We can call the
collect
method on an iterator to turn it into a collection (such a vector). - Note:
std::env::args()
will panic if any argument contains invalid Unicode. For invalid Unicode, usestd::env::args_os
instead
use std::env;
fn main() {
let args: Vec<String> = env::args().collect();
println!("{:?}", args);
}
Result:
$ cargo run 1starg 2ndarg
Compiling iptables_viewer v0.1.0 (/path/to/your/project)
Finished dev [unoptimized + debuginfo] target(s) in 0.25s
Running `target/debug/project-name 1starg 2ndarg`
["target/debug/project-name", "1starg", "2ndarg"]
- The first value in the vector is
target/debug/project-name
, which is the name of our binary. - The first argument is reffered as
&args[1]
in the program. - The Type of arguments is
&str
.
12.2 Reading a File
The following snippet would be refactored in the next section 12.3.
use std::fs;
let contents = fs::read_to_string(filename)
.expect("Something went wrong reading the file");
println!("With text:\n{}", contents);
fs::read_to_string
takes the filename, opens that file, and returns aResult<String>
of the file’s contents.
12.3 Refactoring to Improve Modularity and Error Handling
I’ve learned general programming concepts in this chapter.
In a nutshell: main.rs
handles running the program, and lib.rs
handles all the logic of the task at hand.
Here are the reasons:
- If we continue to grow our program inside
main
, the number of separate tasks the main function handles will increase. - The more variables we have in scope, the harder it will be to keep track of the purpose of each. It’s best to group the configuration variables into one structure to make their purpose clear.
- The error message
Something went wrong reading the file
is not clear. - It would be best if all the error-handling code were in one place so future maintainers had only one place to consult in the code if the error-handling logic needed to change.
- The Rust community has developed a process to use as a guideline for splitting the separate concerns of a binary program when main starts getting large.
- Split your program into a
main.rs
and alib.rs
and move your program’s logic tolib.rs
. - As long as your command line parsing logic is small, it can remain in
main.rs
. - When the command line parsing logic starts getting complicated, extract it from
main.rs
and move it tolib.rs
.
- Split your program into a
- The responsibilities that remain in the main function after this process should be limited to the following:
- Calling the command line parsing logic with the argument values
- Setting up any other configuration
- Calling a
run
function inlib.rs
- Handling the error if
run
returns an error
Based on this best practices, we can do
- Extracting the argument parser (
prse_config
function) - Grouping configuration values (
Config
struct)
Note: Using primitive values when a complex type would be more appropriate is an anti-pattern known as primitive obsession.
This is the refactored version:
use std::env;
use std::fs;
fn main() {
let args: Vec<String> = env::args().collect();
let config = parse_config(&args);
println!("Searching for {}", config.query);
println!("In file {}", config.filename);
let contents = fs::read_to_string(config.filename)
.expect("Something went wrong reading the file");
println!("With text:\n{}", contents);
}
struct Config {
query: String,
filename: String,
}
fn parse_config(args: &[String]) -> Config {
let query = args[1].clone();
let filename = args[2].clone();
Config { query, filename }
}
If you create a file foo.txt
:
➜ cargo run ar1 foo.txt
Finished dev [unoptimized + debuginfo] target(s) in 0.00s
Running `target/debug/minigrep ar1 foo.txt`
Searching for ar1
In file foo.txt
With text:
I'm in foo.txt.
There’s a tendency among many Rustaceans to avoid using clone
to fix ownership problems because of its runtime cost. We will learn more efficient way in Chapter 13.
The next improvements are:
- Creating the
parse_config
as a constructor. Making this change will make the code more idiomatic. - Improving the error handling.
- Returning a
Result
fromconstructor
instead of callingpanic!
, so thatmain
function can exit the process more cleanly in the error case.
impl Config {
fn new(args: &[String]) -> Result<Config, &str> {
if args.len() < 3 {
return Err("not enough arguments");
}
let query = args[1].clone();
let filename = args[2].clone();
Ok(Config { query, filename })
}
}
// --snip--
let config = Config::new(&args).unwrap_or_else(|err| {
println!("Problem parsing arguments: {}", err);
process::exit(1);
});
The unwrap_or_else
function is, if the value is an Err
value, this method calls the code in the closure, which is an anonymous function we define and pass as an argument to unwrap_or_else
.
Next, following the next best practice, we’ll create run
function.
Calling a
run
function inlib.rs
fn run(config: Config) -> Result<(), Box<dyn Error>> {
let contents = fs::read_to_string(config.filename)?;
println!("With text:\n{}", contents);
Ok(())
}
Box<dyn Error>
is colled a trait object, and we will review it in chapter 17.
For now, we can understand that Box<dyn Error>
means the function will return a type that implements the Error trait, but we don’t have to specify what particular type the return value will be.
Recall that ?
returns Err
from the whole function so the error value gets propagated.
This
Ok(())
syntax might look a bit strange at first, but using()
like this is the idiomatic way to indicate that we’re callingrun
for its side effects only; it doesn’t return a value we need.
If a function returns ()
(inside OK(())
) in the success case, and we don’t care about the returned value, we can use if let
rather than unwrap_or_else
.
The last refactoring is splitting code into a library crate. And here is the final result of the section.
src/lib.rs
:
use std::error::Error;
use std::fs;
pub struct Config {
pub query: String,
pub filename: String,
}
impl Config {
pub fn new(args: &[String]) -> Result<Config, &str> {
if args.len() < 3 {
return Err("not enough arguments");
}
let query = args[1].clone();
let filename = args[2].clone();
Ok(Config { query, filename })
}
}
pub fn run(config: Config) -> Result<(), Box<dyn Error>> {
let contents = fs::read_to_string(config.filename)?;
println!("With text:\n{}", contents);
Ok(())
}
src/main.rs
:
use std::env;
use std::process;
use minigrep::Config;
fn main() {
let args: Vec<String> = env::args().collect();
let config = Config::new(&args).unwrap_or_else(|err| {
println!("Problem parsing arguments: {}", err);
process::exit(1);
});
println!("Searching for {}", config.query);
println!("In file {}", config.filename);
if let Err(e) = minigrep::run(config) {
println!("Application error: {}", e);
process::exit(1);
}
}
12.4 Developing the Library’s Functionality with Test-Driven Development
In this chapter, the TDD process is
- Write a test that fails and run it to make sure it fails for the reason you expect.
- Write or modify just enough code to make the new test pass.
- Refactor the code you just added or changed and make sure the tests continue to pass.
- Repeat from step 1!
Before start TDD process, please delete unrequired println!
lines.
Writing a Failing Test
In src/lib.rs
:
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn one_result() {
let query = "duct";
let contents = "\
Rust:
safe, fast, productive.
Pick three.";
assert_eq!(vec!["safe, fast, productive."], search(query, contents));
}
}
Here, we defined the function search
, which was not defined yet.
But it’s OK for this step (this is the TDD).
Writing Code to Pass the Test
In src/lib.rs
:
pub fn search<'a>(query: &str, contents: &'a str) -> Vec<&'a str> {
let mut results = Vec::new();
for line in contents.lines() {
if line.contains(query) {
results.push(line);
}
}
results
}
Maybe it’s better time to review the lifetime chapter.
And use it from run()
(this part will be refactored in the chapter 13):
pub fn run(config: Config) -> Result<(), Box<dyn Error>> {
let contents = fs::read_to_string(config.filename)?;
for line in search(&config.query, &contents) {
println!("{}", line);
}
Ok(())
}
12.5 Working with Environment Variables
Add a new test case with search_case_insensitive
function.
We want to use the function when we specify an environment variable.
The way of “making all functions ascase insensitive” is to all related strings to lower cases (for now, we don’t think about general UTF-8 characters).
In mod tests
of src/lib.rs
:
#[test]
fn case_insensitive() {
let query = "rUsT";
let contents = "\
Rust:
safe, fast, productive.
Pick three.
Trust me.";
assert_eq!(
vec!["Rust:", "Trust me."],
search_case_insensitive(query, contents)
);
}
Implement the function:
pub fn search_case_insensitive<'a>(
query: &str,
contents: &'a str,
) -> Vec<&'a str> {
let query = query.to_lowercase();
let mut results = Vec::new();
for line in contents.lines() {
if line.to_lowercase().contains(&query) {
results.push(line);
}
}
results
}
- We shadowed
query
, and the type ofquery
isString
(because ofto_lowercase
method).
Now, add an environment variable part.
Change Config
struct:
pub struct Config {
pub query: String,
pub filename: String,
pub case_sensitive: bool,
}
Change run
fuction (controll flow):
let results = if config.case_sensitive {
search(&config.query, &contents)
} else {
search_case_insensitive(&config.query, &contents)
};
for line in results {
println!("{}", line);
}
Read an environment variable in the constructor:
let query = args[1].clone();
let filename = args[2].clone();
let case_sensitive = env::var("CASE_INSENSITIVE").is_err();
Ok(Config {
query,
filename,
case_sensitive,
})
CASE_INSENSITIVE
environment variable could be set to anything.is_error
unwraps aResult
and returns boolean.
The next section is the last section of the chapter, so I’ll paste the final result at the end of the chapter.
12.6 Writing Error Messages to Standard Error Instead of Standard Output
One thing to learn: eprintln!
will output a message to the stdout
.
Here is the final result:
src/main.rs
:
use std::env;
use std::process;
use minigrep::Config;
fn main() {
let args: Vec<String> = env::args().collect();
let config = Config::new(&args).unwrap_or_else(|err| {
println!("Problem parsing arguments: {}", err);
process::exit(1);
});
if let Err(e) = minigrep::run(config) {
eprintln!("Application error: {}", e);
process::exit(1);
}
}
src/lib.rs
:
use std::error::Error;
use std::fs;
use std::env;
pub struct Config {
pub query: String,
pub filename: String,
pub case_sensitive: bool,
}
impl Config {
pub fn new(args: &[String]) -> Result<Config, &str> {
if args.len() < 3 {
return Err("not enough arguments");
}
let query = args[1].clone();
let filename = args[2].clone();
let case_sensitive = env::var("CASE_INSENSITIVE").is_err();
Ok(Config {
query,
filename,
case_sensitive,
})
}
}
pub fn run(config: Config) -> Result<(), Box<dyn Error>> {
let contents = fs::read_to_string(config.filename)?;
let results = if config.case_sensitive {
search(&config.query, &contents)
} else {
search_case_insensitive(&config.query, &contents)
};
for line in results {
println!("{}", line);
}
Ok(())
}
pub fn search<'a>(query: &str, contents: &'a str) -> Vec<&'a str> {
let mut results = Vec::new();
for line in contents.lines() {
if line.contains(query) {
results.push(line);
}
}
results
}
pub fn search_case_insensitive<'a>(
query: &str,
contents: &'a str,
) -> Vec<&'a str> {
let query = query.to_lowercase();
let mut results = Vec::new();
for line in contents.lines() {
if line.to_lowercase().contains(&query) {
results.push(line);
}
}
results
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn one_result() {
let query = "duct";
let contents = "\
Rust:
safe, fast, productive.
Pick three.";
assert_eq!(vec!["safe, fast, productive."], search(query, contents));
}
#[test]
fn case_insensitive() {
let query = "rUsT";
let contents = "\
Rust:
safe, fast, productive.
Pick three.
Trust me.";
assert_eq!(
vec!["Rust:", "Trust me."],
search_case_insensitive(query, contents)
);
}
}
If you want to do logging properly, use log
crate.
13. Functional Language Features: Iterators and Closures
Programming in a functional style often includes using functions as values by passing them in arguments, returning them from other functions, assigning them to variables for later execution, and so forth.
13.1 Closures: Anonymous Functions that Can Capture Their Environment
An example of a closure.
let expensive_closure = |num| {
println!("calculating slowly...");
thread::sleep(Duration::from_secs(2));
num
};
- To define a closure, we start with a pair of vertical pipes (
|
), inside which we specify the parameters to the closure. - Unlike functions, closures can capture values from the scope in which they’re defined.
We can use the closure like this:
let i: i32 = 5;
println("{}",expensive_closure(i)));
We don’t need to define type of closure. The Rust compiler infer its parameters and return type. But, closure definitions will have one concrete type inferred for each of their parameters and for their return value.
But we can also define types explicitly:
let expensive_closure = |num: u32| -> u32 {
println!("calculating slowly...");
thread::sleep(Duration::from_secs(2));
num
};
Memoization, lazy evaluation
We can create a struct that will hold the closure and the resulting value of calling the closure (not to calculate expensive code multiple times).
We need to specify the type of the closure, because a struct definition needs to know the types of each of its fields.
Example:
struct Cacher<T> where T: Fn(u32) -> u32, { calculation: T, value: Option<u32>, }
- The
Cacher
struct has acalculation
field of the generic typeT
. - The trait bounds on
T
specify that it’s a closure by using theFn
trait. - Any closure we want to store in the calculation field must have one
u32
parameter (specified within the parentheses afterFn
) - ,and must return a
u32
(specified after the->
).
- The
Fn
Traits
All closures implement at least one of the traits: Fn
, FnMut
, or FnOnce
.
FnOnce
consumes the variables it captures from its enclosing scope, known as the closure’s environment. To consume the captured variables, the closure must take ownership of these variables and move them into the closure when it is defined. TheOnce
part of the name represents the fact that the closure can’t take ownership of the same variables more than once, so it can be called only once.FnMut
can change the environment because it mutably borrows values.Fn
borrows values from the environment immutably.
Implement the example:
impl<T> Cacher<T>
where
T: Fn(u32) -> u32,
{
fn new(calculation: T) -> Cacher<T> {
Cacher {
calculation,
value: None,
}
}
fn value(&mut self, arg: u32) -> u32 {
match self.value {
Some(v) => v,
None => {
let v = (self.calculation)(arg);
self.value = Some(v);
v
}
}
}
}
And use it:
fn generate_workout(intensity: u32, random_number: u32) {
let mut expensive_result = Cacher::new(|num| {
println!("calculating slowly...");
thread::sleep(Duration::from_secs(2));
num
});
if intensity < 25 {
println!("Today, do {} pushups!", expensive_result.value(intensity));
println!("Next, do {} situps!", expensive_result.value(intensity));
} else {
if random_number == 3 {
println!("Take a break today! Remember to stay hydrated!");
} else {
println!(
"Today, run for {} minutes!",
expensive_result.value(intensity)
);
}
}
}
Closures have an additional capability that functions don’t have: they can capture their environment and access variables from the scope in which they’re defined.
Capturing the Environment with Closures
The following snippet returns an error because equal_to_x
is a function, not closure.
fn main() {
let x = 4;
fn equal_to_x(z: i32) -> bool {
z == x
}
let y = 4;
assert!(equal_to_x(y));
}
error[E0434]: can't capture dynamic environment in a fn item
--> src/main.rs:5:14
|
5 | z == x
| ^
|
= help: use the `|| { ... }` closure form instead
Here is the closure version
fn main() {
let x = 4;
let equal_to_x = |z| z == x;
let y = 4;
assert!(equal_to_x(y));
}
If you want to force the closure to take ownership of the values it uses in the environment, you can use the move
keyword before the parameter list.
Here is the move
example (returns compile error):
fn main() {
let x = vec![1, 2, 3];
let equal_to_x = move |z| z == x;
println!("can't use x here: {:?}", x);
let y = vec![1, 2, 3];
assert!(equal_to_x(y));
}
13.2 Processing a Series of Items with Iterators
In Rust, iterators are lazy, meaning they have no effect until you call methods that consume the iterator to use it up.
We can create an iterater from Vec<T>
explicitly:
let v1 = vec![1, 2, 3];
let v1_iter = v1.iter();
The Iterator Trait and the next Method
The definition of the Iterator
trait in the standard library looks like this:
pub trait Iterator {
type Item;
fn next(&mut self) -> Option<Self::Item>;
// methods with default implementations elided
}
- Implementing the
Iterator
trait requires that you also define anItem
type.- The
type
at the linetype Item;
is called an associated type. Associated types connect a type placeholder with a trait such that the trait method definitions can use these placeholder types in their signatures. We will learn this at the later chapter “19.2. Advanced Traits”. - We can create an alias of a type to increase readability by
type
keyword..
- The
- This
Item
type is used in the return type of thenext
method. = TheItem
type will be the type returned from the iterator. - We can call the
next
method on iterators directly. - We don’t need to make an iterator mutable when we used a
for
loop because the loop took ownership of the iterator and made it mutable behind the scenes. - The value we get from the calls to
next
are immutable references to the values in the vector. - If we want to create an iterator that takes ownership of
vec
and returns owned values, we can callinto_iter
instead ofiter
. Similarly, if we want to iterate over mutable references, we can calliter_mut
instead ofiter
.
Methods that Consume the Iterator
Methods that call next
are called consuming adaptors, because calling them uses up the iterator.
An example of the consuming adaptor is sum()
method.
After use sum()
, you can’t reuse the iterator.
Methods that Produce Other Iterators
A method iterator adaptors allow you to change iterators into different kinds of iterators.
The method map
, which takes a closure to call on each item to produce a new iterator, is an example.
But because all iterators are lazy, you have to call one of the consuming adaptor methods to get results from calls to iterator adaptors.
collect()
method consumes the iterator and collects the resulting values into a collection data type.
Here is the good snippet how to use iter
, map
, and collect
:
let v1: Vec<i32> = vec![1, 2, 3];
let v2: Vec<_> = v1.iter().map(|x| x + 1).collect();
assert_eq!(v2, vec![2, 3, 4]);
13.3 Improving Our I/O Project
Refactor two components using iterators:
struct Config
pub fn search
main
function accordingly
Config
before:
impl Config {
pub fn new(args: &[String]) -> Result<Config, &str> {
if args.len() < 3 {
return Err("not enough arguments");
}
let query = args[1].clone();
let filename = args[2].clone();
let case_sensitive = env::var("CASE_INSENSITIVE").is_err();
Ok(Config {
query,
filename,
case_sensitive,
})
}
}
Config
after:
impl Config {
pub fn new(mut args: env::Args) -> Result<Config, &'static str> {
args.next();
let query = match args.next() {
Some(arg) => arg,
None => return Err("Didn't get a query string"),
};
let filename = match args.next() {
Some(arg) => arg,
None => return Err("Didn't get a file name"),
};
let case_sensitive = env::var("CASE_INSENSITIVE").is_err();
Ok(Config {
query,
filename,
case_sensitive,
})
}
}
- We eliminated the
clone
s from the constructor. - After refactoring, we don’t access List, instead, we use iterator.
- Note that the iterator mutatess by iterating over it.
- The signature of the constructor has another lifetime parameter. If you omit the
'static
, the compiler returns error below:error[E0106]: missing lifetime specifier --> src/lib.rs:12:55 | 12 | pub fn new(mut args: env::Args) -> Result<Config, &str> { | ^ expected named lifetime parameter | = help: this function's return type contains a borrowed value with an elided lifetime, but the lifetime cannot be derived from the arguments
pub fn search
before:
pub fn search<'a>(query: &str, contents: &'a str) -> Vec<&'a str> {
let mut results = Vec::new();
for line in contents.lines() {
if line.contains(query) {
results.push(line);
}
}
results
}
pub fn search
after:
pub fn search<'a>(query: &str, contents: &'a str) -> Vec<&'a str> {
contents
.lines()
.filter(|line| line.contains(query))
.collect()
}
and minor main
change:
fn main() {
let config = Config::new(env::args()).unwrap_or_else(|err| {
eprintln!("Problem parsing arguments: {}", err);
process::exit(1);
});
// -- snip --
13.4 Comparing Performance: Loops vs. Iterators
Answer to the title: Iterators, although a high-level abstraction, get compiled down to roughly the same code as if you’d written the lower-level code yourself.
TL;DR: The implementations of closures and iterators are such that runtime performance is not affected. This is part of Rust’s goal to strive to provide zero-cost abstractions.
Unrolling is an optimization that removes the overhead of the loop controlling code and instead generates repetitive code for each iteration of the loop. Rust comiler unrolls some iteration code when its optimization time.
14. More About Cargo and Crates.io
14.1 Customizing Builds with Release Profiles
There are two release profiles by default, dev
and release
.
You can define the profile-specific configurations in Cargo.toml
file.
Here is the example how to change optimization level in the file (this example is default value):
[profile.dev]
opt-level = 0
[profile.release]
opt-level = 3
You can find other profiles in Cargo book.
14.2 Publishing a Crate to Crates.io
Before publishing, we need to leave documentation.
The documentation can be written inside trible slashes comment ///
(Doc-test).
cargo doc
creates the documentation, and cargo doc --open
open the documentation locally.
Commonly Used Sections
- Examples
- Panics
- Errors
- Safety
Commenting Contained Items
//!
comments are used for describing the entire crate, or entire items.
We often use this comments in src/lib.rs
, which is the crate root, to describe the entire crate.
Exporting a Convenient Public API with pub use
If you use pub use self::{{ your_custom_module }}
, such modules are added the “Re-exports” section of the document, and user can use
the module easily.
This section isn’t so critical, so I don’t leave a note. If I need to publish an API, I’ll refer to the documentation directly.
Publish
- Create an account on crate.io. I’m using GitHub account.
- Go https://crates.io/settings/tokens and get token.
- Run
cargo login {{ you_token }}
. The command store your token in$HOME/.cargo/credentials
. - Describe metadata (
package.{name, version, license, description, etc.}
) inCargo.toml
. - Run
cargo publish
. (Done!)
14.3 Cargo Workspaces
The feature workspaces
enable us to split a package into multiple libraries (but still this is a single package).
A workspace is a set of packages that share the same
Cargo.lock
and output directory.
Here is the sample structure of workspaces
$ tree -I target
.
├── adder
│ ├── Cargo.toml
│ └── src
│ └── main.rs
├── add-one
│ ├── Cargo.toml
│ └── src
│ └── lib.rs
├── Cargo.lock
└── Cargo.toml
4 directories, 6 files
Cargo.toml
:
[workspace]
members = [
"adder",
"add-one",
]
adder/Cargo.toml
:
[package]
name = "adder"
version = "0.1.0"
edition = "2018"
[dependencies]
add-one = { path = "../add-one" }
add-one/Cargo.toml
:
[package]
name = "add-one"
version = "0.1.0"
edition = "2018"
[dependencies]
rand = "0.8.3"
add-one/src/lib.rs
:
pub fn add_one(x: i32) -> i32 {
x + 1
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn it_works() {
assert_eq!(3, add_one(2));
}
}
adder/src/main.rs
:
use add_one;
fn main() {
let num = 10;
println!(
"Hello, world! {} plus one is {}!",
num,
add_one::add_one(num)
);
}
Let’s run cargo run
:
$ cargo run
Finished dev [unoptimized + debuginfo] target(s) in 0.00s
Running `target/debug/adder`
Hello, world! 10 plus one is 11!
- The entry point of the
cargo run
isfn main()
inadder/src/main.rs
, because this is the onlymain
function andmain.rs
. - We defined
rand
crate inadd-one/Cargo.toml
. If you want to userand
crate inadder
package, you have to include the package explicitly inadder/Cargo.toml
. - The workspace can define another scope.
14.4 Installing Binaries from Crates.io with cargo install
cargo install {{ name_of_binary_on_crate.io }}
The default location of the cargo binaries is $HOME/.cargo/bin
.
14.5 Extending Cargo with Custom Commands
cargo-something
= cargo somthing
(easy subcommand).
15. Smart Pointers
A pointer is a general concept for a variable that contains an address in memory. The most common kind of pointer in Rust is a reference.
Smart pointers, on the other hand, are data structures that not only act like a pointer but also have additional metadata and capabilities.
One example that we’ll explore in this chapter is the reference counting smart pointer type (in 15.4). This pointer enables you to have multiple owners of data by keeping track of the number of owners and, when no owners remain, cleaning up the data.
In many cases, smart pointers own the data they point to.
Actually, We’ve already encountered a few smart pointers in this book, such as String
and Vec<T>
.
Smart pointers are usually implemented using structs. The characteristic that distinguishes a smart pointer from an ordinary struct is that smart pointers implement the Deref
and Drop
traits.
We’ll cover the most common smart pointers in the standard library:
Box<T>
for allocating values on the heapRc<T>
, a reference counting type that enables multiple ownershipRef<T>
andRefMut<T>
, accessed throughRefCell<T>
, a type that enforces the borrowing rules at runtime instead of compile time
My note: why the smart pointers are important to learn?
The Rust is desiend in a memory-safety way. Think about your company need to create their own database system for some reason (suppose the company don’t want to use 3rd party database services). If you want to implement a relational database by Rust, these pointers could be used frequently.
15.1 Using Box<T>
to Point to Data on the Heap
Box<T>
allow you to store data on the heap rather than the stack. What remains on the stack is the pointer to the heap data.
You’ll use them most often in these situations:
- When you have a type whose size can’t be known at compile time and you want to use a value of that type in a context that requires an exact size
- When you have a large amount of data and you want to transfer ownership but ensure the data won’t be copied when you do so
- When you want to own a value and you care only that it’s a type that implements a particular trait rather than being of a specific type
Sideway: Memory allocation about Vec
My note: at this point, I wondered how Rust allocate memory when I manipulate Vec
.
I found a good post about this theme.
https://markusjais.com/unterstanding-rusts-vec-and-its-capacity-for-fast-and-efficient-programs/ <- the page was removed somehow…🤔
I found a good criticism on the post.
Cite from the official document of std::vec::Vec
:
The capacity of a vector is the amount of space allocated for any future elements that will be added onto the vector. This is not to be confused with the length of a vector, which specifies the number of actual elements within the vector. If a vector’s length exceeds its capacity, its capacity will automatically be increased, but its elements will have to be reallocated. For example, a vector with capacity 10 and length 0 would be an empty vector with space for 10 more elements. Pushing 10 or fewer elements onto the vector will not change its capacity or cause reallocation to occur. However, if the vector’s length is increased to 11, it will have to reallocate, which can be slow. For this reason, it is recommended to use
Vec::with_capacity
whenever possible to specify how big the vector is expected to get.
fn main() {
let v: Vec<i32> = Vec::new();
println!("{:?}",v.capacity()); // 0
println!("{:?}",v.len()); // 0
let v2: Vec<i32> = Vec::with_capacity(5);
println!("{:?}",v2.capacity()); // 5
println!("{:?}",v.len()); // 0
}
After several googling, here was also a good explanation. (Thank you u/matthieum !)
Using a Box<T>
to Store Data on the Heap
Not used in this way very often, but educational purpose.
fn main() {
let b = Box::new(5);
println!("b = {}", b);
}
When a box goes out of scope, as b
does at the end of main
, it will be deallocated. The deallocation happens for the box (stored on the stack) and the box goes out of scope, as b
does at the end of main
, it will be deallocated. The deallocation happens for the box (stored on the stack) and the data it points to (stored on the heap) data it points to (stored on the heap).
Example: construct function (cons list)
A construction function constructs a new pair from its two arguments, which usually are a single value and another pair.
“To cons x
onto y
” informally means to construct a new container instance by putting the element x
at the start of this new container, followed by the container y
.
Each item in a cons list contains two elements: the value of the current item and the next item. The last item in the list contains only a value called Nil
without a next item. A cons list is produced by recursively calling the cons
function.
Cons list is one of linked lists.
Let’s try to implement a list of i32
with Cons
.
The following code returns a compile error.
enum List {
Cons(i32, List),
Nil,
}
The reason Rust compiler can’t compile is, Rust doesn’t know how much space it needs to store a List
value (List
is defined recursively).
To solve this issue, use a Box<T>
(pointer), because the size of pointer is known.
enum List {
Cons(i32, Box<List>),
Nil,
}
use crate::List::{Cons, Nil};
fn main() {
let list = Cons(1, Box::new(Cons(2, Box::new(Cons(3, Box::new(Nil))))));
}
15.2 Treating Smart Pointers Like Regular References with the Deref
Trait
The code following returns compile error:
fn main() {
let x = 5;
let y = &x;
assert_eq!(5, x);
assert_eq!(5, y);
}
The error:
error[E0277]: can't compare `{integer}` with `&{integer}`
--> src/main.rs:6:5
|
6 | assert_eq!(5, y);
| ^^^^^^^^^^^^^^^^^ no implementation for `{integer} == &{integer}`
|
To avoid this error, we should change assert_eq!(5, y);
to assert_eq!(5, *y);
.
This *
is called dereference, which means “follow the reference to the value it’s pointing to.”
One more dereference example (one mutual borrowing is allowed!):
fn main() {
let mut x = 5;
let y = &mut x;
*y = 4;
assert_eq!(5, *y);
// thread 'main' panicked at 'assertion failed: `(left == right)`
// left: `5`,
// right: `4`', src/main.rs:8:5
}
Like C or C++, print the number of address:
fn main() {
let x = &42;
let address = format!("{:p}", x);
print!("{:?}", address) // like "0x560b046ea000"
}
Instead of let y = &mut x;
, write with Box
:
fn main() {
let x = 5;
let y = Box::new(x);
assert_eq!(5, x);
assert_eq!(5, *y);
}
Note that y
is an instance of a box pointing to a copied value of x
rather than a reference pointing to the value of x
.
Defining Our Own Smart Pointer
Box<T>
type in standard library is already implemented Deref
tarit, so we could use *
operator.
If you want to used dereference operator for your own type (struct),
Let’s define a sample type MyBox<T>
(tuple struct with one element):
struct MyBox<T>(T);
impl<T> MyBox<T> {
fn new(x: T) -> MyBox<T> {
MyBox(x)
}
}
We didn’t implement Deref
trait for this struct, so the following code returns a compile error:
let x = 5;
let y = MyBox::new(x);
assert_eq!(5, x);
assert_eq!(5, *y);
// error[E0614]: type `MyBox<{integer}>` cannot be dereferenced
// --> src/main.rs:14:19
// |
// 14 | assert_eq!(5, *y);
// | ^^
//
Let’s implement Deref
trait.
The official trait document says, the required method is deref
and the associated type (about associated type, check Chapter 19) is Target
:
use std::ops::Deref;
impl<T> Deref for MyBox<T> {
type Target = T;
fn deref(&self) -> &Self::Target {
&self.0
}
}
*y
: behind the scenes Rust actually ran this code:
*(y.deref())
Rust substitutes the *
operator with a call to the deref
method and then a plain dereference so we don’t have to think about whether or not we need to call the deref
method.
Why the signature of deref
is fn deref(&self) -> &Self::Target
?
The answer is “Rust’s ownership system”.
If the deref
method returned the value directly instead of a reference to the value, the value would be moved out of self
.
Implicit Deref Coercions with Functions and Methods
Advanced review on String
and str
:
- https://stackoverflow.com/a/24159933/9923806
- https://github.com/BrooksPatton/learning-rust/issues/2#issuecomment-382178427 <- I guess
str
doesn’t store data on stack… partialy wrong. str
is known-size, andString
isn’t.str
is known-size, so it is placed on stack. The first address (a.k.a. base address) stores the length, and the remained addresses stores the actual string data.&str
points to data segment. It also means&str
is immutable (&'static str
).String
stores- the length of its strings,
- the pointer to the actual string data, and
- the capacity. on stack. and the actual string data is stored on heap.
I checked the data segment data in this post.
How is Deref
implemented for String
in standard library:
#[stable(feature = "rust1", since = "1.0.0")]
impl ops::Deref for String {
type Target = str;
#[inline]
fn deref(&self) -> &str {
unsafe { str::from_utf8_unchecked(&self.vec) }
}
}
When we pass a reference to a particular type’s value as an argument to a function or method , Rust tries to dereference as many times as necessary to get a reference to match the parameter’s type. This is called “implicit deref coercions”.
The following code shows a deref coercions chains (&MyBox<String>
→ &String
→ &str
):
use std::ops::Deref;
impl<T> Deref for MyBox<T> {
type Target = T;
fn deref(&self) -> &T {
&self.0
}
}
struct MyBox<T>(T);
impl<T> MyBox<T> {
fn new(x: T) -> MyBox<T> {
MyBox(x)
}
}
fn hello(name: &str) {
println!("Hello, {}!", name);
}
fn main() {
let m = MyBox::new(String::from("Rust"));
hello(&m);
}
I checked the memory allocation of MyBox
in this post.
How Deref Coercion Interacts with Mutability
Rust does deref coercion when it finds types and trait implementations in three cases:
- From
&T
to&U
whenT: Deref<Target=U>
- From
&mut T
to&mut U
whenT: DerefMut<Target=U>
- From
&mut T
to&U
whenT: Deref<Target=U>
Rust will also coerce a mutable reference to an immutable one. But the reverse is not possible: immutable references will never coerce to mutable references.
15.4 Rc<T>
, the Reference Counted Smart Pointer
We use the Rc<T>
type when we want to allocate some data on the heap for multiple parts of our program to read and we can’t determine at compile time which part will finish using the data last.
Note that Rc<T>
is only for use in single-threaded scenarios.
If you want to use shared reference counter in mutlthread, you need Arc
and Mutex
like Arc::new(Mutex::new(0));
.
Let’s see the sample code:
enum List {
Cons(i32, Rc<List>),
Nil,
}
use crate::List::{Cons, Nil};
use std::rc::Rc;
fn main() {
let a = Rc::new(Cons(5, Rc::new(Cons(10, Rc::new(Nil)))));
let b = Cons(3, Rc::clone(&a));
let c = Cons(4, Rc::clone(&a));
}
This code would be interpreted as follows:
When we create b
, instead of taking ownership of a
, we’ll clone the Rc<List>
that a
is holding, thereby increasing the number of references from one to two and letting a
and b
share ownership of the data in that Rc<List>
.
clone()
makes a clone of the Rc
pointer. This creates another pointer to the same allocation, increasing the strong reference count.
When b
goes out of scope, the counter decrece the number automatically.
enum List {
Cons(i32, Rc<List>),
Nil,
}
use crate::List::{Cons, Nil};
use std::rc::Rc;
fn main() {
let a = Rc::new(Cons(5, Rc::new(Cons(10, Rc::new(Nil)))));
println!("{}", Rc::strong_count(&a)); // 1
let b = Cons(3, Rc::clone(&a));
println!("{}", Rc::strong_count(&a)); // 2
{
let c = Cons(4, Rc::clone(&a));
println!("{}", Rc::strong_count(&a)); // 3
}
println!("{}", Rc::strong_count(&a)); // 2
}
We’ll see cyclic reference later, and that’s why the name of method is strong_cout
(there is a weak_count
also.)
15.5 RefCell<T>
and the Interior Mutability Pattern
My summary
Suppose the use case such that:
- you want to use a trait from 3rd party crate, and implement the trait for your struct
MyStruct
, which has a fieldmy_field: &str
. - the signature of the trait is
(&self, foo: &str)
.&self
is immutable reference. - but in your use case, e.g., your mock type
MyStruct
for tests, your implementation of the trait should mutate the valueMyStruct.my_field
tofoo
. - you can’t change the signature of the trait from
(&self, foo: &str)
to(&mut self, foo: &str)
because it is 3rd party crate. (You can fork the crate, but that is another story.)
In this case, you can use RefCell
like my_field: RefCell<&str>
.
The following methods are basic usages of RefCell
:
RefCell::new()
my_refcell.borrow()
my_refcell.borrow_mut()
Enforcing Borrowing Rules at Runtime with RefCell<T>
With references and Box<T>
, the borrowing rules’ invariants are enforced at compile time.
With RefCell<T>
, these invariants are enforced at runtime.
With references, if you break these rules, you’ll get a compiler error.
With RefCell<T>
, if you break these rules, your program will panic and exit.
(Of course you have the question now “why we need to violate the compiler rule?”. Be patient.)
The advantage of checking the borrowing rules at runtime instead is that certain memory-safe scenarios are then allowed, whereas they are disallowed by the compile-time checks.
The advantage of checking the borrowing rules at runtime instead is that certain memory-safe scenarios are then allowed, where they would’ve been disallowed by the compile-time checks. Static analysis, like the Rust compiler, is inherently conservative. Some properties of code are impossible to detect by analyzing the code: the most famous example is the Halting Problem,
Because some static analysis is impossible, if the Rust compiler can’t be sure the code complies with the ownership rules, it might reject a correct program; in this way, it’s conservative.
Similar to Rc<T>
, RefCell<T>
is only for use in single-threaded scenarios and will give you a compile-time error if you try using it in a multithreaded context.
Rc<T>
enables multiple owners of the same data;Box<T>
andRefCell<T>
have single owners.Box<T>
allows immutable or mutable borrows checked at compile time;Rc<T>
allows only immutable borrows checked at compile time;RefCell<T>
allows immutable or mutable borrows checked at runtime.- Because
RefCell<T>
allows mutable borrows checked at runtime, you can mutate the value inside theRefCell<T>
even when theRefCell<T>
is immutable.
Interior Mutability: A Mutable Borrow to an Immutable Value
Interior mutability is a design pattern in Rust that allows you to mutate data even when there are immutable references to that data.
There are situations in which it would be useful for a value to mutate itself in its methods but appear immutable to other code.
Code outside the value’s methods would not be able to mutate the value.
Using RefCell<T>
is one way to get the ability to have interior mutability.
A Use Case for Interior Mutability: Mock Objects (to be reviewed)
- Suppose that a trait in a thrid party library is defined, which takes a parameter as an immutable (default)
&self
reference. - But, when you implement the method, you want to implement it to reference as mutable reference without touching the library. But this attempt will rejected by compiler.
- In that case, you can put a data in
RefCell<T>
likeRefCell<Vec<String>>
so that.borrow_mut()
method make the reference as mutable.
Having Multiple Owners of Mutable Data by Combining Rc<T>
and RefCell<T>
(to be reviewed)
need to be reviewed.
15.6 Reference Cycles Can Leak Memory (to be reviewed)
- Recall that
Rc<T>
lets you have multiple owners of some data, - but it only gives immutable access to that data.
- If you have an
Rc<T>
that holds aRefCell<T>
, you can get a value that can have multiple owners and that you can mutate!
#[derive(Debug)]
enum List {
Cons(Rc<RefCell<i32>>, Rc<List>),
Nil,
}
use crate::List::{Cons, Nil};
use std::cell::RefCell;
use std::rc::Rc;
fn main() {
let value = Rc::new(RefCell::new(5));
let a = Rc::new(Cons(Rc::clone(&value), Rc::new(Nil)));
let b = Cons(Rc::new(RefCell::new(3)), Rc::clone(&a));
let c = Cons(Rc::new(RefCell::new(4)), Rc::clone(&a));
*value.borrow_mut() += 10;
println!("a after = {:?}", a);
println!("b after = {:?}", b);
println!("c after = {:?}", c);
}
a after = Cons(RefCell { value: 15 }, Nil)
b after = Cons(RefCell { value: 3 }, Cons(RefCell { value: 15 }, Nil))
c after = Cons(RefCell { value: 4 }, Cons(RefCell { value: 15 }, Nil))
Mutex is the thread-safe version of RefCell
https://doc.rust-lang.org/book/ch15-06-reference-cycles.html
memory leaks are memory safe in Rust. We can see that Rust allows memory leaks by using Rc and RefCell
Should be reviewed from here
My note: Refernce counted smart pointer for Vector
16. Fearless Concurrency
The Rust team discovered that the ownership and type systems are a powerful set of tools to help manage memory safety and concurrency problems!
Caution: In this book, authors refer to many of the problems as concurrent rather than being more precise by saying concurrent and/or parallel.
16.1 Using Threads to Run Code Simultaneously
Many operating systems provide an API for creating new threads. This model where a language calls the operating system APIs to create threads is sometimes called 1:1, meaning one operating system thread per one language thread.
Programming language-provided threads are known as green threads, and languages that use these green threads will execute them in the context of a different number of operating system threads. For this reason, the green-threaded model is called the M:N model: there are M
green threads per N
operating system threads, where M
and N
are not necessarily the same number.
The Rust standard library only provides an implementation of 1:1 threading.
Creating a New Thread with spawn
To create a new thread, we call the thread::spawn
function and pass it a closure containing the code we want to run in the new thread.
(The new thread is a new OS thread, because Rust standard library provides only 1:1.)
The new thread will be stopped when the main thread ends, whether or not it has finished running.
use std::thread;
use std::time::Duration;
fn main() {
thread::spawn(|| {
for i in 1..10 {
println!("hi number {} from the spawned thread!", i);
thread::sleep(Duration::from_millis(1));
}
});
for i in 1..5 {
println!("hi number {} from the main thread!", i);
thread::sleep(Duration::from_millis(1));
}
}
Run (You can see, there is no 6 to 10):
$ cargo run
hi number 1 from the main thread!
hi number 1 from the spawned thread!
hi number 2 from the spawned thread!
hi number 2 from the main thread!
hi number 3 from the spawned thread!
hi number 3 from the main thread!
hi number 4 from the spawned thread!
hi number 4 from the main thread!
hi number 5 from the spawned thread!
The calls to thread::sleep
force a thread to stop its execution for a short duration, allowing a different thread to run.
The number of spawnd thread! line between main thread is depend on your CPU.
If I comment-out the lines thread::sleep(Duration::from_millis(1));
, the spawned process doesn’t start.
$ cargo run
hi number 1 from the main thread!
hi number 2 from the main thread!
hi number 3 from the main thread!
hi number 4 from the main thread!
Waiting for All Threads to Finish Using join
Handles
The return type of thread::spawn
is JoinHandle
. A JoinHandle
is an owned value that, when we call the join method on it, will wait for its thread to finish.
use std::thread;
use std::time::Duration;
fn main() {
let handle = thread::spawn(|| {
for i in 1..10 {
println!("hi number {} from the spawned thread!", i);
thread::sleep(Duration::from_millis(1));
}
});
for i in 1..5 {
println!("hi number {} from the main thread!", i);
thread::sleep(Duration::from_millis(1));
}
handle.join().unwrap();
}
hi number 1 from the main thread!
hi number 1 from the spawned thread!
hi number 2 from the main thread!
hi number 2 from the spawned thread!
hi number 3 from the main thread!
hi number 3 from the spawned thread!
hi number 4 from the main thread!
hi number 4 from the spawned thread!
hi number 5 from the spawned thread!
hi number 6 from the spawned thread!
hi number 7 from the spawned thread!
hi number 8 from the spawned thread!
hi number 9 from the spawned thread!
If we put the line handle.join().unwrap();
between the for
s statement, result would be like follows, because it waits the end of the sub-thread.
hi number 1 from the spawned thread!
hi number 2 from the spawned thread!
hi number 3 from the spawned thread!
hi number 4 from the spawned thread!
hi number 5 from the spawned thread!
hi number 6 from the spawned thread!
hi number 7 from the spawned thread!
hi number 8 from the spawned thread!
hi number 9 from the spawned thread!
hi number 1 from the main thread!
hi number 2 from the main thread!
hi number 3 from the main thread!
hi number 4 from the main thread!
Using move
Closures with Threads
If you want to access to variables with the closure in thread::spawn
, the spawned thread doesn’t know how long the variable is valied.
By adding the move
keyword before the closure, we force the closure to take ownership of the values it’s using rather than allowing Rust to infer that it should borrow the values.
use std::thread;
fn main() {
let v = vec![1, 2, 3];
let handle = thread::spawn(move || {
println!("Here's a vector: {:?}", v);
});
handle.join().unwrap();
}
In this example, move
keyword moved the ownership of v to the spawned thread, so main thread doesn’t have the ownership of v
.
If you try to drop(v)
in the main thread before handle.join()
, compiler doesn’t allow to do that.
We deal with this issue ina a later section.
To understand move
correctly, please review lifetime in Rust.
16.2 Using Message Passing to Transfer Data Between Threads
In the standard library, Rust provides message-sending concurrency by channel (mpsc::channel
).
mpsc
stands for multiple producer, single consumer (multi TX, single RX, Consumer).
A channel is said to be closed if either the transmitter or receiver half is dropped.
You can create a channel like this:
use std::sync::mpsc;
// -- snip
let (tx, rx) = mpsc::channel();
Let’s see the sample code:
use std::sync::mpsc;
use std::thread;
fn main() {
let (tx, rx) = mpsc::channel();
thread::spawn(move || {
let val = String::from("hi");
tx.send(val).unwrap();
});
let received = rx.recv().unwrap();
println!("Got: {}", received);
}
RX has two useful methods: recv
and try_recv
.
recv
is blocking and return Result<T, E>
, while try_recv
is non-blocking and returns Ok
or Err
immediately.
So, try_recv
would be put in loops.
In this context, we can call threads as actors
Channels and Ownership Transference
Transfering via channel takes ownership of the item (like a real word RX.) If you try to use transmitted data after sending via channel, it returns compile error (thank you Rust).
Sending Multiple Values and Seeing the Receiver Waiting
The single receiver could be an iterable. When the TX is closed, RX will be also closed (dropped) and iterator ends:
use std::sync::mpsc;
use std::thread;
use std::time::Duration;
fn main() {
let (tx, rx) = mpsc::channel();
thread::spawn(move || {
let vals = vec![
String::from("hi"),
String::from("from"),
String::from("the"),
String::from("thread"),
];
for val in vals {
tx.send(val).unwrap();
thread::sleep(Duration::from_secs(1));
}
});
for received in rx {
println!("Got: {}", received);
}
}
Creating Multiple Producers by Cloning the Transmitter
You can imaging a channel as a queue, so when multiple sender send messages, the order in receiver is random.
16.3 Shared-State Concurrency
Using Mutexes to Allow Access to Data from One Thread at a Time
To access the data in a mutex, a thread must first signal that it wants access by asking to acquire the mutex’s lock (lock()
method in Rust).
The API of Mutex<T>
Here is 101 of Mutex<T>
:
use std::sync::Mutex;
fn main() {
let m = Mutex::new(5);
{
let mut num = m.lock().unwrap();
*num = 6;
}
println!("m = {:?}", m);
}
Result:
m = Mutex { data: 6, poisoned: false, .. }
The call to lock
would fail if another thread holding the lock panicked.
In that case, no one would ever be able to get the lock, so we’ve chosen to unwrap
and have this thread panic if we’re in that situation.
Mutex<T>
is a smart pointer.
The call to lock
returns a smart pointer called MutexGuard
, wrapped in a LockResult
that we handled with the call to unwrap
.
The MutexGuard
smart pointer implements Deref
to point at our inner data; the smart pointer also has a Drop
implementation that releases the lock automatically when a MutexGuard
goes out of scope.
Multiple Ownership with Multiple Threads
Note that Rc<T>
points to the data on heap.
Atomic Reference Counting with Arc<T>
Rc<T>
is not safe to share accross threads.
Instead we can use an Atomic reference counter Arc<T>
in std::sync::atomic
.
Thread safety comes with a performance penalty that you only want to pay when you really need to.
Here is the proper way to share ownership accross multiple threads:
use std::sync::{Arc, Mutex};
use std::thread;
fn main() {
let counter = Arc::new(Mutex::new(0));
let mut handles = vec![];
for _ in 0..10 {
let counter = Arc::clone(&counter);
let handle = thread::spawn(move || {
let mut num = counter.lock().unwrap();
*num += 1;
});
handles.push(handle);
}
for handle in handles {
handle.join().unwrap();
}
println!("Result: {}", *counter.lock().unwrap()); // Result: 10
}
counter
is immutable, but Mutex<T>
provides interior mutability.
Similarities Between RefCell<T>
/Rc<T>
and Mutex<T>
/Arc<T>
Should be reviewed from here
17. Object Oriented Programming Features of Rust
Objects came from Simula in the 1960s. Those objects influenced Alan Kay’s programming architecture in which objects pass messages to each other. He coined the term object-oriented programming in 1967 to describe this architecture.
Hmm…
17.1 Characteristics of Object-Oriented Languages
Objects Contain Data and Behavior
The book “Design Patterns: Elements of Reusable Object-Oriented Software” by Erich Gamma, Richard Helm, Ralph Johnson, and John Vlissides (Addison-Wesley Professional, 1994) colloquially referred to as “The Gang of Four book”, is a catalog of object-oriented design patterns. It defines OOP this way:
Object-oriented programs are made up of objects. An object packages both data and the procedures that operate on that data. The procedures are typically called methods or operations.
Using this definition, Rust is object oriented: structs and enums have data, and impl
blocks provide methods on structs and enums.
Encapsulation that Hides Implementation Details
Encapsulation means that the implementation details of an object aren’t accessible to code using that object. Therefore, the only way to interact with an object is through its public API.
In Rust, we can use the pub
keyword to decide which modules, types, functions, and methods in our code should be public, and by default everything else is private.
Inheritance as a Type System and as Code Sharing
There is no way to define a struct that inherits the parent struct’s fields and method implementations.
You choose inheritance for two main reasons.
- One is for reuse of code
- The other reason is polymorphism, which means that you can substitute multiple objects for each other at runtime if they share certain characteristics.
Rust uses generics to abstract over different possible types and trait bounds to impose constraints on what those types must provide. This is sometimes called bounded parametric polymorphism.
polimorphism in practice
Cf. add hoc polymorphism: suppose a function, Add(x,y)
. The behavior of Add
is depend on the type of input (append
in case of strings, add
in case of int, etc.)
This could be an example of ad hoc polymorphism.
Rust takes a different approach, using trait objects instead of inheritance.
17.2 Using Trait Objects That Allow for Values of Different Types
src/lib.rs
:
pub trait Draw {
fn draw(&self);
}
pub struct Screen {
pub components: Vec<Box<dyn Draw>>,
}
impl Screen {
pub fn run(&self) {
for component in self.components.iter() {
component.draw();
}
}
}
pub struct Button {
pub width: u32,
pub height: u32,
pub label: String,
}
impl Draw for Button {
fn draw(&self) {
// code to actually draw a button
}
}
Type Box<dyn Draw>
is a trait object; it’s a stand-in for any type inside a Box
that implements the Draw
trait.
dyn
stands for “dynamic dispatch” (in computer science, dynamic dispatch is the process of selecting which implementation of a polymorphic operation (method or function) to call at run time.).
The official documentation describes the meaning of dynamic dispatch later.
When we use trait objects, Rust must use dynamic dispatch.
We can’t use trait <T>
here because A generic type parameter can only be substituted with one concrete type at a time, whereas trait objects allow for multiple concrete types to fill in for the trait object at runtime.
We say that the trait occurs as a trait objedt at Box<dyn Draw>
.
The dyn
keyword is used to highlight that calls to methods on the associated Trait
are dynamically dispatched. To use the trait this way, it must be ‘object safe’.)
A trait is object safe if all the methods defined in the trait have the following properties:
- The return type isn’t
Self
. - There are no generic type parameters.
Trait objects must be object safe because once you’ve used a trait object, Rust no longer knows the concrete type that’s implementing that trait.
The code that results from monomorphization is doing static dispatch, which is when the compiler knows what method you’re calling at compile time. This is opposed to dynamic dispatch, which is when the compiler can’t tell at compile time which method you’re calling.
An example of a trait whose methods are not object safe is the standard library’s Clone trait.
pub trait Clone {
fn clone(&self) -> Self;
}
Dynamic disptch in practice
When you use a generic function, you could encounter &*my_variable
.
https://stackoverflow.com/a/41273406
My note: vtable - Should be written in my own words
- Example in the Wikipedia is comprehensive: https://en.wikipedia.org/wiki/Virtual_method_table
- intro with real code: https://www.youtube.com/watch?v=oIV2KchSyGQ
- Clue: C++ Inheritance with polymorphism - under the hood
- In which memory location vtables allocated? -> it depends on the compiler.
- vtable in Rust and dynamic dispatch (one limit of Rust)
- The picture in this page is comprehensive: https://www.learncpp.com/cpp-tutorial/the-virtual-table/
Dynamic dispatch costs a bit, so consider using enum_dispatch
crate.
https://docs.rs/enum_dispatch/latest/enum_dispatch/
18. Patterns and Matching
Pattern matching is mandatory when you want to write your own macro.
18.1 All the Places Patterns Can Be Used
Reviews: match
Arms and Conditional if let
Expression
You can write more complex match with if let
, but the downside of if let
expressions is that the compiler doesn’t check exhaustiveness.
(Some cases could leak.)
while let
let v = vec!['a', 'b', 'c'];
for (index, value) in v.iter().enumerate() {
println!("{} is at index {}", value, index);
}
let
statement as pattern
let PATTERN = EXPRESSION;
// example
let (x, y, z) = (1, 2, 3);
Function Parameters
fn print_coordinates(&(x, y): &(i32, i32)) {
println!("Current location: ({}, {})", x, y);
}
fn main() {
let point = (3, 5);
print_coordinates(&point);
}
18.2 Refutability: Whether a Pattern Might Fail to Match
irrefutable | refutable |
---|---|
match for any possible value passed | can fail to match for some possible value |
let x = 5; | Some(x) = a_value; |
In general, you shouldn’t have to worry about the distinction between refutable and irrefutable patterns; however, you do need to be familiar with the concept of refutability so you can respond when you see it in an error message.
18.3 Pattern Syntax
This section just contains examples of useful pattern matches.
Value match for a variable:
let x = 1;
match x {
1 => println!("one"),
2 => println!("two"),
3 => println!("three"),
_ => println!("anything"),
}
Variable scope (shadowed):
let x = Some(5);
let y = 10;
match x {
Some(50) => println!("Got 50"),
Some(y) => println!("Matched, y = {:?}", y), // Matched, y = 5
_ => println!("Default case, x = {:?}", x),
}
println!("at the end: x = {:?}, y = {:?}", x, y);
// at the end: x = Some(5), y = 10
Multiple patterns
let x = 1;
match x {
1 | 2 => println!("one or two"), // match
3 => println!("three"),
_ => println!("anything"),
}
Matching Ranges of Values with ..=
let x = 5;
match x {
1..=5 => println!("one through five"),
_ => println!("something else"),
}
Recall that Rust’s char type is four bytes in size and represents a Unicode Scalar Value.
let x = 'c';
match x {
'a'..='j' => println!("early ASCII letter"),
'k'..='z' => println!("late ASCII letter"),
_ => println!("something else"),
}
Destructuring to Break Apart Values
struct Point {
x: i32,
y: i32,
}
fn main() {
let p = Point { x: 0, y: 7 };
let Point { x: a, y: b } = p;
assert_eq!(0, a);
assert_eq!(7, b);
// Or
let Point { x, y } = p;
assert_eq!(0, x);
assert_eq!(7, y);
}
You can achieve a partial match:
let p = Point { x: 0, y: 7 };
match p {
Point { x, y: 0 } => println!("On the x axis at {}", x),
Point { x: 0, y } => println!("On the y axis at {}", y), // match
Point { x, y } => println!("On neither axis: ({}, {})", x, y),
}
Destructuring Enums
enum Message {
Quit,
Move { x: i32, y: i32 },
Write(String),
ChangeColor(i32, i32, i32),
}
fn main() {
let msg = Message::ChangeColor(0, 160, 255);
match msg {
Message::Quit => {
println!("The Quit variant has no data to destructure.")
}
Message::Move { x, y } => {
println!(
"Move in the x direction {} and in the y direction {}",
x, y
);
}
Message::Write(text) => println!("Text message: {}", text),
Message::ChangeColor(r, g, b) => println!(
"Change the color to red {}, green {}, and blue {}",
r, g, b
),
}
}
Destructuring Nested Structs and Enums
enum Color {
Rgb(i32, i32, i32),
Hsv(i32, i32, i32),
}
enum Message {
Quit,
Move { x: i32, y: i32 },
Write(String),
ChangeColor(Color),
}
fn main() {
let msg = Message::ChangeColor(Color::Hsv(0, 160, 255));
match msg {
Message::ChangeColor(Color::Rgb(r, g, b)) => println!(
"Change the color to red {}, green {}, and blue {}",
r, g, b
),
Message::ChangeColor(Color::Hsv(h, s, v)) => println!(
"Change the color to hue {}, saturation {}, and value {}",
h, s, v
),
_ => (),
}
}
Ignoring Values in a Pattern
Just use underscore _
as a place holder.
Ignoring Remaining Parts of a Value with ..
struct Point {
x: i32,
y: i32,
z: i32,
}
let origin = Point { x: 0, y: 0, z: 0 };
match origin {
Point { x, .. } => println!("x is {}", x),
}
Don’t make ..
as ambiguous (the following code isn’t compiled):
let numbers = (2, 4, 8, 16, 32);
match numbers {
(.., second, ..) => { // Ambiguous
println!("Some numbers: {}", second)
},
}
Extra Conditionals with Match Guards
let x = Some(5);
let y = 10;
match x {
Some(50) => println!("Got 50"),
Some(n) if n == y => println!("Matched, n = {}", n),
_ => println!("Default case, x = {:?}", x),
}
println!("at the end: x = {:?}, y = {}", x, y);
@
Bindings
You can use a variable alias:
enum Message {
Hello { id: i32 },
}
let msg = Message::Hello { id: 5 };
match msg {
Message::Hello {
id: id_variable @ 3..=7,
} => println!("Found an id in range: {}", id_variable),
Message::Hello { id: 10..=12 } => {
println!("Found an id in another range")
}
Message::Hello { id } => println!("Found some other id: {}", id),
}
19. Advanced Features
19.1 Unsafe Rust
“Unsafe” means “doesn’t enforce memory safety guarantees”.
Although the code might be okay, if the Rust compiler doesn’t have enough information to be confident, it will reject the code. In these cases, you can use unsafe code to tell the compiler, “Trust me, I know what I’m doing.”
Another reason Rust has an unsafe alter ego is that the underlying computer hardware is inherently unsafe.
??
Unsafe Superpowers
- Dereference a raw pointer
- Call an unsafe function or method
- Access or modify a mutable static variable
- Implement an unsafe trait
- Access fields of
union
s
Notes:
unsafe
doesn’t turn off the borrow checker or disable any other of Rust’s safety checks.unsafe
does not mean the code inside the block is necessarily dangerous or that it will definitely have memory safety problems.
Parts of the standard library are implemented as safe abstractions over unsafe code that has been audited.
The rest of the section contains examples when to use unsafe.
Define unsafe function, and use it
unsafe fn dangerous() {}
unsafe {
dangerous();
}
Dereferencing a Raw Pointer
Raw pointers can be immutable or mutable and are written as *const T
and *mut T
, respectively.
The asterisk isn’t the dereference operator; it’s part of the type name.
In the context of raw pointers, immutable means that the pointer can’t be directly assigned to after being dereferenced.
let mut num = 5;
let r1 = &num as *const i32;
let r2 = &mut num as *mut i32;
unsafe {
println!("r1 is: {}", *r1);
println!("r2 is: {}", *r2);
}
cf) println!
macro expand the arguments as a reference under the cover:
With raw pointers, we can create a mutable pointer and an immutable pointer to the same location and change data through the mutable pointer, potentially creating a data race. Be careful!
Sometimes, Rust isn’t smart enough to know safe code.
When we know code is okay, but Rust doesn’t, it’s time to reach for unsafe
code.
Using extern Functions to Call External Code
Rust has a keyword, extern
, that facilitates the creation and use of a Foreign Function Interface (FFI).
Functions declared within extern
blocks are always unsafe
to call from Rust code.
extern "C" {
fn abs(input: i32) -> i32;
}
fn main() {
unsafe {
println!("Absolute value of -3 according to C: {}", abs(-3));
}
}
Calling Rust Functions from Other Languages
we make the
call_from_c
function accessible from C code, after it’s compiled to a shared library and linked from C:#[no_mangle] pub extern "C" fn call_from_c() { println!("Just called a Rust function from C!"); }
Accessing or Modifying a Mutable Static Variable
In Rust, global variables are called static variables. Rust does support global variables, but can be problematic with Rust’s ownership rules.
Mutatin a static mut
variable is unsafe
:
static mut COUNTER: u32 = 0;
fn add_to_count(inc: u32) {
unsafe {
COUNTER += inc;
}
}
fn main() {
add_to_count(3);
unsafe {
println!("COUNTER: {}", COUNTER);
}
}
But why it’s unsafe
?
With mutable data that is globally accessible, it’s difficult to ensure there are no data races, which is why Rust considers mutable static variables to be unsafe.
Where possible, it’s preferable to use the concurrency techniques and thread-safe smart pointers we discussed in Chapter 16 so the compiler checks that data accessed from different threads is done safely.
Other examples the book mentioned
- Implementing an Unsafe Trait
- Accessing Fields of a Union
19.2 Advanced Traits (should be reviewed)
Using Supertraits to Require One Trait’s Functionality Within Another Trait
If you know inheritance in OOP, you can understand the meaning of “Super” in Supertraits. We can define a trait which can be implemented only for a struct implemented a certain trait.
In the following snippet, OutlinePrint
can be implemented to a struct only when the struct is implemented the trait fmt::Display
:
struct Point {
x: i32,
y: i32,
}
use std::fmt;
impl fmt::Display for Point {
fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
write!(f, "({}, {})", self.x, self.y)
}
}
use std::fmt;
trait OutlinePrint: fmt::Display {
fn outline_print(&self) {
let output = self.to_string();
let len = output.len();
println!("{}", "*".repeat(len + 4));
println!("*{}*", " ".repeat(len + 2));
println!("* {} *", output);
println!("*{}*", " ".repeat(len + 2));
println!("{}", "*".repeat(len + 4));
}
}
Off topic: fmt::Display
write!
macro (std::write
) writes formatted data into a bufer (heap likeVec
).- A
Formatter
(std::fmt::Formatter
) represents various options related to formatting. Users do not constructFormatters
directly; a mutable reference to one is passed to thefmt
method of all formatting traits, likeDebug
andDisplay
.
In Rust by Example:
https://doc.rust-lang.org/rust-by-example/trait/supertraits.html#supertraits
Note that even the name is “super” tarit, a supertrait is a basic trait which should be “inherited” by other subtrait.
For example, when we think about a set of all person and a set of all student, Person ⊃ Student
is the case, so Person
is the supertrait of Student
.
trait Person {
// foo
};
trait Student: Person {
// bar
}
19.3 Advanced Types (should be reviewed)
Using the Newtype Pattern for Type Safety and Abstraction
The newtype pattern is that create a new type which behaves totally same as another type but the only difference is its name of type. This pattern could reduce the number of bugs, and provides abstraction. You can create the new type with unit-like struct.
use std::ops::Add;
struct Millimeters(u32);
struct Meters(u32);
impl Add<Meters> for Millimeters {
type Output = Millimeters;
fn add(self, other: Meters) -> Millimeters {
Millimeters(self.0 + (other.0 * 1000))
}
}
You can find more flat explanation at Rust Design Patterns or my note.
Dynamically Sized Types and the Sized Trait
By default, generic functions will work only on types that have a known size at compile time. However, you can use the following special syntax to relax this restriction:
fn generic<T: ?Sized>(t: &T) {
// --snip--
}
A trait bound on ?Sized
means “T
may or may not be Sized” and this notation overrides the default that generic types must have a known size at compile time. The ?Trait
syntax with this meaning is only available for Sized
, not any other traits.
Also note that we switched the type of the t
parameter from T
to &T
. Because the type might not be Sized
, we need to use it behind some kind of pointer.
In this case, we’ve chosen a reference.
19.4 Advanced Functions and Closures
Function Pointers
The fn
type is called a function pointer.
By function pointer, you can pass a function as a paramenter:
fn add_one(x: i32) -> i32 {
x + 1
}
fn do_twice(f: fn(i32) -> i32, arg: i32) -> i32 {
f(arg) + f(arg)
}
fn main() {
let answer = do_twice(add_one, 5);
println!("The answer is: {}", answer);
}
Function pointers implement all three of the closure traits (Fn
, FnMut
, and FnOnce
), so you can always pass a function pointer as an argument for a function that expects a closure.
Returning Closures
fn returns_closure() -> Box<dyn Fn(i32) -> i32> {
Box::new(|x| x + 1)
}
Note that if you change the return type to dyn Fn(i32) -> i32
, the compiler returns error because Rust doesn’t know the size of a closure,
19.5 Macros
macro_rules!
can define your custom macros, especially called declarative macros.
There are three types of macros in Rust:
Custom #[derive] macro | Attribute-like macro | Function-like macro |
---|---|---|
code added with the derive attribute used on structs and enums | define custom attributes usable on any item | look like function calls but operate on the tokens specified as their argument |
#[derive(Debug)] | #[tokio:main] | vec![1,2,3,] |
The Difference Between Macros and Functions
Macros can,
- take a variable number of parameters: we can call
println!("hello")
with one argument orprintln!("hello {}", name)
- implement a trait on a given type, because macros are expanded before the compiler interprets.
Excerpt from “Rust By Example”:
So why are macros useful?
- Don’t repeat yourself. …
- Domain-specific languages.
- Variadic interfaces.
The downside is, macro definitions are generally more difficult to read, understand, and maintain than function definitions.
You must define macros or bring them into scope before you call them in a file, as opposed to functions you can define anywhere and call anywhere.
Declarative Macros with macro_rules!
for General Metaprogramming
Before checking how vec!
macro should work, here is a simple macro definition from “Rust by Example”:
// This is a simple macro named `say_hello`.
macro_rules! say_hello {
// `()` indicates that the macro takes no argument.
() => {
// The macro will expand into the contents of this block.
println!("Hello!");
};
}
fn main() {
// This call will expand into `println!("Hello");`
say_hello!()
}
One argument macro:
fn main() {
// compiles OK
macro_rules! foo {
($l:tt) => {
bar!($l);
};
}
macro_rules! bar {
(3) => {};
}
foo!(3);
}
tt
is an abbreviation of “Token Tree”, a single token or tokens in matching delimiters()
,[]
, or{}
.($l:tt)
: this parentheses mean the macro try to match this pattern. In this case, the macro capture inside of the macro parameter as$l
, and this$l
should be a TokenTree metavariable.
To achieve variadic interfaces, macros in Rust takes an expression (pattern) inside the first parentheses. We need knowledge on expressions and metavariables.
Let’s quickly look how we can implement the simple version of the familiar macro vec!
:
#[macro_export]
macro_rules! vec {
( $( $x:expr ),* ) => {
{
let mut temp_vec = Vec::new();
$(
temp_vec.push($x);
)*
temp_vec
}
};
}
#[macro_export]
andmacro_rules!
declare we will define an exportable macro.( $( $x:expr ),* )
: The input of the macro would be,- The form is
$ ( MacroMatch+ ) MacroRepSep? MacroRepOp
, whereMacroMatch+
is a tree labeled by$x
,,
is a MacroRepSep, and*
MacroRepOp indicates how many times the match repeats (in this case 0 or more than 0 times). - try to match the all charactors before comma as an expression metavariable. the name of the match is
$x
. - the comma
,
is a literal comma, which could contain, *
means anything after the comma.
- The form is
From the macro, vec![1,2,3]
wil generate the code as follows:
{
let mut temp_vec = Vec::new();
temp_vec.push(1);
temp_vec.push(2);
temp_vec.push(3);
temp_vec
}
This book is “Getting-strated book”, so we don’t learn about how to write macro further.
Procedural Macros for Generating Code from Attributes
proc_macro
crate is required.
…
OK. getting-started book doesn’t provide a good tutorial how to write my own macro. I’ll refer other learning material when I need.
Reference table
Fragment specifiers | Name | Example |
---|---|---|
item | Item | |
block | BlockExpression | { let foo = 2;} |
stmt | Statement | let foo = 2; |
pat_param | PatternNoTopAlt | Refer the section “Pattern and Matching” |
pat | equivalent to pat_param | Refer the section “Pattern and Matching” |
expr | Expression | |
ty | Type | f64 , MyStruct , MyEnum |
ident | IDENTIFIER_OR_KEYWORD | foo in let foo: i32; |
path | TypePath | ::std::fmt |
tt | TokenTree | |
meta | Attributes | #![allow(unused_variables)] |
lifetime | LIFETIME_TOKEN | 'a in &'a i32 |
vis | Visibility quialifier | pub in pub bar |
literal | LiteralExpression | "hello" ,r#"hi"# , 12 |
20. Final Project: Building a Multithreaded Web Server
But before we get started, we should mention one detail: the method we’ll use won’t be the best way to build a web server with Rust. A number of production-ready crates are available on crates.io that provide more complete web server and thread pool implementations than we’ll build.
20.1 Building a Single-Threaded Web Server
- HTTP over TCP
- Using standard library
std::net
cargo new hello
cd hello
src/main.rc
:
use std::net::TcpListener;
fn main() {
let listener = TcpListener::bind("127.0.0.1:7878").unwrap();
for stream in listener.incoming() {
let stream = stream.unwrap();
println!("Connection established!");
}
}
- The
bind
function returns aResult<T, E>
, which indicates that binding might fail. - We use
unwrap
to stop the program if errors happen. - The
incoming
method onTcpListener
returns an iterator that gives us a sequence of streams (more specifically, streams of typeTcpStream
).- We’re iterating over connection attempts with
incoming
method. - A single stream represents an open connection between the client and the server.
- A connection is the name for the full request and response process in which a client connects to the server, the server generates a response, and the server closes the connection.
- As such,
TcpStream
will read fRom itself to see what the client sent and then allow us to write our response to the stream. - Overall, this
for
loop will process each connection in turn and produce a series of streams for us to handle. - The handling of the stream consists of calling
unwrap
to terminate our program if the stream has any errors.
- We’re iterating over connection attempts with
Test
$ cargo run
# In another terminal
$ curl localhost:7878 -vvv
* Trying 127.0.0.1:7878...
* TCP_NODELAY set
* Connected to localhost (127.0.0.1) port 7878 (#0)
> GET / HTTP/1.1
> Host: localhost:7878
> User-Agent: curl/7.68.0
> Accept: */*
>
* Recv failure: Connection reset by peer
* Closing connection 0
curl: (56) Recv failure: Connection reset by peer
# cargo run terminal
$ cargo run
Compiling hello v0.1.0 (/home/atlex00/rust-projects/hello)
warning: unused variable: `stream`
--> src/main.rs:7:13
|
7 | let stream = stream.unwrap();
| ^^^^^^ help: if this is intentional, prefix it with an underscore: `_stream`
|
= note: `#[warn(unused_variables)]` on by default
warning: 1 warning emitted
Finished dev [unoptimized + debuginfo] target(s) in 0.98s
Running `target/debug/hello`
Connection established!
Connection established!
^C
$
- The connections are reset because the server isn’t currently sending back any data.
Reading the Request
use std::io::prelude::*;
use std::net::TcpListener;
use std::net::TcpStream;
fn main() {
let listener = TcpListener::bind("127.0.0.1:7878").unwrap();
for stream in listener.incoming() {
let stream = stream.unwrap();
handle_connection(stream);
}
}
fn handle_connection(mut stream: TcpStream) {
let mut buffer = [0; 1024];
stream.read(&mut buffer).unwrap();
println!("Request: {}", String::from_utf8_lossy(&buffer[..]));
}
In the
handle_connection
function, we’ve made the stream parameter mutable. The reason is that theTcpStream
instance keeps track of what data it returns to us internally. It might read more data than we asked for and save that data for the next time we ask for data. It therefore needs to bemut
because its internal state might change; usually, we think of “reading” as not needing mutation, but in this case we need the mut keyword.How to read from stream. -> 3 steps.
- Declare a
buffer
on the stack to hold the data. It’s 1024 bytes in the example. - Pass the
buffer
tostream.read
, which will read bytes from theTcpStream
and put them in thebuffer
(stream.read(&mut buffer).unwrap();
). - Convert the bytes in the buffer to a string and print that string (
String::from_utf8_lossy
).
- Declare a
Test:
$ cargo run
# Other terminal
curl localhost:7878 -vvv -H "Host: myserver.com"
* Trying 127.0.0.1:7878...
* TCP_NODELAY set
* Connected to localhost (127.0.0.1) port 7878 (#0)
> GET / HTTP/1.1
> Host: myserver.com
> User-Agent: curl/7.68.0
> Accept: */*
>
* Empty reply from server
* Connection #0 to host localhost left intact
curl: (52) Empty reply from server
# Cargo run terminal
$ cargo run
Finished dev [unoptimized + debuginfo] target(s) in 0.00s
Running `target/debug/hello`
Request: GET / HTTP/1.1
Host: myserver.com
User-Agent: curl/7.68.0
Accept: */*
^C
Writing a Response
First, no HTTP body, just header.
Change the handle_connection
function as follows.
fn handle_connection(mut stream: TcpStream) {
let mut buffer = [0; 1024];
stream.read(&mut buffer).unwrap();
let response = "HTTP/1.1 200 OK\r\n\r\n";
stream.write(response.as_bytes()).unwrap();
stream.flush().unwrap();
}
Returning Real HTML
hello.html
(the same location with src
)
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8">
<title>Hello!</title>
</head>
<body>
<h1>Hello!</h1>
<p>Hi from Rust</p>
</body>
</html>
Change the handle_connection
function as follows.
use std::fs;
fn handle_connection(mut stream: TcpStream) {
let mut buffer = [0; 1024];
stream.read(&mut buffer).unwrap();
let contents = fs::read_to_string("hello.html").unwrap();
let response = format!(
"HTTP/1.1 200 OK\r\nContent-Length: {}\r\n\r\n{}",
contents.len(),
contents
);
stream.write(response.as_bytes()).unwrap();
stream.flush().unwrap();
}
Test:
$ curl localhost:7878
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8">
<title>Hello!</title>
</head>
<body>
<h1>Hello!</h1>
<p>Hi from Rust</p>
</body>
</html>
Validating the Request and Selectively Responding
Returns only GET request.
Change the handle_connection
function as follows.
fn handle_connection(mut stream: TcpStream) {
let mut buffer = [0; 1024];
stream.read(&mut buffer).unwrap();
let get = b"GET / HTTP/1.1\r\n";
if buffer.starts_with(get) {
let contents = fs::read_to_string("hello.html").unwrap();
let response = format!(
"HTTP/1.1 200 OK\r\nContent-Length: {}\r\n\r\n{}",
contents.len(),
contents
);
stream.write(response.as_bytes()).unwrap();
stream.flush().unwrap();
} else {
let contents = String::from("Panic!!");
let response = format!(
"HTTP/1.1 401 OK\r\nContent-Length: {}\r\n\r\n{}",
contents.len(),
contents
);
stream.write(response.as_bytes()).unwrap();
stream.flush().unwrap();
}
}
Return error page
Change else
part in
else {
let status_line = "HTTP/1.1 404 NOT FOUND\r\n\r\n";
let contents = fs::read_to_string("404.html").unwrap();
let response = format!("{}{}", status_line, contents);
stream.write(response.as_bytes()).unwrap();
stream.flush().unwrap();
}
404.html
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8">
<title>Hello!</title>
</head>
<body>
<h1>Oops!</h1>
<p>Sorry, I don't know what you're asking for.</p>
</body>
</html>
A Touch of Refactoring
Here is the refactored handle_connection
function.
fn handle_connection(mut stream: TcpStream) {
let mut buffer = [0; 1024];
stream.read(&mut buffer).unwrap();
let get = b"GET / HTTP/1.1\r\n";
let (status_line, filename) = if buffer.starts_with(get) {
("HTTP/1.1 200 OK\r\n\r\n", "hello.html")
} else {
("HTTP/1.1 404 NOT FOUND\r\n\r\n", "404.html")
};
let contents = fs::read_to_string(filename).unwrap();
let response = format!("{}{}", status_line, contents);
stream.write(response.as_bytes()).unwrap();
stream.flush().unwrap();
}
21. Appendix
21.3 C: Derivable Traits
Default for Default Values
The Default
trait allows you to create a default value for a type.
Deriving Default
implements the default
function.
The derived implementation of the default
function calls the default
function on each part of the type, meaning all fields or values in the type must also implement Default
to derive Default
.
Appendixes
A. Naming rule
https://rust-lang.github.io/api-guidelines/naming.html
Note about prelude
https://doc.rust-lang.org/std/prelude/index.html
The prelude is the list of things that Rust automatically imports into every Rust program. It’s kept as small as possible, and is focused on things, particularly traits, which are used in almost every single Rust program.
Note about Rust memory management
The following link was a very good introductive post: https://deepu.tech/memory-management-in-rust/
Super reference how rust allocate memory: https://www.youtube.com/watch?v=rDoqT-a6UFg