L/R/U References and Move Semantics

c++
 
gcc
 

Move semantics were introduced in C++11 to provided a standard way to manage memory using move (instead of copy). Benefits:

  1. Avoid unnecessary copies in memory. Another technique that serves similar purpose is RVO (link to post);
  2. Resource (memory) management. The most notable example is unique_ptr.

This post gathers my study notes on this topic.

Move Semantics

At first glance, std::move(object) is the most notable semantic changes in this context. It is important to know what exactly it does, and more importantly, what it does not do. A good understanding of std::move(object) facilitates writing examples tests, so we’ll start with that.

Before jumping to the conclusion, let’s take a quick look at the gcc source code:

// bits/move.h
/** 
 *  @brief  Convert a value to an rvalue.
 *  @param  __t  A thing of arbitrary type.
 *  @return The parameter cast to an rvalue-reference to allow moving it.
*/
template<typename _Tp>
constexpr typename std::remove_reference<_Tp>::type&&
move(_Tp&& __t) noexcept
{ return static_cast<typename std::remove_reference<_Tp>::type&&>(__t); }

// traits
template<typename _Tp> 
struct remove_reference
{ typedef _Tp   type; };

template<typename _Tp> 
struct remove_reference<_Tp&>
{ typedef _Tp   type; };

template<typename _Tp> 
struct remove_reference<_Tp&&>
{ typedef _Tp   type; };

std::move(object) does exactly one thing - produce a rvalue of the input. It does so in two steps: first, remove_reference cast, to remove reference from the type; then add right-reference to the return type. The remove_reference is required because of reference collapse (a complication in generating the correct reference type when dealing with references).

IT DOES NOT move anything around. The actual move happens in the implementations that explicitly takes rvalue references. For example std::vector’s move constructor and move assignment ‘steal’ the data of its source and leaves its source in a ‘valid but unspecified state’.

Why is it named move() then if it does not ‘move’? For example, Scott Meyers in item 23 of Effective Modern C++ mentions std::move(object) functions more like rvalue_cast. Speculatively, I think it might have been a syntax sugar to remind the user that the object has been moved. After all, from user’s point of view, there isn’t any other indicator of moving:

std::unique_ptr<T> p1 = std::make_unique<T>();
std::unique_ptr<T> p2(std::move(p1)); // hinting 'move'
std::unique_ptr<T> p3(std::rvalue_cast(p1)); // no signal of p1's content 'moved' at first glance

Values Types in terms of ‘move-ability’

In the first section, we discussed about what std::move(object) does - it is essentially a rvalue_cast. rvalue determines the ‘move-ability’ of the expression (the concept is broader than objects as it includes the results of expressions on objects, which can be seen as temporarily objects).

Definitions

A few informal definitions that facilitate us understanding the concepts:

  1. lvalue and rvalue originated from being on which the side of the assignment operator =. This definition predates C++11 (and possibly C++ itself). In this definition, expressions that can be on the left side of the assignment are lvalues, while all other expressions are rvalues;
  2. Another definitions involves getting its address using address of operator &. lvalues are the ones whose address can be taken using &;
  3. Another definition is rvalues are values that are ‘moveable’. This is an accurate definition of rvalues.

These two definitions portraits the main story - lvalues are the expressions that have an address, and therefore can be on the left side of the assignment to receive a value, and all other expressions that do not fall in this category are rvalues. The full story includes 3 other types (glvalue, prvalue and xvalue):

A final note on value types - rvalue, lvalue are independent of constness as they represent different characteristics of the expressions although constness does play a role in later section about reference binding.

Compilation VS Language Abstraction

It is important to note that these concepts are language abstractions instead of compilation rules. For example, if we compile the below statement with -O0

int i = 3;

we have

mov     w0 3			; set w0 to 3
str     w0 [sp, 12]	; store w0 in function stack

The compilation result indicates that the expression “3” does not have address, but on language abstraction level it does. In fact, if we do

int &&i = 3;

the compiler generates

mov     w0 3
str     w0 [sp, 4]
add     x0 sp, 4		; get w0 address
str     x0 [sp, 8]		; store w0 address in function stack

So compiler always generate the minimum required for the code. We can still use compiler output to do analysis, but we should bear in mind when we discuss about the concepts in this post, we are discussing them in terms of C++ language abstraction.

References

C++11 expands on the definition of the references.

Types

  • Lvalue Reference: the original reference. It practically means creating an alias to an existing object. It is usually denoted by a &, and it binds only to lvalues;
  • Rvalue Reference: the new reference. It is a special type of alias, usually indicating the object it aliases to is ‘moveable’. It is usually denoted by a &&, and it binds only to rvalues. I see it as a ‘deeper’ form of reference where we can not only update the original object, but also ‘steal’ its content;
  • Universal Reference: binds to anything. Similar to rvalue reference, it is also denoted by a &&. But for a reference to be universal, it must involve direct type deduction on the type itself. Two forms of universal reference:
    // case 1
    template<typename T>
    void foo(T&& t) {}	// T is deducted from argument passed to 't' 
    // case 2
    auto && v1 = v2;	// v1's type is deducted from v2's type
    

    Universal Reference works as either Lvalue Reference or Rvalue Reference depending on the bound object type.

Reference Binding

Universal References binds to anything, but what about the other two? What role does constness play in this? Let’s see it in an example:

using V = std::vector<int>;
void value (V v) { ... }
void lref (V &v) { ... }
void rref (V &&v) { ... }
void const_value (const V v) { ... }
void const_lref (const V &v) { ... }
void const_rref (const V &&v) { ... }

int main() {
    V var; const V const_var;
	...

In the above setup, we create functions using a combination of pass by value, pass by lvalue reference and pass by rvalue reference. We initialized two variables: non-const variable var and const variable const_var(both are lvalues). To obtain their corresponding rvalues, we use std::move(object).

value(var);                        // copy ctor called to create parameter
value(const_var);                  // copy ctor called to create parameter
value(std::move(var));             // move ctor called to create parameter
value(std::move(const_var));       // copy ctor called to create parameter
const_value(var);                  // copy ctor called to create parameter
const_value(const_var);            // copy ctor called to create parameter
const_value(std::move(var));       // move ctor called to create parameter
const_value(std::move(const_var)); // copy ctor called to create parameter

Not a whole lot to see up there:

  • Pass-by-value accepts both types of references;
  • move constructor is called for non-const rvalues, and copy constructor for everything else.
lref(var);
//lref(const_var); // const qualifier discarded
//lref(std::move(var)); // cannot bind non-const lvalue reference to rvalue
//lref(std::move(const_var)); // cannot bind lvalue ref to rvalue
const_lref(var);
const_lref(const_var);
const_lref(std::move(var));
const_lref(std::move(const_var));

Lvalue references bind to non-const lvalues. Const lvalue references bind to everything.

//rref(var); //cannot bind rvalue reference to lvalue
//rref(const_var); // cannot bind rvalue reference to lvalue
rref(std::move(var));
//rref(std::move(const_var)); // const qualifier
//const_rref(var);
//const_rref(const_var); // cannot bind rvalue reference
const_rref(std::move(var));
const_rref(std::move(const_var));

Rvalue references bind to non-const right values. Const rvalue references bind to rvalues.

When a function call match to a non reference function signature and reference type function signature, a compile error raises complaining about ambiguity on function call:

error: call of overloaded func_name(value_type) is ambiguous

However, when multiple reference function signatures are matched, there is an implicit binding preference (lower number means higher preference when matched):

val-type\ref-type lref const-lref rref const-rref
lvalue 1 2 - -
const lvalue - 1 - -
rvalue - 1 3 2
const rvalue - 2 - 1

Conclusion is:

  • rvalue prefers rvalue reference;
  • Non-const is preferred over const.

Reference Collapse and Universal Reference

An important part of the design of rvalue reference is Reference Collapse. Bear in mind that reference chaining is illegal for user operations, but they are produced in certain contexts.

Reference Collapse rules that two references result into an lvalue reference if either reference is lvalue reference, and into an rvalue reference only if both references are rvalue reference. This is the reason we need to apply std::remove_reference to std::move(object)’s result: it blocks us from converting an lvalue to an rvalue using only rvalue casting.

Universal Reference is the outcome of Reference Collapse.

Perfect Forwarding

Let us also start with gcc source code for std::forward(object):

// bits/move.h
/* @brief  Forward an lvalue. */
template<typename _Tp>
constexpr _Tp&&
forward(typename std::remove_reference<_Tp>::type& __t) noexcept
{ return static_cast<_Tp&&>(__t); }
/* @brief  Forward an rvalue. */
template<typename _Tp>
constexpr _Tp&&
forward(typename std::remove_reference<_Tp>::type&& __t) noexcept
{ static_assert(...); return static_cast<_Tp&&>(__t); }  
// Note with Reference Collapse we can unify these two function if 
// ignoring static_assert():
template<typename _Tp>
constexpr _Tp&&
forward(typename std::remove_reference<_Tp>::type&& __t) noexcept
{ return static_cast<_Tp&&>(__t); }

Given these facts, in order to forward an rvalue, _Tp needs to be the non-reference type. Suppose T is the non-reference type, we have

std::forward<T>(rvalue); // can't be 'T&' because return type 'T & &&' collapse into 'T&'

To forward an lvalue, _Tp needs to be the left value reference type, i.e. T&:

std::forward<T&>(lvalue); // can't be 'T' because it would result into 'T &&'

std::forward(object) is designed to use with Universal Reference. Universal Reference template deduction rules that the deduced type is a lvalue reference if passed-in argument is lvalue type and non-reference type if passed-in argument is rvalue, which perfectly matches the implementations of std::forward(object).

std::forward(object) does what its name suggests - it forwards lvalue/rvalue depending on the ARGUMENT type (not to be confused with parameter. In fact, it is precisely designed to deal with the reality that regardless of argument type, the parameter is always a lvalue by definition).

Best Practices

Conclusion time!

  1. Use std::forward(object) for Universal Reference as it is designed for this exact purpose; use std::move(object) to pass rvalue from rvalue reference or when explicitly moving from object; Although, with some tweaks, the two functions can be used interchangeably, std::move(object) vs std::forward(object) serve very different purpose. Following this rule also improves consistency and readability:
  2. Don’t make objects const if planned to move them. It doesn’t make sense to declare objects we plan to modify along the way as const in the first place. More dangerously, these move requests are silently transformed into copies if copy signature exists (item 23, page 160);
  3. Delete copy constructors if not required. This helps mitigate misuses in #2 with user defined types by preventing copy constructor getting called in place of move constructor;
  4. Use universal reference instead of overloading lvalue reference and rvalue reference for the same purpose. The benefits are 1. code readability and maintainbility (less code); 2. scalability (number of overloads increases exponentially with the increase number of parameters); 3. performance improvement (item 25, page 171).
    // prefer universal reference
    template <typename T>
    void foo(T&& t) {
     bar(std::forward(t));
    }
    // overly overloading
    void foo(const std::string& s) {
     bar(s);
    }
    void foo(const std::string&& s) {
     bar(std::move(s));
    }
    
  5. Call std::move(object) or std::forward(object) only when not planning to further use the object as they might be in invalid state;
  6. Apply std::move(object) or std::forward(object) to the return value if it is bound to a rvalue reference or universal reference, respectively. It triggers a move operation if return value is passed to a function consuming rvalue (and performs move operation; and if universal forwarded value is an rvalue);
  7. Follow RVO guidelines when returning local variables. C++ standard dictates that the compiler shall perform std::move(object) if applicable (item 23, page 160).
  8. Avoid overloading on Universal References, especially avoid declaring a constructor with Universal References as implicit overloading rules could result into unwanted behavior, and implicit constructor don’t work well with universal referenced constructor (item 26, item 27).
  9. auto && has the same effect as universal reference. They (also, typedef, decltype) are all a result of Reference Collapsing;
  10. Move operation is most effective on heap allocated objects. E.g., move operation on std::array is linear to its size (albeit faster than copy) and on small std::string (item 29).
  11. As for C++17, overloading . (dot) operator is yet to be allowed, although some discussion is on-going. Currently it results into same l/r-ness as its host type:
    void func (V &v) { std::cout << "by lref" << std::endl;}
    void func (V &&v) { std::cout << "by rref" << std::endl;}
    T t;
    func(t.v);		// "by lref"
    func(std::move(t).v);	// "by rref"
    

Credits and Sources

Library source code snippets are from gcc7.5. Most of the topics listed in this post is regurgitating Scott Meyers’s Effective Modern C++. This post is my reading log of its Chapter 5.