STL Algorithms - Binary Search

20 Feb 2018

Requirements
- Ordering Requirements
- Complexity Guarantee
Distinctions between Algorithms
In Practice
STL Implementations
Quick Implementation
Related

To do a simple binary search, there are four STL Algorithms at our disposal: std::binary_search(...), std::lower_bound(...), std::upper_bound(...), std::equal_range(...). When should we use them? When should we not use them? Which one should we use?

Requirements

Before diving into the differences between the functions, it should be stressed that STL binary search algorithms are not panaceas to find an element in O(lg n) time. It has its requirements for it to

find the correct element and
perform the operation in O(lg n) time.

Ordering Requirements

In order to for the algorithm to find the correct result/position(s), some requirements on the ordering of the elements must be met. Algorithms may have different requirements:

std::lower_bound(...): all elements less than the target must precede all elements greater or equal than the target. More strictly speaking, all elements that comp(element,target) == true must precede all elements that comp(element,target) == false. In formal terms, we say the elements in the range of interest[first,last)must be partitioned with respect tocomp(element,target).
std::upper_bound(...): the elements in the range[first,last)must be partitioned with!comp(target,element). Notice that the comp(...) reverses the order of the parameters in comparison to std::lower_bound(...)’s comp(...). In practical terms, when using the default std::less comparator, all elements that are less than or equal to the target must precede all elements greater than the target.
std::equal_range(...): both partition rules above mentioned must be met. In addition to the two requirements, if comp(element,target)is true, then!comp(target,element)must be true. In practical terms, it says if a<b then a<=b.
std::binary_search(...): same requirements as std::equal_range(...).

A sequence with a total order comparator that is fully sorted meets all the requirements listed above, e.g. when we refer a sequence as ‘sorted’ in general terms, but it’s an overly strict requirement. In fact, even a strict weak ordering requirement is too strict. N1313 explains why strict weak ordering is overly strict for binary search algorithms and Defect Report 270 was accpeted in CD1. Boost.org and Bendersky cover strict weak ordering in a comprehensible manner.

Complexity Guarantee

To guarantee O(log n) time performance, std::RandomAccessIterator needs to be supported, i.e. the container’s iterator should have std::iterator_category of std::random_access_iterator_tag This requirement is self-explanatory, but let’s see which standard containers meet this requirement:

Of sequence containers, std::vector, std::array and std::deque meets the requirement;
The remaining two sequence containers std::list and std::forward_list have std::BidirectionalIterator and std::ForwardIterator, respectively and does not meet this requirement. It’s intuitive since we can’t find an element in O(log n) time for these containers;
The associative containers (std::[multi_]set, std::[multi_]map) are sorted by nature, but they fail to meet this requirement as they have iterators of type std::BidirectionalIterator. However, the binary search algorithms are provided as member functions of the containers, as the containers (having tree data structure) stores extra information of its elements inside the container;
The unordered associative containers (std::unordered_[multi_]set, std::unordered_[multi_]map) are unordered. The have iterators of type std::ForwardIterator and does not meet this requirement.
C-arrays who have ‘iterators of type’ raw pointer meets this requirement, as its iterator are categorized as std::RandomAccessIterator.

Distinctions between Algorithms

Why do we need four distinct algorithms for binary search? Where do their name come?

std::binary_search(…)

All four functions have identical parameter list, but std::binary_search(...) differs from others on its return type being a boolean. std::binary_search(...) returns whether the target is found in the provided range.

std::lower_bound(…) and std::upper_bound(…)

With default comparator <, std::lower_bound(...) finds the first element where element >= target; std::upper_bound(...) finds the first element where element > target.

As mentioned in the previous section, std::lower_bound(...) and std::upper_bound(...) have looser requirements compared to the other other two function.

One of the thing that confused for me a good while is their naming. ‘lower bound’ and ‘upper bound’ refer to finding the range of ‘target’. Although asymmetric (inclusive ‘lower bound’ vs exclusive ‘upper bound’), they are consistent with C++ iterator range: begin() is inclusive while end() is exclusive.

std::equal_range(…)

std::equal_range(...) returns a std::pair consisting results of std::lower_bound(...) and std::upper_bound(...). It finds a range of elements that are equal to ‘target’.

It has requirements stricter than std::lower_bound(...) and std::lower_bound(...) combined (same requirements as std::binary_search()) to allow extra optimization.

In Practice

Finding elements

Previous section describes the differences and usages of each of the four functions and we won’t further elaborate on that. Instead, let us have an example on finding different indices:

//                  0 1 2 3 4 5 6 7 8
std::vector<int> v {1,1,1,2,2,2,3,3,3};
// find first occurance of a value >= 2
auto it = std::lower_bound(v.begin(), v.end(), 2);  // return iterator at index 3
// find last occurance of a value < 2
it != v.begin() ? --it : it=v.end();                // return iterator at index 2
// find first occurance of a value > 2
auto it2 = std::upper_bound(v.begin(), v.end(), 2); // return iterator at index 6
// find last occurance of a value <= 2
it2 != v.begin() ? --it2 : it2=v.end();             // return iterator at index 5
/* Note: the examples for 'last occurances' are examples to illustrate that we
 * need to consider boundary. Actual implementation is up to the user. */

Defining Comparator

It is trivial when the target is of the same type as the elements in the container. Same comparator can be used for all four functions. Below is an example using lambda function:

struct MyInt
{
  MyInt(int i) : value(i) {}
  int value;
};
// same setup using a custom type 
std::vector<MyInt> v2 {1,1,1,2,2,2,3,3,3};
MyInt myint(2);
// same comparator can be used for the other three functions
auto p = std::equal_range(v2.begin(), v2.end(), myint, 
    [](const MyInt &lhs, const MyInt &rhs){ 
      return lhs.value < rhs.value;
    });
// equal_range returns a pair result of lower_bound and upper_bound
std::cout << "equal_range:" <<
    p.first - v2.begin() << "," <<
    p.second - v2.begin() << std::endl;

Now, let’s consider when target is not of the same type as the elements in the container.

// make MyInt explicit to avoid implicit conversion from int to MyInt in the comparator
struct MyInt{
    explicit MyInt(int i) : value(i) {}
    int value;
};
// construct the same vector v3 == v2
std::vector<MyInt> v3;
for (int i : v) v3.push_back(MyInt(i))

For std::lower_bound(...) and std::upper_bound(...) we have:

auto lowerBoundRes = std::lower_bound(v3.begin(), v3.end(), 2, 
    [](const MyInt &element, int target){ 
      return element.value < target;
    });
auto upperBoundRes = std::upper_bound(v3.begin(), v3.end(), 2, 
    [](int target, const MyInt &element){ 
      return target < element.value;
    });

I purposely named the parameters of the lambda function as target and element in this example. The fact that std::lower_bound(...) and std::upper_bound(...)require different comparators is easily ignored when the two parameters are of the same type (and therefore the ordering doesn’t matter).

Using the same example, for std::binary_search(...) and std::equal_range(...), we need both comparators implemented:

struct MyCompare 
{
  bool operator()( const MyInt& v, int time ) const { return v.value < time; }
  bool operator()( int time, const MyInt& v ) const { return time < v.value; }
};
auto equalRangeRes = std::equal_range(v3.begin(), v3.end(), 2, MyCompare());
auto binarySearchRes = std::binary_search(v3.begin(), v3.end(), 2, MyCompare());

Reverse Ordered

When the elements in the container is in a reverse order, we can simply reverse the comparator to yield the same result:

//                         0 1 2 3 4 5 6 7 8
std::vector<int> reverseV {3,3,3,2,2,2,1,1,1};
auto p = std::equal_range(reverseV.begin(), reverseV.end(), 2,
    [](int lhs, int rhs) {return lhs > rhs;});

STL Implementations

Before going to the algorithm implementations, a quick peek of std::advance(iterator, distance):

// bits/stl_iterator_base_funcs.h
/**
 *  @brief A generalization of pointer arithmetic.
 *  @param  __i  An input iterator.
 *  @param  __n  The @a delta by which to change @p __i.
 *  @return  Nothing.
 *
 *  This increments @p i by @p n.  For bidirectional and random access
 *  iterators, @p __n may be negative, in which case @p __i is decremented.
 *
 *  For random access iterators, this uses their @c + and @c - operations
 *  and are constant time.  For other %iterator classes they are linear time.
*/
template<typename _InputIterator, typename _Distance>
inline _GLIBCXX14_CONSTEXPR void
__advance(_InputIterator& __i, _Distance __n, input_iterator_tag) {
  /* ... */
  while (__n--) ++__i;
}
template<typename _BidirectionalIterator, typename _Distance>
inline _GLIBCXX14_CONSTEXPR void
__advance(_BidirectionalIterator& __i, _Distance __n, bidirectional_iterator_tag) {
  /* ... */
  if (__n > 0)
    while (__n--)
      ++__i;
  else
    while (__n++)
      --__i;
}
template<typename _RandomAccessIterator, typename _Distance>
inline _GLIBCXX14_CONSTEXPR void
__advance(_RandomAccessIterator& __i, _Distance __n, random_access_iterator_tag) {
  /* ... */
  __i += __n;
}

Notice the tag dispatcher is used in this implementation (Iitem 23). We can see that for std::advance(iterator,distance) to be O(1), the iterator needs to meet the requirements of RandomAccessIterator .

std::lower_bound(start,end,value,comp)

std::lower_bound(...) is a classic binary search implementation that finds first the false in a [true, true, ..., false, false]sequence:

// bits/stl_algobase.h
/**
 *  @brief Finds the first position in which @a val could be inserted
 *         without changing the ordering.
 *  @param  __first   An iterator.
 *  @param  __last    Another iterator.
 *  @param  __val     The search term.
 *  @param  __comp    The comparator. 
 *  @return An iterator pointing to the first element <em>not less
 *          than</em> @a val, or end() if every element is less than 
 *          @a val.
 *  @ingroup binary_search_algorithms
*/
template<typename _ForwardIterator, typename _Tp, typename _Compare> 
_ForwardIterator
__lower_bound(_ForwardIterator __first, _ForwardIterator __last,
    const _Tp& __val, _Compare __comp) {
  /* ... */
  _DistanceType __len = std::distance(__first, __last);
  while (__len > 0) {
    _DistanceType __half = __len >> 1;
    _ForwardIterator __middle = __first;
    std::advance(__middle, __half);
    if (__comp(__middle, __val)) {
      __first = __middle;
      ++__first;
      __len = __len - __half - 1;
    } else {
      __len = __half;
    }
  }
  return __first;
}

std::uper_bound(start,end,value,comp)

Likewise, std::upper_bound(...) is the same classic binary search implementation, with the exception that it finds the first true in a [false, false, ..., true, true]sequence. Its implementation is close to that of std::lower_bound(...), except:

comparator is different - __comp(__val, __middle);
if, else are the opposite to that of std::lower_bound(...).

std::equal_range(start,end,value,comp)

std::equal_range(...) doesn’t stupidly call std::lower_bound(...) and std::upper_bound(...). It halves the range similar to them until it finds the target and split the remaining range and call them on the two halves respectively:

// bits/stl_algo.h
/**
 *  @brief Finds the largest subrange in which @p __val could be inserted
 *         at any place in it without changing the ordering.
 *  @param  __first   An iterator.
 *  @param  __last    Another iterator.
 *  @param  __val     The search term.
 *  @param  __comp    A functor to use for comparisons.
 *  @return  An pair of iterators defining the subrange.
 *  @ingroup binary_search_algorithms
 *
 *  This is equivalent to
 *  @code
 *    std::make_pair(lower_bound(__first, __last, __val, __comp),
 *                   upper_bound(__first, __last, __val, __comp))
 *  @endcode
 *  but does not actually call those functions.
*/
template<typename _ForwardIterator,typename _Tp,typename _CompareItTp,typename _CompareTpIt>
pair<_ForwardIterator, _ForwardIterator>
__equal_range(_ForwardIterator __first, _ForwardIterator __last, const _Tp& __val,
    _CompareItTp __comp_it_val, _CompareTpIt __comp_val_it) {
  /* ... */
  _DistanceType __len = std::distance(__first, __last);
  while (__len > 0) {
    _DistanceType __half = __len >> 1;
    _ForwardIterator __middle = __first;
    std::advance(__middle, __half);
    if (__comp_it_val(__middle, __val)) {
      __first = __middle;
      ++__first;
      __len = __len - __half - 1;
    } else if (__comp_val_it(__val, __middle)) {
      __len = __half;
    } else {
      _ForwardIterator __left = std::__lower_bound(__first,__middle,__val,__comp_it_val);
      std::advance(__first, __len);
      _ForwardIterator __right = std::__upper_bound(++__middle,__first,__val,__comp_val_it);
      return pair<_ForwardIterator, _ForwardIterator>(__left, __right);
    }
  }
  return pair<_ForwardIterator, _ForwardIterator>(__first, __first);
}

Quick Implementation

When dealing with code challenges, I personally prefer to use

int lo = 0, hi = v.size();
while (lo < hi) {
  int mid = (hi-lo)/2 + lo; // avoid lo+hi overflow
  if (v[mid] == target) {
    return true;
  } else if(v[mid] > target) {
    hi = mid;
  } else {
    lo = mid+1;
  }
}

As always, every C++ related topic I decided to study, there is a summarized book/paper/post from Scott Meyers. A couple interesting reads from the standard: proposal and defect report of binary search requirements. Bendersky’s post nicely covers the different types of ordering.

Ranting Jacaré