STL Algorithms - Binary Search
20 Feb 2018- Requirements
- Distinctions between Algorithms
- In Practice
- STL Implementations
- Quick Implementation
- Related
To do a simple binary search, there are four STL Algorithms at our disposal: std::binary_search(...)
, std::lower_bound(...)
, std::upper_bound(...)
, std::equal_range(...)
.
When should we use them?
When should we not use them?
Which one should we use?
Requirements
Before diving into the differences between the functions, it should be stressed that STL binary search algorithms are not panaceas to find an element in O(lg n)
time.
It has its requirements for it to
- find the correct element and
- perform the operation in
O(lg n)
time.
Ordering Requirements
In order to for the algorithm to find the correct result/position(s), some requirements on the ordering of the elements must be met. Algorithms may have different requirements:
-
std::lower_bound(...)
: all elements less than the target must precede all elements greater or equal than the target. More strictly speaking, all elements thatcomp(element,target) == true
must precede all elements thatcomp(element,target) == false
. In formal terms, we say the elements in the range of interest[first,last)
must be partitioned with respect tocomp(element,target)
. -
std::upper_bound(...)
: the elements in the range[first,last)
must be partitioned with!comp(target,element)
. Notice that thecomp(...)
reverses the order of the parameters in comparison tostd::lower_bound(...)
’scomp(...)
. In practical terms, when using the defaultstd::less
comparator, all elements that are less than or equal to the target must precede all elements greater than the target. -
std::equal_range(...)
: both partition rules above mentioned must be met. In addition to the two requirements, ifcomp(element,target)
is true, then!comp(target,element)
must be true. In practical terms, it says ifa<b
thena<=b
. -
std::binary_search(...)
: same requirements asstd::equal_range(...)
.
A sequence with a total order comparator that is fully sorted meets all the requirements listed above, e.g. when we refer a sequence as ‘sorted’ in general terms, but it’s an overly strict requirement. In fact, even a strict weak ordering requirement is too strict. N1313 explains why strict weak ordering is overly strict for binary search algorithms and Defect Report 270 was accpeted in CD1. Boost.org and Bendersky cover strict weak ordering in a comprehensible manner.
Complexity Guarantee
To guarantee O(log n)
time performance, std::RandomAccessIterator
needs to be supported, i.e. the container’s iterator should have std::iterator_category
of std::random_access_iterator_tag
This requirement is self-explanatory, but let’s see which standard containers meet this requirement:
- Of sequence containers,
std::vector
,std::array
andstd::deque
meets the requirement; - The remaining two sequence containers
std::list
andstd::forward_list
havestd::BidirectionalIterator
andstd::ForwardIterator
, respectively and does not meet this requirement. It’s intuitive since we can’t find an element inO(log n)
time for these containers; - The associative containers (
std::[multi_]set
,std::[multi_]map
) are sorted by nature, but they fail to meet this requirement as they have iterators of typestd::BidirectionalIterator
. However, the binary search algorithms are provided as member functions of the containers, as the containers (having tree data structure) stores extra information of its elements inside the container; - The unordered associative containers (
std::unordered_[multi_]set
,std::unordered_[multi_]map
) are unordered. The have iterators of typestd::ForwardIterator
and does not meet this requirement. -
C-arrays who have ‘iterators of type’ raw pointer meets this requirement, as its iterator are categorized as
std::RandomAccessIterator
.
Distinctions between Algorithms
Why do we need four distinct algorithms for binary search? Where do their name come?
std::binary_search(…)
All four functions have identical parameter list, but std::binary_search(...)
differs from others on its return type being a boolean.
std::binary_search(...)
returns whether the target is found in the provided range.
std::lower_bound(…) and std::upper_bound(…)
With default comparator <
, std::lower_bound(...)
finds the first element where element >= target
;
std::upper_bound(...)
finds the first element where element > target
.
As mentioned in the previous section, std::lower_bound(...)
and std::upper_bound(...)
have looser requirements compared to the other other two function.
One of the thing that confused for me a good while is their naming.
‘lower bound’ and ‘upper bound’ refer to finding the range of ‘target’.
Although asymmetric (inclusive ‘lower bound’ vs exclusive ‘upper bound’), they are consistent with C++ iterator range: begin()
is inclusive while end()
is exclusive.
std::equal_range(…)
std::equal_range(...)
returns a std::pair
consisting results of std::lower_bound(...)
and std::upper_bound(...)
.
It finds a range of elements that are equal to ‘target’.
It has requirements stricter than std::lower_bound(...)
and std::lower_bound(...)
combined (same requirements as std::binary_search()
) to allow extra optimization.
In Practice
Finding elements
Previous section describes the differences and usages of each of the four functions and we won’t further elaborate on that. Instead, let us have an example on finding different indices:
// 0 1 2 3 4 5 6 7 8
std::vector<int> v {1,1,1,2,2,2,3,3,3};
// find first occurance of a value >= 2
auto it = std::lower_bound(v.begin(), v.end(), 2); // return iterator at index 3
// find last occurance of a value < 2
it != v.begin() ? --it : it=v.end(); // return iterator at index 2
// find first occurance of a value > 2
auto it2 = std::upper_bound(v.begin(), v.end(), 2); // return iterator at index 6
// find last occurance of a value <= 2
it2 != v.begin() ? --it2 : it2=v.end(); // return iterator at index 5
/* Note: the examples for 'last occurances' are examples to illustrate that we
* need to consider boundary. Actual implementation is up to the user. */
Defining Comparator
It is trivial when the target is of the same type as the elements in the container. Same comparator can be used for all four functions. Below is an example using lambda function:
struct MyInt
{
MyInt(int i) : value(i) {}
int value;
};
// same setup using a custom type
std::vector<MyInt> v2 {1,1,1,2,2,2,3,3,3};
MyInt myint(2);
// same comparator can be used for the other three functions
auto p = std::equal_range(v2.begin(), v2.end(), myint,
[](const MyInt &lhs, const MyInt &rhs){
return lhs.value < rhs.value;
});
// equal_range returns a pair result of lower_bound and upper_bound
std::cout << "equal_range:" <<
p.first - v2.begin() << "," <<
p.second - v2.begin() << std::endl;
Now, let’s consider when target is not of the same type as the elements in the container.
// make MyInt explicit to avoid implicit conversion from int to MyInt in the comparator
struct MyInt{
explicit MyInt(int i) : value(i) {}
int value;
};
// construct the same vector v3 == v2
std::vector<MyInt> v3;
for (int i : v) v3.push_back(MyInt(i))
For std::lower_bound(...)
and std::upper_bound(...)
we have:
auto lowerBoundRes = std::lower_bound(v3.begin(), v3.end(), 2,
[](const MyInt &element, int target){
return element.value < target;
});
auto upperBoundRes = std::upper_bound(v3.begin(), v3.end(), 2,
[](int target, const MyInt &element){
return target < element.value;
});
I purposely named the parameters of the lambda function as target
and element
in this example.
The fact that std::lower_bound(...)
and std::upper_bound(...)
require different comparators is easily ignored when the two parameters are of the same type (and therefore the ordering doesn’t matter).
Using the same example, for std::binary_search(...)
and std::equal_range(...)
, we need both comparators implemented:
struct MyCompare
{
bool operator()( const MyInt& v, int time ) const { return v.value < time; }
bool operator()( int time, const MyInt& v ) const { return time < v.value; }
};
auto equalRangeRes = std::equal_range(v3.begin(), v3.end(), 2, MyCompare());
auto binarySearchRes = std::binary_search(v3.begin(), v3.end(), 2, MyCompare());
Reverse Ordered
When the elements in the container is in a reverse order, we can simply reverse the comparator to yield the same result:
// 0 1 2 3 4 5 6 7 8
std::vector<int> reverseV {3,3,3,2,2,2,1,1,1};
auto p = std::equal_range(reverseV.begin(), reverseV.end(), 2,
[](int lhs, int rhs) {return lhs > rhs;});
STL Implementations
Before going to the algorithm implementations, a quick peek of std::advance(iterator, distance)
:
// bits/stl_iterator_base_funcs.h
/**
* @brief A generalization of pointer arithmetic.
* @param __i An input iterator.
* @param __n The @a delta by which to change @p __i.
* @return Nothing.
*
* This increments @p i by @p n. For bidirectional and random access
* iterators, @p __n may be negative, in which case @p __i is decremented.
*
* For random access iterators, this uses their @c + and @c - operations
* and are constant time. For other %iterator classes they are linear time.
*/
template<typename _InputIterator, typename _Distance>
inline _GLIBCXX14_CONSTEXPR void
__advance(_InputIterator& __i, _Distance __n, input_iterator_tag) {
/* ... */
while (__n--) ++__i;
}
template<typename _BidirectionalIterator, typename _Distance>
inline _GLIBCXX14_CONSTEXPR void
__advance(_BidirectionalIterator& __i, _Distance __n, bidirectional_iterator_tag) {
/* ... */
if (__n > 0)
while (__n--)
++__i;
else
while (__n++)
--__i;
}
template<typename _RandomAccessIterator, typename _Distance>
inline _GLIBCXX14_CONSTEXPR void
__advance(_RandomAccessIterator& __i, _Distance __n, random_access_iterator_tag) {
/* ... */
__i += __n;
}
Notice the tag dispatcher is used in this implementation (Iitem 23).
We can see that for std::advance(iterator,distance)
to be O(1)
, the iterator needs to meet the requirements of RandomAccessIterator
.
std::lower_bound(start,end,value,comp)
std::lower_bound(...)
is a classic binary search implementation that finds first the false
in a [true, true, ..., false, false]
sequence:
// bits/stl_algobase.h
/**
* @brief Finds the first position in which @a val could be inserted
* without changing the ordering.
* @param __first An iterator.
* @param __last Another iterator.
* @param __val The search term.
* @param __comp The comparator.
* @return An iterator pointing to the first element <em>not less
* than</em> @a val, or end() if every element is less than
* @a val.
* @ingroup binary_search_algorithms
*/
template<typename _ForwardIterator, typename _Tp, typename _Compare>
_ForwardIterator
__lower_bound(_ForwardIterator __first, _ForwardIterator __last,
const _Tp& __val, _Compare __comp) {
/* ... */
_DistanceType __len = std::distance(__first, __last);
while (__len > 0) {
_DistanceType __half = __len >> 1;
_ForwardIterator __middle = __first;
std::advance(__middle, __half);
if (__comp(__middle, __val)) {
__first = __middle;
++__first;
__len = __len - __half - 1;
} else {
__len = __half;
}
}
return __first;
}
std::uper_bound(start,end,value,comp)
Likewise, std::upper_bound(...)
is the same classic binary search implementation, with the exception that it finds the first true
in a [false, false, ..., true, true]
sequence.
Its implementation is close to that of std::lower_bound(...)
, except:
- comparator is different -
__comp(__val, __middle)
; -
if
,else
are the opposite to that ofstd::lower_bound(...)
.
std::equal_range(start,end,value,comp)
std::equal_range(...)
doesn’t stupidly call std::lower_bound(...)
and std::upper_bound(...)
.
It halves the range similar to them until it finds the target and split the remaining range and call them on the two halves respectively:
// bits/stl_algo.h
/**
* @brief Finds the largest subrange in which @p __val could be inserted
* at any place in it without changing the ordering.
* @param __first An iterator.
* @param __last Another iterator.
* @param __val The search term.
* @param __comp A functor to use for comparisons.
* @return An pair of iterators defining the subrange.
* @ingroup binary_search_algorithms
*
* This is equivalent to
* @code
* std::make_pair(lower_bound(__first, __last, __val, __comp),
* upper_bound(__first, __last, __val, __comp))
* @endcode
* but does not actually call those functions.
*/
template<typename _ForwardIterator,typename _Tp,typename _CompareItTp,typename _CompareTpIt>
pair<_ForwardIterator, _ForwardIterator>
__equal_range(_ForwardIterator __first, _ForwardIterator __last, const _Tp& __val,
_CompareItTp __comp_it_val, _CompareTpIt __comp_val_it) {
/* ... */
_DistanceType __len = std::distance(__first, __last);
while (__len > 0) {
_DistanceType __half = __len >> 1;
_ForwardIterator __middle = __first;
std::advance(__middle, __half);
if (__comp_it_val(__middle, __val)) {
__first = __middle;
++__first;
__len = __len - __half - 1;
} else if (__comp_val_it(__val, __middle)) {
__len = __half;
} else {
_ForwardIterator __left = std::__lower_bound(__first,__middle,__val,__comp_it_val);
std::advance(__first, __len);
_ForwardIterator __right = std::__upper_bound(++__middle,__first,__val,__comp_val_it);
return pair<_ForwardIterator, _ForwardIterator>(__left, __right);
}
}
return pair<_ForwardIterator, _ForwardIterator>(__first, __first);
}
Quick Implementation
When dealing with code challenges, I personally prefer to use
int lo = 0, hi = v.size();
while (lo < hi) {
int mid = (hi-lo)/2 + lo; // avoid lo+hi overflow
if (v[mid] == target) {
return true;
} else if(v[mid] > target) {
hi = mid;
} else {
lo = mid+1;
}
}
Related
As always, every C++ related topic I decided to study, there is a summarized book/paper/post from Scott Meyers. A couple interesting reads from the standard: proposal and defect report of binary search requirements. Bendersky’s post nicely covers the different types of ordering.