Last time, I introduced a new concept, Iterable, and showed how it solved many of the problems with pair-of-iterator-style ranges. This time around, I’m going to extend Iterable in small ways to make programming with infinite ranges safer and more efficient. Disclaimer: The ideas in this post are more speculative than in the previous three. I’m looking forward to the discussion.

Quick Recap

Previously I described the problems that crop up when representing infinite and delimited ranges with pairs of iterators, the first three of which are:

Iteration is slow The ranges are forced to model a weaker concept than they could otherwise Their implementations are awkward

My solution to that problem is the Iterable concept; that is, to allow the end of the range to have a different type than the beginning of the range. Once you allow that:

Performance is improved because sentinel-ness is encoded in the C++ type system and so doesn’t need to be check at runtime. The concept a range can model is no longer limited by the concept that can be modeled by the sentinel, which by its very definition cannot be decremented or dereferenced. Since sentinel-ness is now a compile-time property and does not need to be explicitly checked, the logic of iterator comparisons is simpler.

There were two more issues that crop up with infinite ranges in particular. They are:

Some STL algorithm just don’t work with infinite ranges Infinite or possibly-infinite ranges will overflow their difference_type

These are the issues I’ll focus on in this post.

Infinite Iterables

iota_range is an infinite ranges of integers, starting at some value and counting up, forever. (Suppose that iota_range uses an infinite-precision integer type, so it really never ends.) It is a sorted forward range. Binary search algorithms work with sorted forward ranges, so they should work with iota_range , right? Wrong! You can’t conquer infinity by dividing it. (You can quote me on that.)

Can we make the standard algorithms safer, so the algorithms that don’t work with infinite ranges fail to compile if you pass them one? In the current formulation of the STL the answer is no. There’s no way, given two iterators of the same type, to tell at compile time if they denote an infinite range. Think about it for a minute: the following is perfectly ok, and guaranteed to finish:

// OK, this finishes quickly iota_range<bigint> rng; auto i = std::lower_bound(rng.begin(), std::next(rng.begin(), 10), 5);

But the following will run forever:

// Oops! this runs forever. :'-( iota_range<bigint> rng; auto i = std::lower_bound(rng.begin(), rng.end(), 5);

If rng.begin() is the same type as rng.end() , those two calls resolve to the same instantiation of lower_bound . There’s no way for lower_bound to tell whether it’s going to run forever or not. But if we allow the sentinel type to be different, it opens the door to greater compile-time checking. How? Suppose we had a type function (aka a metafunction) called DenotesInfiniteSequence that takes a type pair (BeginType, EndType) and says whether the sequence is infinite or not. We’ve already established that if BeginType and EndType are the same, DenotesInfiniteSequence has to always return false since it can’t know. But if they’re different — say, if EndType is a special type called unreachable_sentinel or something — then we can know at compile-time that the sequence is infinite.

So the Iterable concept naturally give us a way to test for infinite ranges, right? Well…

Infinite Ranges

Some ranges might genuinely be infinite even though their begin and end iterators have the same type. We want to catch those, too. Consider:

// An infinite range of zeros class zeros : public range_facade<zeros> { friend range_core_access; struct impl { bool sentinel; int current() const { return 0; } void next() {} bool equal(impl that) const { return sentinel == that.sentinel; } }; // begin() and end() are implemented by range_facade // in terms of begin_impl and end_impl. They will // have the same type. impl begin_impl() const { return {false}; } impl end_impl() const { return {true}; } }; // zeros models the Range concept CONCEPT_ASSERT(Range<zeros>()); int main() { // Oops! This will run forever. for_each(zeros(), [](int i) {/*...*/}); }

We’d like to be able to catch mistakes like this if it’s possible, but clearly, the binary DenotesInfiniteSequence type function we hypothesized above isn’t up to the task. For zeros , the types BeginType and EndType are the same, so DenotesInfiniteSequence would return false. And yet zeros is infinite.

So instead of a DenotesInfiniteSequence type function that takes a (BeginType,EndType) pair, let’s have a unary IsInfinite type function that takes a range type. What could be simpler? In code, it would be a type trait:

// Report whether an Iterable is infinite or not template<typename Iterable> struct is_infinite : std::integral_constant<bool, true-or-false> {};

This type trait can be used to define a concept FiniteIterable as follows:

// Current proposed Concept Lite syntax template<typename T> concept bool FiniteIterable = Iterable<T> && !is_infinite<T>::value;

(Why FiniteIterable instead of InfiniteIterable? I’ll say why in a minute.) Every FiniteIterable is an Iterable. In fact, there’s a parallel refinement hierarchy here, just as there is with Ranges:

And as with Range, we don’t actually need to define all these concepts in code. “Finite-ness” is orthogonal to the Iterable concept hierarchy and can be queried separately.

So why FiniteIterable instead of InfiniteIterable? It comes down to the algorithms and their requirements. There are no algorithms that require that their range arguments be infinite. So being able to say requires InfiniteIterable<T> is useless. But an algorithm like lower_bound would very much like to require that the range it’s operating on has a definite end; hence FiniteIterable.

Now, all iterable things model FiniteIterable by default, and a type has to opt in to being infinite. How? One way is to specialize is_infinite . As a convenience, the utilities for building iterables and ranges take an optional IsInfinite template parameter, so opting in is easy. Here’s how zeros looks now:

// An infinite range of zeros class zeros : public range_facade<zeros, true> { // ... IsInfinite ...................^^^^ // ... as before ... }; // zeros is a Range but it's not Finite CONCEPT_ASSERT(Range<zeros>()); CONCEPT_ASSERT(!FiniteIterable<zeros>());

With the addition of the FiniteIterable concept, the algorithms that require finite-ness have an easy way to check for it at compile-time. This is only possible with a range-based interface, so we can add that to the long list of advantages ranges have over iterators.

Possibly Infinite Ranges

Once we have a way to separate the finite ranges from the infinite, we now have to categorize the ranges. This should be simple; either a range is finite or it’s not, right? It’s actually trickier than that. Take an istream range, for instance. It might be infinite, or it might not. You don’t know. Most of the time, the stream runs dry eventually and the iteration stops. In fact, almost all the time. But sometimes…

This is a sticky situation. Should we be prevented from passing an istream range to an algorithm just because it might go on forever? I think the answer is yes, but I confess I haven’t made up my mind about that yet. I think we need more real-world usage.

Counting the Uncountable

With infinite ranges, we run into an inherent difficulty: all iterators — and by extension, all iterables — have an associated difference_type . Alex Stepanov has this to say about an iterator’s difference_type :

DistanceType returns an integer type large enough to measure any sequence of applications of successor allowable for the type. 1

Since an iterator over an infinite sequence permits an infinite number of applications of successor, we need an integer type large enough … well, infinitely large. Does this problem have a solution? As in the words of Tevye from Fiddler on the Roof, “I’ll tell you…. I don’t know.”

No flash of insight has been forthcoming. Instead, here’s a core dump of my brain on the issue:

C++ needs bigint , an infinite-precision integral type. Other languages have it. C++ is a great language for building libraries, and this is crying out for a library solution. If such a type existed, an infinite range might choose that as its difference_type . That would come with a not-insignificant performance hit. Infinite ranges could use safe_int as its difference_type . safe_int behaves like an int , but it can represent infinity. Instead of overflowing and going into undefined-behavior-land, a safe_int clips to infinity and stays there. The two biggest problems with letting an iterator’s difference_type overflow are undefined behavior and the inability to tell after the fact if anything went wrong. With safe_int , you can avoid the UB and have a way to tell at runtime if something bad happened. That might be enough in some situations. If this feels like a Big Hack to you, that’s because it is. An alternate design of safe_int might be to throw an exception on overflow rather than clip to infinity. That might be appropriate in some situations. Another approach is to look at where the library uses difference_type and give users a way to specify that a different type be used. For instance, the API of a range-based distance algorithm might take a range and optionally a starting count. It would default to difference_type{0} , but if you passed in, say, a bigint then you’re opting in to safer, slower code. You can ignore the problem. Users that worry about overflow can use a counted range adaptor to make sure iteration stops before the difference_type overflows. Something else I haven’t thought of.

Here’s my opinion: I don’t like anything that introduces unnecessary runtime overhead, so std::ptrdiff_t is an acceptable default for difference_type . In addition, we should design range-based interfaces in such a way as to give users a way to specify a different difference_type when overflow is a concern. So basically, I’m going with options (4) and (5). The other library types — bigint and maybe a policy-based safe_int — would be nice-to-haves that users could pass to these algorithms to get the safety/speed tradeoff that makes sense for them.

That’s the best I’ve got.

Summary, and Next Steps

Maybe after the first 3 posts about range concepts you were feeling good, like it’s all falling into place, and now you’re kind of confused. But I think we’re in a good place, much better than we were. I described 5 problems with pair-of-iterator ranges. A new concept — Iterable — addresses 3 of them very well (slow iteration, modeling weaker-than-necessary concepts, awkward implementations). The 4th problem (infinite ranges) we can address with a further refinement of Iterable. And we have some options for dealing with the 5th (overflow), which is helped by being able to tell the infinite ranges apart from the finite ones. So the new concepts help there too. I think this is a promising start.

Some of you have asked if I’m planning to take these ideas to the C++ standardization committee. Indeed, I am. When we get language support for concepts (not if, when), there is very likely to be a push for a new, concept-ified version of the STL, probably in a different namespace. This wholesale rewrite is a prime opportunity for getting something like Iterable baked into the STL from day one.

My next step is to start a discussion on the SG9 (Ranges) mailing list. It’s likely to be contentious, and I expect these ideas will evolve. Consider subscribing to the list and joining the discussion.

Addendum

Sean Parent has commented on my blog and made an interesting point about the importance of counted algorithms (e.g. copy_n ). He’s challenged me to find a more efficient way of supporting counted ranges than my proposed solution. I have some initial thoughts on the issue that I’ll write up and publish here provided they pan out. At any rate, it’s clear that it’s time for brains other than mine to be working on this problem. C++17 is closer than you might think, and time’s a’wasting!

x

1. Stepanov, A; McJones, P. Elements of Programming. Addison-Wesley. 2009.↩