Not A Grammar: a critique of shared

std::shared_ptr made it into the C++ standard library, is popular, and now in widespread use. And before it, everyone used boost::shared_ptr. So, what's the problem? In a nutshell: weak_ptr support, and mandatory threadsafe refcounting. The resulting performance is far less than optimal, for many common use-cases.

If you need a refresher, this stackoverflow post explains how shared_ptr works.

The major off-the-shelf alternatives are std::unique_ptr and boost::intrusive_ptr.

The weak_ptr requirement always adds storage cost to shared_ptr ...

... even if you're not using weak_ptr in your own code. The common algorithm is to maintain a separate refcount for strong-references and weak-references. Both MSVC and libstdc++ do this.

Refcounting is always thread-safe ...

... even when you don't need it to be. And as we all know, atomics are 1-2 orders of magnitude slower than their single-threaded counterparts.

If you are trying to build a DAG that is only accessed by one thread at a time, then shared_ptr is the wrong solution.

The weak_ptr requirement always adds extra performance overhead.

The optimal number of refcounts per operation involve a trick, where all strong references collectively hold 1 weak reference. This allows all but the last strong reference to avoid touching the weak_count. Both MSVC and libstdc++ do this. Here are all the places atomics occur:

shared_ptr creation: set strong_count=1 , weak count = 1 ; no atomics
weak acquire: atomically increment weak_count
weak release: atomically decrement weak_count
strong acquire: atomically increment strong_count
strong release: atomically decrement strong_count, and if it was the last one, also atomically decrement weak_count.

Sadly, this means that every object created through shared_ptr incurs a minimum cost of two atomics. In contrast, a strong-only pointer system would incur at minimum one atomic.

Let's also remember that at least one new/delete is required as well, so we're up to four atomics per shared_ptr-mediated object. By the performance-analysis method presented here, single-threaded shared_ptr object-management is limited to ~(150MHz / 4) = 37MHz.

If you're using C++, you probably expect performance. 37MHz object creation is a far cry from peak performance.

the make_shared optimization undermines weak_ptr

The make_shared optimization is to allocate both the object and control-block in a single allocation (a single call to "new"). Herb Sutter describes it well in GotW#89. This effectively makes all weak_ptrs now hold a strong reference to the object's raw memory -- a shallow strong-reference. The irony! Especially considering the only reason you'd use shared_ptr now, is if you needed weak_ptr as well.

Admittedly the extended object-memory lifetime isn't a big deal for small objects like vector or string. Just be wary of using shared_ptr+weak_ptr with large-footprint objects.

shared_ptr implementation requires a virtual release() ...

... even if you're only using a concrete type with no inheritance. shared_ptr must account for all possible usage scenarios, including multiple-inheritance, and . It's analogous to how all COM objects inherit from IUnknown and have a virtual Release() method.

The alternative optimal solution, is to use intrusive_ptr. There you have the freedom of defining an inlinable intrusive_ptr_release() on your concrete type.

This may sound like a minor micro-optimization, but the effect of a non-inlinable call on surrounding code-generation can be profound,

Concluding Remarks

shared_ptr is at best a convenient low-performance class, to be used sparingly and in code that is called at low frequency. Prefer unique_ptr and intrusive_ptr, in that order.

Not A Grammar

Sunday, January 17, 2016

a critique of shared_ptr

The weak_ptr requirement always adds storage cost to shared_ptr ...

Refcounting is always thread-safe ...

... even when you don't need it to be. And as we all know, atomics are 1-2 orders of magnitude slower than their single-threaded counterparts.

If you are trying to build a DAG that is only accessed by one thread at a time, then shared_ptr is the wrong solution.

The weak_ptr requirement always adds extra performance overhead.

the make_shared optimization undermines weak_ptr

shared_ptr implementation requires a virtual release() ...

Concluding Remarks

No comments:

Post a Comment

Sunday, January 17, 2016

a critique of shared_ptr

The weak_ptr requirement always adds storage cost to shared_ptr ...

Refcounting is always thread-safe ...

... even when you don't need it to be. And as we all know, atomics are 1-2 orders of magnitude slower than their single-threaded counterparts. If you are trying to build a DAG that is only accessed by one thread at a time, then shared_ptr is the wrong solution.

The weak_ptr requirement always adds extra performance overhead.

the make_shared optimization undermines weak_ptr

shared_ptr implementation requires a virtual release() ...

Concluding Remarks

No comments:

Post a Comment

... even when you don't need it to be. And as we all know, atomics are 1-2 orders of magnitude slower than their single-threaded counterparts.

If you are trying to build a DAG that is only accessed by one thread at a time, then shared_ptr is the wrong solution.