If you need a refresher, this stackoverflow post explains how shared_ptr works.
The major off-the-shelf alternatives are std::unique_ptr and boost::intrusive_ptr.
The weak_ptr requirement always adds storage cost to shared_ptr ...
Refcounting is always thread-safe ...
... even when you don't need it to be. And as we all know, atomics are 1-2 orders of magnitude slower than their single-threaded counterparts.
If you are trying to build a DAG that is only accessed by one thread at a time, then shared_ptr is the wrong solution.
The weak_ptr requirement always adds extra performance overhead.
The optimal number of refcounts per operation involve a trick, where all strong references collectively hold 1 weak reference. This allows all but the last strong reference to avoid touching the weak_count. Both MSVC and libstdc++ do this. Here are all the places atomics occur:
- shared_ptr creation: set strong_count=1 , weak count = 1 ; no atomics
- weak acquire: atomically increment weak_count
- weak release: atomically decrement weak_count
- strong acquire: atomically increment strong_count
- strong release: atomically decrement strong_count, and if it was the last one, also atomically decrement weak_count.
Sadly, this means that every object created through shared_ptr incurs a minimum cost of two atomics. In contrast, a strong-only pointer system would incur at minimum one atomic.
Let's also remember that at least one new/delete is required as well, so we're up to four atomics per shared_ptr-mediated object. By the performance-analysis method presented here, single-threaded shared_ptr object-management is limited to ~(150MHz / 4) = 37MHz.
If you're using C++, you probably expect performance. 37MHz object creation is a far cry from peak performance.
Let's also remember that at least one new/delete is required as well, so we're up to four atomics per shared_ptr-mediated object. By the performance-analysis method presented here, single-threaded shared_ptr object-management is limited to ~(150MHz / 4) = 37MHz.
If you're using C++, you probably expect performance. 37MHz object creation is a far cry from peak performance.
the make_shared optimization undermines weak_ptr
The make_shared optimization is to allocate both the object and control-block in a single allocation (a single call to "new"). Herb Sutter describes it well in GotW#89. This effectively makes all weak_ptrs now hold a strong reference to the object's raw memory -- a shallow strong-reference. The irony! Especially considering the only reason you'd use shared_ptr now, is if you needed weak_ptr as well.
Admittedly the extended object-memory lifetime isn't a big deal for small objects like vector or string. Just be wary of using shared_ptr+weak_ptr with large-footprint objects.
shared_ptr implementation requires a virtual release() ...
... even if you're only using a concrete type with no inheritance. shared_ptr must account for all possible usage scenarios, including multiple-inheritance, and . It's analogous to how all COM objects inherit from IUnknown and have a virtual Release() method.
The alternative optimal solution, is to use intrusive_ptr. There you have the freedom of defining an inlinable intrusive_ptr_release() on your concrete type.
This may sound like a minor micro-optimization, but the effect of a non-inlinable call on surrounding code-generation can be profound,
Concluding Remarks
shared_ptr is at best a convenient low-performance class, to be used sparingly and in code that is called at low frequency. Prefer unique_ptr and intrusive_ptr, in that order.
No comments:
Post a Comment