![]()
A segmented_array is a hybrid container which is implemented as a linked list of vecarray arrays (the segments). Being a segmented array (as opposed to a normal flat array) its elements are not stored contiguously in memory but are contiguous within a segment. Compared to a std::vector or std::deque it is relatively efficient at inserting/erasing elements at any point in the controlled sequence. Even though strictly speaking it is O(n) complexity random access is, when compared to a std::list, relatively efficient (how efficient is dependant on number of segments/segment size). push_back() and pop_back() are O(1) complexity. Single element insert() and erase(), push_front() and pop_front() are O(1) complexity: you might think they are worst case O(s) complexity where s is the segment size however whilst this is technically correct s is a constant and s will, for the majority of cases, be small compared to the n elements in the container so in terms of n complexity is O(1) (an example which illustrates this). The performance of multiple element insert() and erase() is linear in the number of items inserted/erased.
You specify the segment size via a template parameter (or rely on the default of 64) and varying this can affect performance: the efficiency of random element access for example depends on segment size: efficiency is worse with a smaller segment size and better with a larger segment size.
Random access iterators are provided so the container is compatible with algorithms such as std::sort (see example below) however the iterators do not meet the requirements of a true random access iterator as they do not provide constant time iteration: they may have to traverse the segment linked list if the start and end positions are in different segments so actual linear time performance depends on segment size (affects [], +=, +, -= and - iterator operators), increment (++) and decrement (--) are constant time however (modulo the additional constant time needed if the current segment needs changing). The iterators contain an integer position ensuring that std::distance and the iterator comparison operators run in constant time.
Iterators become invalid when they are past the point of element insertion/removal. Element references may become invalid when they are past the point of element insertion/removal but clients should not rely on this (i.e. assume always invalidated).
If a segment becomes empty after an element is erased then that segment is automatically deleted. Depending on the number and type of operations performed on a segmented_array object many of its segments may over time become partially full resulting in wasted memory and slower random access; if this is a problem then the following trick can be used to reclaim that unused memory and "compress" the segmented array:
{
// the segmented_array sa
contains partially full segments
lib::segmented_array<int> tmp(sa);
tmp.swap(sa);
// sa is now compressed, only its
last segment may be partially full
}
N.B. The swap() method only runs in constant time if the two segmented_array objects are instantiations of the same template instantiation i.e. have the same template parameters (same element type and segment size).
If for some reason you need access to the segments themselves see here however this is not recommended for normal use.
A possible example application for segmented_array might be to use a segmented_array<char, 1024> as the data structure to represent the text in a text editor allowing the insertion/deletion of text anywhere in the text document quickly. A std::list<char> would be inefficient for this due to the overheads of storing two list node pointers per character and allocating one list node per character. A std::vector<char> or std::string would be inefficient for this if the document text was of any length, how noticeable this inefficiency would be would depend on CPU speed of course: probably not noticeable at all on a modern PC unless the text document length is in the admittedly very large region of 10-100MB but perhaps more noticeable on a handheld device with a smaller document size. A quick test revealed that a call to std::vector<char>::insert(begin(), 'A') takes 456ms on a 1.3Ghz Pentium M laptop if the vector contains 40MB of chars; "The Reginald Perrin Omnibus" by David Nobbs must contain about 1.6 million characters so a ≈18ms key press delay would not be noticeable during normal typing. By comparison segmented_array<char>::push_front() takes <1ms if the segmented_array contains 40MB of chars.
You can download the implementation here (v1.1.8) (Changelog), you also need vecarray.h.
Here is an example of usage which outputs some comparative timings:
struct timer
{
timer(const char* message)
{
std::cout << message;
QueryPerformanceCounter(&iStartTime);
}
~timer()
{
LARGE_INTEGER endTime;
QueryPerformanceCounter(&endTime);
LARGE_INTEGER freq;
QueryPerformanceFrequency(&freq);
std::cout.precision(4);
std::cout << std::fixed << (float)(endTime.QuadPart
- iStartTime.QuadPart)/(float)freq.QuadPart << " seconds" << std::endl;
}
LARGE_INTEGER iStartTime;
};
template<typename T, typename A>
typename std::list<T, A>::iterator index_list(std::list<T, A>& list, typename
std::list<T, A>::size_type index)
{
if (index < list.size() / 2)
{
typename std::list<T, A>::iterator it
= list.begin();
for (typename std::list<T, A>::size_type
i = 0; i < index; ++i)
++it;
return it;
}
else
{
typename std::list<T, A>::iterator it
= list.end();
for (typename std::list<T, A>::size_type
i = list.size(); i > index; --i)
--it;
return it;
}
}
int main()
{
const int iterations = 30000;
const int segment_size_1 = 64;
const int segment_size_2 = 256;
std::vector<int> v;
std::deque<int> d;
std::list<int> l;
lib::segmented_array<int, segment_size_1> sa1;
lib::segmented_array<int, segment_size_2> sa2;
for (int i = 0; i < iterations; ++i)
{
v.push_back(1);
d.push_back(1);
l.push_back(1);
sa1.push_back(1);
sa2.push_back(1);
}
/* WARNING: "rand() % N" is normally bad practice as the
low-order bits returned by
by rand() cannot in all implementations be
relied upon to be "random" enough. You
should instead use all the bits returned by
rand() and perform a division. I am
being lazy by doing it here.
It is probably a good idea to dump rand()
and use something like a
Mersenne Twister
instead. */
{
timer time("std::vector random erase: ");
for (int i = 0; i < iterations; ++i)
v.erase(v.begin() + rand() % v.size());
}
{
timer time("std::deque random erase: ");
for (int i = 0; i < iterations; ++i)
d.erase(d.begin() + rand() % d.size());
}
{
timer time("std::list random erase: ");
for (int i = 0; i < iterations; ++i)
l.erase(index_list(l, rand() % l.size()));
}
{
timer time("lib::segmented_array (1) random erase: ");
for (int i = 0; i < iterations; ++i)
sa1.erase(sa1.begin() + rand() % sa1.size());
}
{
timer time("lib::segmented_array (2) random erase: ");
for (int i = 0; i < iterations; ++i)
sa2.erase(sa2.begin() + rand() % sa2.size());
}
v.push_back(1);
d.push_back(1);
l.push_back(1);
sa1.push_back(1);
sa2.push_back(1);
{
timer time("std::vector random insert: ");
for (int i = 0; i < iterations; ++i)
v.insert(v.begin() + rand() % v.size(), rand());
}
{
timer time("std::deque random insert: ");
for (int i = 0; i < iterations; ++i)
d.insert(d.begin() + rand() % d.size(), rand());
}
{
timer time("std::list random insert: ");
for (int i = 0; i < iterations; ++i)
l.insert(index_list(l, rand() % l.size()), rand());
}
{
timer time("lib::segmented_array (1) random insert: ");
for (int i = 0; i < iterations; ++i)
sa1.insert(sa1.begin() + rand() % sa1.size(), rand());
}
{
timer time("lib::segmented_array (2) random insert: ");
for (int i = 0; i < iterations; ++i)
sa2.insert(sa2.begin() + rand() % sa2.size(), rand());
}
{
timer time("std::vector random access: ");
for (int i = 0; i < iterations; ++i)
int e = v[rand() % v.size()];
}
{
timer time("std::deque random access: ");
for (int i = 0; i < iterations; ++i)
int e = d[rand() % d.size()];
}
{
timer time("std::list random access: ");
for (int i = 0; i < iterations; ++i)
int e = *index_list(l,
rand() % l.size());
}
{
timer time("lib::segmented_array (1) random access: ");
for (int i = 0; i < iterations; ++i)
int e = sa1[rand() % sa1.size()];
}
{
timer time("lib::segmented_array (2) random access: ");
for (int i = 0; i < iterations; ++i)
int e = sa2[rand() % sa2.size()];
}
{
timer time("std::vector sequential access: ");
for (std::vector<int>::iterator i = v.begin(); i != v.end(); ++i)
int e = *i;
}
{
timer time("std::deque sequential access: ");
for (std::deque<int>::iterator i = d.begin(); i != d.end(); ++i)
int e = *i;
}
{
timer time("std::list sequential access: ");
for (std::list<int>::iterator i = l.begin(); i != l.end(); ++i)
int e = *i;
}
{
timer time("lib::segmented_array (1) sequential access: ");
for (lib::segmented_array<int,segment_size_1>::iterator i = sa1.begin(); i !=
sa1.end(); ++i)
int e = *i;
}
{
timer time("lib::segmented_array (2) sequential access: ");
for (lib::segmented_array<int,segment_size_2>::iterator i = sa2.begin(); i !=
sa2.end(); ++i)
int e = *i;
}
{
timer time("std::vector sort: ");
std::sort(v.begin(), v.end());
}
{
timer time("std::deque sort: ");
std::sort(d.begin(), d.end());
}
{
timer time("std::list sort: ");
l.sort();
}
{
timer time("lib::segmented_array (1) sort: ");
std::sort(sa1.begin(), sa1.end());
}
{
timer time("lib::segmented_array (2) sort: ");
std::sort(sa2.begin(), sa2.end());
}
return 0;
}
The output of which is (compiled with VC++6.0, running on a 1.3Ghz Pentium M):
std::vector random erase: 2.1558 seconds
std::deque random erase: 2.9700 seconds
std::list random erase: 1.0635 seconds
lib::segmented_array (1) random erase: 0.1366 seconds
lib::segmented_array (2) random erase: 0.0454 seconds
std::vector random insert: 2.6321 seconds
std::deque random insert: 1.9668 seconds
std::list random insert: 1.2766 seconds
lib::segmented_array (1) random insert: 0.0986 seconds
lib::segmented_array (2) random insert: 0.0640 seconds
std::vector random access: 0.0010 seconds
std::deque random access: 0.0025 seconds
std::list random access: 2.6379 seconds
lib::segmented_array (1) random access: 0.1659 seconds
lib::segmented_array (2) random access: 0.0421 seconds
std::vector sequential access: 0.0002 seconds
std::deque sequential access: 0.0004 seconds
std::list sequential access: 0.0019 seconds
lib::segmented_array (1) sequential access: 0.0036 seconds
lib::segmented_array (2) sequential access: 0.0039 seconds
std::vector sort: 0.0072 seconds
std::deque sort: 0.0161 seconds
std::list sort: 0.0315 seconds
lib::segmented_array (1) sort: 0.0258 seconds
lib::segmented_array (2) sort: 0.0269 seconds