c++ stl-compliant iterator of iterators_问答_开发者

What I'm trying to do

I have a method that partitions things. This method does not sort the array entirely; it simply partitions the array so that all elements on one side (of some pre-determined "center", or "midpoint value" - it doesn't have to result in an even split, though) are less than the "center" and all elements on the other side are larger than the center. Point: It isn't a "sort", in the traditional sense; it's a partition.

When I partition things, I need to keep a key; so that as things are swapped, the key is swapped; and if at some point in the future I want to undo the partition, I can rearrange the things based on the key.

Obviously, to rearrange things based on the key value, I could do something like

std::vector< std::pair< std::size_t , my::thingie > > vp;
std::vector< std::size_t >::iterator itKey( key.begin() );
// itThingie_begin and itThingie_end exist; I don't have direct access to the container 
my::thingie::iterator itThingie( itThingie_begin );
for (; itKey != key.end(); ++itKey; ++itThingie ) vp.push_back( *itKey, *itThingie );
std::sort( vp.begin() , vp.end() , &comp_pair_first );
itThingie = itThingie_begin;
for ( std::vector< std::pair< std::size_t , my::thingie > >::const_iterator p=vp.begin(); p!=vp.end(); ++p, ++itThingie ) *itThingie = p->second;

Meaning, I could copy all the key and data into a pair; and sort the pair by it's first value (the key), using a custom comparison predicate (or with boost::bind); and then copy all the data back again. I understand that. It's not ideal because I probably have a couple hundred megabytes of thingies, and the above approach involves copying that to a temporary, sorting the temporary, and then copying it back.

It is also not ideal because my partition method, as it currently exists, needs the begin and end iterators of the key AND of the thingie (because it has to swap both everytime it swaps). Moreover, -and here's the kicker- I have to rewrite my partition method if there are two sets of thingies; I have a key, a thingie that decides the side of the partition, and another baggage thingie that has other information that I want to use for other algorithms.

Now, clearly, I don't want to rewrite the partition method everytime I want to i开发者_高级运维nclude some other iterator to swap "in tandom" with the partition. So, like before, I could copy all this stuff into a temporary std::pair (or nested pairs if I need to swap more things in tandem), and then partition it by looking at std::pair::first, and then copy the temporary data back... But this is extraordinarily wasteful because each additional "baggage" thingie I add is also probably hundreds of megabytes.

I know I could do it that way. I don't want to do it that way because it's too memory intensive.

The way I want to do it

The problem that I have described above is merely one of operating on iterators in tandem. Therefore, I would like to have an iterator collection that abstracts away the contents of that collection.

I want to have a collection of iterators. I call that collection a piter (it is a pair of iterators). When one swaps two piter's, one is really doing std::iter_swap's on their first iterators (and also their second iterators).

I want to have a piter iterator (called a piterator), that has all the characteristics of an iterator, but when it increments and decrements, it is incrementing and decrementing the first and second iterators of the piter. When the piterator dereferences, it should return a reference to a piter, which is the collection of iterators. All distances can be measured by the first component of the piter. Or more generally, if there is any question that needs to be answered and it is unclear what iterator should answer it, then the first iterator of the piter should answer it.

If I want to create a piterator that can in-tandom iterator over more iterators, I can create a piterator whose piter contains an iterator (first) and another piterator (second).

So, here's what I tried (I also tried using boost::iterator_facade but I end up having the same problem - as described below.)

#include <vector>
#include <iostream>
#include <algorithm>
#include <utility>
#include <iterator>


//
// pair of iterators
//
template <typename T,typename U>
struct piter : public std::pair<T,U>
{
  piter() : std::pair<T,U>() {};
  piter( T const & l , U const & r ) : std::pair<T,U>(l,r) {};
  piter( std::pair<T,U> const & p ) { this->first = p.first; this->second = p.second;     };
   //piter( std::pair<T,U> const p ) { this->first = p.first; this->second = p.second;      };

  template <typename OT, typename OU>
  piter( piter<OT,OU> const & p ) : std::pair<T,U>::first(p.first), std::pair<T,U>::second(p.second) {}

  piter<T,U> & operator=( piter<T,U> const & rhs )
  {
    if( &rhs != this ) { *this->first  = *rhs.first; *this->second = *rhs.second; }
    return *this;
  };


  friend void swap( piter<T,U> & lhs, piter<T,U> & rhs )
  {
    using std::swap;

    std::cout << "piter::swap; WAS: " << *lhs.first << " <-> " << *rhs.first << std::endl;

    std::iter_swap(lhs.first,rhs.first);
    std::iter_swap(lhs.second,rhs.second);

    std::cout << "piter::swap; NOW: " << *lhs.first << " <-> " << *rhs.first << std::endl;
  };

};




//
// iterator of pair of iterators
//
template <typename T, typename U>
class piterator : public std::iterator< std::random_access_iterator_tag,
                        piter<T,U>, 
                        std::ptrdiff_t,
                        piter<T,U> *,
                        piter<T,U> & >
{
  typedef piterator<T,U> iter;

public: // Traits typedefs, which make this class usable with algorithms which need a random access iterator.
  typedef std::random_access_iterator_tag iterator_category;
  typedef piter<T,U>     value_type;
  typedef std::ptrdiff_t difference_type;
  typedef piter<T,U> *   pointer;
  typedef piter<T,U> &   reference;


public:

  piterator() {};
  piterator( iter const &    rhs ) { this->mp.first = rhs.mp.first;  this->mp.second = rhs.mp.second;};
  piterator( pointer         rhs ) { this->mp.first = rhs->first;    this->mp.second = rhs->second; };
  //piterator( reference const rhs ) { this->mp.first = rhs.first;     this->mp.second = rhs.second; };
  piterator( value_type const rhs ) { this->mp.first = rhs.first;     this->mp.second = rhs.second; };


  iter & operator=( iter const & rhs )
  {
    if ( &rhs != this ){ this->mp.first = rhs.mp.first; this->mp.second = rhs.mp.second; };
    return *this;
  }

  friend void swap( iter & lhs , iter & rhs )
  {
    using std::swap;

    std::cout << "piterator::swap; WAS:  lhs " << *lhs->first << " rhs " << *rhs->first << std::endl;

    swap(lhs.mp,rhs.mp);

    std::cout << "piterator::swap; NOW:  lhs " << *lhs->first << " rhs " << *rhs->first << std::endl;
  }



public: // Comparison
  // Note: it's an error to compare iterators over different files.
  bool operator< ( iter const & rhs ) const { return mp.first < rhs.mp.first; }
  bool operator> ( iter const & rhs ) const { return mp.first > rhs.mp.first; }
  bool operator==( iter const & rhs ) const { return mp.first == rhs.mp.first; }
  bool operator!=( iter const & rhs ) const { return mp.first != rhs.mp.first; }

public: // Iteration
  iter & operator++()  { ++mp.first; ++mp.second; return *this; }
  iter & operator--()  { --mp.first; --mp.second; return *this; }
  iter operator++(int) { iter tmp(*this); ++(*this); return tmp; }
  iter operator--(int) { iter tmp(*this); --(*this); return tmp; }

public: // Step
  iter & operator+=( difference_type n ) { mp.first += n; mp.second += n; return *this; }
  iter & operator-=( difference_type n ) { mp.first -= n; mp.second -= n; return *this; }
  iter operator+   ( difference_type n ) { iter result(*this); return result += n; }
  iter operator-   ( difference_type n ) { iter result(*this); return result -= n; }

public: // Distance
  difference_type operator-( iter & rhs ) { return mp.first - rhs.mp.first; }

public: // Access
  reference operator*() { return mp; }
  reference operator[]( difference_type n ) { return *(*this+n); }
  pointer operator->() { return &mp; };

private: // State

  value_type mp;

};








template<class T,class U>
bool proxy_comp( piter<T,U> left, piter<T,U> right )
{ 
  std::cout << "proxy_comp: " << *(left.first) << " > " << *(right.first) << " ?=? " <<  ( *(left.first) > *(right.first) ) << std::endl;  
  return *left.first > *right.first; 
}




int main()
{

  std::vector<double> dv(3);
  std::vector<int> iv(3);
  dv[0]  = -0.5; dv[1]  = -1.5; dv[2]  = -2.5;
  iv[0]  =  10;  iv[1]  = 20;   iv[2]  =  3;


  typedef piterator< std::vector<int>::iterator , std::vector<double>::iterator > PAIR_ITER;
  typedef PAIR_ITER::value_type PAIR_REF;

  PAIR_ITER pair_begin( PAIR_REF( iv.begin() , dv.begin() ) );
  PAIR_ITER pair_end(   PAIR_REF( iv.end()   , dv.end()   ) );


  std::cout << "paired arrays now:" << std::endl;
  for ( PAIR_ITER p = pair_begin; p != pair_end; ++p )
    std::cout << *p->first << " " << *p->second << std::endl;

  std::cout << "swap 1st and 3rd elements..." << std::endl;
  swap(*pair_begin,*(pair_begin+2));

  std::cout << "paired arrays now:" << std::endl;
  for ( PAIR_ITER p = pair_begin; p != pair_end; ++p )
    std::cout << *p->first << " " << *p->second << std::endl;

  std::cout << "calling sort..." << std::endl;
  std::sort( pair_begin , pair_end , &proxy_comp<std::vector<int>::iterator , std::vector<double>::iterator> );

  std::cout << "paired arrays now:" << std::endl;
  for ( PAIR_ITER p = pair_begin; p != pair_end; ++p )
    std::cout << *p->first << " " << *p->second << std::endl;


  return 0;
}

THE PROBLEM The piter and piterator seem to work when I try to use it like how I use all other iterators, but it doesn't work correctly with STL algorithms.

The above code shows that piter swaps correctly, but it doesn't sort correctly.

The output of the above code is:

    paired arrays now:
    10 -0.5
    20 -1.5
    3 -2.5
    swap 1st and 3rd elements...
    piter::swap; WAS: 10 <-> 3
    piter::swap; NOW: 3 <-> 10
    paired arrays now:
    3 -2.5
    20 -1.5
    10 -0.5
    calling sort...
    proxy_comp: 20 > 3 ?=? 1
    proxy_comp: 10 > 3 ?=? 1
    paired arrays now:
    3 -2.5
    3 -2.5
    3 -2.5

QUESTION:

What do I have to change so that std::sort (or, ideally, the stl in general) works correctly with piterators?

OK. The problem has to do with how the stl moves memory around. It it used swap() all the time, then things would be fine, but sometimes it does (from gnu's stl_algo.h __insertion_sort)

if (__comp(*__i, *__first))
{
  // COPY VALUE INTO TEMPORARY MEMORY 
    typename iterator_traits<_RandomAccessIterator>::value_type __val = _GLIBCXX_MOVE(*__i);
  // MOVE MEMORY AROUND
   _GLIBCXX_MOVE_BACKWARD3(__first, __i, __i + 1);
  // COPY TEMPORARY VALUE BACK
    *__first = _GLIBCXX_MOVE(__val);
}

So, we see that an iterator's ::value_type HAS TO STORE THE VALUE. It can't, itself, be a pointer. If it was a pointer, then it complete invalidates the pseudo-swap approach shown above.

Therefore, we need to create a helper-class that is a collection of VALUES, as opposed to a collection of ITERATORS. We can return this helper-class when the piterator dereference operator is constant, e.g.,

value_type operator*() const { return helper_class_value_collection_ctor( _args_ ); };

In this way, we can store values in new memory.

In addition, we need to create another helper-class that is a collection of ITERATORS, as opposed to VALUES, so that expressions like

*piterator_a = *piterator_b

are valid. If *piterator_a returned by-value, then setting those values is meaningless because the returned value is a temporary. So, in this case, we need the dereference operator to return the reference type (collection of iterators), which would be stored as a private member variable of the piterator.

reference operator*() { return private_reftype_variable; };

So, putting this altogether leads to the following.

#include <vector>
#include <iostream>
#include <utility>
#include <iterator>
#include <algorithm>


// forward decl
template <typename T,typename U> struct piterator_iterators;

template <typename T,typename U>
struct piterator_values
{
  // This class holds memory; it is a value_type
  // It only serves the purpose of 
  // allowing the stl to hold temporary values when moving memory around.
  // If the stl called sort(), then this class wouldn't be necessary.
  //
  // Note that the memory may be set by a piterator_iterators class,
  // which is a pseudo-value_type that points at memory, instead of holding memory.
  //
  // YOU NEED THIS SO THAT
  // typename piterator<T,U>::value_type Tmp = *piterator_a
  // PLACES THE VALUES INTO SOME (ACTUAL) TEMPORARY MEMORY, AS OPPOSED
  // TO CREATING A NEW POINTER TO EXISTING MEMORY.

  typedef typename T::value_type first_value;
  typedef typename U::value_type second_value;

  first_value  first;
  second_value second;

  piterator_values() {};
  piterator_values( first_value const & first , second_value const & second ) : first(first), second(second) {};
  piterator_values( piterator_values<T,U> const & rhs ) : first(rhs.first), second(rhs.second) { };
  piterator_values( piterator_iterators<T,U> const & rhs ) : first(*rhs.first), second(*rhs.second) { };


  piterator_values<T,U> & operator=( piterator_values<T,U> const & rhs )
  {
    if( &rhs != this ) 
      {
    first  = rhs.first; 
    second = rhs.second; 
      }
    return *this;
  };

  piterator_values<T,U> & operator=( piterator_iterators<T,U> const & rhs )
  {
    if( &rhs != this ) 
      {
    first  = *rhs.first; 
    second = *rhs.second; 
      }
    return *this;
  };


  friend void swap( piterator_values<T,U> & lhs, piterator_values<T,U> & rhs )
  {
    using std::swap;
    swap(lhs.first,rhs.first);
    swap(lhs.second,rhs.second);
  };


};





template <typename T,typename U>
struct piterator_iterators
{

  T first;
  U second;

  // This class does not hold memory; it points at existing memory.
  // It is a pseudo-value_type.  When the piterator dereferences, it
  // will return a piterator_iterators object IF it is a nonconst reference.
  // This class is used as a "reference" for an actual iterator,
  // so assignment operators change the value of the thing pointed at,
  // as opposed to reseting the address of what is being pointed at.
  //
  // YOU NEED THIS SO THAT
  // *piterator_a = *piterator_b
  // MAKES SENSE.
  // IF THE DEREFERENCE PASSED A piterator_values, 
  // THEN IT WOULD ONLY MODIFY A TEMPORARY, NOT THE ACTUAL THING
  //
  piterator_iterators() {};
  piterator_iterators( T const & first , U const & second ) : first(first), second(second) {};
  piterator_iterators( piterator_iterators<T,U> const & rhs ) : first(rhs.first), second(rhs.second) {};


  piterator_iterators<T,U> & operator=( piterator_iterators<T,U> const & rhs )
  {
    if( &rhs != this ) 
      {
    *first  = *rhs.first; 
    *second = *rhs.second; 
      }
    return *this;
  };

  piterator_iterators<T,U> & operator=( piterator_values<T,U> const & rhs )
  {
    *first  = rhs.first; 
    *second = rhs.second; 
    return *this;
  };


  friend void swap( piterator_iterators<T,U> & lhs, piterator_iterators<T,U> & rhs )
  {
    using std::swap;
    std::iter_swap(lhs.first,rhs.first);
    std::iter_swap(lhs.second,rhs.second);
  };


};




//
// iterator of pair of iterators
//
template <typename T, typename U>
class piterator : public std::iterator< std::random_access_iterator_tag, piterator_values<T,U>, std::ptrdiff_t, piterator_iterators<T,U> *, piterator_iterators<T,U> & >
{
  typedef piterator<T,U> iter;

public: 

  typedef std::random_access_iterator_tag iterator_catagory;
  typedef typename piterator<T,U>::value_type      value_type;
  typedef typename piterator<T,U>::difference_type difference_type;
  typedef typename piterator<T,U>::pointer         pointer;
  typedef typename piterator<T,U>::reference       reference;
  typedef piterator_iterators<T,U>                 value_of_reference;
  //typedef typename piterator_iterators<T,U> & reference;

public:

  piterator() {};
  piterator( iter const & rhs ) { mp.first = rhs.mp.first;  mp.second = rhs.mp.second; };
  piterator( value_of_reference const rhs ) { mp.first = rhs.first; mp.second = rhs.second; };
  piterator( T const first, U const second ) { mp.first = first; mp.second = second; };


  iter & operator=( iter const & rhs )
  {
    if ( &rhs != this )
      {
    mp.first = rhs.mp.first; 
    mp.second = rhs.mp.second; 
      };
    return *this;
  }

  friend void swap( iter & lhs , iter & rhs )
  {
    using std::swap;
    swap(lhs.mp,rhs.mp);
  }



public: // Comparison
  bool operator< ( iter const & rhs ) const { return mp.first < rhs.mp.first; }
  bool operator> ( iter const & rhs ) const { return mp.first > rhs.mp.first; }
  bool operator==( iter const & rhs ) const { return mp.first == rhs.mp.first; }
  bool operator!=( iter const & rhs ) const { return mp.first != rhs.mp.first; }

public: // Iteration
  iter & operator++()  { ++(mp.first); ++(mp.second); return *this; }
  iter & operator--()  { --(mp.first); --(mp.second); return *this; }
  iter operator++(int) { iter tmp(*this); ++(*this); return tmp; }
  iter operator--(int) { iter tmp(*this); --(*this); return tmp; }

public: // Step
  iter & operator+=( difference_type n ) { mp.first += n; mp.second += n; return *this; }
  iter & operator-=( difference_type n ) { mp.first -= n; mp.second -= n; return *this; }
  iter operator+   ( difference_type n ) { iter result(*this); return result += n; }
  iter operator-   ( difference_type n ) { iter result(*this); return result -= n; }
  difference_type operator+   ( iter const & rhs ) { return mp.first + rhs.mp.first; }
  difference_type operator-   ( iter const & rhs ) { return mp.first - rhs.mp.first; }

public: // Distance
  difference_type operator-( iter & rhs ) { return mp.first - rhs.mp.first; }

public: // Access
  // reference if on the lhs of the eq.
  reference operator*() { return mp; }
  // value if on the rhs of the eq.
  value_type operator*() const { return value_type(*mp.first,*mp.second); }

  reference operator[]( difference_type n ) { return *( (*this) + n ); }
  pointer operator->() { return &mp; };

private: // State

  value_of_reference mp;

};

Here's the main program, separated from above, to see how the above is used....

////////////////////////////////////////////////////////////////
template<class T,class U>
bool proxy_comp( piterator_values<T,U> left, piterator_values<T,U> right )
{ 
  return left.first < right.first; 
}
///////////////////////////////////////////////////////////////
int main()
{

  std::vector<double> dv1(3);
  std::vector<double> dv2(3);
  std::vector<int> iv(3);
  dv1[0]  = -0.5; dv1[1]  = -1.5; dv1[2]  = -2.5;
  dv2[0]  = 10.5; dv2[1]  = 11.5; dv2[2]  = 12.5;
  iv[0]   = 10;    iv[1]  = 20;   iv[2]   =  3;

  //
  // EXAMPLE 1: PAIR OF ITERATORS
  //

  typedef piterator< std::vector<int>::iterator , std::vector<double>::iterator > PAIR_ITER;

  PAIR_ITER pair_begin( iv.begin() , dv1.begin() );
  PAIR_ITER pair_end( iv.end() , dv1.end() );

  std::cout << "paired arrays now:" << std::endl;
  for ( PAIR_ITER p = pair_begin; p != pair_end; ++p )
    std::cout << *p->first << " " << *p->second << std::endl;

  std::cout << "swap 1st and 3rd elements..." << std::endl;
  swap(*pair_begin,*(pair_begin+2));

  std::cout << "paired arrays now:" << std::endl;
  for ( PAIR_ITER p = pair_begin; p != pair_end; ++p )
    std::cout << *p->first << " " << *p->second << std::endl;

  std::cout << "calling sort..." << std::endl;
  std::sort( pair_begin , pair_end , &proxy_comp<std::vector<int>::iterator , std::vector<double>::iterator> );

  std::cout << "paired arrays now:" << std::endl;
  for ( PAIR_ITER p = pair_begin; p != pair_end; ++p )
    std::cout << *p->first << " " << *p->second << std::endl;




  //   
  // EXAMPLE 2: TRIPLET (PAIR OF PAIR)
  //

  typedef piterator< std::vector<double>::iterator , std::vector<double>::iterator > DOUBLET_ITER;
  typedef piterator< std::vector<int>::iterator , DOUBLET_ITER > TRIPLET_ITER;


  TRIPLET_ITER triplet_begin( iv.begin(), DOUBLET_ITER( dv1.begin() , dv2.begin() ) );
  TRIPLET_ITER triplet_end(   iv.end(),   DOUBLET_ITER( dv1.end() , dv2.end() ) );


  std::cout << "tripleted arrays now:" << std::endl;
  for ( TRIPLET_ITER p = triplet_begin; p != triplet_end; ++p )
    std::cout << *p->first << " " 
          << *p->second->first << " " 
          << *p->second->second << std::endl;


  std::cout << "iter_swap 1st and second elements..." << std::endl;
  std::iter_swap( triplet_begin , triplet_begin+1 );


  std::cout << "tripleted arrays now:" << std::endl;
  for ( TRIPLET_ITER p = triplet_begin; p != triplet_end; ++p )
    std::cout << *p->first << " " 
          << *p->second->first << " " 
          << *p->second->second << std::endl;


  std::cout << "calling sort..." << std::endl;
  std::sort( triplet_begin, triplet_end, &proxy_comp< std::vector<int>::iterator , piterator< std::vector<double>::iterator , std::vector<double>::iterator > > );


  std::cout << "tripleted arrays now:" << std::endl;
  for ( TRIPLET_ITER p = triplet_begin; p != triplet_end; ++p )
    std::cout << *p->first << " " 
          << *p->second->first << " " 
          << *p->second->second << std::endl;


  return 0;
}

Here's the output of the above program

paired arrays now:
10 -0.5
20 -1.5
3 -2.5
swap 1st and 3rd elements...
paired arrays now:
3 -2.5
20 -1.5
10 -0.5
calling sort...
paired arrays now:
3 -2.5
10 -0.5
20 -1.5
tripleted arrays now:
3 -2.5 10.5
10 -0.5 11.5
20 -1.5 12.5
iter_swap 1st and second elements...
tripleted arrays now:
10 -0.5 11.5
3 -2.5 10.5
20 -1.5 12.5
calling sort...
tripleted arrays now:
3 -2.5 10.5
10 -0.5 11.5
20 -1.5 12.5

First of all, you should realize that std::nth_element already does the partitioning you describe. It finds the N^th element, exactly as you'd expect, but it also partitions the data into two pieces -- all the elements smaller than the element you've found will be at the lower locations in the collection, and all the larger elements to its right.

Personally, I think I'd do things a bit differently: if you still need the original order of the data, but also need it sorted in some other order, create a sorted index, and leave the original data alone. Given that your original data is (apparently) in an std::vector, I'd personally just put subscripts into the index (adding new items to the end of the vector won't invalidate them like it will iterators).

std::vector<int> index(data.size());

template <class T>
struct cmp { 
   T const &data;
public:
   cmp(T const &array) : data(array) {}

   bool operator()(int a, int b) { return data[a] < data[b]; }
};

for (int i=0; i<index.size(); i++)
    index[i] = i;

std::sort(index.begin(), index.end(), cmp(your_data));

Then, to use the data in original order, you just index the original array/vector, like your_data[i]. To use the data in sorted order, you use something like your_data[index[i]] instead.

You could, of course, build all of that into an index class, so you just use the index class' operator[] to actually index into the original data in the sorted order. The cmp class above already shows how to do most of that.