开发者

python: how to slice/store iter pointed data in a fixed buffer class?

开发者 https://www.devze.com 2023-01-20 03:02 出处:网络
All, As you know, by python iter we can use iter.next() to get the next item of data. take a list for example:

All,

As you know, by python iter we can use iter.next() to get the next item of data. take a list for example:

l =  [x for x in range(100)]
itl = iter(l)
itl.next()            # 0
itl.next()            # 1

Now I want a buffer can store *general iter pointed data * slice in fixed size, use above list iter to demo my question.

class IterPage(iter, size):
      # class code here

itp = IterPage(itl, 5)

what I want is

print itp.first()   # [0,1,2,3,4]
print itp.next()    # [5,6,7,8,9]
print itp.prev()    # [0,1,2,3,4]
len(itp)            # 20   # 100 item / 5 fixed size = 20    
print itp.last()   # [96,97,98,99,100]


for y in itp:           # iter may not support "for" and len(iter) then something alike code also needed here  
    print y
[0,1,2,3,4]
[5,6,7,8,9]
...
[96,97,98,99,100]

it is not a homework, but as a beginner of the python know little about to design an iter class, could someone share me how to code the class "IterPage" here?

Also, by below answers I found if the raw data what I want to slice is very big, for example a 8Giga text file or a 10^100 records table on a database, it may not able to read all of them into a list - I have no so much physical memories. Take the snippet in python document for example:

http://docs.python.org/library/sqlite3.html#

>>> c = conn.cursor()
>>> c.execute('select * from stocks order by price')
>>> for row in c:
...    print row
...
(u'2006-01-05', u'BUY', u'RHAT', 100, 35.14)
(u'2006-03-28', u'BUY', u'IBM', 1000, 45.0)
(u'2006-04-06', u'SELL', u'IBM', 500, 53.0)
(u'2006-04-05', u'BUY', u'MSOFT', 1000, 72.0)

If here we've got about 10^100 records, In that case, it it possible only store line/records I want by this class with itp = IterPage(c, 5)? if I invoke the itp.next() the itp can just fetch next 5 records from database?

Thanks!

PS: I got an approach in below link: http://code.activestate.com/recipes/577196-windowing-an-iterable-with-itertools/

and I also found someone want to make a itertools.iwindow()开发者_开发知识库 function however it is just been rejected. http://mail.python.org/pipermail/python-dev/2006-May/065304.html


Since you asked about design, I'll write a bit about what you want - it's not a iterator.

The defining property of a iterator is that it only supports iteration, not random access. But methods like .first and .last do random access, so what you ask for is not a iterator.

There are of course containers that allow this. They are called sequences and the simplest of them is the list. It's .first method is written as [0] and it's .last is [-1].

So here is such a object that slices a given sequence. It stores a list of slice objects, which is what Python uses to slice out parts of a list. The methods that a class must implement to be a sequence are given by the abstact base class Sequence. It's nice to inherit from it because it throws errors if you forget to implement a required method.

from collections import Sequence

class SlicedList(Sequence):
    def __init__(self, iterable, size):
        self.seq = list(iterable)
        self.slices = [slice(i,i+size) for i in range(0,len(self.seq), size)]

    def __contains__(self, item):
        # checks if a item is in this sequence
        return item in self.seq

    def __iter__(self):
        """ iterates over all slices """
        return (self.seq[slice] for slice in self.slices)

    def __len__(self):
        """ implements len( .. ) """
        return len(self.slices)

    def __getitem__(self, n):
        # two forms of getitem ..
        if isinstance(n, slice):
            # implements sliced[a:b]
            return [self.seq[x] for x in self.slices[n]]
        else:
            # implements sliced[a]
            return self.seq[self.slices[n]]

s = SlicedList(range(100), 5)

# length
print len(s) # 20

#iteration
print list(s) # [[0, 1, 2, 3, 4], [5, 6, 7, 8, 9], ... , [95, 96, 97, 98, 99]]
# explicit iteration:
it = iter(s)
print next(it) # [0, 1, 2, 3, 4]

# we can slice it too
print s[0], s[-1] # [0, 1, 2, 3, 4] [95, 96, 97, 98, 99]
# get the first two
print s[0:2] # [[0, 1, 2, 3, 4], [5, 6, 7, 8, 9]]
# every other item
print s[::2] # [[0, 1, 2, 3, 4], [10, 11, 12, 13, 14], [20, 21, 22, 23, 24], ... ]

Now if you really want methods like .start (what for anyways, just a verbose way for [0] ) you can write a class like this:

class Navigator(object):    
    def __init__(self, seq):
        self.c = 0
        self.seq = seq

    def next(self):
        self.c +=1
        return self.seq[self.c]

    def prev(self):
        self.c -=1
        return self.seq[self.c]

    def start(self):
        self.c = 0
        return self.seq[self.c]

    def end(self):
        self.c = len(self.seq)-1
        return self.seq[self.c]

n = Navigator(SlicedList(range(100), 5))

print n.start(), n.next(), n.prev(), n.end()


The raw data that I want to slice is very big, for example a 8Giga text file... I may not be able to read all of them into a list - I do not have so much physical memory. In that case, is it possible only get line/records I want by this class?

No, as it stands, the class originally proposed below converts the iterator into a list, which make it 100% useless for your situation.

Just use the grouper idiom (also mentioned below). You'll have to be smart about remembering previous groups. To save on memory, only store those previous groups that you need. For example, if you only need the most recent previous group, you could store that in a single variable, previous_group.

If you need the 5 most recent previous groups, you could use a collections.deque with a maximum size of 5.

Or, you could use the window idiom to get a sliding window of n groups of groups...

Given what you've told us so far, I would not define a class for this, because I don't see many reusable elements to the solution.


Mainly, what you want can be done with the grouper idiom:

In [22]: l =  xrange(100)    
In [23]: itl=iter(l)    
In [24]: import itertools    
In [25]: for y in itertools.izip(*[itl]*5):
   ....:     print(y)
(0, 1, 2, 3, 4)
(5, 6, 7, 8, 9)
(10, 11, 12, 13, 14)
...
(95, 96, 97, 98, 99)

Calling next is no problem:

In [28]: l =  xrange(100)

In [29]: itl=itertools.izip(*[iter(l)]*5)

In [30]: next(itl)
Out[30]: (0, 1, 2, 3, 4)

In [31]: next(itl)
Out[31]: (5, 6, 7, 8, 9)

But making a previous method is a big problem, because iterators don't work this way. Iterators are meant to produce values without remembering past values. If you need all past values, then you need a list, not an iterator:

In [32]: l =  xrange(100)
In [33]: ll=list(itertools.izip(*[iter(l)]*5))

In [34]: ll[0]
Out[34]: (0, 1, 2, 3, 4)

In [35]: ll[1]
Out[35]: (5, 6, 7, 8, 9)

# Get the last group
In [36]: ll[-1]
Out[36]: (95, 96, 97, 98, 99)

Now getting the previous group is just a matter of keeping track of the list index.

0

精彩评论

暂无评论...
验证码 换一张
取 消