开发者

Ordered Data with Cassandra RandomPartitioner

开发者 https://www.devze.com 2023-03-22 10:08 出处:网络
I have about a billion pieces of data that I would like to store i开发者_JAVA百科n Cassandra.The data items are ordered by time, and one of the main queries I\'ll be doing is to find the items between

I have about a billion pieces of data that I would like to store i开发者_JAVA百科n Cassandra. The data items are ordered by time, and one of the main queries I'll be doing is to find the items between two time ranges, in order. I'd really prefer to use the RandomParititioner, if at all possible. Is there a way to do this in Cassandra?

At first, since I'm coming from SQL, I assumed I should create each event as a row, but then it occurred to me that I was thinking about it the wrong way and I should really use columns. Columns in Cassandra seem to be ordered, but I'm confused as to just how ordered they are. If I use a time as the column name, is there a way for me to get all of the columns from one time to another in order?

Another thing I looked at was the 0.7 feature of secondary indices, but I've had trouble finding documentation for whether I can use these to view a range of things in order.

All I want is the Cassandra equivalent of this SQL: "Select * from Stuff where date > X and date < Y order by date asc". How can I do this?


The partitioner only affects the distribution of keys around the ring, not the order of columns within a key. Columns are always ordered according to the Column Comparator defined for the column family.

You can call get_slice with a SlicePredicate that specifies a SliceRange to get all the columns of a key within a range.

To model your data, you can create 1 row for each day (or suitable time shard) and have a column for each piece of data. Something like,

"yyyy-mm-dd" : {  #key, one for each day
    timeStampMillis1:dataid1 : "value1" # one column for each piece of data
    timeStampMillis2:dataid2 : "value2" 
    timeStampMillis3:dataid3 : "value3" 
}

The column names should be binary, using the binary comparator. The first 8 bytes are the timestamp, while the rest of the bytes are the id of the data.

Assuming X and Y are on the same day, to find all items between X and Y, do a do a get_slice on the day key, with a SlicePredicate with a SliceRange specifying a start of X and a finish of Y+1. Both start and finish are byte arrays of 8 bytes.

To find data over multiple days, read from multiple keys.

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号