Greetings Stackers.
We're working on a project which stores second-to-second tracking data for participants in psych experiments. Our current design has a Flash client which collects 60 seconds worth of timestamp/activity pairings and then posts the data as strings, along with a little participant metadata to our rails (3.0.3) / MySQL (5.1) application. Edit We're using vanilla Passenger/Nginx for the front. Rails splits the timestamp/activity strings into parallel arrays, generates a single raw SQL insert statement, and then shoves everything into a massive table, i.e: (simplified code)
@feedback_data = params[:feedbackValues].split(",")
@feedback_times = params[:feedbackTimes].split(",")
inserts = []
base = "(" + @userid + "," + @studyid + ","
@feedback_data.each_with_index do |e,i|
record = base + @feedback_times[i].to_s + ","
record += "'" + @feedback_data[i].to_s + "')"
inserts.push(record)
end
sql = "INSERT INTO excitement_datas (participantId, studyId, timestamp, activityLevel) VALUES #{inserts.join(", ")}"
ActiveRecord::Base.connection.execute sql
Yields:
INSERT INTO STUDY_DATA (participantId, studyId, timestamp, activityLevel)
VALUES (3,5,2011-01-27 05:02:21,47),(3,5,2011-01-27 05:02:22,56),etc.
The design has generated a lot of debate on the team. Studies will have 10s or 100s of concurrent participants. I've staggered the 开发者_开发知识库60 second POST interval for each client so that incoming data is distributed more evenly, but I'm still getting lots of doom and gloom predictions.
What else can we do / should we do to improve the scalability of this design in rails?
What tools / techniques can I use to accurately predict how this performs under load?
Many thanks.
This is more of an architecture issue than a code issue. Your code looks sane, and generating only one SQL query is a good approach. What's your application server however?
If you are using, say, one thin server then requests will block while the database is performing the SQL query, leading to an undresponsive app.
Using Passenger or Unicorn you'd get an increase in concurrency, but still quite slow sql queries per request.
If you're really worried about that query, you could try an intermediate Memcache or RabbitMQ layer, that stores a job for each of the received requests. Then have a background task (or many of them) pick up and do the slow insert. Memcache and Rabbit are more responsive than Mysql and you're dealing with the raw request.
This means that the request would complete very quickly and hand off the heavy lifting to your worker tasks. Delayed Job could be something to look at, or Workling, or Bunny/ EventMachine for Rabbit.
Memcache persistence might be an issue for you, so I'd recommend Rabbit if you fancy the queue-based approach.
On top of that, you could look at Apache Bench to see how you're actually doing already:
http://httpd.apache.org/docs/2.0/programs/ab.html
精彩评论