What is the best way to determine the language of twitter posts.
There is the language parameter that comes with the streaming API but it doesn't really seem to开发者_高级运维 be very accurate. Even many Japanese posts are labelled as English.
What have others done to sort out the langauges?
I've had very good results with this PHP package: http://pear.php.net/package/Text_LanguageDetect/
It is fast and open source. We use it to select English only posts for a site we run at http://2012twit.com.
google have language detection within their Translate API if using evil external services is a go-er?
http://code.google.com/apis/language/translate/v1/reference.html#detectResult
精彩评论