开发者

Best practices for sanitizing Unicode input

开发者 https://www.devze.com 2023-02-14 04:39 出处:网络
I\'m working on a web application at the moment (using Ruby) that I would ultimately like to be usable by people from anywhere in the world.With that in mind, support for non-AS开发者_JAVA百科CII char

I'm working on a web application at the moment (using Ruby) that I would ultimately like to be usable by people from anywhere in the world. With that in mind, support for non-AS开发者_JAVA百科CII characters is essential. However, I don't want the database to be full of "noise" characters in fields such as username etc.

Are there any accepted best practices for dealing with Unicode input under these circumstances without alienating users? Any thoughts on dealing with homographs in usernames to make impersonation harder?

Some of my thoughts so far -

  • normalizing text before storing or using it in queries
  • filtering non-printable characters
  • limiting the number of sequential combining diacritics allowed in input

Any further thoughts, or am I making unnecessary work for myself?

Thanks.


http://www.ietf.org/rfc/rfc3454.txt will tell you what you should be doing, which is to say worrying about normalization and security issues.

0

精彩评论

暂无评论...
验证码 换一张
取 消