开发者

The best regex to parse Twitter #hashtags and @users

开发者 https://www.devze.com 2023-04-08 14:11 出处:网络
Here is what I quickly came up with. It works with regexKitLite on the iPhone: #define kUserRegex @\"((?:@){1}[0-9a-zA-Z_]{1,15})\";

Here is what I quickly came up with. It works with regexKitLite on the iPhone:

#define kUserRegex @"((?:@){1}[0-9a-zA-Z_]{1,15})";

Twitter only allows letters/numbers, underscores _, and a max of 15 chars (without @). My regex seems fine but reports false positives on e-mail addresses.

#define kHashtagRegex @"((?:#){1}[0-9a-zA-Z_àáâãäåçèéêëìíîïðòóôõöùúûüýÿ]{1,140})";

kHashtagRegex works with accentuated words but it is not enough for UTF-8 words. What is the 'tech spec' of a hashtag?

Is there a reference somewhere on what to use for parsing these? Or do开发者_如何学JAVA you have advice on how to enhance this regex?


I'm not sure if this is complete, bu this is what I would do:


For the username, Add a check for whitespace/start of string before the @ to eliminate emails (?:^|\s):

#define kUserRegex @"((?:^|\s)(?:@){1}[0-9a-zA-Z_]{1,15})";

for the hash tags, I would just say \w or \d

#define kHashtagRegex @"((?:#){1}[\w\d]{1,140})";


REGEX_HASHTAG = '/(^|[^0-9A-Z&\/\?]+)([##]+)([0-9A-Z_]*[A-Z_]+[a-z0-9_üÀ-ÖØ-öø-ÿ]*)/iu';`
0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号