开发者

How to collect and clean database entries with bad emails

开发者 https://www.devze.com 2023-04-13 02:16 出处:网络
So I\'ve been given the task of going through our mailing list and \"cleaning\" the emails which are slightly incorrect (trailing symbols, 开发者_如何学运维random symbols accidentally in the middle of

So I've been given the task of going through our mailing list and "cleaning" the emails which are slightly incorrect (trailing symbols, 开发者_如何学运维random symbols accidentally in the middle of the email, syntactically invalid, etc.) and to delete the entries which are totally bogus. The problem is I'm finding some of the emails in the database have already been syntactically corrected but the original entries still exist. There are over 2000 emails which my query is returning as incorrect (it seems past cleanings encountered upwards of 10,000 bad emails!) and going through them one at a time is no good. Does anyone have any suggestions for how to go about this?

Here's the query I used to return the bad emails:

 SELECT id,email
 FROM table
 WHERE NOT
 (
 CHARINDEX(' ',LTRIM(RTRIM([email]))) = 0 
 AND   LEFT(LTRIM([email]),1) <> '@' 
 AND   RIGHT(RTRIM([email]),1) <> '.' 
 AND   CHARINDEX('.',[Email],CHARINDEX('@',[email])) - CHARINDEX('@',[email]) > 1 
 AND   LEN(LTRIM(RTRIM([email]))) - LEN(REPLACE(LTRIM(RTRIM([email])),'@','')) = 1 
 AND   CHARINDEX('.',REVERSE(LTRIM(RTRIM([email])))) >= 3 
 AND   (CHARINDEX('.@',[email]) = 0 AND CHARINDEX('..',[email]) = 0)
 )
 or id in (select id from table where email like '%[+;(,!]%')       


This question has already been asked (and somewhat answered) on Stack Overflow - T-SQL: checking for email format

I suggest using https://www.rfc-editor.org/rfc/rfc3696 to come up with a solution.

Also, don't assume the second level domain is always a max of four characters - In New Zealand, our longest second level domain is ".parliament.nz". I use a ".geek.nz" second level domain and are constantly having websites tell me it's an invalid email address.

Wikipedias article, http://en.wikipedia.org/wiki/Email_address#Valid_email_addresses is also informative - The following are all valid email addresses

  • niceandsimple@example.com
  • a.little.unusual@example.com
  • much."more\ unusual"@example.com
  • very.unusual."@".unusual.com@example.com
  • very."(),:;<>[]".VERY."very\\ @"very".unusual@cool.example.com
0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号