开发者

LINQ Where Ignore Accentuation and Case

开发者 https://www.devze.com 2023-04-04 16:40 出处:网络
What is the easiest way to filter elements with LINQ through the Where method ignoring accentuation and case?

What is the easiest way to filter elements with LINQ through the Where method ignoring accentuation and case?

So far, I've been able to ignore Casing by calling methods on the properties, which I dont think is a good idea because it calls the same method for every element (right?).

So here's what I got so far:

var result = from p in People
             where p.Name.ToUpper().Contains(filter.ToUpper())
             select p;

Please tell me 开发者_Python百科if this is a good practice, and the easiest way to ignore accentuation.


To ignore case and accents (diacritics) you can first define an extension method like this:

    public static string RemoveDiacritics(this String s)
    {
        String normalizedString = s.Normalize(NormalizationForm.FormD);
        StringBuilder stringBuilder = new StringBuilder();

        for (int i = 0; i < normalizedString.Length; i++)
        {
            Char c = normalizedString[i];
            if (CharUnicodeInfo.GetUnicodeCategory(c) != UnicodeCategory.NonSpacingMark)
                stringBuilder.Append(c);
        }

        return stringBuilder.ToString();
    }

(Modified from Ignoring accented letters in string comparison)

Now you can run your query:

string queryText = filter.ToUpper().RemoveDiacritics();

var result = from p in People
         where p.Name.ToUpper().RemoveDiacritics() == queryText
         select p;

This is fine if you are just iterating over a collection in C#, but if you are using LINQ to SQL it is preferable to avoid non-standard methods (including extension methods) in your LINQ query. This is because your code cannot be converted into valid SQL and hence run on SQL Server with all its lovely performance optimization.

Since there doesn't seem to be a standard way of ignoring accents within LINQ to SQL, in this case I would suggest changing the field type that you want to search to be case- and accent-insensitive (CI_AI).

With your example:

ALTER TABLE People ALTER COLUMN Name [varchar](100) COLLATE SQL_Latin1_General_CP1_CI_AI

Your query should now ignore accentuation and case.

Note that you will need to temporarily remove any unique constraints on the field before running the above query, e.g.

ALTER TABLE People DROP CONSTRAINT UQ_People_Name

Now your LINQ query would simply be:

var result = from p in People
         where p.Name == filter
         select p;

See related question here.


For accents, you can enumerate all of them (here for French language) if you can't update your db schema nor fetch the entire list in RAM:

var result = from p in People
             where p.Name.ToLower()
                .Replace("à", "a")
                .Replace("â", "a")
                .Replace("ä", "a")
                .Replace("ç", "c")
                .Replace("é", "e")
                .Replace("è", "e")
                .Replace("ê", "e")
                .Replace("ë", "e")
                .Replace("î", "i")
                .Replace("ï", "i")
                .Replace("ô", "o")
                .Replace("ù", "u")
                .Replace("û", "u")
                .Replace("ü", "u").Contains(RemoveDiacritics(filter.ToLower()))
             select p;
                


Change collate:

ALTER TABLE dbo.MyTable 
ALTER COLUMN CharCol varchar(10)**COLLATE Latin1_General_CI_AS** NOT NULL;


Here is some code that allows comparison ignoring accentuation:

Ignoring accented letters in string comparison

I will have the decency of not copying the code, so that the author can get rep for his answer. Now, answering your question:

You'd get that piece of code and use it like this:

var result = from p in People
             where p.Name.ToUpper().Contains(RemoveDiacritics(filter.ToUpper()))
             select p;

You even turn that code into an extension method. I have :)


Following Dunc's solution of changing the collation of the whole database, here is a full tutorial that deals with indexes, keys etc.:

https://www.codeproject.com/Articles/302405/The-Easy-way-of-changing-Collation-of-all-Database

(Just make sure to read all the comments first.)


If you use Linq-to-Entities, you could:

1. Create an SQL Function to remove the diacritics, by applying to the input string the collation SQL_Latin1_General_CP1253_CI_AI, for example:

CREATE FUNCTION [dbo].[RemoveDiacritics] (
@input varchar(max)
)   RETURNS varchar(max)

AS BEGIN
DECLARE @result VARCHAR(max);

select @result = @input collate SQL_Latin1_General_CP1253_CI_AI

return @result
END

2. Add it in the DB context (in this case ApplicationDbContext) by mapping it with the attribute DbFunction, for example:

 public class ApplicationDbContext : IdentityDbContext<CustomIdentityUser>
    {
        [DbFunction("RemoveDiacritics", "dbo")]
        public static string RemoveDiacritics(string input)
        {
            throw new NotImplementedException("This method can only be used with LINQ.");
        }

        public ApplicationDbContext(DbContextOptions<ApplicationDbContext> options)
            : base(options)
        {
        }
}

3. Use it in LINQ query, for example:

var query = await db.Users.Where(a => ApplicationDbContext.RemoveDiacritics(a.Name).Contains(ApplicationDbContext.RemoveDiacritics(filter))).tolListAsync();

being filter the string you want to search, in this case in the column name of the table Users of the DB.


As of Entity Framework Core 5.0 you can now alter the collation of the query on the fly for Linq to SQL.

So for your example, if I wanted to ignore both case and accents I would do something like:

(Note that we cannot use contains but we can use the SQL 'like' operator)

var result = from p in People
             where EF.Functions.Like(EF.Functions.Collate(p.Name, "Latin1_General_CI_AI"), $"%{filter}%")
             select p;

Latin1_General_CI_AI is Case insensitive (CI) and Accent insensitive (AI)

More information here on EF collations and case sensitivity for EF:

https://learn.microsoft.com/en-us/ef/core/miscellaneous/collations-and-case-sensitivity#explicit-collation-in-a-query

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号