I have a string in c# containing some data i need to extract based on certain conditions.
The string contains many tenders in the following form :
<TENDER> some words, don't know how many, may contain numbers and things like slashes 开发者_JAVA技巧(/) or whatever <DESCRIPTION> some more words and possibly other things like numbers or whatever describing the tender here </DESCRIPTION> some more words and possibly numbers and weird things </TENDER>
This string doesn't contain any nested <TENDER> tags, its flat. The <DESCRIPTION> tags occur only once within the <TENDER> tags.
I'm using : <TENDER>(.+?)</TENDER> as the regex to split up the tenders and it works fine. If this is wrong or stupid and you know a better way to write this please let me know as I have discovered I suck at regex.
My problem that I now need to only select a tender if its description contains any word in a list of keywords (lets say for now i want to select a tender only if it contains either "concrete" or"brick" in the description).
So far the regex I have come up with looks like this, but I don't know what to put in the middle. Also I have a vague suspicion that this might return me some false positives.
<TENDER>(.+?)<DESCRIPTION>have no idea what to do here</DESCRIPTION>(.+?)</TENDER>
If any of you regex guru's could point me in the right direction I would be most appreciative.
Use
<TENDER>([^<>]+?)<DESCRIPTION>[^<>]*?(brick|concrete)[^<>]*?</DESCRIPTION>([^<>]+?)</TENDER> 
I am using [^<>] instead of . to avoid leaving the tags.
Use IgnorePatternWhiteSpace because I have commented the pattern. It does not affect the data processing...it allows one to break out patterns and comment.
string pattern = @"
(?<=<TENDER>)            # Look Behind for TENDER
(?<TenderBefore>.*?)     # Put the data into the TenderBefore Named Match Capture Group
(?:<DESCRIPTION>)
(?=.*brick|concrete)     # Look ahead for the keywords
(?<Description>.*?)      # Put the data into the Description NMCG
(?:</DESCRIPTION>)
(?<TenderAfter>.*?)      # Put text into NMCG TenderAfter
(?=<\/TENDER>)           # Tender Look ahead.";
After processing the matches, extract the data out of each match such as
string Tender = string.Format("{0}<DESCRIPTION>{1}</DESCRIPTION>{2}",
 myMatch.Groups["TenderBefore"].Value,
 myMatch.Groups["Description"].Value,
 myMatch.Groups["TenderAfter"].Value);
HTH
Instead of regex, try using a proper DOM parsing library, such as the Html Agility Pack. It should work with any tags, even custom ones.
 
         
                                         
                                         
                                         
                                        ![Interactive visualization of a graph in python [closed]](https://www.devze.com/res/2023/04-10/09/92d32fe8c0d22fb96bd6f6e8b7d1f457.gif) 
                                         
                                         
                                         
                                         加载中,请稍侯......
 加载中,请稍侯......
      
精彩评论