I want to search a string for occurences of a string that matches a specific pattern.
Then I will write that unique list of found strings separated by commas.
The pattern is to look for "$FOR_something"
as long as that pattern does not fall inside of "#LOOKING( )"
or "/* */"
and the _something part does not have any other special characters.
For example, if I have this string,
"Not #LOOKING( $FOR_one $FOR_two) /* $FOR_three */ not $$$FOR_four or $FOR_four_b, but $FOR_five; and $FOR_six and not $FOR-seven or $FOR_five again"
The resulting list of found patterns I'm looking for from the above quoted string would be:
$FOR_five, $FOR_six
I started with this example:
import java.lang.StringBuffer;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class testIt {
public static void main(String args[]) {
String myWords = "Not #LOOKING( $FOR_one $FOR_two) /* $FOR_three */ not $$$FOR_four or $FOR_four_b, but $FOR_five; and $FOR_six and not $FOR-seven or $FOR_five again";
StringBuffer sb = new StringBuffer(0);
if ( myWords.toUpperCase().contains("$FOR") )
{
Pattern p = Pattern.compile("\\$FOR[\\_][a-zA-Z_0-9]+[\\s]*", Pattern.CASE_INSENSITIVE);
Matcher m = p.matcher(myWords);
String myFors = "";
while (m.find())
{
myFors = myWords.substring( m.start() , m.end() ).trim开发者_JAVA百科();
if ( sb.length() == 0 ) sb = sb.append(myFors);
else
{
if ( !(sb.toString().contains(myFors))) sb = sb.append(", " + myFors );
}
}
}
System.out.println(sb);
}
}
But it is not giving me what I want. What I want is:
$FOR_five, $FOR_six
Instead, I get all of the $FOR_somethings. I don't know how to ignore the occurences inside of the /**/
or the #LOOKING()
.
Any suggestions?
This problem goes beyond regular regex I would say. The $$$
patterns can be fixed with negative lookbehind, the others won't as easily.
What I would recommend you to do is to first use tokenizing / manual string parsing to discard unwanted data, such as /* ... */
or #LOOKING( .... )
. This could however also be removed by another regex such as:
myWords.replaceAll("/\\*[^*/]+\\*/", ""); // removes /* ... */
myWords.replaceAll("#LOOKING\\([^)]+\\)", ""); // removes #LOOKING( ... )
Once stripped of context-based content you can use e..g, the following regex:
(?<!\\$)\\$FOR_\\p{Alnum}+(?=[\\s;])
Explanation:
(?<!\\$) // Match iff not prefixed with $
\\$FOR_ // Matches $FOR_
\\p{Alnum}+ // Matches one or more alphanumericals [a-zA-Z0-9]
(?=[\\s;]) // Match iff followed by space or ';'
Note that the employed (?...)
are known as lookahead/lookbehind expressions which are not captured in the result itself. They act only as prefix/suffix conditions in the above sample.
精彩评论