开发者

Iterate an Array and test a regular expression to each value (Java)

开发者 https://www.devze.com 2023-04-12 03:18 出处:网络
I\'m quite new to Java and I\'m facing a situation I can\'t solve. I have some html code and I\'m trying to run a regular expression to store all matches into an array. Here\'s my code:

I'm quite new to Java and I'm facing a situation I can't solve. I have some html code and I'm trying to run a regular expression to store all matches into an array. Here's my code:

    import java.util.regex.Matcher;
    import java.util.regex.Pattern;
    import java.util.regex.PatternSyntaxException;

    public class RegexMatch{
      boolean foundMatch = false;
      public String[] arrayResults;

      public String[] TestRegularExpression(String sourceCode, String pattern){
      try{
      Pat开发者_开发问答tern regex = Pattern.compile(pattern, Pattern.CASE_INSENSITIVE | Pattern.MULTILINE);
      Matcher regexMatcher = regex.matcher(sourceCode);
      while (regexMatcher.find()) {
        arrayResults[matches] = regexMatcher.group();
        matches ++;
      } 
    } catch (PatternSyntaxException ex) {
      // Exception occurred
    }
    return arrayResults;
  }
}

I'm passing a string containing html code and the regular expression pattern to extract all meta tags and store them into the array. Here's how I instantiate the method:

RegexMatch regex = new RegexMatch();
regex.TestRegularExpression(sourceCode, "<meta.*?>");
String[] META_TAGS = regex.arrayResults;

Any hint? Thanks!


Firstly, parsing HTML with regular expressions is a bad idea. There are alternatives which will convert the HTML into a DOM etc - you should look into those.

Assuming you still want the "match multiple results" idea though, it seems to me that a List<E> of some form would be more useful, so you don't need to know the size up-front. You can also build that in the method itself, rather than having state. For example:

import java.util.*;
import java.util.regex.*;

public class Test
{
    public static void main(String[] args)
        throws PatternSyntaxException
    {
        // Want to get x10 and x5 from this
        String text = "x10 y5 x5 xyz";
        String pattern = "x\\d+";
        List<String> matches = getAllMatches(text, pattern);
        for (String match : matches) {
            System.out.println(match);
        }
    }

    public static List<String> getAllMatches(String text, String pattern)
        throws PatternSyntaxException
    {
        Pattern regex = Pattern.compile(pattern);
        List<String> results = new ArrayList<String>();
        Matcher regexMatcher = regex.matcher(text);
        while (regexMatcher.find()) {
            results.add(regexMatcher.group());
        }
        return results;
    }
}

It's possible that there's something similar to this within the Matcher class itself, but I can't immediately see it...


With Jsoup, you could do something as simple as...

import java.util.ArrayList;
import java.util.List;

import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;

public class GetMeta {
   private static final String META_QUERY = "meta";

   public static List<String> parseForMeta(String htmlText) {
      Document jsDocument = Jsoup.parse(htmlText);
      Elements metaElements = jsDocument.select(META_QUERY);
      List<String> metaList = new ArrayList<String>();

      for (Element element : metaElements) {
         metaList.add(element.toString());
      }

      return metaList;
   }
}

For example:

import java.io.IOException;
import java.net.*;
import java.util.*;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;

public class GetMeta {
   private static final String META_QUERY = "meta";
   private static final String MAIN_URL = "http://www.yahoo.com";

   public static void main(String[] args) {
      try {
         Scanner scan = new Scanner(new URL(MAIN_URL).openStream());
         StringBuilder sb = new StringBuilder();
         while (scan.hasNextLine()) {
            sb.append(scan.nextLine() + "\n");
         }

         List<String> metaList = parseForMeta(sb.toString());
         for (String metaStr : metaList) {
            System.out.println(metaStr);
         }

      } catch (MalformedURLException e) {
         e.printStackTrace();
      } catch (IOException e) {
         e.printStackTrace();
      }
   }

   public static List<String> parseForMeta(String htmlText) {
      Document jsDocument = Jsoup.parse(htmlText);
      Elements metaElements = jsDocument.select(META_QUERY);
      List<String> metaList = new ArrayList<String>();

      for (Element element : metaElements) {
         metaList.add(element.toString());
      }

      return metaList;
   }
}
0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号