开发者

Jsoup Element.attr() returning unexpected value (Android)

开发者 https://www.devze.com 2023-04-10 13:38 出处:网络
i\'m using JSoup to parse a webpage like this, and make it into two string arrays, one for each of the items text values (to be displayed in a ListActivity) and one for the links. some of these text v

i'm using JSoup to parse a webpage like this, and make it into two string arrays, one for each of the items text values (to be displayed in a ListActivity) and one for the links. some of these text values have special characters which jsoup has trouble parsing. at first i used:

Document doc = Jsoup.connect(URL).get();
maintable = doc.select(".kader").first();

to get the element for the table with the content. in another thread here someone said it would work using Jsoup.parse(html), so i changed it to this:

Document doc = Jsoup.connect(URL).get();
Document DOC = Jsoup.parse(doc.html());
if(doc.select(".kader") != null){
    maintable = DOC.select(".kader").first();
}

however this did not seem to work either. so i left that as something later to solve (here perhaps) but it is not my main problem. if i try to get a String array of all the links displayed in the main content i would use this method:

public String[] getTranslationLinks(){
    String[] items = new String[alllinks.size()];
    Element tempelement;
    for(int i = 0;i<items.length;i++){
        tempelement = alllinks.get(i);

        items[i] = tempelement.attr("abs:href");
    }
    return items;
}

the debugger says that tempelement contains the 开发者_StackOverflowproper element, but for some reason the .attr("abs:href") doesnt return the link as requested. tempelement would for instance contain:

<a href="./vertaling.php?id=6518" target="_top" title="">Hoofdstuk 3, tekst A: Herakles de slaaf</a>

but the .attr(abs:href) returns "".

do any of you know a way to solve these problems?


Your best bet is to create a small compilable and runnable bit of code that demonstrates your problem, an SSCCE. For instance, when I created my SSCCE based on my interpretation of your problem, it seemed to work. This was the code:

import java.io.IOException;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;

public class Kader {
   private static final String MAIN_URL = "http://www.latijnengrieks.com/categorie.php?id=120";
   private static final String ALL_LINKS = "a[href]";
   private static Element maintable;

   public static void main(String[] args) {
      Document jsDoc = null;

      try {
         jsDoc = Jsoup.connect(MAIN_URL).get();
         maintable = jsDoc.select(".kader").first();

         Elements alllinks = maintable.select(ALL_LINKS);

         String[] translationLinks = getTranslationLinks(alllinks);

         for (String tLink : translationLinks) {
            System.out.println(tLink);
         }

      } catch (IOException e) {
         e.printStackTrace();
      }
   }

   public static String[] getTranslationLinks(Elements alllinks){
      String[] items = new String[alllinks.size()];
      Element tempelement;
      for(int i = 0;i<items.length;i++){
          tempelement = alllinks.get(i);

          items[i] = tempelement.attr("abs:href");
      }
      return items;
  }
}

And this was the output:

http://www.latijnengrieks.com/vertaling.php?id=5586
http://www.latijnengrieks.com/profiel.php?id=1
http://www.latijnengrieks.com/vertaling.php?id=6342
http://www.latijnengrieks.com/profiel.php?id=1
http://www.latijnengrieks.com/vertaling.php?id=6159
http://www.latijnengrieks.com/profiel.php?id=1
http://www.latijnengrieks.com/vertaling.php?id=5368
http://www.latijnengrieks.com/profiel.php?id=11
http://www.latijnengrieks.com/vertaling.php?id=5371
http://www.latijnengrieks.com/profiel.php?id=11
http://www.latijnengrieks.com/vertaling.php?id=5797
http://www.latijnengrieks.com/profiel.php?id=1
http://www.latijnengrieks.com/vertaling.php?id=6310
http://www.latijnengrieks.com/profiel.php?id=1
http://www.latijnengrieks.com/vertaling.php?id=5799
http://www.latijnengrieks.com/profiel.php?id=1
http://www.latijnengrieks.com/vertaling.php?id=5776
http://www.latijnengrieks.com/profiel.php?id=1
http://www.latijnengrieks.com/vertaling.php?id=5861
http://www.latijnengrieks.com/profiel.php?id=1
http://www.latijnengrieks.com/vertaling.php?id=5521
http://www.latijnengrieks.com/profiel.php?id=1
http://www.latijnengrieks.com/vertaling.php?id=5622
http://www.latijnengrieks.com/profiel.php?id=1
http://www.latijnengrieks.com/vertaling.php?id=5692
http://www.latijnengrieks.com/profiel.php?id=1
http://www.latijnengrieks.com/vertaling.php?id=6367
http://www.latijnengrieks.com/profiel.php?id=1
http://www.latijnengrieks.com/vertaling.php?id=5910
http://www.latijnengrieks.com/profiel.php?id=1
http://www.latijnengrieks.com/vertaling.php?id=6011
http://www.latijnengrieks.com/profiel.php?id=1
http://www.latijnengrieks.com/vertaling.php?id=5940
http://www.latijnengrieks.com/profiel.php?id=1
http://www.latijnengrieks.com/vertaling.php?id=6009
http://www.latijnengrieks.com/profiel.php?id=1
http://www.latijnengrieks.com/vertaling.php?id=5573
http://www.latijnengrieks.com/profiel.php?id=1
http://www.latijnengrieks.com/vertaling.php?id=5572
http://www.latijnengrieks.com/profiel.php?id=1
http://www.latijnengrieks.com/vertaling.php?id=5778
http://www.latijnengrieks.com/profiel.php?id=1
http://www.latijnengrieks.com/vertaling.php?id=5993
http://www.latijnengrieks.com/profiel.php?id=1
http://www.latijnengrieks.com/vertaling.php?id=5623
http://www.latijnengrieks.com/profiel.php?id=1
http://www.latijnengrieks.com/vertaling.php?id=5642
http://www.latijnengrieks.com/profiel.php?id=1
http://www.latijnengrieks.com/vertaling.php?id=6000
http://www.latijnengrieks.com/profiel.php?id=1
http://www.latijnengrieks.com/vertaling.php?id=5798
http://www.latijnengrieks.com/profiel.php?id=1
http://www.latijnengrieks.com/vertaling.php?id=5578
http://www.latijnengrieks.com/profiel.php?id=1
http://www.latijnengrieks.com/vertaling.php?id=6415
http://www.latijnengrieks.com/profiel.php?id=14
0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号