开发者

Sorting XML by Attribute and Modifying [closed]

开发者 https://www.devze.com 2023-03-19 04:15 出处:网络
Closed. This question needs to be more focused. It is not currently accepting answers. Want to improve this question? Update the question so it focuses on one problem only by editing this po
Closed. This question needs to be more focused. It is not currently accepting answers.

Want to improve this question? Update the question so it focuses on one problem only by editing this post.

Closed 7 years ago.

Improve this question

First a little context: I use a collection management software, GCStar, to manage my digital library (comics/manga/films, you name it - it's pretty awesome except for books). Problem is, it doesn't let me sort the shelf by multiple keys, say by Series AND Episode number. Episodes added later will always show up lower in the shelf, grouped by Series.

I pattered around the configurations and found that the .gcs file it uses is nothing but an XML (which I am only cursorily familiar with). Goes like this:

<?xml version="1.0" encoding="UTF-8"?>
<collection type="GCTVepisodes" items="101" version="1.6.1">
 <information>
  <maxId>101</maxId>
 </information>

 <item
  id="1"
  name="The Vice President Doesn't Say Anything about the Possibility of 
        Him Being the Main Character"
  series="Baccano"
  season="1"
  episode="1"
  ...
 >
  <synopsis>It's 1931 and...</synopsis>
 ...
 </item>
 <item ...

The program, far as I understand, will always order descending by ID (which increases whenever I add an episode). So I need a transform on this which will:

  1. Sort the XML by series, then season, then episode
  2. Change the id attributes accordingly, starting from 1 to end (also reset maxId based on that)
  3. Write it all out into identical format to another XML.

How to do this (not talking about cut-pasting code here, obviously)? Can XSLT do all this stuff? Should I look into a tree-based parser in Perl? This is the weekend and I'm on a Linux machine, so open-so开发者_JS百科urce solutions running on UNIX would be nice - something in Perl would probably be best. What should I read up on?

If I can't do this at home, well, I can always design a small datastage job at the office, but I'd seriously like a simpler solution.

Thanks! :)


The maxId (and items in collection) value should not change, because you are not removing or adding ids.

If you want an easy commandline open-source XSLT transformator use XSLTProc from libxml2/libxslt. It is available on nearly every standard linux. http://xmlsoft.org/XSLT/xsltproc2.html

Use this command xsltproc transform.xsl input.xml >output.xml

And here is a solution, the XSLT transform stylesheet, that should work ;-) (I had enough free time to code it)

<?xml version="1.0" encoding="UTF-8" ?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

<xsl:output method="xml" encoding="UTF-8" indent="yes"/>

<xsl:strip-space elements="*"/>

<!-- Default: copy everything -->
<xsl:template match="@*|node()">
  <xsl:copy>
    <xsl:apply-templates select="@*|node()"/>
  </xsl:copy>
</xsl:template>

<!-- remove items, they will be sorted and inserted later -->
<xsl:template match="/collection/item"/>

<!-- remove id -->
<xsl:template match="/collection/item/@id"/>

<xsl:template match="/collection">
    <xsl:copy>
        <xsl:apply-templates select="@*|node()"/>
        <!-- copy and sort item by series, then season, then episode -->
        <xsl:for-each select="item">
            <xsl:sort select="@series" data-type="text"/>
            <xsl:sort select="@season" data-type="number"/>
            <xsl:sort select="@episode" data-type="number"/>
            <xsl:copy>
                <xsl:attribute name="id">
                    <xsl:value-of select="position()"/>
                </xsl:attribute>
                <!-- copy the rest of item -->
                <xsl:apply-templates select="@*|node()"/>
            </xsl:copy>
        </xsl:for-each>
    </xsl:copy>
</xsl:template>

</xsl:stylesheet>

I used this simplified data to test it:

<?xml version="1.0" encoding="UTF-8"?>
<collection type="GCTVepisodes" items="5" version="1.6.1">
 <information>
  <maxId>5</maxId>
 </information>

 <item
  id="1"
  name="The Vice President Doesn't Say Anything about the Possibility of 
        Him Being the Main Character"
  series="Baccano"
  season="1"
  episode="1"/>

 <item
  id="2"
  name="blabla"
  series="c"
  season="1"
  episode="2"/>

 <item
  id="3"
  name="abc"
  series="Baccano"
  season="2"
  episode="1"/>  

 <item
  id="4"
  name="blabla2"
  series="Baccano"
  season="1"
  episode="2"/>

 <item
  id="5"
  name="first of c"
  series="c"
  season="1"
  episode="1"/>

</collection>

And this is the result (look at how the position and id changed):

<?xml version="1.0" encoding="UTF-8"?>
<collection type="GCTVepisodes" items="5" version="1.6.1">
  <information>
    <maxId>5</maxId>
  </information>
  <item id="1" name="The Vice President Doesn't Say Anything about the Possibility of    Him Being the Main Character" series="Baccano" season="1" episode="1"/>
  <item id="2" name="blabla2" series="Baccano" season="1" episode="2"/>
  <item id="3" name="abc" series="Baccano" season="2" episode="1"/>
  <item id="4" name="first of c" series="c" season="1" episode="1"/>
  <item id="5" name="blabla" series="c" season="1" episode="2"/>
</collection>


You can get the same result using two simple templates:

  • In the first template (the identity) we can just slightly "orient" the apply templates mechanism in order to sort item elements.
  • In the second template we can override each item element, and use the position() function to recompute the id attribute. We will leave every other descendant node as is, but excluding the original id of the item.

XSLT 1.0 transform tested with Saxon 6.5.5

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:output indent="yes"/>
    <xsl:strip-space elements="*"/>

    <xsl:template match="@*|node()">
        <xsl:copy>
            <xsl:apply-templates select="@*|node()[not(self::item)]"/>
            <xsl:apply-templates select="item">
                <xsl:sort select="@series"/>
                <xsl:sort select="@season" data-type="number"/>
                <xsl:sort select="@episode" data-type="number"/>
            </xsl:apply-templates>
        </xsl:copy>
    </xsl:template>

    <xsl:template match="item">
        <item id="{position()}">
            <xsl:apply-templates select="@*[name()!='id']|node()"/>
        </item>
    </xsl:template>

</xsl:stylesheet>

When the above transform is applied to the following input (@therealmarv a bit modified to include children elements):

<collection type="GCTVepisodes" items="5" version="1.6.1">
    <information>
        <maxId>5</maxId>
    </information>
    <item
        id="1"
        name="The Vice President Doesn't Say Anything about the Possibility of 
        Him Being the Main Character"
        series="Baccano"
        season="1"
        episode="1">
        <synopsis>It's 1931 and...</synopsis>
    </item>
    <item
        id="2"
        name="blabla"
        series="c"
        season="1"
        episode="2">
        <synopsis>It's 1931 and...</synopsis>
    </item>
    <item
        id="3"
        name="abc"
        series="Baccano"
        season="2"
        episode="1">
        <synopsis>It's 1931 and...</synopsis>
    </item>
    <item
        id="4"
        name="blabla2"
        series="Baccano"
        season="1"
        episode="2">
        <synopsis>It's 1931 and...</synopsis>
    </item>
    <item
        id="5"
        name="first of c"
        series="c"
        season="1"
        episode="1">
        <synopsis>It's 1931 and...</synopsis>
    </item>
</collection>

The following output is produced:

<collection type="GCTVepisodes" items="5" version="1.6.1">
   <information>
      <maxId>5</maxId>
   </information>
   <item id="1" name="The Vice President Doesn't Say Anything about the Possibility of    Him Being the Main Character" series="Baccano" season="1" episode="1">
      <synopsis>It's 1931 and...</synopsis>
   </item>
   <item id="4" name="blabla2" series="Baccano" season="1" episode="2">
      <synopsis>It's 1931 and...</synopsis>
   </item>
   <item id="3" name="abc" series="Baccano" season="2" episode="1">
      <synopsis>It's 1931 and...</synopsis>
   </item>
   <item id="5" name="first of c" series="c" season="1" episode="1">
      <synopsis>It's 1931 and...</synopsis>
   </item>
   <item id="2" name="blabla" series="c" season="1" episode="2">
      <synopsis>It's 1931 and...</synopsis>
   </item>
</collection>


Can XSLT do all this stuff?

Yes. See the sub-answers below

  • Sort the XML by series, then season, then episode

Yes you can use XSLT to sort XML.

http://www.w3schools.com/xsl/xsl_sort.asp

  • Change the id attributes accordingly, starting from 1 to end (also reset maxId based on that)

You can also use it to write any text you want. Which means you can replace data in your transform.

It also can assign variables, do if statements, loops, do XPath queries, has a built-in function library, etc, so it will be more than powerful enough for what you want to do.

  • Write it all out into identical format to another XML

...Which also means you can use it to write XML

What should I read up on?

XSLT :)

The w3schools links (all the links above) were plenty for me, but I was already familiar with the XML structure, in general (attributes, elements, root element, inner text, etc). If you are familiar with that, just read up on XSLT.

You could also look into XmlStarlet, which is a tool designed to query and transform XML from the command line or shell scripts/batch files (though for transformations, it might use XSLT anyhow).


I would also do this with XSLT. However, my stylesheet is a little different than therealmarv's stylsheet.

This XML input:

<collection type="GCTVepisodes" items="101" version="1.6.1">
  <information>
    <maxId>101</maxId>
  </information>

  <item
    id="1"
    name="The Vice President Doesn't Say Anything about the Possibility of 
    Him Being the Main Character"
    series="Baccano"
    season="1"
    episode="2"
    >
    <synopsis>Blah blah blah...</synopsis>
    ...
  </item>

  <item
    id="2"
    name="some name"
    series="Alpha"
    season="2"
    episode="1"
    >
    <synopsis>Blah blah blah...</synopsis>
    ...
  </item>


  <item
    id="3"
    name="The Vice President Doesn't Say Anything about the Possibility of 
    Him Being the Main Character"
    series="Baccano"
    season="1"
    episode="1"
    >
    <synopsis>It's 1931 and...</synopsis>
    ...
  </item>

  <item
    id="4"
    name="some name"
    series="Alpha"
    season="1"
    episode="1"
    >
    <synopsis>Blah blah blah...</synopsis>
    ...
  </item>

</collection>

with this stylesheet:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:output indent="yes"/>
  <xsl:strip-space elements="*"/>

  <xsl:template match="node()|@*">
    <xsl:copy>
      <xsl:apply-templates select="node()|@*"/>
    </xsl:copy>
  </xsl:template>

  <xsl:template match="collection">
    <collection>
      <xsl:apply-templates select="@*"/>
      <xsl:apply-templates select="information"/>
      <xsl:apply-templates select="item">
        <xsl:sort select="@series" data-type="text"/>
        <xsl:sort select="@season" data-type="number"/>
        <xsl:sort select="@episode" data-type="number"/>
      </xsl:apply-templates>      
    </collection>
  </xsl:template>

  <xsl:template match="item">
    <item id="{position()}">
      <xsl:apply-templates select="@*[not(name()='id')]|node()"/>
    </item>
  </xsl:template>

</xsl:stylesheet>

produces this output:

<collection type="GCTVepisodes" items="101" version="1.6.1">
   <information>
      <maxId>101</maxId>
   </information>
   <item id="1" name="some name" series="Alpha" season="1" episode="1">
      <synopsis>Blah blah blah...</synopsis>
    ...
  </item>
   <item id="2" name="some name" series="Alpha" season="2" episode="1">
      <synopsis>Blah blah blah...</synopsis>
    ...
  </item>
   <item id="3" name="The Vice President Doesn't Say Anything about the Possibility of      Him Being the Main Character" series="Baccano" season="1" episode="1">
      <synopsis>It's 1931 and...</synopsis>
    ...
  </item>
   <item id="4" name="The Vice President Doesn't Say Anything about the Possibility of      Him Being the Main Character" series="Baccano" season="1" episode="2">
      <synopsis>Blah blah blah...</synopsis>
    ...
  </item>
</collection>

with the input from therealmarv's answer it produces:

<collection type="GCTVepisodes" items="5" version="1.6.1">
   <information>
      <maxId>5</maxId>
   </information>
   <item id="1" name="The Vice President Doesn't Say Anything about the Possibility of      Him Being the Main Character" series="Baccano" season="1" episode="1"/>
   <item id="2" name="blabla2" series="Baccano" season="1" episode="2"/>
   <item id="3" name="abc" series="Baccano" season="2" episode="1"/>
   <item id="4" name="first of c" series="c" season="1" episode="1"/>
   <item id="5" name="blabla" series="c" season="1" episode="2"/>
</collection>
0

精彩评论

暂无评论...
验证码 换一张
取 消