Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this questionFirst a little context: I use a collection management software, GCStar, to manage my digital library (comics/manga/films, you name it - it's pretty awesome except for books). Problem is, it doesn't let me sort the shelf by multiple keys, say by Series AND Episode number. Episodes added later will always show up lower in the shelf, grouped by Series.
I pattered around the configurations and found that the .gcs file it uses is nothing but an XML (which I am only cursorily familiar with). Goes like this:
<?xml version="1.0" encoding="UTF-8"?>
<collection type="GCTVepisodes" items="101" version="1.6.1">
<information>
<maxId>101</maxId>
</information>
<item
id="1"
name="The Vice President Doesn't Say Anything about the Possibility of
Him Being the Main Character"
series="Baccano"
season="1"
episode="1"
...
>
<synopsis>It's 1931 and...</synopsis>
...
</item>
<item ...
The program, far as I understand, will always order descending by ID (which increases whenever I add an episode). So I need a transform on this which will:
- Sort the XML by series, then season, then episode
- Change the id attributes accordingly, starting from 1 to end (also reset maxId based on that)
- Write it all out into identical format to another XML.
How to do this (not talking about cut-pasting code here, obviously)? Can XSLT do all this stuff? Should I look into a tree-based parser in Perl? This is the weekend and I'm on a Linux machine, so open-so开发者_JS百科urce solutions running on UNIX would be nice - something in Perl would probably be best. What should I read up on?
If I can't do this at home, well, I can always design a small datastage job at the office, but I'd seriously like a simpler solution.
Thanks! :)
The maxId (and items in collection) value should not change, because you are not removing or adding ids.
If you want an easy commandline open-source XSLT transformator use XSLTProc from libxml2/libxslt. It is available on nearly every standard linux. http://xmlsoft.org/XSLT/xsltproc2.html
Use this command xsltproc transform.xsl input.xml >output.xml
And here is a solution, the XSLT transform stylesheet, that should work ;-) (I had enough free time to code it)
<?xml version="1.0" encoding="UTF-8" ?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" encoding="UTF-8" indent="yes"/>
<xsl:strip-space elements="*"/>
<!-- Default: copy everything -->
<xsl:template match="@*|node()">
<xsl:copy>
<xsl:apply-templates select="@*|node()"/>
</xsl:copy>
</xsl:template>
<!-- remove items, they will be sorted and inserted later -->
<xsl:template match="/collection/item"/>
<!-- remove id -->
<xsl:template match="/collection/item/@id"/>
<xsl:template match="/collection">
<xsl:copy>
<xsl:apply-templates select="@*|node()"/>
<!-- copy and sort item by series, then season, then episode -->
<xsl:for-each select="item">
<xsl:sort select="@series" data-type="text"/>
<xsl:sort select="@season" data-type="number"/>
<xsl:sort select="@episode" data-type="number"/>
<xsl:copy>
<xsl:attribute name="id">
<xsl:value-of select="position()"/>
</xsl:attribute>
<!-- copy the rest of item -->
<xsl:apply-templates select="@*|node()"/>
</xsl:copy>
</xsl:for-each>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
I used this simplified data to test it:
<?xml version="1.0" encoding="UTF-8"?>
<collection type="GCTVepisodes" items="5" version="1.6.1">
<information>
<maxId>5</maxId>
</information>
<item
id="1"
name="The Vice President Doesn't Say Anything about the Possibility of
Him Being the Main Character"
series="Baccano"
season="1"
episode="1"/>
<item
id="2"
name="blabla"
series="c"
season="1"
episode="2"/>
<item
id="3"
name="abc"
series="Baccano"
season="2"
episode="1"/>
<item
id="4"
name="blabla2"
series="Baccano"
season="1"
episode="2"/>
<item
id="5"
name="first of c"
series="c"
season="1"
episode="1"/>
</collection>
And this is the result (look at how the position and id changed):
<?xml version="1.0" encoding="UTF-8"?>
<collection type="GCTVepisodes" items="5" version="1.6.1">
<information>
<maxId>5</maxId>
</information>
<item id="1" name="The Vice President Doesn't Say Anything about the Possibility of Him Being the Main Character" series="Baccano" season="1" episode="1"/>
<item id="2" name="blabla2" series="Baccano" season="1" episode="2"/>
<item id="3" name="abc" series="Baccano" season="2" episode="1"/>
<item id="4" name="first of c" series="c" season="1" episode="1"/>
<item id="5" name="blabla" series="c" season="1" episode="2"/>
</collection>
You can get the same result using two simple templates:
- In the first template (the identity) we can just slightly "orient" the apply templates mechanism in order to sort
item
elements. - In the second template we can override each
item
element, and use theposition()
function to recompute theid
attribute. We will leave every other descendant node as is, but excluding the originalid
of theitem
.
XSLT 1.0 transform tested with Saxon 6.5.5
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:template match="@*|node()">
<xsl:copy>
<xsl:apply-templates select="@*|node()[not(self::item)]"/>
<xsl:apply-templates select="item">
<xsl:sort select="@series"/>
<xsl:sort select="@season" data-type="number"/>
<xsl:sort select="@episode" data-type="number"/>
</xsl:apply-templates>
</xsl:copy>
</xsl:template>
<xsl:template match="item">
<item id="{position()}">
<xsl:apply-templates select="@*[name()!='id']|node()"/>
</item>
</xsl:template>
</xsl:stylesheet>
When the above transform is applied to the following input (@therealmarv a bit modified to include children elements):
<collection type="GCTVepisodes" items="5" version="1.6.1">
<information>
<maxId>5</maxId>
</information>
<item
id="1"
name="The Vice President Doesn't Say Anything about the Possibility of
Him Being the Main Character"
series="Baccano"
season="1"
episode="1">
<synopsis>It's 1931 and...</synopsis>
</item>
<item
id="2"
name="blabla"
series="c"
season="1"
episode="2">
<synopsis>It's 1931 and...</synopsis>
</item>
<item
id="3"
name="abc"
series="Baccano"
season="2"
episode="1">
<synopsis>It's 1931 and...</synopsis>
</item>
<item
id="4"
name="blabla2"
series="Baccano"
season="1"
episode="2">
<synopsis>It's 1931 and...</synopsis>
</item>
<item
id="5"
name="first of c"
series="c"
season="1"
episode="1">
<synopsis>It's 1931 and...</synopsis>
</item>
</collection>
The following output is produced:
<collection type="GCTVepisodes" items="5" version="1.6.1">
<information>
<maxId>5</maxId>
</information>
<item id="1" name="The Vice President Doesn't Say Anything about the Possibility of Him Being the Main Character" series="Baccano" season="1" episode="1">
<synopsis>It's 1931 and...</synopsis>
</item>
<item id="4" name="blabla2" series="Baccano" season="1" episode="2">
<synopsis>It's 1931 and...</synopsis>
</item>
<item id="3" name="abc" series="Baccano" season="2" episode="1">
<synopsis>It's 1931 and...</synopsis>
</item>
<item id="5" name="first of c" series="c" season="1" episode="1">
<synopsis>It's 1931 and...</synopsis>
</item>
<item id="2" name="blabla" series="c" season="1" episode="2">
<synopsis>It's 1931 and...</synopsis>
</item>
</collection>
Can XSLT do all this stuff?
Yes. See the sub-answers below
- Sort the XML by series, then season, then episode
Yes you can use XSLT to sort XML.
http://www.w3schools.com/xsl/xsl_sort.asp
- Change the id attributes accordingly, starting from 1 to end (also reset maxId based on that)
You can also use it to write any text you want. Which means you can replace data in your transform.
It also can assign variables, do if statements, loops, do XPath queries, has a built-in function library, etc, so it will be more than powerful enough for what you want to do.
- Write it all out into identical format to another XML
...Which also means you can use it to write XML
What should I read up on?
XSLT :)
The w3schools links (all the links above) were plenty for me, but I was already familiar with the XML structure, in general (attributes, elements, root element, inner text, etc). If you are familiar with that, just read up on XSLT.
You could also look into XmlStarlet, which is a tool designed to query and transform XML from the command line or shell scripts/batch files (though for transformations, it might use XSLT anyhow).
I would also do this with XSLT. However, my stylesheet is a little different than therealmarv's stylsheet.
This XML input:
<collection type="GCTVepisodes" items="101" version="1.6.1">
<information>
<maxId>101</maxId>
</information>
<item
id="1"
name="The Vice President Doesn't Say Anything about the Possibility of
Him Being the Main Character"
series="Baccano"
season="1"
episode="2"
>
<synopsis>Blah blah blah...</synopsis>
...
</item>
<item
id="2"
name="some name"
series="Alpha"
season="2"
episode="1"
>
<synopsis>Blah blah blah...</synopsis>
...
</item>
<item
id="3"
name="The Vice President Doesn't Say Anything about the Possibility of
Him Being the Main Character"
series="Baccano"
season="1"
episode="1"
>
<synopsis>It's 1931 and...</synopsis>
...
</item>
<item
id="4"
name="some name"
series="Alpha"
season="1"
episode="1"
>
<synopsis>Blah blah blah...</synopsis>
...
</item>
</collection>
with this stylesheet:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:template match="node()|@*">
<xsl:copy>
<xsl:apply-templates select="node()|@*"/>
</xsl:copy>
</xsl:template>
<xsl:template match="collection">
<collection>
<xsl:apply-templates select="@*"/>
<xsl:apply-templates select="information"/>
<xsl:apply-templates select="item">
<xsl:sort select="@series" data-type="text"/>
<xsl:sort select="@season" data-type="number"/>
<xsl:sort select="@episode" data-type="number"/>
</xsl:apply-templates>
</collection>
</xsl:template>
<xsl:template match="item">
<item id="{position()}">
<xsl:apply-templates select="@*[not(name()='id')]|node()"/>
</item>
</xsl:template>
</xsl:stylesheet>
produces this output:
<collection type="GCTVepisodes" items="101" version="1.6.1">
<information>
<maxId>101</maxId>
</information>
<item id="1" name="some name" series="Alpha" season="1" episode="1">
<synopsis>Blah blah blah...</synopsis>
...
</item>
<item id="2" name="some name" series="Alpha" season="2" episode="1">
<synopsis>Blah blah blah...</synopsis>
...
</item>
<item id="3" name="The Vice President Doesn't Say Anything about the Possibility of Him Being the Main Character" series="Baccano" season="1" episode="1">
<synopsis>It's 1931 and...</synopsis>
...
</item>
<item id="4" name="The Vice President Doesn't Say Anything about the Possibility of Him Being the Main Character" series="Baccano" season="1" episode="2">
<synopsis>Blah blah blah...</synopsis>
...
</item>
</collection>
with the input from therealmarv's answer it produces:
<collection type="GCTVepisodes" items="5" version="1.6.1">
<information>
<maxId>5</maxId>
</information>
<item id="1" name="The Vice President Doesn't Say Anything about the Possibility of Him Being the Main Character" series="Baccano" season="1" episode="1"/>
<item id="2" name="blabla2" series="Baccano" season="1" episode="2"/>
<item id="3" name="abc" series="Baccano" season="2" episode="1"/>
<item id="4" name="first of c" series="c" season="1" episode="1"/>
<item id="5" name="blabla" series="c" season="1" episode="2"/>
</collection>
精彩评论