开发者

Extracting data from Postgres database to XML via Perl - encoding issue

开发者 https://www.devze.com 2023-01-29 12:14 出处:网络
I have a Postgres database encoded in UTF-8. I am using Perl with the following modules: use DBI use XML::Generator::DBI

I have a Postgres database encoded in UTF-8. I am using Perl with the following modules:

use DBI
use XML::Generator::DBI
use XML::SAX::Writer

to extract data in the Postgres database to an XML file using a query, i.e.:

use DBI;
use XML::Generator::DBI;
use XML::SAX::Writer;

my $dbh = DBI->connect("dbi:Pg:dbname=postgres;host=MYHOST;port=2278",
                      username,
                      password,
                      {RaiseError => 1},
                     );

my $handler = XML::SAX::Writer->new( Output => 'foo.xml' );

my $generator = XML::Generator::DBI->new(
 Handler => $handler,
 dbh     => $dbh,
 Indent  => 1,
);

$select = qq(

!!!!SQL QUERY!!!!!

);

$generator->execute(
                     $select, 
                     undef,
                     RootElement => 'root',
                        );

This works well and I get a valid XML document as a resu开发者_Go百科lt. My problem is that some of the data in the database is binary - i.e. there are non UTF-8 characters in there. When this occurs, XML::Generator::DBI detects this and outputs the data as follows in the XML file:

<foo dbi:encoding='base64'>VGhpcyByZXBvcnQgbGlzdHMgYWxsIGZpbGVzIGhhdmluZyBhY2Nlc3NlcyB0byBkYXRhYmFzZSB0
YWJsZXMuDQpJdCBwcm92aWRlcyB0aGUgZm9sbG93aW5nIGluZm9ybWF0aW9uOiAgRmlsZSBmdWxs
IG5hbWUsIGFjY2Vzc2VkIHRhYmxl
</foo>

The namespace is also bound to the URL http://axkit.org/NS/xml-generator-dbi. This is correct behaviour according to the module documentation, but what I want to know is: is it possible to somehow transform this base64 encoded string into UTF-8 so that i can actually use it?

I'm no Perl expert at all, nor Postgres for that matter, so go easy! many thanks.


I imagine you can decode the Base64 using MIME::Base64... But it depends what the data is. If there are non-UTF-8 characters in, what are they? You need to know what to transform from.


If by "binary", you mean that the column type in PostgreSQL is bytea, then you could either:

  1. Set bytea_output=escape on $dbh, like so:

    $dbh->do('SET bytea_output=escape');

  2. Cast the column with type bytea to text in your query:

    SELECT bytea_column::text FROM ...

I doubt either of those will be exactly what you want to end up with. Hopefully that will get you going in the right direction.

0

精彩评论

暂无评论...
验证码 换一张
取 消