开发者

How to set value of composite variable field using dynamic SQL

开发者 https://www.devze.com 2023-04-12 11:29 出处:网络
Given this type: -- Just for testing purposes: CREATE TYPE testType as (name text) I can get the value of a field dynamically with this function:

Given this type:

-- Just for testing purposes:
CREATE TYPE testType as (name text)

I can get the value of a field dynamically with this function:

CREATE OR REPLACE FUNCTION get_field(object anyelement, field text) RETURNS text as
$BODY$
DECLARE
    value text;
BEGIN
    EXECUTE 'SELECT $1."' || field || '"'
      USING object
       INTO value;

    return value;
END;
$BODY$
LANGUAGE plpgsql

Calling get_field('(david)':开发者_StackOverflow社区:testType, 'name') works as expected returning "david".

But how can I set a value of a field in a composite type? I've tried these functions:

CREATE OR REPLACE FUNCTION set_field_try1(object anyelement, field text, value text)
RETURNS anyelement
as
$BODY$
DECLARE
    value text;
BEGIN
    EXECUTE '$1."' || field || '" := $2'
      USING object, value;

    return object;
END;
$BODY$
LANGUAGE plpgsql

CREATE OR REPLACE FUNCTION set_field_try2(object anyelement, field text, value text)
RETURNS anyelement
as
$BODY$
DECLARE
    value text;
BEGIN
    EXECUTE 'SELECT $1 INTO $2."' || field || '"'
      USING value, object;

    return object;
END;
$BODY$
LANGUAGE plpgsql

CREATE OR REPLACE FUNCTION set_field_try3(object anyelement, field text, value text)
RETURNS anyelement
as
$BODY$
DECLARE
    value text;
BEGIN
    EXECUTE 'BEGIN $1."' || field || '" := $2; SELECT $1; END;'
       INTO object
      USING value, object;

    return object;
END;
$BODY$
LANGUAGE plpgsql

and some variations. Calling set_field_tryX doesn't work. I always get "ERROR: syntax error at or near...". How can I accomplish this?

Notes:

  • The parameter is anyelement and the field can be any field in the composite type. I can't just use object.name.
  • I'm concerned about SQL injection. Any advice in this would be appreciated but it is not my question.


Faster with hstore

Since Postgres 9.0, with the additional module hstore installed in your database there is a very simple and fast solution with the #= operator that ...

replace[s] fields in record with matching values from hstore.

To install the module:

CREATE EXTENSION hstore;

Examples:

SELECT my_record #= '"field"=>"value"'::hstore;  -- with string literal
SELECT my_record #= hstore(field, value);        -- with values

Values have to be cast to text and back, obviously.

Example plpgsql functions with more details:

  • Endless loop in trigger function
  • Assign to NEW by key in a Postgres trigger

Now works with json / jsonb, too!

There are similar solutions with json (pg 9.3+) or jsonb (pg 9.4+)

SELECT json_populate_record (my_record, json_build_object('key', 'new-value');

The functionality was undocumented, but it's official since Postgres 13. The manual:

However, if base isn't NULL then the values it contains will be used for unmatched columns.

So you can take any existing row and fill arbitrary fields (overwriting what's in it).

Major advantages of json vs hstore:

  • works with stock Postgres so you don't need an additional module.
  • also works for nested array and composite types.

Minor disadvantage: a bit slower.

See @Geir's added answer for details.

Without hstore and json

If you are on an older version or cannot install the additional module hstore or cannot assume it's installed, here is an improved version of what I posted previously. Still slower than the hstore operator, though:

CREATE OR REPLACE FUNCTION f_setfield(INOUT _comp_val anyelement
                                          , _field text, _val text)
  RETURNS anyelement
  LANGUAGE plpgsql STABLE AS
$func$
BEGIN

EXECUTE 'SELECT ' || array_to_string(ARRAY(
      SELECT CASE WHEN attname = _field
                THEN '$2'
                ELSE '($1).' || quote_ident(attname)
             END AS fld
      FROM   pg_catalog.pg_attribute
      WHERE  attrelid = pg_typeof(_comp_val)::text::regclass
      AND    attnum > 0
      AND    attisdropped = FALSE
      ORDER  BY attnum
      ), ',')
USING  _comp_val, _val
INTO   _comp_val;

END
$func$;

Call:

CREATE TEMP TABLE t( a int, b text);  -- Composite type for testing
SELECT f_setfield(NULL::t, 'a', '1');

Notes

  • An explicit cast of the value _val to the target data type is not necessary, a string literal in the dynamic query would be coerced automatically, obviating the subquery on pg_type. But I took it one step further:

  • Replace quote_literal(_val) with direct value insertion via the USING clause. Saves one function call and two casts, and is safer anyway. text is coerced to the target type automatically in modern PostgreSQL. (Did not test with versions before 9.1.)

  • array_to_string(ARRAY()) is faster than string_agg().

  • No variables needed, no DECLARE. Fewer assignments.

  • No subquery in the dynamic SQL. ($1).field is faster.

  • pg_typeof(_comp_val)::text::regclass
    does the same as
    (SELECT typrelid FROM pg_catalog.pg_type WHERE oid = pg_typeof($1)::oid)
    for valid composite types, just faster.
    This last modification is built on the assumption that pg_type.typname is always identical to the associated pg_class.relname for registered composite types, and the double cast can replace the subquery. I ran this test in a big database to verify, and it came up empty as expected:

    SELECT *
    FROM   pg_catalog.pg_type t
    JOIN   pg_namespace  n ON n.oid = t.typnamespace
    WHERE  t.typrelid > 0  -- exclude non-composite types
    AND    t.typrelid IS DISTINCT FROM
          (quote_ident(n.nspname ) || '.' || quote_ident(typname))::regclass
  • The use of an INOUT parameter obviates the need for an explicit RETURN. This is just a notational shortcut. Pavel won't like it, he prefers an explicit RETURN statement ...

Everything put together this is twice as fast as the previous version.


Original (outdated) answer:

The result is a version that's ~ 2.25 times faster. But I probably couldn't have done it without building on Pavel's second version.

In addition, this version avoids most of the casting to text and back by doing everything within a single query, so it should be much less error prone.
Tested with PostgreSQL 9.0 and 9.1.

CREATE FUNCTION f_setfield(_comp_val anyelement, _field text, _val text)
  RETURNS anyelement
  LANGUAGE plpgsql STABLE AS
$func$
DECLARE
   _list text;
BEGIN
_list := (
   SELECT string_agg(x.fld, ',')
   FROM  (
      SELECT CASE WHEN a.attname = $2
              THEN quote_literal($3) || '::'|| (SELECT quote_ident(typname)
                                                FROM   pg_catalog.pg_type
                                                WHERE  oid = a.atttypid)
              ELSE quote_ident(a.attname)
             END AS fld
      FROM   pg_catalog.pg_attribute a 
      WHERE  a.attrelid = (SELECT typrelid
                           FROM   pg_catalog.pg_type
                           WHERE  oid = pg_typeof($1)::oid)
      AND    a.attnum > 0
      AND    a.attisdropped = false
      ORDER  BY a.attnum
      ) x
   );

EXECUTE 'SELECT ' || _list || ' FROM  (SELECT $1.*) x'
USING  $1
INTO   $1;

RETURN $1;
END
$func$;


I wrote a second version of setfield function. It work on postgres 9.1 I didn't test it on older versions. It's not a miracle (from performance view), but it is more robust and about 8 times faster than the previous.

CREATE OR REPLACE FUNCTION public.setfield2(anyelement, text, text)
 RETURNS anyelement
 LANGUAGE plpgsql
AS $function$
DECLARE 
  _name text;
  _values text[];
  _value text;
  _attnum int;
BEGIN
  FOR _name, _attnum
     IN SELECT a.attname, a.attnum
          FROM pg_catalog.pg_attribute a 
         WHERE a.attrelid = (SELECT typrelid
                               FROM pg_type
                              WHERE oid = pg_typeof($1)::oid)
           AND a.attnum > 0 
  LOOP
    IF _name = $2 THEN
      _value := $3;
    ELSE
      EXECUTE 'SELECT (($1).' || quote_ident(_name) || ')::text' INTO _value USING $1;
    END IF;
    _values[_attnum] :=  COALESCE('"' || replace(replace(_value, '"', '""'), '''', '''''') || '"', ''); 
  END LOOP;
  EXECUTE 'SELECT (' || quote_ident(pg_typeof($1)::text) || ' ''(' || array_to_string(_values,',') || ')'').*' INTO $1; 
  RETURN $1;
END;
$function$;


UPDATE/caution: Erwin points out that this is currently undocumented, and the manual indicates it should not be possible to alter records this way.

Use hstore or Pavel's solution instead.

This simple json based solution is almost as fast as hstore, and requires only Postgres 9.3 or newer. This should be a good option if you can't use the hstore extension, and the performance difference should be negligible. Benchmarks: https://stackoverflow.com/a/28673542/1914376

a) We can either do it inline by cast/concat. Json function requires Postgres 9.3:

SELECT json_populate_record( 
     record
    , ('{"'||'key'||'":"'||'new-value'||'"}')::json
);

b) or inline by using functions from Postgres 9.4.

SELECT json_populate_record (
      record
     ,json_object(ARRAY['key', 'new-value'])
);

Note: I chose json_object(ARRAY[key,value]) since it was a bit faster than json_build_object(key,value):

To hide the casting details you can use a) in a function, with little overhead.

CREATE FUNCTION x.setfield_json(in_element anyelement, key text, value text)
    RETURNS anyelement AS
$BODY$
    SELECT json_populate_record( in_element, ('{"'||key||'":"'||value||'"}')::json);
$BODY$ LANGUAGE sql;


"SELECT INTO" outside plpgsql (in dynamic SQL context) has different sense than you expect - it store a result of query to table.

Modification of any field is possible, but not simple

CREATE OR REPLACE FUNCTION public.setfield(a anyelement, text, text)
RETURNS anyelement
LANGUAGE plpgsql
AS $function$
begin
  create temp table aux as select $1.*;
  execute 'update aux set ' || quote_ident($2) || ' = ' || quote_literal($3);
  select into $1 * from aux;
  drop table aux;
  return $1;
end;
$function$

But this code is not very effective - is not possible to write this well in plpgsql. You can find some C library, that should to do.


Test setup and benchmarks v2

Erwin encouraged to reproduce his benchmark in this thread (https://stackoverflow.com/a/7782839/1914376), so I modified his code with synthetic test data and added both the hstore solution and the json-solution from my answer (and a json solution by Pavel found in another thread) The benchmark is now run as one query, making it easier to capture the results.

DROP SCHEMA IF EXISTS x CASCADE;
CREATE SCHEMA x;


-- Pavel 1:
--------------------------------------------------------------------------------------------------
CREATE OR REPLACE FUNCTION x.setfield(anyelement, text, text)
RETURNS anyelement
LANGUAGE plpgsql
AS $function$
begin
  create temp table aux as select $1.*;
  execute 'update aux set ' || quote_ident($2) || ' = ' || quote_literal($3);
  select into $1 * from aux;
  drop table aux;
  return $1;
end;
$function$;


-- Pavel 2 (with patches)
--------------------------------------------------------------------------------------------------
CREATE OR REPLACE FUNCTION x.setfield2(anyelement, text, text)
 RETURNS anyelement
 LANGUAGE plpgsql
AS $function$
DECLARE
  _name text;
  _values text[];
  _value text;
  _attnum int;
BEGIN
  FOR _name, _attnum
     IN SELECT a.attname, a.attnum
           FROM pg_catalog.pg_attribute a
          WHERE a.attrelid = (SELECT typrelid
                                 FROM pg_type
                                WHERE oid = pg_typeof($1)::oid)
  LOOP
    IF _name = $2 THEN
      _value := $3;
    ELSE
      EXECUTE 'SELECT (($1).' || quote_ident(_name) || ')::text' INTO _value USING $1;
    END IF;
    _values[_attnum] :=  COALESCE('"' || replace(replace(_value, '"', '""'), '''', '''''') || '"', '');
  END LOOP;
  EXECUTE 'SELECT (' || pg_typeof($1)::text || '''(' || array_to_string(_values,',') || ')'').*' INTO $1;
  RETURN $1;
END;
$function$;


-- Erwin 1
--------------------------------------------------------------------------------------------------
CREATE OR REPLACE FUNCTION x.setfield3(anyelement, text, text)
RETURNS anyelement
AS $body$
DECLARE
 _list text;

BEGIN
_list := (
   SELECT string_agg(x.fld, ',')
   FROM   (
      SELECT CASE WHEN a.attname = $2
              THEN quote_literal($3)
              ELSE quote_ident(a.attname)
             END AS fld
      FROM   pg_catalog.pg_attribute a
      WHERE  a.attrelid = (SELECT typrelid
                           FROM   pg_type
                           WHERE  oid = pg_typeof($1)::oid)
      ORDER BY a.attnum
   ) x
);

EXECUTE '
SELECT ' || _list || '
FROM   (SELECT $1.*) x'
USING  $1
INTO   $1;

RETURN $1;
END;
$body$ LANGUAGE plpgsql;


-- Erwin 2
--------------------------------------------------------------------------------------------------
CREATE OR REPLACE FUNCTION x.setfield4(INOUT _comp_val anyelement
                                       , _field text, _val text)
  RETURNS anyelement AS
$func$
BEGIN

EXECUTE 'SELECT ' || array_to_string(ARRAY(
      SELECT CASE WHEN attname = _field
                THEN '$2'
                ELSE '($1).' || quote_ident(attname)
             END AS fld
      FROM   pg_catalog.pg_attribute
      WHERE  attrelid = pg_typeof(_comp_val)::text::regclass
      AND    attnum > 0
      AND    attisdropped = FALSE
      ORDER  BY attnum
      ), ',')
USING  _comp_val, _val
INTO   _comp_val;

END
$func$ LANGUAGE plpgsql;


-- Pavel 3: json. (Postgres 9.4)
-- Found here: https://stackoverflow.com/a/28284491/1914376
--------------------------------------------------------------------------------------------------
CREATE OR REPLACE FUNCTION x.setfield5(r anyelement, fn text, val text,OUT result anyelement)
 RETURNS anyelement
 LANGUAGE plpgsql
AS $function$
declare jo json;
begin
  jo := (select json_object(array_agg(key), 
                            array_agg(case key when fn then val
                                               else value end)) 
            from json_each_text(row_to_json(r)));
  result := json_populate_record(r, jo);
end;
$function$;


-- Json. Use built-in json functions (Postgres 9.3)
-- This is available from 9.3 since we create json by casting 
-- instead of using json_object/json_build_object only available from 9.4
--------------------------------------------------------------------------------------------------
CREATE FUNCTION x.setfield_json(in_element anyelement, key text, value text)
    RETURNS anyelement AS
$BODY$
    SELECT json_populate_record( in_element, ('{"'||key||'":"'||value||'"}')::json);
$BODY$ LANGUAGE sql;



--------------------------------------------------------------------------------------------------
-- Test setup
--------------------------------------------------------------------------------------------------

-- composite type for tests.
CREATE TYPE x.t_f as (
 id       int
,company  text
,sort     text
,log_up   timestamp
,log_upby smallint
);

-- Create temp table with synthetic test data
DROP TABLE IF EXISTS tmp_f;
CREATE TEMP table tmp_f AS
   SELECT ROW(i, 'company'||i, NULL, NULL, NULL)::x.t_f AS f
   FROM generate_series(1, 5000) S(i);



-- Run the benchmark
DO $$  DECLARE  start_time timestamptz; test_count integer; test_description TEXT; BEGIN

    test_count := 200;
    test_description := 'setfield, Pavel 1: temptable';
    start_time := clock_timestamp();    
    PERFORM x.setfield (f, 'company','new-value-'||md5(random()::text)) FROM tmp_f LIMIT test_count;
    RAISE NOTICE 'Test took: % ms (for % rows) Name: %', extract(MILLISECONDS FROM (clock_timestamp() - start_time))::INTEGER, test_count, test_description;

    test_count := 5000;
    test_description := 'setfield2, Pavel 2: reflection';
    start_time := clock_timestamp();
    PERFORM x.setfield2 (f, 'company','new-value-'||md5(random()::text)) FROM tmp_f LIMIT test_count;
    RAISE NOTICE 'Test took: % ms (for % rows) Name: %', extract(MILLISECONDS FROM (clock_timestamp() - start_time))::INTEGER, test_count, test_description;

    test_count := 5000;
    test_description := 'setfield3, Erwin 1: reflection';
    start_time := clock_timestamp();
    PERFORM x.setfield3 (f, 'company','new-value-'||md5(random()::text)) FROM tmp_f LIMIT test_count;
    RAISE NOTICE 'Test took: % ms (for % rows) Name: %', extract(MILLISECONDS FROM (clock_timestamp() - start_time))::INTEGER, test_count, test_description;

    test_count := 5000;
    test_description := 'setfield4, Erwin 2: reflection';
    start_time := clock_timestamp();
    PERFORM x.setfield4 (f, 'company','new-value-'||md5(random()::text)) FROM tmp_f LIMIT test_count;
    RAISE NOTICE 'Test took: % ms (for % rows) Name: %', extract(MILLISECONDS FROM (clock_timestamp() - start_time))::INTEGER, test_count, test_description;

    test_count := 5000;
    test_description := 'setfield5, Pavel 3: json (PG 9.4)';
    start_time := clock_timestamp();
    PERFORM x.setfield5 (f, 'company','new-value-'||md5(random()::text)) FROM tmp_f LIMIT test_count;
    RAISE NOTICE 'Test took: % ms (for % rows) Name: %', extract(MILLISECONDS FROM (clock_timestamp() - start_time))::INTEGER, test_count, test_description;
    
    test_count := 5000;
    test_description := 'setfield_json, Geir 1: casting (PG 9.3)';
    start_time := clock_timestamp();
    PERFORM x.setfield_json (f, 'company','new-value-'||md5(random()::text)) FROM tmp_f LIMIT test_count;
    RAISE NOTICE 'Test took: % ms (for % rows) Name: %', extract(MILLISECONDS FROM (clock_timestamp() - start_time))::INTEGER, test_count, test_description;

    --json_object(ARRAY(key,value]) is actually faster than json_build_object(key, value)
    test_count := 5000;
    test_description := 'no function/inlined: json_object (PG 9.4)';
    start_time := clock_timestamp();
    PERFORM json_populate_record( f, json_object(ARRAY['company', 'new-value'||md5(random()::text)]  )) FROM tmp_f LIMIT test_count;
    RAISE NOTICE 'Test took: % ms (for % rows) Name: %', extract(MILLISECONDS FROM (clock_timestamp() - start_time))::INTEGER, test_count, test_description;

    test_count := 5000;
    test_description := 'no function/inlined: hstore (PG 9.0)';
    start_time := clock_timestamp();
    PERFORM f #= hstore('company', 'new-value'||md5(random()::text))  FROM tmp_f LIMIT test_count;
    RAISE NOTICE 'Test took: % ms (for % rows) Name: %', extract(MILLISECONDS FROM (clock_timestamp() - start_time))::INTEGER, test_count, test_description;
    
END; $$;

Test results on 9.4.1, win32, i5-4300U

NOTICE:  Test took: 1138 ms (for 200 rows) Name: setfield, Pavel 1: temptable
NOTICE:  Test took: 652 ms (for 5000 rows) Name: setfield2, Pavel 2: reflection
NOTICE:  Test took: 364 ms (for 5000 rows) Name: setfield3, Erwin 1: reflection
NOTICE:  Test took: 275 ms (for 5000 rows) Name: setfield4, Erwin 2: reflection
NOTICE:  Test took: 192 ms (for 5000 rows) Name: setfield5, Pavel 3: json (PG 9.4)
NOTICE:  Test took: 23 ms (for 5000 rows) Name: setfield_json, Geir 1: casting (PG 9.3)
NOTICE:  Test took: 25 ms (for 5000 rows) Name: no function/inlined: json_object (PG 9.4)
NOTICE:  Test took: 14 ms (for 5000 rows) Name: no function/inlined: hstore (PG 9.0)


Update March 2015:
Largely outdated now. Consider the new benchmark by @Geir with faster variants.


Test setup and benchmarks

I took the three solutions presented (by Oct. 16th, 2011) and ran a test on PostgreSQL 9.0. You find the complete setup below. Only test data are not included as I used a real life database (not synthetic data). It's all encapsulated in its own schema for non-intrusive use.

I would like to encourage anybody who wants to reproduce the test. Maybe with postgres 9.1? And add your results here? :)

-- DROP SCHEMA x CASCADE;
CREATE SCHEMA x;

-- Pavel 1
CREATE OR REPLACE FUNCTION x.setfield(anyelement, text, text)
RETURNS anyelement
LANGUAGE plpgsql
AS $function$
begin
  create temp table aux as select $1.*;
  execute 'update aux set ' || quote_ident($2) || ' = ' || quote_literal($3);
  select into $1 * from aux;
  drop table aux;
  return $1;
end;
$function$;

-- Pavel 2 (with patches)
CREATE OR REPLACE FUNCTION x.setfield2(anyelement, text, text)
 RETURNS anyelement
 LANGUAGE plpgsql
AS $function$
DECLARE 
  _name text;
  _values text[];
  _value text;
  _attnum int;
BEGIN
  FOR _name, _attnum
     IN SELECT a.attname, a.attnum
           FROM pg_catalog.pg_attribute a 
          WHERE a.attrelid = (SELECT typrelid
                                 FROM pg_type
                                WHERE oid = pg_typeof($1)::oid) 
  LOOP
    IF _name = $2 THEN
      _value := $3;
    ELSE
      EXECUTE 'SELECT (($1).' || quote_ident(_name) || ')::text' INTO _value USING $1;
    END IF;
    _values[_attnum] :=  COALESCE('"' || replace(replace(_value, '"', '""'), '''', '''''') || '"', '');
  END LOOP;
  EXECUTE 'SELECT (' || pg_typeof($1)::text || '''(' || array_to_string(_values,',') || ')'').*' INTO $1; 
  RETURN $1;
END;
$function$;

-- Erwin 1
CREATE OR REPLACE FUNCTION x.setfield3(anyelement, text, text)
RETURNS anyelement
AS $body$
DECLARE
 _list text;

BEGIN
_list := (
   SELECT string_agg(x.fld, ',')
   FROM   (
      SELECT CASE WHEN a.attname = $2
              THEN quote_literal($3)
              ELSE quote_ident(a.attname)
             END AS fld
      FROM   pg_catalog.pg_attribute a 
      WHERE  a.attrelid = (SELECT typrelid
                           FROM   pg_type
                           WHERE  oid = pg_typeof($1)::oid) 
      ORDER BY a.attnum
   ) x
);

EXECUTE '
SELECT ' || _list || '
FROM   (SELECT $1.*) x'
USING  $1
INTO   $1;

RETURN $1;
END;
$body$ LANGUAGE plpgsql;

-- composite type for tests.
CREATE TYPE x.t_f as (
 id       int
,company  text
,sort     text
,log_up   timestamp 
,log_upby smallint
);

-- temp table with real life test data
DROP   TABLE IF EXISTS tmp_f;
CREATE TEMP table tmp_f AS 
   SELECT ROW(firma_id,firma,sort,log_up,log_upby)::x.t_f AS f
   FROM   ef.firma
   WHERE  firma !~~ '"%';

-- SELECT count(*) FROM tmp_f;  -- 5183

-- Quick test: results are identical?
SELECT *,
       x.setfield (f, 'company','test')
      ,x.setfield2(f, 'company','test')
      ,x.setfield3(f, 'company','test')
 FROM tmp_f
LIMIT 10;

Benchmarks

I ran the queries a couple of times to populate the cache. The presented results are the best of five total runtimes with EXPLAIN ANALYZE.

Rirst round with 1000 rows

Pavel's first prototype maxes out shared memory with more rows.

Pavel 1: 2445.112 ms

SELECT x.setfield (f, 'company','test') FROM tmp_f limit 1000;

Pavel 2: 263.753 ms

SELECT x.setfield2(f, 'company','test') FROM tmp_f limit 1000;

Erwin 1: 120.671 ms

SELECT x.setfield3(f, 'company','test') FROM tmp_f limit 1000;

Another test with 5183 rows.

Pavel 2: 1327.429 ms

SELECT x.setfield2(f, 'company','test') FROM tmp_f;

Erwin1: 588.691 ms

SELECT x.setfield3(f, 'company','test') FROM tmp_f;
0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号