What are the conditions for encountering a serialization failure in PostgreSQL?_问答_开发者

The PostgreSQL manual page on the Serializable Isolation Level states:

[Like] the Repeatable Read level, applications using this level must be prepared to retry transactions due to serialization failures.

What are the conditions for encountering a serialization failure at the Repeatable Read or Serializable levels?

I tried to induce a serialization failure with two instances of psql running, but even though a transaction was committed by one instance, the other instance, inside a serializable-level transaction while the other committed, was successful in committing its changes. Both simply inserted records into a table, so perhaps I need to try something more complex.

Basically I am trying to understand what happens in the event of a开发者_StackOverflow serialization failure and how serialization failures arise.

There are many possible causes for serialization failures. Technically, a deadlock between two transactions is one form of serialization failure, and can potentially happen at any isolation level if there are concurrent modifications to the schema (database structure). Since you're asking about PostgreSQL, you should be aware that in PostgreSQL this type of serialization failure gets a separate SQLSTATE from the the others: '40P01'. All other serialization failures return '40001'. The rest of this answer will focus on these non-deadlock varieties in PostgreSQL.

Outside of a live replica ("hot standby") these can only occur at the two stricter isolation levels: REPEATABLE READ and SERIALIZABLE. At the REPEATABLE READ level these can only occur because of write conflicts -- two concurrent transactions try to update or delete the same (existing) row. The first transaction to make the attempt locks the row and proceeds. If it commits, the second transaction fails with a serialization failure. If the first transaction rolls back for any reason, the blocked transaction is freed to proceed and will acquire its own lock on the row. This behavior, in combination with a single "snapshot" for the duration of the transaction, is also known as SNAPSHOT ISOLATION.

Prior to PostgreSQL version 9.1, SERIALIZABLE transactions worked exactly the same way. Starting with 9.1 PostgreSQL uses a new technique called Serializable Snapshot Isolation to ensure that the behavior of any set of serializable transactions is fully consistent with some serial (one-at-a-time) execution of those transactions. When using SERIALIZABLE transactions in 9.1, your application should be prepared for serialization failures on any statement except for ROLLBACK -- even in read only transactions and even on COMMIT. For more information, see the PostgreSQL doc page at http://www.postgresql.org/docs/current/interactive/transaction-iso.html or the Wiki page giving examples of how serialization failures can occur in the new, stricter isolation level at http://wiki.postgresql.org/wiki/SSI

If you are using the Hot Standby feature, you can get a serialization failure on the read-only replica if there is a long-running query for which maintaining a stable view of the data would require the database to forestall replication for too long. There are configuration settings to allow you to balance "freshness" of the replicated data against tolerance for long-running queries. Some users may want to create more than one replica so that they can have up-to-date data (possibly even choosing synchronous replication) while allowing another to lag as needed to service long-running queries.

Edit to provide another link: The paper titled Serializable Snapshot Isolation in PostgreSQL, presented at the 38th International Conference on Very Large Databases provides more detail and perspective than the other links, along with references to the papers which laid the groundwork for this implementation.

For REPEATABLE READ this example will do:

Prepare stage:

psql-0> CREATE TABLE foo(key int primary key, val int);
CREATE TABLE
psql-0> INSERT INTO foo VALUES(1, 42);

Now keep an eye on the psql-X part indicating the interleaving of actions:

psql-1> BEGIN ISOLATION LEVEL REPEATABLE READ;
psql-1> UPDATE foo SET val=val+1;
UPDATE 1
psql-2> BEGIN ISOLATION LEVEL REPEATABLE READ;
psql-2> UPDATE foo SET val=val+1;
*** no output, transaction blocked ***

psql-1> COMMIT;

psql-2> *** unblocks ***
ERROR:  could not serialize access due to concurrent update

An example for SERIALIZABLE is in the documentation for PostgreSQL 9.1 and should be no problem from here.

In case this helps anyone, here is a transcript from #postgresql on Freenode:

[14:36] <dtrebbien> What are the conditions for encountering a serialization failure?

[14:36] <dtrebbien> ^ What are the conditions for encountering a serialization failure?

[14:37] <dtrebbien> Is there a PostgreSQL dev who can identify the conditions of a serialization failure?

[14:38] <peerce> http://www.postgresql.org/docs/current/static/transaction-iso.html#XACT-SERIALIZABLE

[14:43] <dtrebbien> "any set of concurrent serializable transactions will have the same effect as if they were run one at a time"

[14:44] <dtrebbien> What are the rules that the PostgreSQL engine follows?

[14:44] <dtrebbien> I.e. If a row is changed, does that trigger a failure?

[14:44] <johto> the serializable isolation mode in 9.1 is really complex

[14:45] <dtrebbien> I figured.

[14:45] <dtrebbien> I read, too, that the Serializable level was somehow "fixed"

[14:45] <RhodiumToad> dtrebbien: prior to 9.1, the basic rule is that if a transaction tries to change a row whose current value isn't visible to it, that's a failure

[14:46] <dtrebbien> RhodiumToad: That's interesting.

[14:46] <dtrebbien> Also, accessing a value, right?

[14:46] <selenamarie> dtrebbien: in addition to what others said, the basic premise behind it is detection of cycles of dependencies

[14:47] <dtrebbien> Oh.

[14:50] <dtrebbien> Is it fair to say that in 9.1, the rules for triggering an isolation level have been made more complex to basically cut down on "false positive" serialization anomalies?

[14:51] <johto> they were made complex because the more simple rulex didn't catch all serialization anomalies

[14:51] <dtrebbien> Ah! I see.

[14:51] <dtrebbien> So that's why the release notes said "fixed".

[14:52] <RhodiumToad> dtrebbien: accessing a non-visible value wasn't an error because it just got the value which was visible at the time of the snapshot.

[14:53] <RhodiumToad> dtrebbien: read-only serializable queries simply see a static state of the database as of their snapshot time.

[14:54] <RhodiumToad> dtrebbien: other than a small wrinkle with TRUNCATE, all the serialization issues involve read/write queries

[15:03] <dtrebbien> RhodiumToad, johto, selenamarie, and peerce: Would you mind if I posted a transcript of this conversation to Stack Overflow?

[15:07] <selenamarie> dtrebbien: sure :)

[15:07] <dtrebbien> I don't know if it will help anyone. It might.

[15:08] <selenamarie> dtrebbien: i posted my notes from Kevin Grittner's talk on this here: http://www.chesnok.com/daily/2011/03/24/raw-notes-from-kevin-grittners-talk-on-ssi/

Some more examples

Here are some further minimal examples that might be of interest in addition to the one provided in A.H.'s answer. I'm using this test setup to conveniently run them on PostgreSQL 13.5.

Common setup:

0: CREATE TABLE "MyInt"(i INTEGER);
0: INSERT INTO foo VALUES(0);

The first thing I'd like to point out is that even read-only statements like SELECT can matter.

For example, this example does not raise any errors:

0: BEGIN TRANSACTION ISOLATION LEVEL REPEATABLE READ
1: BEGIN TRANSACTION ISOLATION LEVEL REPEATABLE READ
0: UPDATE "MyInt" SET i = 1
0: COMMIT
1: UPDATE "MyInt" SET i = 2
1: COMMIT

The only difference between this example and the one in the aforementioned answer is that the one in that answer is that in this one thread 1 does the UPDATE after the COMMIT of thread 0. TODO why does it matter?

However, if we simply add a SELECT from thread 1 before thread 0 commits as in:

0: BEGIN TRANSACTION ISOLATION LEVEL REPEATABLE READ
1: BEGIN TRANSACTION ISOLATION LEVEL REPEATABLE READ
0: UPDATE "MyInt" SET i = 1
1: SELECT * FROM "MyInt"
0: COMMIT
1: UPDATE "MyInt" SET i = 2
1: COMMIT

then the last UPDATE blows up with:

could not serialize access due to concurrent update

TODO understand exactly why the SELECT matters/how it is tracked by PostgreSQL. What seems to happen is that PostgreSQL makes considerations of type:

could such database serialization property possibly be violated if the client had such and such information?

which requires it to also track SELECT statements.

Example that blows up only on SERIALIZABLE

Here is another interesting example acting on two different rows. This example blows up only on SERIALIZABLE, but not on REPEATABLE READ (TODO why):

INSERT INTO "MyInt" VALUES (0)
INSERT INTO "MyInt" VALUES (10)
BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE
BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE
UPDATE "MyInt" SET i = 1 WHERE i = 0
SELECT * FROM "MyInt"
COMMIT
UPDATE "MyInt" SET i = 11 WHERE i = 10
COMMIT

The error message in this case is:

could not serialize access due to read/write dependencies among transactions

Unfortunately, after hours staring at https://www.postgresql.org/docs/13/transaction-iso.html#XACT-REPEATABLE-READ I'm still unable to precisely explain why some of those blow up and others don't, but I felt that the examples were of enough interest to publish regardless, maybe someone can clarify the exact sequence of steps taken by PostgreSQL in future edits/comments.

Examples from the tests

Another source of many such examples is, unsurprisingly, the in-tree tests under src/test/isolation, and they are quite readable too. A grep for could not serialize

Files under src/test/isolation/specs/*.spec determine which steps are to be done, and corresponding files under src/test/isolation/expected/*.out contain the exact expected psql raw output. So we can just read the .out files to see many full examples of failures.

For example src/test/isolation/expected/insert-conflict-do-nothing-2.out contains:

starting permutation: beginrr1 beginrr2 donothing1 donothing2 c1 c2 show
step beginrr1: BEGIN ISOLATION LEVEL REPEATABLE READ;
step beginrr2: BEGIN ISOLATION LEVEL REPEATABLE READ;
step donothing1: INSERT INTO ints(key, val) VALUES(1, 'donothing1') ON CONFLICT DO NOTHING;
step donothing2: INSERT INTO ints(key, val) VALUES(1, 'donothing2'), (1, 'donothing3') ON CONFLICT DO NOTHING; <waiting ...>
step c1: COMMIT;
step donothing2: <... completed>
ERROR:  could not serialize access due to concurrent update
step c2: COMMIT;
step show: SELECT * FROM ints;
key|val
---+----------
  1|donothing1
(1 row)

The only thing not shown clearly is the table creation statement which we can see in the corresponding .spec file src/test/isolation/specs/insert-conflict-do-nothing-2.spec:

  CREATE TABLE ints (key int, val text, PRIMARY KEY (key) INCLUDE (val));

From which we learn that INSERT INTO ON CONFLICT DO NOTHING can also cause serialization failures.

Some of the tests also link to a paper: https://www.cs.umb.edu/~poneil/ROAnom.pdf A Read-Only Transaction Anomaly Under Snapshot Isolation. You know a software is serious when the bug reports are addressed by papers.

In PostgreSQL, serialization failures occur when there are: