Fault Injection Framework
=========================
Fault is defined as a point of interest in the source code with an
associated action to be taken when that point is hit during execution.
Fault points are defined using SIMPLE_FAULT_INJECTOR() macro or by
directly invoking the FaultInjector_TriggerFaultIfSet() function. A
fault point is identifed by a name. This module provides an interface
to inject a pre-defined fault point into a running PostgreSQL database
by associating an action with the fault point. Action can be error,
panic, sleep, skip, infinite_loop, etc.
SQL based tests can make use of the "inject_fault()" interface to
simulate complex scenarios that are otherwise cumbersome to automate.
For example,
select inject_fault('checkpoint', 'error');
The above command causes the next checkpoint to fail with elog(ERROR).
The 'checkpoint' fault is defined in CreateCheckPoint() function in
xlog.c. Note that the fault is set to trigger only once by default.
Subsequent checkpoints will not be affected by the above fault.
select inject_fault('checkpoint', 'status');
The above command checks the status of the fault. It reports the
number of times the fault has been triggered during execution and
whether it has completed. Faults that are completed will no longer
trigger.
select wait_until_triggered_fault('checkpoint', 1);
The above command blocks until the checkpoint fault is triggered once.
select inject_fault('checkpoint', 'reset');
The above command removes the fault, such that no action will be taken
when the fault point is reached during execution. A fault can be set
to trigger more than once. For example:
select inject_fault_infinite('checkpoint', 'error');
This command causes checkpoints to fail until the fault is removed.
More detailed interface
-----------------------
A more detailed version of the fault injector interface accepts
several more paramters. Let us assume that a fault named
"heap_insert" has been defined in function heap_insert() in backend
code, like so:
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -1875,6 +1875,13 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
Buffer vmbuffer = InvalidBuffer;
bool all_visible_cleared = false;
+#ifdef FAULT_INJECTOR
+ FaultInjector_TriggerFaultIfSet(
+ "heap_insert",
+ "" /* database name */,
+ RelationGetRelationName(relation));
+#endif
+
A SQL test may want to inject "heap_insert" fault such that inserts
into a table named "my_table" fail for first 10 tuples.
select inject_fault(
'heap_insert',
'error',
'' /* database name */,
'my_table' /* table name */,
1 /* start occurrence */,
10 /* end occurrence */,
0 /* */);
The above command injects heap_insert fault such that the inserting
transaction will abort with elog(ERROR) when the code reaches the
fault point, only if the relation being inserted to has the name
'my_table'. Moreover, the fault will stop triggering after 10 tuples
have been inserted into the my_table. The 11th transaction to insert
into my_table will continue the insert as usual.
Fault actions
-------------
Fault action is specified as the type parameter in inject_fault()
interface. The following types are supported.
error
elog(ERROR)
fatal
elog(FATAL)
panic
elog(PANIC)
sleep
sleep for specified amount of time, use extraArg in seconds
infinite_loop
block until the query is canceled or terminated
suspend
block until the fault is removed
resume
resume backend processes that are blocked due to a suspend fault
skip
do nothing (used to implement custom logic that is not supported by
predefined actions)
reset
remove a previously injected fault
segv
crash the backend process due to SIGSEGV
interrupt
simulate cancel interrupt arrival, such that the next
interrupt processing cycle will cancel the query
finish_pending
similar to interrupt, sets the QueryFinishPending global flag
status
return a text datum with details of how many times a fault has been
triggered, the state it is currently in. Fault states are as follows:
"set" injected but the fault point has not been reached during
execution yet.
"tiggered" the fault point has been reached at least once during
execution.
"completed" the action associated with the fault point will no
longer be taken because the fault point has been reached maximum
number of times during execution.