polardbxengine/plugin/x/protocol/doc/mysqlx-protocol-use-cases.dox

362 lines
11 KiB
Plaintext

/*
* Copyright (c) 2015, 2018, Oracle and/or its affiliates. All rights reserved.
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License, version 2.0,
* as published by the Free Software Foundation.
*
* This program is also distributed with certain software (including
* but not limited to OpenSSL) that is licensed under separate terms,
* as designated in a particular file or component or in included license
* documentation. The authors of MySQL hereby grant you an additional
* permission to link the program and your derivative works with the
* separately licensed software that they have included with MySQL.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License, version 2.0, for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program; if not, write to the Free Software
* Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
*/
/** @page mysqlx_protocol_use_cases Use Cases
Topics in this section:
- @ref use_cases_Prepared_Statements_with_Single_Round_Trip
- @ref use_cases_Streaming_Inserts
- @ref use_cases_SQL_with_Multiple_Resultsets
- @ref use_cases_Inserting_CRUD_Data_in_a_Batch
- @ref use_cases_Cross_Collection_Update_and_Delete
Prepared Statements with Single Round-Trip {#use_cases_Prepared_Statements_with_Single_Round_Trip}
==========================================
In the MySQL Client/Server %Protocol, a ``PREPARE``/``EXECUTE`` cycle
required two round-trips as the ``COM_STMT_EXECUTE`` requires data from
the ``COM_STMT_PREPARE-ok`` packet.
@startuml "Single Round-Trip"
group tcp packet
Client -> Server: COM_STMT_PREPARE(...)
end
group tcp packet
Server --> Client: COM_STMT_PREPARE-ok(**stmt_id=1**)
end
group tcp packet
Client -> Server: COM_STMT_EXECUTE(**stmt_id=1**)
end
group tcp packet
Server --> Client: Resultset
end
@enduml
The X %Protocol is designed in a way that the ``EXECUTE`` stage
doesn't depend on the response of the ``PREPARE`` stage.
@startuml "Without PREPARE Stage Dependency"
group tcp packet
Client -> Server: Sql.PrepareStmt(**stmt_id=1**, ...)
end
group tcp packet
Server --> Client: Sql.PreparedStmtOk
end
group tcp packet
Client -> Server: Sql.PreparedStmtExecute(**stmt_id=1**, ...)
end
group tcp packet
Server --> Client: Sql.PreparedStmtExecuteOk
end
@enduml
That allows the client to send both ``PREPARE`` and ``EXECUTE`` after
each other without waiting for the server's response.
@startuml "Without Server Response"
group tcp packet
Client -> Server: Sql.PrepareStmt(**stmt_id=1**, ...)
Client -> Server: Sql.PreparedStmtExecute(**stmt_id=1**, ...)
end
group tcp packet
Server --> Client: Sql.PreparedStmtOk
Server --> Client: Sql.PreparedStmtExecuteOk
end
@enduml
@note
See the @ref mysqlx_protocol_implementation
"Implementation Notes" about how to efficiently implement pipelining.
Streaming Inserts {#use_cases_Streaming_Inserts}
=================
When inserting a large set of data (data import), make a trade-off
among:
- memory usage on client and server side
- network round-trips
- error reporting
For this example it is assumed that 1 million rows, each 1024 bytes in
size have to be transferred to the server.
**Optimizing for Network Round-Trips**
(Assuming the MySQL Client/Server %Protocol in this case) Network
round-trips can be minimized by creating a huge SQL statements of up to
1Gbyte in chunks of 16Mbyte (the protocol's maximum frame size) and
sending it over the wire and letting the server execute it.
@code{sql}
INSERT INTO tbl VALUES
( 1, "foo", "bar" ),
( 2, "baz", "fuz" ),
...;
@endcode
In this case:
- the client can generate the SQL statement in chunks of 16Mbyte and
write them to the network
- *(memory usage)* the server waits until the full 1GByte is received
- *(execution delay)* before it can start parsing and executing it
- *(error-reporting)* in case an error (parse-error, duplicate key
error, ...) the whole 1Gbyte message will be denied without any good
way to know where the error in that big message happened
The *Execution Time* for inserting all rows in one batch is:
@code{unparsed}
1 * RTT +
(num_rows * Row Size / Network Bandwidth) +
num_rows * Row Parse Time +
num_rows * Row Execution Time
@endcode
**Optimizing for Memory Usage and Error-Reporting**
The other extreme is using single row ``INSERT`` statements:
@code{sql}
INSERT INTO tbl VALUES
( 1, "foo", "bar" );
INSERT INTO tbl VALUES
( 2, "baz", "fuz" );
...
@endcode
- client can generate statements as it receives data
- streams it to the server
- *(execution delay)* server starts executing statements as soon as it
receives the first row
- *(memory usage)* server only has to buffer a single row
- *(error-reporting)* if inserting one row fails, the client knows
about it when it happens
- as each statement results in its own round-trip, the network-latency
is applied for each row instead of once
- each statement has to be parsed and executed in the server
Using Prepared Statements solves the last bullet point:
@startuml "Optimization for Memory"
Client -> Server: prepare("INSERT INTO tbl VALUES (?, ?, ?)")
Server --> Client: ok(stmt=1)
Client -> Server: execute(1, [1, "foo", "bar"])
Server --> Client: ok
Client -> Server: execute(1, [2, "baz", "fuz"])
Server --> Client: ok
@enduml
The *Execution Time* for inserting all rows using prepared statements
and the MySQL Client/Server %Protocol is:
@code{unparsed}
num_rows * RTT +
(num_rows * Row Size / Network Bandwidth) +
1 * Row Parse Time +
num_rows * Row Execution Time
@endcode
**Optimizing for Execution Time and Error-Reporting**
In the X %Protocol, a pipeline can be used to stream messages to the
server while the server is executing the previous message.
@startuml "Optimization for Execution"
group tcp packet
Client -> Server: Sql.PrepareStmt(stmt_id=1, ...)
Client -> Server: Sql.PreparedStmtExecute(stmt_id=1, values= [ .. ])
Client -> Server: Sql.PreparedStmtExecute(stmt_id=1, values= [ .. ])
Client -> Server: Sql.PreparedStmtExecute(stmt_id=1, values= [ .. ])
end
note over Client, Server
data too large to be merged into one TCP packet
end note
group tcp packet
Server --> Client: Sql.PreparedStmtOk
Server --> Client: Sql.PreparedStmtExecuteOk
Server --> Client: Sql.PreparedStmtExecuteOk
end
group tcp packet
Client -> Server: Sql.PreparedStmtExecute(stmt_id=1, values= [ .. ])
Client -> Server: Sql.PreparedStmtExecute(stmt_id=1, values= [ .. ])
end
group tcp packet
Server --> Client: Sql.PreparedStmtExecuteOk
Server --> Client: Sql.PreparedStmtExecuteOk
Server --> Client: Sql.PreparedStmtExecuteOk
end
@enduml
The *Execution Time* for inserting all rows using prepared statements
and using pipelining is (assuming that the network is not saturated):
@code{unparsed}
1 * RTT +
(1 * Row Size / Network Bandwidth) +
1 * Row Parse Time +
num_rows * Row Execution Time
@endcode
- ``one`` *network latency* to get the initial ``prepare``/``execute``
across the wire
- ``one`` *network bandwith* to get the initial ``prepare``/``execute``
across the wire. All further commands arrive at the server before the
executor needs them thanks to pipelining.
- ``one`` *row parse time* to parse the ``prepare``
- ``num_rows`` *row execution time* stays as before
In case *error reporting* isn't a major topic one can combine
``multi-row INSERT`` with pipelining and reduce the per-row network
overhead. This is important in case the network is saturated.
SQL with Multiple Resultsets {#use_cases_SQL_with_Multiple_Resultsets}
============================
@startuml "Multiple Resultsets"
group prepare
Client -> Server: Sql.PrepareStmt(stmt_id=1, "CALL multi_resultset_sp()")
Server --> Client: Sql.PrepareStmtOk()
end
group execute
Client -> Server: Sql.PreparedStmtExecute(stmt_id=1, cursor_id=1)
Server --> Client: Sql.PreparedStmtExecuteOk()
end
group fetch rows
Client -> Server: Cursor::FetchResultset(cursor_id=1)
Server --> Client: Resultset::ColumnMetaData
Server --> Client: Resultset::Row
Server --> Client: Resultset::Row
Server --> Client: Resultset::DoneMoreResultsets
end
group fetch last resultset
Client -> Server: Cursor::FetchResultset(cursor_id=1)
Server --> Client: Resultset::ColumnMetaData
Server --> Client: Resultset::Row
Server --> Client: Resultset::Row
Server --> Client: Resultset::Done
end
group close cursor
Client -> Server: Cursor::Close(cursor_id=1)
Server --> Client: Cursor::Ok
end
@enduml
Inserting CRUD Data in a Batch {#use_cases_Inserting_CRUD_Data_in_a_Batch}
==============================
Inserting multiple documents into a collection ``col1`` is a two-step
process:.
1. prepare the insert
2. pipeline the execute messages
@startuml "Batch
Client -> Server: Crud::PrepareInsert(stmt_id=1, Collection(name="col1"))
Server --> Client: PreparedStmt::PrepareOk
loop
Client -> Server: PreparedStmt::Execute(stmt_id=1, values=[ doc ])
Server --> Client: PreparedStmt::ExecuteOk
end loop
Client -> Server: PreparedStmt::Close(stmt_id=1)
Server --> Client: PreparedStmt::CloseOk
@enduml
By utilizing pipelining the ``execute`` message can be batched without
waiting for the corresponding ``executeOk`` message to be returned.
Cross-Collection Update and Delete {#use_cases_Cross_Collection_Update_and_Delete}
==================================
Deleting documents from collection ``col2`` based on data from
collection ``col1``.
Instead of fetching all rows from ``col1`` first to construct a big
``delete`` message it can also be run in nested loop:
@code{unparsed}
Crud.PrepareDelete(stmt_id=2, Collection(name="col2"), filter={ id=? })
Crud.PrepareFind(stmt_id=1, Collection(name="col1"), filter={ ... })
Sql.PreparedStmtExecute(stmt_id=1, cursor_id=2)
while ((rows = Sql.CursorFetch(cursor_id=2))):
Sql.PreparedStmtExecute(stmt_id=2, values = [ rows.col2_id ])
Sql.PreparedStmtClose(stmt_id=2)
Sql.PreparedStmtClose(stmt_id=1)
@endcode
@startuml "Update and Delete"
Client -> Server: Crud::PrepareFind(stmt_id=1, filter=...)
Client -> Server: Crud::PrepareDelete(stmt_id=2, filter={ id=? })
Server --> Client: PrepareStmt::PrepareOk
Server --> Client: PrepareStmt::PrepareOk
Client -> Server: PreparedStmt::ExecuteIntoCursor(stmt_id=1, cursor_id=2)
Server --> Client: PreparedStmt:::ExecuteOk
Client -> Server: Cursor::FetchResultset(cursor_id=2, limit=batch_size)
loop batch_size
Server --> Client: Resultset::Row
Client -> Server: PreparedStmt::Execute(stmt_id=2, values=[ ? ])
break
alt
Server --> Client: Resultset::Suspended
else
Server --> Client: Resultset::Done
end alt
end break
end loop
loop batch_size
Server --> Client: PreparedStmt::ExecuteOk
end loop
@enduml
*/