polardbxengine/plugin/x/protocol/doc/mysqlx-protocol-use-cases.dox

/*
 * Copyright (c) 2015, 2018, Oracle and/or its affiliates. All rights reserved.
 *
 * This program is free software; you can redistribute it and/or modify
 * it under the terms of the GNU General Public License, version 2.0,
 * as published by the Free Software Foundation.
 *
 * This program is also distributed with certain software (including
 * but not limited to OpenSSL) that is licensed under separate terms,
 * as designated in a particular file or component or in included license
 * documentation.  The authors of MySQL hereby grant you an additional
 * permission to link the program and your derivative works with the
 * separately licensed software that they have included with MySQL.
 *
 * This program is distributed in the hope that it will be useful,
 * but WITHOUT ANY WARRANTY; without even the implied warranty of
 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
 * GNU General Public License, version 2.0, for more details.
 *
 * You should have received a copy of the GNU General Public License
 * along with this program; if not, write to the Free Software
 * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301  USA
 */

/** @page mysqlx_protocol_use_cases Use Cases

Topics in this section:

- @ref use_cases_Prepared_Statements_with_Single_Round_Trip
- @ref use_cases_Streaming_Inserts
- @ref use_cases_SQL_with_Multiple_Resultsets
- @ref use_cases_Inserting_CRUD_Data_in_a_Batch
- @ref use_cases_Cross_Collection_Update_and_Delete

Prepared Statements with Single Round-Trip {#use_cases_Prepared_Statements_with_Single_Round_Trip}
==========================================

In the MySQL Client/Server %Protocol, a ``PREPARE``/``EXECUTE`` cycle
required two round-trips as the ``COM_STMT_EXECUTE`` requires data from
the ``COM_STMT_PREPARE-ok`` packet.

@startuml "Single Round-Trip"
group tcp packet
Client -> Server: COM_STMT_PREPARE(...)
end
group tcp packet
Server --> Client: COM_STMT_PREPARE-ok(**stmt_id=1**)
end
group tcp packet
Client -> Server: COM_STMT_EXECUTE(**stmt_id=1**)
end
group tcp packet
Server --> Client: Resultset
end
@enduml

The X %Protocol is designed in a way that the ``EXECUTE`` stage
doesn't depend on the response of the ``PREPARE`` stage.


@startuml "Without PREPARE Stage Dependency"
group tcp packet
Client -> Server: Sql.PrepareStmt(**stmt_id=1**, ...)
end
group tcp packet
Server --> Client: Sql.PreparedStmtOk
end
group tcp packet
Client -> Server: Sql.PreparedStmtExecute(**stmt_id=1**, ...)
end
group tcp packet
Server --> Client: Sql.PreparedStmtExecuteOk
end
@enduml

That allows the client to send both ``PREPARE`` and ``EXECUTE`` after
each other without waiting for the server's response.

@startuml "Without Server Response"
group tcp packet
Client -> Server: Sql.PrepareStmt(**stmt_id=1**, ...)
Client -> Server: Sql.PreparedStmtExecute(**stmt_id=1**, ...)
end

group tcp packet
Server --> Client: Sql.PreparedStmtOk
Server --> Client: Sql.PreparedStmtExecuteOk
end
@enduml

@note
    See the @ref mysqlx_protocol_implementation
    "Implementation Notes" about how to efficiently implement pipelining.


Streaming Inserts {#use_cases_Streaming_Inserts}
=================

When inserting a large set of data (data import), make a trade-off
among:

-  memory usage on client and server side

-  network round-trips

-  error reporting

For this example it is assumed that 1 million rows, each 1024 bytes in
size have to be transferred to the server.

**Optimizing for Network Round-Trips**

(Assuming the MySQL Client/Server %Protocol in this case) Network
round-trips can be minimized by creating a huge SQL statements of up to
1Gbyte in chunks of 16Mbyte (the protocol's maximum frame size) and
sending it over the wire and letting the server execute it.

@code{sql}
    INSERT INTO tbl VALUES
      ( 1, "foo", "bar" ),
      ( 2, "baz", "fuz" ),
      ...;
@endcode

In this case:

-  the client can generate the SQL statement in chunks of 16Mbyte and
   write them to the network

-  *(memory usage)* the server waits until the full 1GByte is received

-  *(execution delay)* before it can start parsing and executing it

-  *(error-reporting)* in case an error (parse-error, duplicate key
   error, ...) the whole 1Gbyte message will be denied without any good
   way to know where the error in that big message happened

The *Execution Time* for inserting all rows in one batch is:

@code{unparsed}
    1 * RTT +
    (num_rows * Row Size / Network Bandwidth) +
    num_rows * Row Parse Time +
    num_rows * Row Execution Time
@endcode

**Optimizing for Memory Usage and Error-Reporting**

The other extreme is using single row ``INSERT`` statements:

@code{sql}
    INSERT INTO tbl VALUES
      ( 1, "foo", "bar" );

    INSERT INTO tbl VALUES
      ( 2, "baz", "fuz" );

      ...
@endcode

-  client can generate statements as it receives data

-  streams it to the server

-  *(execution delay)* server starts executing statements as soon as it
   receives the first row

-  *(memory usage)* server only has to buffer a single row

-  *(error-reporting)* if inserting one row fails, the client knows
   about it when it happens

-  as each statement results in its own round-trip, the network-latency
   is applied for each row instead of once

-  each statement has to be parsed and executed in the server

Using Prepared Statements solves the last bullet point:

@startuml "Optimization for Memory"
Client -> Server: prepare("INSERT INTO tbl VALUES (?, ?, ?)")
Server --> Client: ok(stmt=1)
Client -> Server: execute(1, [1, "foo", "bar"])
Server --> Client: ok
Client -> Server: execute(1, [2, "baz", "fuz"])
Server --> Client: ok
@enduml

The *Execution Time* for inserting all rows using prepared statements
and the MySQL Client/Server %Protocol is:

@code{unparsed}
    num_rows * RTT +
    (num_rows * Row Size / Network Bandwidth) +
    1 * Row Parse Time +
    num_rows * Row Execution Time
@endcode

**Optimizing for Execution Time and Error-Reporting**

In the X %Protocol, a pipeline can be used to stream messages to the
server while the server is executing the previous message.

@startuml "Optimization for Execution"
group tcp packet
Client -> Server: Sql.PrepareStmt(stmt_id=1, ...)
Client -> Server: Sql.PreparedStmtExecute(stmt_id=1, values= [ .. ])
Client -> Server: Sql.PreparedStmtExecute(stmt_id=1, values= [ .. ])
Client -> Server: Sql.PreparedStmtExecute(stmt_id=1, values= [ .. ])
end
note over Client, Server
data too large to be merged into one TCP packet
end note

group tcp packet
Server --> Client: Sql.PreparedStmtOk
Server --> Client: Sql.PreparedStmtExecuteOk
Server --> Client: Sql.PreparedStmtExecuteOk
end

group tcp packet
Client -> Server: Sql.PreparedStmtExecute(stmt_id=1, values= [ .. ])
Client -> Server: Sql.PreparedStmtExecute(stmt_id=1, values= [ .. ])
end

group tcp packet
Server --> Client: Sql.PreparedStmtExecuteOk
Server --> Client: Sql.PreparedStmtExecuteOk
Server --> Client: Sql.PreparedStmtExecuteOk
end
@enduml

The *Execution Time* for inserting all rows using prepared statements
and using pipelining is (assuming that the network is not saturated):

@code{unparsed}
    1 * RTT +
    (1 * Row Size / Network Bandwidth) +
    1 * Row Parse Time +
    num_rows * Row Execution Time
@endcode

-  ``one`` *network latency* to get the initial ``prepare``/``execute``
   across the wire

-  ``one`` *network bandwith* to get the initial ``prepare``/``execute``
   across the wire. All further commands arrive at the server before the
   executor needs them thanks to pipelining.

-  ``one`` *row parse time* to parse the ``prepare``

-  ``num_rows`` *row execution time* stays as before

In case *error reporting* isn't a major topic one can combine
``multi-row INSERT`` with pipelining and reduce the per-row network
overhead. This is important in case the network is saturated.


SQL with Multiple Resultsets {#use_cases_SQL_with_Multiple_Resultsets}
============================

@startuml "Multiple Resultsets"
group prepare
Client -> Server: Sql.PrepareStmt(stmt_id=1, "CALL multi_resultset_sp()")
Server --> Client: Sql.PrepareStmtOk()
end
group execute
Client -> Server: Sql.PreparedStmtExecute(stmt_id=1, cursor_id=1)
Server --> Client: Sql.PreparedStmtExecuteOk()
end
group fetch rows
Client -> Server: Cursor::FetchResultset(cursor_id=1)
Server --> Client: Resultset::ColumnMetaData
Server --> Client: Resultset::Row
Server --> Client: Resultset::Row
Server --> Client: Resultset::DoneMoreResultsets
end
group fetch last resultset
Client -> Server: Cursor::FetchResultset(cursor_id=1)
Server --> Client: Resultset::ColumnMetaData
Server --> Client: Resultset::Row
Server --> Client: Resultset::Row
Server --> Client: Resultset::Done
end
group close cursor
Client -> Server: Cursor::Close(cursor_id=1)
Server --> Client: Cursor::Ok
end
@enduml

Inserting CRUD Data in a Batch {#use_cases_Inserting_CRUD_Data_in_a_Batch}
==============================

Inserting multiple documents into a collection ``col1`` is a two-step
process:.

1. prepare the insert

2. pipeline the execute messages

@startuml "Batch
Client -> Server: Crud::PrepareInsert(stmt_id=1, Collection(name="col1"))
Server --> Client: PreparedStmt::PrepareOk
loop
Client -> Server: PreparedStmt::Execute(stmt_id=1, values=[ doc ])
Server --> Client: PreparedStmt::ExecuteOk
end loop
Client -> Server: PreparedStmt::Close(stmt_id=1)
Server --> Client: PreparedStmt::CloseOk
@enduml

By utilizing pipelining the ``execute`` message can be batched without
waiting for the corresponding ``executeOk`` message to be returned.


Cross-Collection Update and Delete {#use_cases_Cross_Collection_Update_and_Delete}
==================================

Deleting documents from collection ``col2`` based on data from
collection ``col1``.

Instead of fetching all rows from ``col1`` first to construct a big
``delete`` message it can also be run in nested loop:

@code{unparsed}
    Crud.PrepareDelete(stmt_id=2, Collection(name="col2"), filter={ id=? })
    Crud.PrepareFind(stmt_id=1, Collection(name="col1"), filter={ ... })

    Sql.PreparedStmtExecute(stmt_id=1, cursor_id=2)

    while ((rows = Sql.CursorFetch(cursor_id=2))):
      Sql.PreparedStmtExecute(stmt_id=2, values = [ rows.col2_id ])

    Sql.PreparedStmtClose(stmt_id=2)
    Sql.PreparedStmtClose(stmt_id=1)
@endcode

@startuml "Update and Delete"
Client -> Server: Crud::PrepareFind(stmt_id=1, filter=...)
Client -> Server: Crud::PrepareDelete(stmt_id=2, filter={ id=? })
Server --> Client: PrepareStmt::PrepareOk
Server --> Client: PrepareStmt::PrepareOk
Client -> Server: PreparedStmt::ExecuteIntoCursor(stmt_id=1, cursor_id=2)
Server --> Client: PreparedStmt:::ExecuteOk
Client -> Server: Cursor::FetchResultset(cursor_id=2, limit=batch_size)
loop batch_size
Server --> Client: Resultset::Row
Client -> Server: PreparedStmt::Execute(stmt_id=2, values=[ ? ])
break
alt
Server --> Client: Resultset::Suspended
else
Server --> Client: Resultset::Done
end alt
end break
end loop
loop batch_size
Server --> Client: PreparedStmt::ExecuteOk
end loop
@enduml
*/