329 lines
8.0 KiB
Plaintext
329 lines
8.0 KiB
Plaintext
/*
|
|
* Copyright (c) 2015, 2018, Oracle and/or its affiliates. All rights reserved.
|
|
*
|
|
* This program is free software; you can redistribute it and/or modify
|
|
* it under the terms of the GNU General Public License, version 2.0,
|
|
* as published by the Free Software Foundation.
|
|
*
|
|
* This program is also distributed with certain software (including
|
|
* but not limited to OpenSSL) that is licensed under separate terms,
|
|
* as designated in a particular file or component or in included license
|
|
* documentation. The authors of MySQL hereby grant you an additional
|
|
* permission to link the program and your derivative works with the
|
|
* separately licensed software that they have included with MySQL.
|
|
*
|
|
* This program is distributed in the hope that it will be useful,
|
|
* but WITHOUT ANY WARRANTY; without even the implied warranty of
|
|
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
|
* GNU General Public License, version 2.0, for more details.
|
|
*
|
|
* You should have received a copy of the GNU General Public License
|
|
* along with this program; if not, write to the Free Software
|
|
* Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
|
|
*/
|
|
|
|
/** @page mysqlx_protocol_implementation Implementation Notes
|
|
|
|
Topics in this section:
|
|
- @ref implementation_Client
|
|
- @ref implementation_Server
|
|
|
|
|
|
@ref implementation_Client "Client" and
|
|
@ref implementation_Server "Server" implementations of the
|
|
protocol should make use of the following:
|
|
|
|
- vectorized IO
|
|
|
|
- pipelining
|
|
|
|
to reduce the latency and CPU usage.
|
|
|
|
Client {#implementation_Client}
|
|
======
|
|
|
|
|
|
@par Out-of-Band Messages
|
|
|
|
The client should decode the messages it receives from the server in a
|
|
generic way and track the possible messages with a state-machine.
|
|
|
|
@code{py}
|
|
def getMessage(self, message):
|
|
## handle out-of-band message
|
|
msg = messageFactory(message.type).fromString(message.payload)
|
|
|
|
if message.type is Notification:
|
|
notification_queue.add(msg)
|
|
raise NoMessageError()
|
|
|
|
if message.type is Notice:
|
|
notice_queue.add(msg)
|
|
raise NoMessageError()
|
|
|
|
return msg
|
|
@endcode
|
|
|
|
@par Pipelining
|
|
|
|
The client may send several messages to the server without waiting for a
|
|
response for each message.
|
|
|
|
Instead of waiting for the response to a message like in:
|
|
|
|
@startuml "Client Pipeline"
|
|
Client -[#red]> Server: Sql::StmtExecute
|
|
...1 second later...
|
|
Server -[#red]-> Client: Sql::StmtExecuteOk
|
|
Client -[#blue]> Server: Sql::StmtExecute
|
|
...1 second later...
|
|
Server -[#blue]-> Client: Sql::StmtExecuteOk
|
|
Client -[#green]> Server: Sql::StmtExecute
|
|
...1 second later...
|
|
Server -[#green]-> Client: Sql::StmtExecuteOk
|
|
@enduml
|
|
|
|
the client can generate its messages and send it to the server without
|
|
waiting:
|
|
|
|
@startuml "Client Pipeline"
|
|
Client -[#red]> Server: Sql::StmtExecute
|
|
Client -[#blue]> Server: Sql::StmtExecute
|
|
Client -[#green]> Server: Sql::StmtExecute
|
|
...1 second later...
|
|
Server -[#red]-> Client: Sql::StmtExecuteOk
|
|
Server -[#blue]-> Client: Sql::StmtExecuteOk
|
|
Server -[#green]-> Client: Sql::StmtExecuteOk
|
|
@enduml
|
|
|
|
The client has to ensure that when pipeline messages that in case of an
|
|
error the following messages also error out correctly:
|
|
|
|
@startuml "Client Pipeline"
|
|
Client -[#red]> Server: Sql::StmtExecute
|
|
Client -[#blue]> Server: Sql::StmtExecute
|
|
Client -[#green]> Server: Sql::StmtExecute
|
|
...1 second later...
|
|
Server -[#red]-> Client: Sql::StmtExecuteOk
|
|
Server -[#blue]-> Client: Error
|
|
Server -[#green]-> Client: Error
|
|
@enduml
|
|
|
|
@par Vectored I/O
|
|
|
|
In network programming it is pretty common the to prefix the message
|
|
payload with the header:
|
|
|
|
- HTTP header + HTTP content
|
|
|
|
- a pipeline of messages
|
|
|
|
- message header + protobuf message
|
|
|
|
@code{py}
|
|
import struct
|
|
import socket
|
|
|
|
s = socket.create_connection(( "127.0.0.1", 33060))
|
|
|
|
msg_type = 1
|
|
msg_payload = "abc"
|
|
msg_header = struct.pack(">I", len(msg_payload)) +
|
|
struct.pack("B", msg_type)
|
|
|
|
## concat before send
|
|
s.send(msg_header + msg_payload)
|
|
|
|
## multiple syscalls
|
|
s.send(msg_header)
|
|
s.send(msg_payload)
|
|
|
|
## vectored I/O
|
|
s.sendmsg([ msg_header, msg_payload ])
|
|
@endcode
|
|
|
|
*concat before send* leads to pretty wasteful reallocations and copy
|
|
operations if the payload is huge.
|
|
|
|
*multiple syscalls* is pretty wasteful for small messages as a few bytes
|
|
only the whole machinery of copying data between user land and kernel
|
|
land has to be started.
|
|
|
|
*vectored io* combines the best of both approaches and sends multiple
|
|
buffers to the OS in one syscall and OS can optimize sending multiple
|
|
buffers in on TCP packet.
|
|
|
|
On Unix this is handled by ``writev(2)``, on Windows exists
|
|
``WSASend()``
|
|
|
|
@note
|
|
Any good buffered iostream implementation should already make use of
|
|
vectored I/O.
|
|
|
|
Known good implementation:
|
|
|
|
- Boost::ASIO
|
|
|
|
- GIO's GBufferedIOStream
|
|
|
|
@par Corking
|
|
|
|
Further control about how when to actual send data to the other endpoint
|
|
can be achieved with "corking":
|
|
|
|
- linux: ``TCP_CORK`` http://linux.die.net/man/7/tcp
|
|
|
|
- freebsd/macosx: ``TCP_NOPUSH``
|
|
https://www.freebsd.org/cgi/man.cgi?query=tcp&sektion=4&manpath=FreeBSD+9.0-RELEASE
|
|
|
|
They work in combination with ``TCP_NODELAY`` (aka Nagle's Algorithm).
|
|
|
|
- http://stackoverflow.com/questions/3761276/when-should-i-use-tcp-nodelay-and-when-tcp-cork?rq=1
|
|
|
|
Server {#implementation_Server}
|
|
======
|
|
|
|
@par Pipelining
|
|
|
|
The protocol is structured in a way that the messages can be decoded
|
|
completely without of knowing the state of the message sequence.
|
|
|
|
If data is available on the network, the server has to:
|
|
|
|
- read the message
|
|
|
|
- decode the message
|
|
|
|
- execute the message
|
|
|
|
Instead of a synchronous read-execution cycle:
|
|
|
|
@startuml "Server Pipeline"
|
|
participant Network
|
|
participant Reader
|
|
participant Executor
|
|
|
|
[-> Reader: message ready
|
|
|
|
Reader -> Network: receive
|
|
activate Reader
|
|
activate Network
|
|
Network --> Reader: data
|
|
deactivate Network
|
|
|
|
Reader -> Reader: decode(data)
|
|
|
|
Reader -> Executor: start_execute(msg)
|
|
deactivate Reader
|
|
activate Executor
|
|
|
|
Executor -> Executor: execute(msg)
|
|
Executor -> Executor: encode(response_msg)
|
|
|
|
[-> Reader: message ready
|
|
|
|
Executor -> Network: send(data)
|
|
activate Network
|
|
Network --> Executor: ok
|
|
deactivate Network
|
|
deactivate Executor
|
|
|
|
Reader -> Network: receive
|
|
activate Reader
|
|
activate Network
|
|
Network --> Reader: data
|
|
deactivate Network
|
|
|
|
Reader -> Reader: decode(data)
|
|
|
|
Reader -> Executor: start_execute(msg)
|
|
deactivate Reader
|
|
activate Executor
|
|
|
|
Executor -> Executor: execute(msg)
|
|
Executor -> Executor: encode(response_msg)
|
|
|
|
Executor -> Network: send(data)
|
|
activate Network
|
|
Network --> Executor: ok
|
|
deactivate Network
|
|
deactivate Executor
|
|
@enduml
|
|
|
|
the Reader and the Executor can be decoupled into separate threads:
|
|
|
|
@startuml "Separate Threads"
|
|
participant Network
|
|
participant Reader
|
|
box "Executor Thread"
|
|
participant ExecQueue
|
|
participant Executor
|
|
end box
|
|
|
|
[-> Reader: message ready
|
|
|
|
Executor -> ExecQueue: wait_for_msg
|
|
activate Executor
|
|
|
|
Reader -> Network: receive
|
|
activate Reader
|
|
activate Network
|
|
Network --> Reader: data
|
|
deactivate Network
|
|
|
|
Reader -> Reader: decode(data)
|
|
|
|
Reader -> ExecQueue: start_execute(msg)
|
|
ExecQueue --> Reader: ok
|
|
deactivate Reader
|
|
ExecQueue --> Executor: msg
|
|
|
|
Executor -> Executor: execute(msg)
|
|
Executor -> Executor: encode(response_msg)
|
|
|
|
[-> Reader: message ready
|
|
|
|
Reader -> Network: receive
|
|
activate Reader
|
|
activate Network
|
|
Network --> Reader: data
|
|
deactivate Network
|
|
|
|
Reader -> Reader: decode(data)
|
|
|
|
Executor -> Network: send(data)
|
|
activate Network
|
|
Network --> Executor: ok
|
|
deactivate Network
|
|
deactivate Executor
|
|
|
|
|
|
Reader -> ExecQueue: start_execute(msg)
|
|
Executor -> ExecQueue: wait_for_msg
|
|
activate Executor
|
|
ExecQueue --> Reader: ok
|
|
deactivate Reader
|
|
|
|
ExecQueue --> Executor: msg
|
|
|
|
Executor -> Executor: execute(msg)
|
|
Executor -> Executor: encode(response_msg)
|
|
|
|
Executor -> Network: send(data)
|
|
activate Network
|
|
Network --> Executor: ok
|
|
deactivate Network
|
|
deactivate Executor
|
|
@enduml
|
|
|
|
which allows to hide cost of decoding the message behind the execution
|
|
of the previous message.
|
|
|
|
The amount of messages that are prefetched this way should be
|
|
configurable to allow a trade-off between:
|
|
|
|
- resource usage
|
|
|
|
- parallelism
|
|
|
|
*/ |