For a Go-only project the following code pattern
go func() {
select {
case <-ctx.Done():
// call some cancel
case <-done:
// work finished ok
}
}()
// do some work
close(done)
works good and fast - without high scheduling overhead because scheduler
usually puts spawned goroutine into run queue on the same OS thread and so
after done is closed control is passed to spawned goroutine without OS context
switch.
However in the presence of Cgo calls in "do some work" the situation can
become different - Cgo calls are treated by go runtime similarly to
system calls with the effect that goroutines spawned on original OS
thread tend to be migrated by scheduler to be executed on another OS
thread.
This in turn can bring high overhead for communicating on "done", which
ultimately can result in full context switch: if the spawned goroutine
had chance to run, already checked done and ctx to be not ready, and went
into sleep via wait on futex - showing as something like below in strace for
one read query (note futex calls):
27867 00:38:39.782146 stat(".../neo.sqlite-journal", 0x7f83809c4a20) = -1 ENOENT (No such file or directory)
27867 00:38:39.782165 pread64(3, "\0\0\0\33\0\0\10\235\0\0\10]\0\0\0\27", 16, 24) = 16
27871 00:38:39.782179 <... pselect6 resumed> ) = 0 (Timeout)
27868 00:38:39.782187 <... pselect6 resumed> ) = 0 (Timeout)
27871 00:38:39.782193 futex(0xc4200f8538, FUTEX_WAIT, 0, NULL <unfinished ...>
27868 00:38:39.782199 futex(0xc420013138, FUTEX_WAIT, 0, NULL <unfinished ...>
27867 00:38:39.782205 stat(".../neo.sqlite-wal", 0x7f83809c4a20) = -1 ENOENT (No such file or directory)
27867 00:38:39.782224 fstat(3, {st_mode=S_IFREG|0644, st_size=9031680, ...}) = 0
27867 00:38:39.782247 futex(0xc420013138, FUTEX_WAKE, 1 <unfinished ...>
27868 00:38:39.782259 <... futex resumed> ) = 0
27867 00:38:39.782265 <... futex resumed> ) = 1
27868 00:38:39.782270 pselect6(0, NULL, NULL, NULL, {tv_sec=0, tv_nsec=3000}, NULL <unfinished ...>
27867 00:38:39.782279 fcntl(3, F_SETLK, {l_type=F_UNLCK, l_whence=SEEK_SET, l_start=0, l_len=0}) = 0
27867 00:38:39.782315 fcntl(3, F_SETLK, {l_type=F_RDLCK, l_whence=SEEK_SET, l_start=1073741824, l_len=1}) = 0
27868 00:38:39.782336 <... pselect6 resumed> ) = 0 (Timeout)
27867 00:38:39.782342 fcntl(3, F_SETLK, {l_type=F_RDLCK, l_whence=SEEK_SET, l_start=1073741826, l_len=510} <unfinished ...>
27868 00:38:39.782348 futex(0xc4200f8538, FUTEX_WAKE, 1 <unfinished ...>
27867 00:38:39.782355 <... fcntl resumed> ) = 0
27871 00:38:39.782360 <... futex resumed> ) = 0
27868 00:38:39.782367 <... futex resumed> ) = 1
27871 00:38:39.782372 futex(0xc4200f8138, FUTEX_WAKE, 1 <unfinished ...>
27868 00:38:39.782377 pselect6(0, NULL, NULL, NULL, {tv_sec=0, tv_nsec=3000}, NULL <unfinished ...>
27871 00:38:39.782384 <... futex resumed> ) = 1
27870 00:38:39.782389 <... futex resumed> ) = 0
27867 00:38:39.782394 fcntl(3, F_SETLK, {l_type=F_UNLCK, l_whence=SEEK_SET, l_start=1073741824, l_len=1} <unfinished ...>
27870 00:38:39.782400 pselect6(0, NULL, NULL, NULL, {tv_sec=0, tv_nsec=3000}, NULL <unfinished ...>
27867 00:38:39.782408 <... fcntl resumed> ) = 0
Below link shows that go scheduler itself might be significantly improved for
cases when there are several Cgo calls made for a request in a server:
https://github.com/golang/go/issues/21827#issuecomment-329092317
in particular CGo-4 case should be closely related to this sqlite3 go package,
because for one query many CGo calls are made to SQLite.
However until there are proper scheduler fixes, let's make what could
be made to improve time to do queries:
If we know that the context under which a query is executed will never
be canceled - we know we can safely skip spawning the interrupt
goroutine and this was avoid ping-pong on done in between different OS
threads.
This brings the following speedup on my notebook with go1.10:
name old req/s new req/s delta
Exec 254k ± 1% 379k ± 1% +48.89% (p=0.000 n=10+10)
Query 90.6k ± 2% 96.4k ± 1% +6.37% (p=0.000 n=10+10)
Params 81.5k ± 1% 87.0k ± 1% +6.83% (p=0.000 n=10+10)
Stmt 122k ± 2% 129k ± 1% +6.07% (p=0.000 n=10+9)
Rows 2.98k ± 1% 3.06k ± 1% +2.77% (p=0.000 n=9+10)
StmtRows 3.10k ± 1% 3.13k ± 1% +1.12% (p=0.000 n=9+10)
name old time/op new time/op delta
CustomFunctions-4 10.6µs ± 1% 10.1µs ± 1% -5.01% (p=0.000 n=10+10)
With current settings SQLite was using lseek/read syscalls to read data, e.g.:
20:43:17.640660 fcntl(3, F_SETLK, {l_type=F_UNLCK, l_whence=SEEK_SET, l_start=0, l_len=0}) = 0
20:43:17.640683 fcntl(3, F_SETLK, {l_type=F_RDLCK, l_whence=SEEK_SET, l_start=1073741824, l_len=1}) = 0
20:43:17.640705 fcntl(3, F_SETLK, {l_type=F_RDLCK, l_whence=SEEK_SET, l_start=1073741826, l_len=510}) = 0
20:43:17.640725 fcntl(3, F_SETLK, {l_type=F_UNLCK, l_whence=SEEK_SET, l_start=1073741824, l_len=1}) = 0
20:43:17.640744 stat(".../neo.sqlite-journal", 0x7ffef2c91080) = -1 ENOENT (No such file or directory)
20:43:17.640764 lseek(3, 24, SEEK_SET) = 24
20:43:17.640779 read(3, "\0\0\0\33\0\0\10\235\0\0\10]\0\0\0\27", 16) = 16
20:43:17.640795 stat(".../neo.sqlite-wal", 0x7ffef2c91080) = -1 ENOENT (No such file or directory)
but if we allow it to use pread it will be only 1 system call instead of 2 and
reading this way can also be done in parallel because there is no global to
file seeking:
20:48:42.668466 fcntl(3, F_SETLK, {l_type=F_UNLCK, l_whence=SEEK_SET, l_start=0, l_len=0}) = 0
20:48:42.668501 fcntl(3, F_SETLK, {l_type=F_RDLCK, l_whence=SEEK_SET, l_start=1073741824, l_len=1}) = 0
20:48:42.668522 fcntl(3, F_SETLK, {l_type=F_RDLCK, l_whence=SEEK_SET, l_start=1073741826, l_len=510}) = 0
20:48:42.668542 fcntl(3, F_SETLK, {l_type=F_UNLCK, l_whence=SEEK_SET, l_start=1073741824, l_len=1}) = 0
20:48:42.668561 stat(".../neo.sqlite-journal", 0x7ffdbc1f22c0) = -1 ENOENT (No such file or directory)
20:48:42.668580 pread64(3, "\0\0\0\33\0\0\10\235\0\0\10]\0\0\0\27", 16, 24) = 16
20:48:42.668597 stat(".../neo.sqlite-wal", 0x7ffdbc1f22c0) = -1 ENOENT (No such file or directory)
(if needed this enablement can be done per OS)
The SQLiteConn.RegisterAggregator implementation was defined in
sqlite3_trace.go file, which is guarded with a build constraint. This
change simply moves RegisterAggregator to the main sqlite3.go file,
and moves accompanying unit tests.
The rationale for this move is that it was not possible for downstream
using packages to use RegisterAggregator without also specifying (and
notifying the user) the 'trace' build tag.