The current KNN implementation has two areas that can be improved:
- The current behavior is somewhat incorrect. When performing a kNN
query, the current code fetches k items from the index, and then sorts
these items according to Haversine distance. The problem with this
approach is that since the items fetched from the index are ordered by
a Euclidean metric, there is no guarantee that item k + 1 is not closer
than item k in great circle distance, and hence incorrect results can be
returned when closer items beyond k exist.
- The secondary sort is a performance killer. This requires buffering
all k items (again...they were already run through a priority queue in)
the index, and then a sort. Since the items are mostly sorted, and
Go's sort implementation is a quickSort this is the worst case for the
sort algorithm.
Both of these can be fixed by applying a proper distance metric in
the index nearby operation. In addition, this cleans up the code
considerably, removing a number of special cases that applied only
to NEARBY operations.
This change implements a geodetic distance metric that ensures that
the order from the index is correct, eliminating the need for the
secondary sort and special filtering cases in the ScanWriter code.
This commit fixes a case where a roaming geofence will not fire
a "faraway" event when it's supposed to.
The fix required rewriting the nearby/faraway detection logic. It
is now much more accurate and takes overall less memory, but it's
also a little slower per operation because each object proximity
is checked twice per update. Once to compare the old object's
surrounding, and once to evaulated the new object. The two lists
are then used to generate accurate "nearby" and "faraway" results.
This commit addresses an issue that began on 1.19 where the
deprecated tile38 native line protocol was removed in favor of
the more robust resp protocol. In turn the tile38 cli required
that all args are quoteless or quote escaped.
The commit ensures that the server returns the correct error
message and also loosens the strictness of the need for quoted
arguments in the tile38-cli.
fixes#513
This commit includes updates that affects the build, testing, and
deployment of Tile38.
- The root level build.sh has been broken up into multiple scripts
and placed in the "scripts" directory.
- The vendor directory has been updated to follow the Go modules
rules, thus `make` should work on isolated environments. Also
some vendored packages may have been updated to a later
version, if needed.
- The Makefile has been updated to allow for making single
binaries such as `make tile38-server`. There is some scaffolding
during the build process, so from now on all binaries should be
made using make. For example, to run a development version of
the tile38-cli binary, do this:
make tile38-cli && ./tile38-cli
not this:
go run cmd/tile38-cli/main.go
- Travis.CI docker push script has been updated to address a
change to Docker's JSON repo meta output, which in turn fixes
a bug where new Tile38 versions were not being properly pushed
to Docker
This commit cleans up various Go code in the internal directory.
- Ensures comments on exported functions
- Changes all *Server receiver in all files to be "s", instead
of mixed "c", "s", "server", etc.
- Silenced Go warnings for if/else with returns.
- Cleaned up import ordering.
This commit fixes an issue where Tile38 was using lots of extra
memory to track objects that are marked to expire. This was
creating problems with applications that set big TTLs.
How it worked before:
Every collection had a unique hashmap that stores expiration
timestamps for every object in that collection. Along with
the hashmaps, there's also one big server-wide list that gets
appended every time a new SET+EX is performed.
From a background routine, this list is looped over at least
10 times per second and is randomly searched for potential
candidates that might need expiring. The routine then removes
those entries from the list and tests if the objects matching
the entries have actually expired. If so, these objects are
deleted them from the database. When at least 25% of
the 20 candidates are deleted the loop is immediately
continued, otherwise the loop backs off with a 100ms pause.
Why this was a problem.
The list grows one entry for every SET+EX. When TTLs are long,
like 24-hours or more, it would take at least that much time
before the entry is removed. So for databased that have objects
that use TTLs and are updated often this could lead to a very
large list.
How it was fixed.
The list was removed and the hashmap is now search randomly. This
required a new hashmap implementation, as the built-in Go map
does not provide an operation for randomly geting entries. The
chosen implementation is a robinhood-hash because it provides
open-addressing, which makes for simple random bucket selections.
Issue #502