The current KNN implementation has two areas that can be improved:
- The current behavior is somewhat incorrect. When performing a kNN
query, the current code fetches k items from the index, and then sorts
these items according to Haversine distance. The problem with this
approach is that since the items fetched from the index are ordered by
a Euclidean metric, there is no guarantee that item k + 1 is not closer
than item k in great circle distance, and hence incorrect results can be
returned when closer items beyond k exist.
- The secondary sort is a performance killer. This requires buffering
all k items (again...they were already run through a priority queue in)
the index, and then a sort. Since the items are mostly sorted, and
Go's sort implementation is a quickSort this is the worst case for the
sort algorithm.
Both of these can be fixed by applying a proper distance metric in
the index nearby operation. In addition, this cleans up the code
considerably, removing a number of special cases that applied only
to NEARBY operations.
This change implements a geodetic distance metric that ensures that
the order from the index is correct, eliminating the need for the
secondary sort and special filtering cases in the ScanWriter code.
This commit fixes a case where a roaming geofence will not fire
a "faraway" event when it's supposed to.
The fix required rewriting the nearby/faraway detection logic. It
is now much more accurate and takes overall less memory, but it's
also a little slower per operation because each object proximity
is checked twice per update. Once to compare the old object's
surrounding, and once to evaulated the new object. The two lists
are then used to generate accurate "nearby" and "faraway" results.
This commit addresses an issue that began on 1.19 where the
deprecated tile38 native line protocol was removed in favor of
the more robust resp protocol. In turn the tile38 cli required
that all args are quoteless or quote escaped.
The commit ensures that the server returns the correct error
message and also loosens the strictness of the need for quoted
arguments in the tile38-cli.
fixes#513
This commit cleans up various Go code in the internal directory.
- Ensures comments on exported functions
- Changes all *Server receiver in all files to be "s", instead
of mixed "c", "s", "server", etc.
- Silenced Go warnings for if/else with returns.
- Cleaned up import ordering.
Use the `tls=1` and the set the the `tlscert` and `tlskey` query
string params. The cert and key files must be on the tile38
server and the Nats server must be started using the same files.
nats://54.12.34.121:4222/fleet?tls=1&tlscert=cert.crt&tlskey=cert.key
This commit fixes an issue where Tile38 was using lots of extra
memory to track objects that are marked to expire. This was
creating problems with applications that set big TTLs.
How it worked before:
Every collection had a unique hashmap that stores expiration
timestamps for every object in that collection. Along with
the hashmaps, there's also one big server-wide list that gets
appended every time a new SET+EX is performed.
From a background routine, this list is looped over at least
10 times per second and is randomly searched for potential
candidates that might need expiring. The routine then removes
those entries from the list and tests if the objects matching
the entries have actually expired. If so, these objects are
deleted them from the database. When at least 25% of
the 20 candidates are deleted the loop is immediately
continued, otherwise the loop backs off with a 100ms pause.
Why this was a problem.
The list grows one entry for every SET+EX. When TTLs are long,
like 24-hours or more, it would take at least that much time
before the entry is removed. So for databased that have objects
that use TTLs and are updated often this could lead to a very
large list.
How it was fixed.
The list was removed and the hashmap is now search randomly. This
required a new hashmap implementation, as the built-in Go map
does not provide an operation for randomly geting entries. The
chosen implementation is a robinhood-hash because it provides
open-addressing, which makes for simple random bucket selections.
Issue #502
Each mqtt hook establishes separate connection to the MQTT broker. If
their clientIds are all equal the MQTT broker will disconnect the clients - the
protocol does not allow 2 connected clients with the same name
Unless the credpath query param is provided, the SQS credidentails
will be automatically chosen from one of the following:
- ~/.aws/credidentials
- Environment variables
- EC2 Role
The big change is that the GeoJSON package has been completely
rewritten to fix a few of geometry calculation bugs, increase
performance, and to better follow the GeoJSON spec RFC 7946.
GeoJSON updates
- A LineString now requires at least two points.
- All json members, even foreign, now persist with the object.
- The bbox member persists too but is no longer used for geometry
calculations. This is change in behavior. Previously Tile38 would
treat the bbox as the object's physical rectangle.
- Corrections to geometry intersects and within calculations.
Faster spatial queries
- The performance of Point-in-polygon and object intersect operations
are greatly improved for complex polygons and line strings. It went
from O(n) to roughly O(log n).
- The same for all collection types with many children, including
FeatureCollection, GeometryCollection, MultiPoint, MultiLineString,
and MultiPolygon.
Codebase changes
- The pkg directory has been renamed to internal
- The GeoJSON internal package has been moved to a seperate repo at
https://github.com/tidwall/geojson. It's now vendored.
Please look out for higher memory usage for datasets using complex
shapes. A complex shape is one that has 64 or more points. For these
shapes it's expected that there will be increase of least 54 bytes per
point.