This commit changes the logic for managing the expiration of
objects in the database.
Before: There was a server-wide hashmap that stored the
collection key, id, and expiration timestamp for all objects
that had a TTL. The hashmap was occasionally probed at 20
random positions, looking for objects that have expired. Those
expired objects were immediately deleted, and if there was 5
or more objects deleted, then the probe happened again, with
no delay. If the number of objects was less than 5 then the
there was a 1/10th of a second delay before the next probe.
Now: Rather than a server-wide hashmap, each collection has
its own ordered priority queue that stores objects with TTLs.
Rather than probing, there is a background routine that
executes every 1/10th of a second, which pops the expired
objects from the collection queues, and deletes them.
The collection/queue method is a more stable approach than
the hashmap/probing method. With probing, we can run into
major cache misses for some cases where there is wide
TTL duration, such as in the hours or days. This may cause
the system to occasionally fall behind, leaving should-be
expired objects in memory. Using a queue, there is no
cache misses, all objects that should be expired will be
right away, regardless of the TTL durations.
Fixes#616
This commit addresses an issue where the sarama kafka library
leaks memory when a connection closes unless the metrics
configuration that was passed to new connection is also closed.
Fixes#613
Returns 'ok' if the server is the leader or a follower with
a 'caught up' log.
This is mainly for HTTP connections that are using an
orchestration environment like kubernetes, but will work as a
general RESP command.
For HTTP a '200 OK' for 'caught up' and
'500 Internal Server Error' otherwise.
See #608
Prior to this commit roaming geofences only registered changes to
the SET command. Now it will work for SET/DEL/FSET, just like
non-roaming geofences.
To opt out of these events, explicitly choose which event you
would like to register when you create the geofences.
For example:
NEARBY fleet FENCE COMMANDS set,del ROAM fleet * 5000
Will only fire off events from SET and DEL command.
Closes#597
This commit fixes a performance issue with the algorithm that
determines with geofences are potential candidates for
notifications following a SET operation.
Details
Prior to commit b471873 (10 commits ago) there was a bug where
the "cross" detection was not firing in all cases. This happened
because when looking for candidates for "cross" due to a SET
operation, only the geofences that overlapped the previous
position of the object and the geofences that overlapped the new
position where searched. But, in fac, all of the geofences that
overlapped the union rectangle of the old and new position should
have been searched.
That commit fixed the problem by searching a union rect of the
old and new positions. While this is an accurate solution, it
caused a slowdown on systems that have big/wild position changes
that might cross a huge number of geofences, even when those
geofences did not need actually need "cross" detection.
The fix
With this commit the geofences that have a "cross" detection
are stored in a seperated tree from those that do not. This
allows for a hybrid of the functionality prior and post b471873.
Fixes#583
This commit addresses issue #230, where an AOF file will sometimes
not load due to the file being padded with trailing zeros. It's
uncertain what is causing this corruption, but it appears to be
coming from outside of the tile38-server process. I suspect it's
due to some block store layer in Kubernetes/Docker cloud
environments.
This fix allows for Tile38 to start up by discovering the trailing
zeros while loading the AOF and safely truncating the file as to
not include the zeros in the future.
This commit fixes an issue that happens when running SCAN on a
collection that has objects with fields, causing field values
to be mismatched with their respective keys.
This only occured with json output, and is a regression from #534.
Fixes#569
This commit fixes an issue where the OUTPUT command requires
authentication when a server password has been set with
CONFIG SET requirepass. This was causing problems with clients
that use json responses, like the tile38-cli.
Fixes#564
Prior to this commit roaming geofences only registered changes to
the SET command. Now it will work for SET/DEL/FSET, just like
non-roaming geofences.
To opt out of these events, explicitly choose which event you
would like to register when you create the geofences.
For example:
NEARBY fleet FENCE COMMANDS set,del ROAM fleet * 5000
Will only fire off events from SET and DEL command.
Closes#597
This commit fixes a performance issue with the algorithm that
determines with geofences are potential candidates for
notifications following a SET operation.
Details
Prior to commit b471873 (10 commits ago) there was a bug where
the "cross" detection was not firing in all cases. This happened
because when looking for candidates for "cross" due to a SET
operation, only the geofences that overlapped the previous
position of the object and the geofences that overlapped the new
position where searched. But, in fac, all of the geofences that
overlapped the union rectangle of the old and new position should
have been searched.
That commit fixed the problem by searching a union rect of the
old and new positions. While this is an accurate solution, it
caused a slowdown on systems that have big/wild position changes
that might cross a huge number of geofences, even when those
geofences did not need actually need "cross" detection.
The fix
With this commit the geofences that have a "cross" detection
are stored in a seperated tree from those that do not. This
allows for a hybrid of the functionality prior and post b471873.
Fixes#583
This commit addresses issue #230, where an AOF file will sometimes
not load due to the file being padded with trailing zeros. It's
uncertain what is causing this corruption, but it appears to be
coming from outside of the tile38-server process. I suspect it's
due to some block store layer in Kubernetes/Docker cloud
environments.
This fix allows for Tile38 to start up by discovering the trailing
zeros while loading the AOF and safely truncating the file as to
not include the zeros in the future.
This commit fixes an issue that happens when running SCAN on a
collection that has objects with fields, causing field values
to be mismatched with their respective keys.
This only occured with json output, and is a regression from #534.
Fixes#569
This commit fixes an issue where the OUTPUT command requires
authentication when a server password has been set with
CONFIG SET requirepass. This was causing problems with clients
that use json responses, like the tile38-cli.
Fixes#564
The current KNN implementation has two areas that can be improved:
- The current behavior is somewhat incorrect. When performing a kNN
query, the current code fetches k items from the index, and then sorts
these items according to Haversine distance. The problem with this
approach is that since the items fetched from the index are ordered by
a Euclidean metric, there is no guarantee that item k + 1 is not closer
than item k in great circle distance, and hence incorrect results can be
returned when closer items beyond k exist.
- The secondary sort is a performance killer. This requires buffering
all k items (again...they were already run through a priority queue in)
the index, and then a sort. Since the items are mostly sorted, and
Go's sort implementation is a quickSort this is the worst case for the
sort algorithm.
Both of these can be fixed by applying a proper distance metric in
the index nearby operation. In addition, this cleans up the code
considerably, removing a number of special cases that applied only
to NEARBY operations.
This change implements a geodetic distance metric that ensures that
the order from the index is correct, eliminating the need for the
secondary sort and special filtering cases in the ScanWriter code.
This commit fixes a case where a roaming geofence will not fire
a "faraway" event when it's supposed to.
The fix required rewriting the nearby/faraway detection logic. It
is now much more accurate and takes overall less memory, but it's
also a little slower per operation because each object proximity
is checked twice per update. Once to compare the old object's
surrounding, and once to evaulated the new object. The two lists
are then used to generate accurate "nearby" and "faraway" results.
This commit addresses an issue that began on 1.19 where the
deprecated tile38 native line protocol was removed in favor of
the more robust resp protocol. In turn the tile38 cli required
that all args are quoteless or quote escaped.
The commit ensures that the server returns the correct error
message and also loosens the strictness of the need for quoted
arguments in the tile38-cli.
fixes#513
This commit cleans up various Go code in the internal directory.
- Ensures comments on exported functions
- Changes all *Server receiver in all files to be "s", instead
of mixed "c", "s", "server", etc.
- Silenced Go warnings for if/else with returns.
- Cleaned up import ordering.
Use the `tls=1` and the set the the `tlscert` and `tlskey` query
string params. The cert and key files must be on the tile38
server and the Nats server must be started using the same files.
nats://54.12.34.121:4222/fleet?tls=1&tlscert=cert.crt&tlskey=cert.key
This commit fixes an issue where Tile38 was using lots of extra
memory to track objects that are marked to expire. This was
creating problems with applications that set big TTLs.
How it worked before:
Every collection had a unique hashmap that stores expiration
timestamps for every object in that collection. Along with
the hashmaps, there's also one big server-wide list that gets
appended every time a new SET+EX is performed.
From a background routine, this list is looped over at least
10 times per second and is randomly searched for potential
candidates that might need expiring. The routine then removes
those entries from the list and tests if the objects matching
the entries have actually expired. If so, these objects are
deleted them from the database. When at least 25% of
the 20 candidates are deleted the loop is immediately
continued, otherwise the loop backs off with a 100ms pause.
Why this was a problem.
The list grows one entry for every SET+EX. When TTLs are long,
like 24-hours or more, it would take at least that much time
before the entry is removed. So for databased that have objects
that use TTLs and are updated often this could lead to a very
large list.
How it was fixed.
The list was removed and the hashmap is now search randomly. This
required a new hashmap implementation, as the built-in Go map
does not provide an operation for randomly geting entries. The
chosen implementation is a robinhood-hash because it provides
open-addressing, which makes for simple random bucket selections.
Issue #502
Each mqtt hook establishes separate connection to the MQTT broker. If
their clientIds are all equal the MQTT broker will disconnect the clients - the
protocol does not allow 2 connected clients with the same name