Commit Graph

25 Commits

Author SHA1 Message Date
tidwall 9e68703841 Update expiration logic
This commit changes the logic for managing the expiration of
objects in the database.

Before: There was a server-wide hashmap that stored the
collection key, id, and expiration timestamp for all objects
that had a TTL. The hashmap was occasionally probed at 20
random positions, looking for objects that have expired. Those
expired objects were immediately deleted, and if there was 5
or more objects deleted, then the probe happened again, with
no delay. If the number of objects was less than 5 then the
there was a 1/10th of a second delay before the next probe.

Now: Rather than a server-wide hashmap, each collection has
its own ordered priority queue that stores objects with TTLs.
Rather than probing, there is a background routine that
executes every 1/10th of a second, which pops the expired
objects from the collection queues, and deletes them.

The collection/queue method is a more stable approach than
the hashmap/probing method. With probing, we can run into
major cache misses for some cases where there is wide
TTL duration, such as in the hours or days. This may cause
the system to occasionally fall behind, leaving should-be
expired objects in memory. Using a queue, there is no
cache misses, all objects that should be expired will be
right away, regardless of the TTL durations.

Fixes #616
2021-07-12 13:37:50 -07:00
tidwall dd4d31ae1b Fix last merge 2021-07-11 10:09:51 -07:00
tidwall 579a41abae Merge branch 'housecanary-fix-knn' 2021-07-11 10:02:59 -07:00
tidwall 991963268a Fix last merge 2021-07-10 19:32:21 -07:00
tidwall f2bbf10c36 Merge branch 'mpoindexter-optimize-field-value-access' 2021-07-10 19:26:29 -07:00
tidwall b467c6f1cb Change 255 to 256 2021-04-28 05:10:18 -07:00
tidwall 6b08f7fa9e Code cleanup
- Removed unused functions and variables
- Wrapped client formatted errors
- Updated deprecated packages
- Changed suggested code patterns
2021-03-31 08:13:44 -07:00
tidwall 72dfaaec63 Updated dependencies 2021-02-07 17:54:56 -07:00
tidwall 3ed048242e Various updates
- Updated all dependencies
- Updated geoindex Box api
2021-02-03 14:30:55 -07:00
tidwall 79bee8523b Updated btree deps 2020-10-27 15:29:50 -07:00
Mike Poindexter 042582aef3 Update comment 2020-04-08 11:38:12 -07:00
Mike Poindexter 2a4272c95f Improve kNN behavior
The current KNN implementation has two areas that can be improved:

- The current behavior is somewhat incorrect. When performing a kNN
query, the current code fetches k items from the index, and then sorts
these items according to Haversine distance. The problem with this
approach is that since the items fetched from the index are ordered by
a Euclidean metric, there is no guarantee that item k + 1 is not closer
than item k in great circle distance, and hence incorrect results can be
returned when closer items beyond k exist.

- The secondary sort is a performance killer. This requires buffering
all k items (again...they were already run through a priority queue in)
the index, and then a sort. Since the items are mostly sorted, and
Go's sort implementation is a quickSort this is the worst case for the
sort algorithm.

Both of these can be fixed by applying a proper distance metric in
the index nearby operation. In addition, this cleans up the code
considerably, removing a number of special cases that applied only
to NEARBY operations.

This change implements a geodetic distance metric that ensures that
the order from the index is correct, eliminating the need for the
secondary sort and special filtering cases in the ScanWriter code.
2020-04-07 20:10:58 -07:00
Mike Poindexter 9a5d608c21 Switch field storage to an array vs map 2020-03-25 10:24:02 -07:00
tidwall b482206894 Minimize sorting of collection fields 2020-03-22 07:58:03 -07:00
Alex Roitman 5faccc3b4c Avoid sorting fields for each written object. 2020-03-03 13:39:43 -08:00
tidwall 639f6e2deb Replaced boxtree for rbang 2019-09-12 18:42:53 -07:00
tidwall 0aecef6a5c Added TIMEOUT command 2019-04-24 05:09:41 -07:00
tidwall 95a5556d61 Added periodic yielding to iterators 2019-03-05 11:33:37 -07:00
tidwall 92c1ce8ef9 Update tinybtree dep 2019-02-11 13:39:29 -07:00
tidwall b2203fcb97 Fix nearby fast-fail 2018-11-11 09:05:26 -07:00
tidwall 372744b192 More hacking vendored circle.go 2018-11-11 09:04:00 -07:00
tidwall 07bae979a5 Added Cursor interface 2018-11-02 06:09:56 -07:00
Alex Roitman 0933c541f4 Refactor cursor/paging. 2018-10-31 22:01:37 -07:00
Alex Roitman b94f3685b6 Move iterating up to the cursor before any tests. 2018-10-31 22:01:24 -07:00
tidwall 6257ddba78 Faster point in polygon / GeoJSON updates
The big change is that the GeoJSON package has been completely
rewritten to fix a few of geometry calculation bugs, increase
performance, and to better follow the GeoJSON spec RFC 7946.

GeoJSON updates

- A LineString now requires at least two points.
- All json members, even foreign, now persist with the object.
- The bbox member persists too but is no longer used for geometry
  calculations. This is change in behavior. Previously Tile38 would
  treat the bbox as the object's physical rectangle.
- Corrections to geometry intersects and within calculations.

Faster spatial queries

- The performance of Point-in-polygon and object intersect operations
  are greatly improved for complex polygons and line strings. It went
  from O(n) to roughly O(log n).
- The same for all collection types with many children, including
  FeatureCollection, GeometryCollection, MultiPoint, MultiLineString,
  and MultiPolygon.

Codebase changes

- The pkg directory has been renamed to internal
- The GeoJSON internal package has been moved to a seperate repo at
  https://github.com/tidwall/geojson. It's now vendored.

Please look out for higher memory usage for datasets using complex
shapes. A complex shape is one that has 64 or more points. For these
shapes it's expected that there will be increase of least 54 bytes per
point.
2018-10-13 04:30:48 -07:00