Family: sp
Authors: KSB Braunsdorf
Mail: [email protected]
Version: 1.5
Bugs: None known.

Introduction

We need to hold a list of integers in memory as we process an expensive operation. At the end we need to traverse the list of the few integers which "hit" in the expensive search, and maybe recover a clue as to "how" it hit.

This module assumes that hits in the scan process are at random, and are very unlikely.

Configuration

Provide a standard machine.h which optionally defines:
#define SP_ALLOC_CACHE 2048
Gather sp_init function calls to calloc(3) into larger cached slabs. Since we never free the memory we allocate for the space this saves a lot of malloc headers and such. Suggested value of 4096 for a medium application hit rate, smaller for very rare hit rates.

By default no cache is enabled.

#define SP_ALLOC_TRIM 8
Declare how much space you are willing to discard to avoid calling calloc(3). Many of the calls we make to calloc are for 2 and 3 element vectors (12 bytes or less), this cache reduces the overhead consumed by that pattern. By allowing the rare request for 20 to 44 element vectors to fall directory to calloc we can save the small "trim part" for the next request for a short vector.
#define SP_RECURSE 0
Use the recursive version of sp_walk, rather than the iterative one when set to non-zero. The recursive one is more complex and slower, as far as I can tell.
#define SP_VERBOSE 0
When set to a non-zero value each function outputs a trace of the major plays in the game. Helps a new coder understand how this all works. Of course a picture helps a lot more.

Synopsis

#include "machine.h"
#include "sparse.h"

Description

This structure provides a cheap way to record which integer values a program has "seen" or "visited". That is to say we build a list of the values we've been presented, and bind a (void *) to each. This is useful for mapping inodes to uids, or dates to events, or any integer domain to any range you can represent with a pointer to some data structure.

These functions are limited to construction of a sparse space: no mechanism (short of exit(3)) is provided to remove any allocated space. This is fine for the use intended. An alternative implementation via AVL trees would trivially provide a "clean" version. More modern allocation system may eliminate this issue.

Most of the configuration is done at run-time. The compile-time options are limited at present.

Provides

extern sp_key sp_fibs[];
The list of the unique 32 (or 64) bit Fibonacci number, with a leading 0 and a trailing value of 2**32-1 (2**64-1). This is used in the code to quickly lookup the next or previous Fibonacci number.
extern size_t sp_unfib(sp_key wN);
Find the position in the sp_key array which contains the Fibonacci that is closest the wN without going over.
extern void **sp_init(sp_key wRange);
Return a (void **) which can represent unsigned integers up to and including wRange (starting at zero). This is the only way to build an empty sparse space, do not try to make your own.
extern void **sp_index(void **ppvFrom, sp_key wFind);
Return a (void **) which is bound to the integer wFind in the sparse space ppcFrom. To mark the value as used set the (void *) indexed by the return value to a non-NULL value (any one works).

Later, when sp_walking the space this (void *) is the second parameter passed to the Map function.

extern int sp_walk(void **ppvFrom, sp_key wLow, sp_key wHi, int (*pfiMap)(sp_key, void *));

Walk the sparse space given by From from wLow to wHi calling Map on each integer that has a non-NULL (void *) bound to it.

int YourMap(sp_key wN, void *pvData) { ... }

The declaration of the function that Map expects for sp_walk.

extern int sp_exists(void **ppvFrom, sp_key wTest);

Test fot the existence of Test in the sparse space given by From. Return zero for "not in the space" and non-zero for "included".

EXAMPLE

See the test driver embedded in the module.

This test driver inserts a few (unsigned long) integers into a sparse space instance with alternating colors ("red" and "blue"), then walks the space to output each integer and the color it was recorded with.

The integers in the test data were selected to test the corner cases in the walk and index functions.

Diagnostics

None.

See Also

avl(3)

To Do List

None. This is an evolutionary dead-end, as far as I can tell. Replacing the (void *) with a CPP macro or typedef to make this more MICE-like really wouldn't gain much, because the internal code that builds the sparse space needs the pointer type, so we'd have to union it in, anyway.

It is possible to expand the range of the sparse space by representing many integers in the (void *) returned, for example each sub-space could represent 2^64-1 integers to allow a space of 2^128-1 integers by the application of 2 layers, any number of layers could be applied to get any range required with very little memory for truly sparse applications.


$Id: sparse.html,v 1.13 2012/03/21 16:15:05 ksb Exp $