NAME
dbminit, fetch, store, dbmclose - somewhat dbm-compatible
database routines
dbzfresh, dbzagain, dbzfetch, dbzstore - database routines
dbzsync, dbzsize, dbzincore, dbzcancel, dbzdebug - database
routines
SYNOPSIS
#include <dbz.h>
dbminit(base)
char *base;
datum
fetch(key)
datum key;
store(key, value)
datum key;
datum value;
dbmclose()
dbzfresh(base, size, fieldsep, cmap, tagmask)
char *base;
long size;
int fieldsep;
int cmap;
long tagmask;
dbzagain(base, oldbase)
char *base;
char *oldbase;
datum
dbzfetch(key)
datum key;
dbzstore(key, value)
datum key;
datum value;
dbzsync()
long
dbzsize(nentries)
long nentries;
dbzincore(newvalue)
dbzcancel()
dbzdebug(newvalue)
DESCRIPTION
These functions provide an indexing system for rapid random
access to a text file (the base file). Subject to certain
constraints, they are call-compatible with dbm(3), although
they also provide some extensions. (Note that they are not
file-compatible with dbm or any variant thereof.)
In principle, dbz stores key-value pairs, where both key and
value are arbitrary sequences of bytes, specified to the
functions by values of type datum, typedefed in the header
file to be a structure with members dptr (a value of type
char * pointing to the bytes) and dsize (a value of type int
indicating how long the byte sequence is).
In practice, dbz is more restricted than dbm. A dbz data-
base must be an index into a base file, with the database
values being fseek(3) offsets into the base file. Each such
value must ``point to'' a place in the base file where the
corresponding key sequence is found. A key can be no longer
than DBZMAXKEY (a constant defined in the header file)
bytes. No key can be an initial subsequence of another,
which in most applications requires that keys be either
bracketed or terminated in some way (see the discussion of
the fieldsep parameter of dbzfresh, below, for a fine point
on terminators).
Dbminit opens a database, an index into the base file base,
consisting of files base.dir and base.pag which must already
exist. (If the database is new, they should be zero-length
files.) Subsequent accesses go to that database until
dbmclose is called to close the database. The base file
need not exist at the time of the dbminit, but it must exist
before accesses are attempted.
Fetch searches the database for the specified key, returning
the corresponding value if any. Store stores the key-value
pair in the database. Store will fail unless the database
files are writeable. See below for a complication arising
from case mapping.
Dbzfresh is a variant of dbminit for creating a new database
with more control over details. Unlike for dbminit, the
database files need not exist: they will be created if
necessary, and truncated in any case.
Dbzfresh's size parameter specifies the size of the first
hash table within the database, in key-value pairs. Perfor-
mance will be best if size is a prime number and the number
of key-value pairs stored in the database does not exceed
about 2/3 of size. (The dbzsize function, given the
expected number of key-value pairs, will suggest a database
size that meets these criteria.) Assuming that an fseek
offset is 4 bytes, the .pag file will be 4*size bytes (the
.dir file is tiny and roughly constant in size) until the
number of key-value pairs exceeds about 80% of size. (Noth-
ing awful will happen if the database grows beyond 100% of
size, but accesses will slow down somewhat and the .pag file
will grow somewhat.)
Dbzfresh's fieldsep parameter specifies the field separator
in the base file. If this is not NUL (0), and the last
character of a key argument is NUL, that NUL compares equal
to either a NUL or a fieldsep in the base file. This per-
mits use of NUL to terminate key strings without requiring
that NULs appear in the base file. The fieldsep of a data-
base created with dbminit is the horizontal-tab character.
For use in news systems, various forms of case mapping (e.g.
uppercase to lowercase) in keys are available. The cmap
parameter to dbzfresh is a single character specifying which
of several mapping algorithms to use. Available algorithms
are:
0 case-sensitive: no case mapping
B same as 0
NUL same as 0
= case-insensitive: uppercase and lowercase
equivalent
b same as =
C RFC822 message-ID rules, case-sensitive before `@'
(with certain exceptions) and case-insensitive
after
? whatever the local default is, normally C
Mapping algorithm 0 (no mapping) is faster than the others
and is overwhelmingly the correct choice for most applica-
tions. Unless compatibility constraints interfere, it is
more efficient to pre-map the keys, storing mapped keys in
the base file, than to have dbz do the mapping on every
search.
For historical reasons, fetch and store expect their key
arguments to be pre-mapped, but expect unmapped keys in the
base file. Dbzfetch and dbzstore do the same jobs but han-
dle all case mapping internally, so the customer need not
worry about it.
Dbz stores only the database values in its files, relying on
reference to the base file to confirm a hit on a key.
References to the base file can be minimized, greatly speed-
ing up searches, if a little bit of information about the
keys can be stored in the dbz files. This is ``free'' if
there are some unused bits in an fseek offset, so that the
offset can be tagged with some information about the key.
The tagmask parameter of dbzfresh allows specifying the
location of unused bits. Tagmask should be a mask with one
group of contiguous 1 bits. The bits in the mask should be
unused (0) in most offsets. The bit immediately above the
mask (the flag bit) should be unused (0) in all offsets;
(dbz)store will reject attempts to store a key-value pair in
which the value has the flag bit on. Apart from this res-
triction, tagging is invisible to the user. As a special
case, a tagmask of 1 means ``no tagging'', for use with
enormous base files or on systems with unusual offset
representations.
A size of 0 given to dbzfresh is synonymous with the local
default; the normal default is suitable for tables of 90-
100,000 key-value pairs. A cmap of 0 (NUL) is synonymous
with the character 0, signifying no case mapping (note that
the character ? specifies the local default mapping, nor-
mally C). A tagmask of 0 is synonymous with the local
default tag mask, normally 0x7f000000 (specifying the top
bit in a 32-bit offset as the flag bit, and the next 7 bits
as the mask, which is suitable for base files up to circa
24MB). Calling dbminit(name) with the database files empty
is equivalent to calling dbzfresh(name,0,'\t','?',0).
When databases are regenerated periodically, as in news, it
is simplest to pick the parameters for a new database based
on the old one. This also permits some memory of past sizes
of the old database, so that a new database size can be
chosen to cover expected fluctuations. Dbzagain is a vari-
ant of dbminit for creating a new database as a new genera-
tion of an old database. The database files for oldbase
must exist. Dbzagain is equivalent to calling dbzfresh with
the same field separator, case mapping, and tag mask as the
old database, and a size equal to the result of applying
dbzsize to the largest number of entries in the oldbase
database and its previous 10 generations.
When many accesses are being done by the same program, dbz
is massively faster if its first hash table is in memory.
If an internal flag is 1, an attempt is made to read the
table in when the database is opened, and dbmclose writes it
out to disk again (if it was read successfully and has been
modified). Dbzincore sets the flag to newvalue (which
should be 0 or 1) and returns the previous value; this does
not affect the status of a database that has already been
opened. The default is 0. The attempt to read the table in
may fail due to memory shortage; in this case dbz quietly
falls back on its default behavior. Stores to an in-memory
database are not (in general) written out to the file until
dbmclose or dbzsync, so if robustness in the presence of
crashes or concurrent accesses is crucial, in-memory data-
bases should probably be avoided.
Dbzsync causes all buffers etc. to be flushed out to the
files. It is typically used as a precaution against crashes
or concurrent accesses when a dbz-using process will be run-
ning for a long time. It is a somewhat expensive operation,
especially for an in-memory database.
Dbzcancel cancels any pending writes from buffers. This is
typically useful only for in-core databases, since writes
are otherwise done immediately. Its main purpose is to let
a child process, in the wake of a fork, do a dbmclose
without writing its parent's data to disk.
If dbz has been compiled with debugging facilities available
(which makes it bigger and a bit slower), dbzdebug alters
the value (and returns the previous value) of an internal
flag which (when 1; default is 0) causes verbose and cryptic
debugging output on standard output.
Concurrent reading of databases is fairly safe, but there is
no (inter)locking, so concurrent updating is not.
The database files include a record of the byte order of the
processor creating the database, and accesses by processors
with different byte order will work, although they will be
slightly slower. Byte order is preserved by dbzagain. How-
ever, agreement on the size and internal structure of an
fseek offset is necessary, as is consensus on the character
set.
An open database occupies three stdio streams and their
corresponding file descriptors; a fourth is needed for an
in-memory database. Memory consumption is negligible
(except for stdio buffers) except for in-memory databases.
SEE ALSO
dbz(1), dbm(3)
DIAGNOSTICS
Functions returning int values return 0 for success, -1 for
failure. Functions returning datum values return a value
with dptr set to NULL for failure. Dbminit attempts to have
errno set plausibly on return, but otherwise this is not
guaranteed. An errno of EDOM from dbminit indicates that
the database did not appear to be in dbz format.
HISTORY
The original dbz was written by Jon Zeeff (zeeff@b-
tech.ann-arbor.mi.us). Later contributions by David Butler
and Mark Moraes. Extensive reworking, including this docu-
mentation, by Henry Spencer (henry@zoo.toronto.edu) as part
of the C News project. Hashing function by Peter Honeyman.
BUGS
The dptr members of returned datum values point to static
storage which is overwritten by later calls.
Unlike dbm, dbz will misbehave if an existing key-value pair
is `overwritten' by a new (dbz)store with the same key. The
user is responsible for avoiding this by using (dbz)fetch
first to check for duplicates; an internal optimization
remembers the result of the first search so there is minimal
overhead in this.
Waiting until after dbminit to bring the base file into
existence will fail if chdir(2) has been used meanwhile.
The RFC822 case mapper implements only a first approximation
to the hideously-complex RFC822 case rules.
The prime finder in dbzsize is not particularly quick.
Should implement the dbm functions delete, firstkey, and
nextkey.
On C implementations which trap integer overflow, dbz will
refuse to (dbz)store an fseek offset equal to the greatest
representable positive number, as this would cause overflow
in the biased representation used.
Dbzagain perhaps ought to notice when many offsets in the
old database were too big for tagging, and shrink the tag
mask to match.
Marking dbz's file descriptors close-on-exec would be a
better approach to the problem dbzcancel tries to address,
but that's harder to do portably.
Man(1) output converted with
man2html