NAME

     dbminit, fetch, store, dbmclose  -  somewhat  dbm-compatible
     database routines
     dbzfresh, dbzagain, dbzfetch, dbzstore - database routines
     dbzsync, dbzsize, dbzincore, dbzcancel, dbzdebug -  database
     routines


SYNOPSIS

     #include <dbz.h>

     dbminit(base)
     char *base;

     datum
     fetch(key)
     datum key;

     store(key, value)
     datum key;
     datum value;

     dbmclose()

     dbzfresh(base, size, fieldsep, cmap, tagmask)
     char *base;
     long size;
     int fieldsep;
     int cmap;
     long tagmask;

     dbzagain(base, oldbase)
     char *base;
     char *oldbase;

     datum
     dbzfetch(key)
     datum key;

     dbzstore(key, value)
     datum key;
     datum value;

     dbzsync()

     long
     dbzsize(nentries)
     long nentries;

     dbzincore(newvalue)

     dbzcancel()

     dbzdebug(newvalue)


DESCRIPTION

     These functions provide an indexing system for rapid  random
     access  to  a text file (the base file).  Subject to certain
     constraints, they are call-compatible with dbm(3),  although
     they  also provide some extensions.  (Note that they are not
     file-compatible with dbm or any variant thereof.)

     In principle, dbz stores key-value pairs, where both key and
     value  are  arbitrary  sequences  of bytes, specified to the
     functions by values of type datum, typedefed in  the  header
     file  to  be  a structure with members dptr (a value of type
     char * pointing to the bytes) and dsize (a value of type int
     indicating how long the byte sequence is).

     In practice, dbz is more restricted than dbm.  A  dbz  data-
     base  must  be  an index into a base file, with the database
     values being fseek(3) offsets into the base file.  Each such
     value  must  ``point to'' a place in the base file where the
     corresponding key sequence is found.  A key can be no longer
     than  DBZMAXKEY  (a  constant  defined  in  the header file)
     bytes.  No key can be an  initial  subsequence  of  another,
     which  in  most  applications  requires  that keys be either
     bracketed or terminated in some way (see the  discussion  of
     the  fieldsep parameter of dbzfresh, below, for a fine point
     on terminators).

     Dbminit opens a database, an index into the base file  base,
     consisting of files base.dir and base.pag which must already
     exist.  (If the database is new, they should be  zero-length
     files.)   Subsequent  accesses  go  to  that  database until
     dbmclose is called to close the  database.   The  base  file
     need not exist at the time of the dbminit, but it must exist
     before accesses are attempted.

     Fetch searches the database for the specified key, returning
     the  corresponding value if any.  Store stores the key-value
     pair in the database.  Store will fail unless  the  database
     files  are  writeable.  See below for a complication arising
     from case mapping.

     Dbzfresh is a variant of dbminit for creating a new database
     with  more  control  over  details.  Unlike for dbminit, the
     database files need not exist:   they  will  be  created  if
     necessary, and truncated in any case.

     Dbzfresh's size parameter specifies the size  of  the  first
     hash table within the database, in key-value pairs.  Perfor-
     mance will be best if size is a prime number and the  number
     of  key-value  pairs  stored in the database does not exceed
     about  2/3  of  size.   (The  dbzsize  function,  given  the
     expected  number of key-value pairs, will suggest a database
     size that meets these criteria.)   Assuming  that  an  fseek
     offset  is  4 bytes, the .pag file will be 4*size bytes (the
     .dir file is tiny and roughly constant in  size)  until  the
     number of key-value pairs exceeds about 80% of size.  (Noth-
     ing awful will happen if the database grows beyond  100%  of
     size, but accesses will slow down somewhat and the .pag file
     will grow somewhat.)

     Dbzfresh's fieldsep parameter specifies the field  separator
     in  the  base  file.   If  this is not NUL (0), and the last
     character of a key argument is NUL, that NUL compares  equal
     to  either  a NUL or a fieldsep in the base file.  This per-
     mits use of NUL to terminate key strings  without  requiring
     that  NULs appear in the base file.  The fieldsep of a data-
     base created with dbminit is the horizontal-tab character.

     For use in news systems, various forms of case mapping (e.g.
     uppercase  to  lowercase)  in  keys are available.  The cmap
     parameter to dbzfresh is a single character specifying which
     of  several mapping algorithms to use.  Available algorithms
     are:

          0    case-sensitive:  no case mapping

          B    same as 0

          NUL  same as 0

          =    case-insensitive:    uppercase    and    lowercase
               equivalent

          b    same as =

          C    RFC822 message-ID rules, case-sensitive before `@'
               (with  certain  exceptions)  and  case-insensitive
               after

          ?    whatever the local default is, normally C

     Mapping algorithm 0 (no mapping) is faster than  the  others
     and  is  overwhelmingly the correct choice for most applica-
     tions.  Unless compatibility constraints  interfere,  it  is
     more  efficient  to pre-map the keys, storing mapped keys in
     the base file, than to have dbz  do  the  mapping  on  every
     search.

     For historical reasons, fetch and  store  expect  their  key
     arguments  to be pre-mapped, but expect unmapped keys in the
     base file.  Dbzfetch and dbzstore do the same jobs but  han-
     dle  all  case  mapping internally, so the customer need not
     worry about it.
     Dbz stores only the database values in its files, relying on
     reference  to  the  base  file  to  confirm  a hit on a key.
     References to the base file can be minimized, greatly speed-
     ing  up  searches,  if a little bit of information about the
     keys can be stored in the dbz files.  This  is  ``free''  if
     there  are  some unused bits in an fseek offset, so that the
     offset can be tagged with some information  about  the  key.
     The  tagmask  parameter  of  dbzfresh  allows specifying the
     location of unused bits.  Tagmask should be a mask with  one
     group  of contiguous 1 bits.  The bits in the mask should be
     unused (0) in most offsets.  The bit immediately  above  the
     mask  (the  flag  bit)  should be unused (0) in all offsets;
     (dbz)store will reject attempts to store a key-value pair in
     which  the  value has the flag bit on.  Apart from this res-
     triction, tagging is invisible to the user.   As  a  special
     case,  a  tagmask  of  1  means ``no tagging'', for use with
     enormous base  files  or  on  systems  with  unusual  offset
     representations.

     A size of 0 given to dbzfresh is synonymous with  the  local
     default;  the  normal  default is suitable for tables of 90-
     100,000 key-value pairs.  A cmap of 0  (NUL)  is  synonymous
     with  the character 0, signifying no case mapping (note that
     the character ? specifies the local  default  mapping,  nor-
     mally  C).   A  tagmask  of  0  is synonymous with the local
     default tag mask, normally 0x7f000000  (specifying  the  top
     bit  in a 32-bit offset as the flag bit, and the next 7 bits
     as the mask, which is suitable for base files  up  to  circa
     24MB).   Calling dbminit(name) with the database files empty
     is equivalent to calling dbzfresh(name,0,'\t','?',0).

     When databases are regenerated periodically, as in news,  it
     is  simplest to pick the parameters for a new database based
     on the old one.  This also permits some memory of past sizes
     of  the  old  database,  so  that a new database size can be
     chosen to cover expected fluctuations.  Dbzagain is a  vari-
     ant  of dbminit for creating a new database as a new genera-
     tion of an old database.  The  database  files  for  oldbase
     must exist.  Dbzagain is equivalent to calling dbzfresh with
     the same field separator, case mapping, and tag mask as  the
     old  database,  and  a  size equal to the result of applying
     dbzsize to the largest number  of  entries  in  the  oldbase
     database and its previous 10 generations.

     When many accesses are being done by the same  program,  dbz
     is  massively  faster  if its first hash table is in memory.
     If an internal flag is 1, an attempt is  made  to  read  the
     table in when the database is opened, and dbmclose writes it
     out to disk again (if it was read successfully and has  been
     modified).   Dbzincore  sets  the  flag  to  newvalue (which
     should be 0 or 1) and returns the previous value; this  does
     not  affect  the  status of a database that has already been
     opened.  The default is 0.  The attempt to read the table in
     may  fail  due  to memory shortage; in this case dbz quietly
     falls back on its default behavior.  Stores to an  in-memory
     database  are not (in general) written out to the file until
     dbmclose or dbzsync, so if robustness  in  the  presence  of
     crashes  or  concurrent accesses is crucial, in-memory data-
     bases should probably be avoided.

     Dbzsync causes all buffers etc. to be  flushed  out  to  the
     files.  It is typically used as a precaution against crashes
     or concurrent accesses when a dbz-using process will be run-
     ning for a long time.  It is a somewhat expensive operation,
     especially for an in-memory database.

     Dbzcancel cancels any pending writes from buffers.  This  is
     typically  useful  only  for in-core databases, since writes
     are otherwise done immediately.  Its main purpose is to  let
     a  child  process,  in  the  wake  of  a fork, do a dbmclose
     without writing its parent's data to disk.

     If dbz has been compiled with debugging facilities available
     (which  makes  it  bigger and a bit slower), dbzdebug alters
     the value (and returns the previous value)  of  an  internal
     flag which (when 1; default is 0) causes verbose and cryptic
     debugging output on standard output.

     Concurrent reading of databases is fairly safe, but there is
     no (inter)locking, so concurrent updating is not.

     The database files include a record of the byte order of the
     processor  creating the database, and accesses by processors
     with different byte order will work, although they  will  be
     slightly slower.  Byte order is preserved by dbzagain.  How-
     ever, agreement on the size and  internal  structure  of  an
     fseek  offset is necessary, as is consensus on the character
     set.

     An open database occupies  three  stdio  streams  and  their
     corresponding  file  descriptors;  a fourth is needed for an
     in-memory  database.   Memory  consumption   is   negligible
     (except for stdio buffers) except for in-memory databases.


SEE ALSO

     dbz(1), dbm(3)


DIAGNOSTICS

     Functions returning int values return 0 for success, -1  for
     failure.   Functions  returning  datum values return a value
     with dptr set to NULL for failure.  Dbminit attempts to have
     errno  set  plausibly  on  return, but otherwise this is not
     guaranteed.  An errno of EDOM from  dbminit  indicates  that
     the database did not appear to be in dbz format.


HISTORY

     The  original  dbz  was  written  by  Jon  Zeeff   (zeeff@b-
     tech.ann-arbor.mi.us).   Later contributions by David Butler
     and Mark Moraes.  Extensive reworking, including this  docu-
     mentation,  by Henry Spencer (henry@zoo.toronto.edu) as part
     of the C News project.  Hashing function by Peter Honeyman.


BUGS

     The dptr members of returned datum values  point  to  static
     storage which is overwritten by later calls.

     Unlike dbm, dbz will misbehave if an existing key-value pair
     is `overwritten' by a new (dbz)store with the same key.  The
     user is responsible for avoiding this  by  using  (dbz)fetch
     first  to  check  for  duplicates;  an internal optimization
     remembers the result of the first search so there is minimal
     overhead in this.

     Waiting until after dbminit to  bring  the  base  file  into
     existence will fail if chdir(2) has been used meanwhile.

     The RFC822 case mapper implements only a first approximation
     to the hideously-complex RFC822 case rules.

     The prime finder in dbzsize is not particularly quick.

     Should implement the dbm  functions  delete,  firstkey,  and
     nextkey.

     On C implementations which trap integer overflow,  dbz  will
     refuse  to  (dbz)store an fseek offset equal to the greatest
     representable positive number, as this would cause  overflow
     in the biased representation used.

     Dbzagain perhaps ought to notice when many  offsets  in  the
     old  database  were  too big for tagging, and shrink the tag
     mask to match.

     Marking dbz's file  descriptors  close-on-exec  would  be  a
     better  approach  to the problem dbzcancel tries to address,
     but that's harder to do portably.












Man(1) output converted with man2html