MonetDB源代码——数据结构GDK
2013-10-24 20:33
246 查看
gdk.h是封装的API,存储层的数据结构也在这里定义。
文章先翻译源码中的注释信息,在解释其中的数据结构。
* GDK is a C library that provides ACID properties on a DSM model ——GDK是c连接库,提供DSM模型上的ACID操作。
* @tex
* [@cite{Copeland85}]
* @end tex
* , using main-memory
* database algorithms
* @tex
* [@cite{Garcia-Molina92}]
* @end tex
* built on virtual-memory OS primitives and multi-threaded parallelism —— 构建在虚拟内存和多线程并行处理.
* Its implementation has undergone various changes over its decade ——是一个经历了10年变化,外部需求构建的一个快速强大的数据库系统。
* of development, many of which were driven by external needs to
* obtain a robust and fast database system.
*
* The coding scheme explored in GDK has also laid a foundation to
* communicate over time experiences and to provide (hopefully)
* helpful advice near to the place where the code-reader needs it.
* Of course, over such a long time the documentation diverges from
* reality. Especially in areas where the environment of this package
* is being described.
* Consider such deviations as historic landmarks, e.g. crystallization
* of brave ideas and mistakes rectified at a later stage.
*
* @+ Short Outline
* The facilities provided in this implementation are: ——提供如下实现:
* @itemize
* @item
* GDK or Goblin Database Kernel routines for session management ——数据库核心session管理
* @item
* BAT routines that define the primitive operations on the ——BAT定义和操作
* database tables (BATs).
* @item
* BBP routines to manage the BAT Buffer Pool (BBP). ——在BAT的Buffer池
* @item
* ATOM routines to manipulate primitive types, define new types ——使用ADT接口定义新类型
* using an ADT interface.
* @item
* HEAP routines for manipulating heaps: linear spaces of memory ——HEAP 内存堆管理
* that are GDK's vehicle of mass storage (on which BATs are built).
* @item
* DELTA routines to access inserted/deleted elements within a ——DELTA 事务中插入删除元素
* transaction.
* @item
* HASH routines for manipulating GDK's built-in linear-chained —— HASH 线形Hash表,在BAT上查找
* hash tables, for accelerating lookup searches on BATs.
* @item
* TM routines that provide basic transaction management primitives. ——TM基础事务管理
* @item
* TRG routines that provided active database support. [DEPRECATED]——TRG 数据库支持
* @item
* ALIGN routines that implement BAT alignment management. ——ALIGN BAT对齐管理
* @end itemize
*
* The Binary Association Table (BAT) is the lowest level of storage ——在BAT是数据库系统中低层存储结构
* considered in the Goblin runtime system
* @tex
* [@cite{Goblin}]
* @end tex
* . A BAT is a
* self-descriptive main-memory structure that represents the ——BAT是一种自描述的内存结构表示
* @strong{binary relationship} between two atomic types. The
* association can be defined over:
* @table @code
* @item void:
* virtual-OIDs: a densely ascending column of OIDs (takes zero-storage).—— 虚拟空间,列递增在OID在(0开始)
* @item bit:
* Booleans, implemented as one byte values. ——值,用1个byte实现
* @item bte:
* Tiny (1-byte) integers (8-bit @strong{integer}s).
* @item sht:
* Short integers (16-bit @strong{integer}s).
* @item int:
* This is the C @strong{int} type (32-bit).
* @item oid:
* Unique @strong{long int} values uses as object identifier. Highest
* bit cleared always. Thus, oids-s are 31-bit numbers on
* 32-bit systems, and 63-bit numbers on 64-bit systems. ——32位系统,oid是31位数字,64位系统是63位数字
* @item wrd:
* Machine-word sized integers
* (32-bit on 32-bit systems, 64-bit on 64-bit systems).
* @item ptr:
* Memory pointer values. DEPRECATED. Can only be stored in transient
* BATs.
* @item flt:
* The IEEE @strong{float} type.
* @item dbl:
* The IEEE @strong{double} type.
* @item lng:
* Longs: the C @strong{long long} type (64-bit integers).
* @item str:
* UTF-8 strings (Unicode). A zero-terminated byte sequence.
* @item bat:
* Bat descriptor. This allows for recursive administered tables, but ——BAT描述符。允许递归管理,但是严重的事务管理。
* severely complicates transaction management. Therefore, they CAN
* ONLY BE STORED IN TRANSIENT BATs.
* @end table
*
* This model can be used as a back-end model underlying other -higher
* level- models, in order to achieve @strong{better performance} and
* @strong{data independence} in one go. The relational model and the
* object-oriented model can be mapped on BATs by vertically splitting
* every table (or class) for each attribute. Each such a column is
* then stored in a BAT with type @strong{bat[oid,attribute]}, where
* the unique object identifiers link tuples in the different BATs.
* Relationship attributes in the object-oriented model hence are
* mapped to @strong{bat[oid,oid]} tables, being equivalent to the
* concept of @emph{join indexes} @tex [@cite{Valduriez87}] @end tex .
*
* The set of built-in types can be extended with user-defined types
* through an ADT interface. They are linked with the kernel to
* obtain an enhanced library, or they are dynamically loaded upon
* request.
*
* Types can be derived from other types. They represent something
* different than that from which they are derived, but their internal
* storage management is equal. This feature facilitates the work of
* extension programmers, by enabling reuse of implementation code,
* but is also used to keep the GDK code portable from 32-bits to
* 64-bits machines: the @strong{oid} and @strong{ptr} types are
* derived from @strong{int} on 32-bits machines, but is derived from
* @strong{lng} on 64 bits machines. This requires changes in only two
* lines of code each.
*
* To accelerate lookup and search in BATs, GDK supports one built-in
* search accelerator: hash tables. We choose an implementation
* efficient for main-memory: bucket chained hash
* @tex
* [@cite{LehCar86,Analyti92}]
* @end tex
* . Alternatively, when the table is sorted, it will resort to
* merge-scan operations or binary lookups.
为了在BAT上查询,GSK提供了hash tables。提供了bucket chained hash。存储的表,通过merge-scan或者二进制查找重新装载。
* BATs are built on the concept of heaps, which are large pieces of
* main memory. They can also consist of virtual memory, in case the
* working set exceeds main-memory. In this case, GDK supports
* operations that cluster the heaps of a BAT, in order to improve
* performance of its main-memory.
BAT构建在堆。GDK提供BAT堆的cluster,用于提高在主存中的性能。
* @- Rationale
* The rationale for choosing a BAT as the building block for both
* relational and object-oriented system is based on the following
* observations:
选择BAT的根据如下:
* @itemize
* @item -
* Given the fact that CPU speed and main-memory increase in current
* workstation hardware for the last years has been exceeding IO
* access speed increase, traditional disk-page oriented algorithms do
* no longer take best advantage of hardware, in most database
* operations.
*
* Instead of having a disk-block oriented kernel with a large memory
* cache, we choose to build a main-memory kernel, that only under
* large data volumes slowly degrades to IO-bound performance,
* comparable to traditional systems
* @tex
* [@cite{boncz95,boncz96}]
* @end tex
* .
*
* @item -
* Traditional (disk-based) relational systems move too much data
* around to save on (main-memory) join operations.
*
* The fully decomposed store (DSM
* @tex
* [@cite{Copeland85})]
* @end tex
* assures that only those attributes of a relation that are needed,
* will have to be accessed.
*
* @item -
* The data management issues for a binary association is much
* easier to deal with than traditional @emph{struct}-based approaches
* encountered in relational systems.
*
* @item -
* Object-oriented systems often maintain a double cache, one with the
* disk-based representation and a C pointer-based main-memory
* structure. This causes expensive conversions and replicated
* storage management. GDK does not do such `pointer swizzling'. It
* used virtual-memory (@strong{mmap()}) and buffer management advice
* (@strong{madvise()}) OS primitives to cache only once. Tables take
* the same form in memory as on disk, making the use of this
* technique transparent
对象数据库经常提供二级cace映射,但是GDK不提供指针重写,而是采用虚拟内存,由操作系统提供。(直接内存映射)
* @tex
* [@cite{oo7}]
* @end tex
* .
* @end itemize
*
* A RDBMS or OODBMS based on BATs strongly depends on our ability to
* efficiently support tuples and to handle small joins, respectively.
* The remainder of this document describes the Goblin Database kernel
* implementation at greater detail. It is organized as follows:
* @table @code
* @item @strong{GDK Interface}:
数据库内核实现详细内容如下,按照如下组织:
* It describes the global interface with which GDK sessions can be
* started and ended, and environment variables used.
*
* @item @strong{Binary Association Tables}: ——BAT表
* As already mentioned, these are the primary data structure of GDK.
* This chapter describes the kernel operations for creation,
* destruction and basic manipulation of BATs and BUNs (i.e. tuples:
* Binary UNits).
*
* @item @strong{BAT Buffer Pool:} ——BAT 缓存
*
* All BATs are registered in the BAT Buffer Pool. This directory is
* used to guide swapping in and out of BATs. Here we find routines
* that guide this swapping process.
*
* @item @strong{GDK Extensibility:} ——GDK扩展
*
* Atoms can be defined using a unified ADT interface. There is also
* an interface to extend the GDK library with dynamically linked
* object code.
*
* @item @strong{GDK Utilities:} ——GDK工具
*
* Memory allocation and error handling primitives are
* provided. Layers built on top of GDK should use them, for proper
* system monitoring. Thread management is also included here.
*
* @item @strong{Transaction Management:} ——事务管理
*
* For the time being, we just provide BAT-grained concurrency and
* global transactions. Work is needed here.
*
* @item @strong{BAT Alignment:} ——对齐
* Due to the mapping of multi-ary datamodels onto the BAT model, we
* expect many correspondences among BATs, e.g.
* @emph{bat(oid,attr1),.. bat(oid,attrN)} vertical
* decompositions. Frequent activities will be to jump from one
* attribute to the other (`bunhopping'). If the head columns are
* equal lists in two BATs, merge or even array lookups can be used
* instead of hash lookups. The alignment interface makes these
* relations explicitly manageable.
*
* In GDK, complex data models are mapped with DSM on binary tables.
* Usually, one decomposes @emph{N}-ary relations into @emph{N} BATs
* with an @strong{oid} in the head column, and the attribute in the
* tail column. There may well be groups of tables that have the same
* sets of @strong{oid}s, equally ordered. The alignment interface is
* intended to make this explicit. Implementations can use this
* interface to detect this situation, and use cheaper algorithms
* (like merge-join, or even array lookup) instead.
*
* @item @strong{BAT Iterators:} ——迭代
*
* Iterators are C macros that generally encapsulate a complex
* for-loop. They would be the equivalent of cursors in the SQL
* model. The macro interface (instead of a function call interface)
* is chosen to achieve speed when iterating main-memory tables.
*
* @item @strong{Common BAT Operations:}——操作
*
* These are much used operations on BATs, such as aggregate functions
* and relational operators. They are implemented in terms of BAT- and
* BUN-manipulation GDK primitives.
* @end table
*
* @+ Interface Files
* In this section we summarize the user interface to the GDK library.
* It consist of a header file (gdk.h) and an object library
* (gdklib.a), which implements the required functionality. The header
* file must be included in any program that uses the library. The
* library must be linked with such a program.
在gdk.h中汇总了所有用户接口。
* @- Database Context
*
* The MonetDB environment settings are collected in a configuration
* file. Amongst others it contains the location of the database
* directory. First, the database directory is closed for other
* servers running at the same time. Second, performance enhancements
* may take effect, such as locking the code into memory (if the OS
* permits) and preloading the data dictionary. An error at this
* stage normally lead to an abort.
*/
/* Heap storage modes */
typedef enum {
STORE_MEM = 0, /* load into GDKmalloced memory */
STORE_MMAP = 1, /* mmap() into virtual memory */
STORE_PRIV = 2, /* BAT copy of copy-on-write mmap */
STORE_INVALID /* invalid value, used to indicate error */
} storage_t;
typedef struct {
size_t maxsize; /* deprecated: kept equal to size */
size_t free; /* index where free area starts. */
size_t size; /* size of the heap (bytes) */
char *base; /* base pointer in memory. */
str filename; /* file containing image of the heap */
unsigned int copied:1, /* a copy of an existing map. */
hashash:1,/* the string heap contains hash values */
forcemap:1; /* force STORE_MMAP even if heap exists */
storage_t storage; /* storage mode (mmap/malloc). */
storage_t newstorage; /* new desired storage mode at re-allocation. */
bte dirty; /* specific heap dirty marker */
bat parentid; /* cache id of VIEW parent bat */
} Heap;
typedef struct {
int type; /* type of index entity */
BUN lim; /* collision list size */
BUN mask; /* number of hash buckets-1 (power of 2) */
BUN *hash; /* hash table */
BUN *link; /* collision list */
Heap *heap; /* heap where the hash is stored */
} Hash;
typedef struct {
union { /* storage is first in the record */
int ival;
oid oval;
sht shval;
bte btval;
wrd wval;
flt fval;
ptr pval;
struct BAT *Bval; /* this field is only used by mel */
bat bval;
str sval;
dbl dval;
lng lval;
} val;
int len, vtype;
} *ValPtr, ValRecord;
typedef struct {
MT_Id tid; /* which thread created it */
int stamp; /* BAT recent creation stamp */
unsigned int
copiedtodisk:1, /* once written */
dirty:2, /* dirty wrt disk? */
dirtyflushed:1, /* was dirty before commit started? */
descdirty:1, /* bat descriptor dirty marker */
set:1, /* real set semantics */
restricted:2, /* access priviliges */
persistence:1, /* should the BAT persist on disk? */
unused:23; /* value=0 for now */
int sharecnt; /* incoming view count */
char map_head; /* mmap mode for head bun heap */
char map_tail; /* mmap mode for tail bun heap */
char map_hheap; /* mmap mode for head atom heap */
char map_theap; /* mmap mode for tail atom heap */
} BATrec;
typedef struct {
/* delta status administration */
BUN deleted; /* start of deleted elements */
BUN first; /* to store next deletion */
BUN inserted; /* start of inserted elements */
BUN count; /* tuple count */
BUN capacity; /* tuple capacity */
} BUNrec;
typedef struct PROPrec {
int id;
ValRecord v;
struct PROPrec *next; /* simple chain of properties */
} PROPrec;
/* see also comment near BATassertProps() for more information about
* the properties */
typedef struct {
str id; /* label for head/tail column */
unsigned short width; /* byte-width of the atom array */
bte type; /* type id. */
bte shift; /* log2 of bunwidth */
unsigned int
varsized:1, /* varsized (1) or fixedsized (0) */
key:2, /* duplicates allowed? */
dense:1, /* OID only: only consecutive values */
nonil:1, /* nonil isn't propchecked yet */
nil:1, /* there is a nil in the column */
sorted:1, /* column is sorted in ascending order */
revsorted:1; /* column is sorted in descending order */
oid align; /* OID for sync alignment */
BUN nokey[2]; /* positions that prove key ==FALSE */
BUN nosorted; /* position that proves sorted==FALSE */
BUN norevsorted; /* position that proves revsorted==FALSE */
BUN nodense; /* position that proves dense==FALSE */
oid seq; /* start of dense head sequence */
Heap heap; /* space for the column. */
Heap *vheap; /* space for the varsized data. */
Hash *hash; /* hash table */
PROPrec *props; /* list of dynamic properties stored in the bat descriptor */
} COLrec;
typedef struct BAT {
/* static bat properties */
bat batCacheid; /* index into BBP */
/* dynamic column properties */
COLrec *H; /* column info */
COLrec *T; /* column info */
/* dynamic bat properties */
BATrec *P; /* cache and sort info */
BUNrec *U; /* cache and sort info */
} BAT;
typedef struct BATiter {
BAT *b;
oid hvid, tvid;
} BATiter;
/*
* The different parts of which a BAT consists are physically stored
* next to each other in the BATstore type.
*/
typedef struct {
BAT B; /* storage for BAT descriptor */
BAT BM; /* mirror (reverse) BAT */
COLrec H; /* storage for head column */
COLrec T; /* storage for tail column */
BATrec P; /* storage for BATrec */
BUNrec U; /* storage for BUNrec */
} BATstore;
/* structure used by HEAP_check functions */
typedef struct {
size_t minpos; /* minimum block byte-index */
size_t maxpos; /* maximum block byte-index */
int alignment; /* block index alignment */
int *validmask; /* bitmap with all valid byte-indices
* first bit corresponds with 'minpos';
* 2nd bit with 'minpos+alignment', etc
*/
} HeapRepair;
gdk_export void HEAP_initialize(
Heap *heap, /* nbytes -- Initial size of the heap. */
size_t nbytes, /* alignment -- for objects on the heap. */
size_t nprivate, /* nprivate -- Size of private space */
int alignment /* alignment restriction for allocated chunks */
);
typedef struct {
BAT *cache[2]; /* if loaded: BAT* handle + reverse */
str logical[2]; /* logical name + reverse */
str bak[2]; /* logical name + reverse backups */
bat next[2]; /* next BBP slot in linked list */
BATstore *desc; /* the BAT descriptor */
str physical; /* dir + basename for storage */
str options; /* A string list of options */
int refs; /* in-memory references on which the loaded status of a BAT relies */
int lrefs; /* logical references on which the existence of a BAT relies */
int lastused; /* BBP LRU stamp */
volatile int status; /* status mask used for spin locking */
/* MT_Id pid; non-zero thread-id if this BAT is private */
} BBPrec;
typedef struct {
/* simple attributes */
char name[IDLENGTH];
int storage; /* stored as another type? */
short linear; /* atom can be ordered linearly */
short size; /* fixed size of atom */
short align; /* alignment condition for values */
short deleting; /* set if unloading */
int varsized; /* variable-size or fixed-sized */
/* automatically generated fields */
ptr atomNull; /* global nil value */
/* generic (fixed + varsized atom) ADT functions */
int (*atomFromStr) (const char *s, int *len, ptr *dst);
int (*atomToStr) (str *s, int *len, const void *src);
void *(*atomRead) (ptr a, stream *s, size_t cnt);
int (*atomWrite) (const void *a, stream *s, size_t cnt);
int (*atomCmp) (const void *v1, const void *v2);
BUN (*atomHash) (const void *v);
/* optional functions */
void (*atomConvert) (ptr v, int direction);
int (*atomFix) (const void *atom);
int (*atomUnfix) (const void *atom);
/* varsized atom-only ADT functions */
var_t (*atomPut) (Heap *, var_t *off, const void *src);
void (*atomDel) (Heap *, var_t *atom);
int (*atomLen) (const void *atom);
void (*atomHeap) (Heap *, size_t);
/* optional functions */
void (*atomHeapConvert) (Heap *, int direction);
int (*atomHeapCheck) (Heap *, HeapRepair *);
} atomDesc;
文章先翻译源码中的注释信息,在解释其中的数据结构。
* GDK is a C library that provides ACID properties on a DSM model ——GDK是c连接库,提供DSM模型上的ACID操作。
* @tex
* [@cite{Copeland85}]
* @end tex
* , using main-memory
* database algorithms
* @tex
* [@cite{Garcia-Molina92}]
* @end tex
* built on virtual-memory OS primitives and multi-threaded parallelism —— 构建在虚拟内存和多线程并行处理.
* Its implementation has undergone various changes over its decade ——是一个经历了10年变化,外部需求构建的一个快速强大的数据库系统。
* of development, many of which were driven by external needs to
* obtain a robust and fast database system.
*
* The coding scheme explored in GDK has also laid a foundation to
* communicate over time experiences and to provide (hopefully)
* helpful advice near to the place where the code-reader needs it.
* Of course, over such a long time the documentation diverges from
* reality. Especially in areas where the environment of this package
* is being described.
* Consider such deviations as historic landmarks, e.g. crystallization
* of brave ideas and mistakes rectified at a later stage.
*
* @+ Short Outline
* The facilities provided in this implementation are: ——提供如下实现:
* @itemize
* @item
* GDK or Goblin Database Kernel routines for session management ——数据库核心session管理
* @item
* BAT routines that define the primitive operations on the ——BAT定义和操作
* database tables (BATs).
* @item
* BBP routines to manage the BAT Buffer Pool (BBP). ——在BAT的Buffer池
* @item
* ATOM routines to manipulate primitive types, define new types ——使用ADT接口定义新类型
* using an ADT interface.
* @item
* HEAP routines for manipulating heaps: linear spaces of memory ——HEAP 内存堆管理
* that are GDK's vehicle of mass storage (on which BATs are built).
* @item
* DELTA routines to access inserted/deleted elements within a ——DELTA 事务中插入删除元素
* transaction.
* @item
* HASH routines for manipulating GDK's built-in linear-chained —— HASH 线形Hash表,在BAT上查找
* hash tables, for accelerating lookup searches on BATs.
* @item
* TM routines that provide basic transaction management primitives. ——TM基础事务管理
* @item
* TRG routines that provided active database support. [DEPRECATED]——TRG 数据库支持
* @item
* ALIGN routines that implement BAT alignment management. ——ALIGN BAT对齐管理
* @end itemize
*
* The Binary Association Table (BAT) is the lowest level of storage ——在BAT是数据库系统中低层存储结构
* considered in the Goblin runtime system
* @tex
* [@cite{Goblin}]
* @end tex
* . A BAT is a
* self-descriptive main-memory structure that represents the ——BAT是一种自描述的内存结构表示
* @strong{binary relationship} between two atomic types. The
* association can be defined over:
* @table @code
* @item void:
* virtual-OIDs: a densely ascending column of OIDs (takes zero-storage).—— 虚拟空间,列递增在OID在(0开始)
* @item bit:
* Booleans, implemented as one byte values. ——值,用1个byte实现
* @item bte:
* Tiny (1-byte) integers (8-bit @strong{integer}s).
* @item sht:
* Short integers (16-bit @strong{integer}s).
* @item int:
* This is the C @strong{int} type (32-bit).
* @item oid:
* Unique @strong{long int} values uses as object identifier. Highest
* bit cleared always. Thus, oids-s are 31-bit numbers on
* 32-bit systems, and 63-bit numbers on 64-bit systems. ——32位系统,oid是31位数字,64位系统是63位数字
* @item wrd:
* Machine-word sized integers
* (32-bit on 32-bit systems, 64-bit on 64-bit systems).
* @item ptr:
* Memory pointer values. DEPRECATED. Can only be stored in transient
* BATs.
* @item flt:
* The IEEE @strong{float} type.
* @item dbl:
* The IEEE @strong{double} type.
* @item lng:
* Longs: the C @strong{long long} type (64-bit integers).
* @item str:
* UTF-8 strings (Unicode). A zero-terminated byte sequence.
* @item bat:
* Bat descriptor. This allows for recursive administered tables, but ——BAT描述符。允许递归管理,但是严重的事务管理。
* severely complicates transaction management. Therefore, they CAN
* ONLY BE STORED IN TRANSIENT BATs.
* @end table
*
* This model can be used as a back-end model underlying other -higher
* level- models, in order to achieve @strong{better performance} and
* @strong{data independence} in one go. The relational model and the
* object-oriented model can be mapped on BATs by vertically splitting
* every table (or class) for each attribute. Each such a column is
* then stored in a BAT with type @strong{bat[oid,attribute]}, where
* the unique object identifiers link tuples in the different BATs.
* Relationship attributes in the object-oriented model hence are
* mapped to @strong{bat[oid,oid]} tables, being equivalent to the
* concept of @emph{join indexes} @tex [@cite{Valduriez87}] @end tex .
*
* The set of built-in types can be extended with user-defined types
* through an ADT interface. They are linked with the kernel to
* obtain an enhanced library, or they are dynamically loaded upon
* request.
*
* Types can be derived from other types. They represent something
* different than that from which they are derived, but their internal
* storage management is equal. This feature facilitates the work of
* extension programmers, by enabling reuse of implementation code,
* but is also used to keep the GDK code portable from 32-bits to
* 64-bits machines: the @strong{oid} and @strong{ptr} types are
* derived from @strong{int} on 32-bits machines, but is derived from
* @strong{lng} on 64 bits machines. This requires changes in only two
* lines of code each.
*
* To accelerate lookup and search in BATs, GDK supports one built-in
* search accelerator: hash tables. We choose an implementation
* efficient for main-memory: bucket chained hash
* @tex
* [@cite{LehCar86,Analyti92}]
* @end tex
* . Alternatively, when the table is sorted, it will resort to
* merge-scan operations or binary lookups.
为了在BAT上查询,GSK提供了hash tables。提供了bucket chained hash。存储的表,通过merge-scan或者二进制查找重新装载。
* BATs are built on the concept of heaps, which are large pieces of
* main memory. They can also consist of virtual memory, in case the
* working set exceeds main-memory. In this case, GDK supports
* operations that cluster the heaps of a BAT, in order to improve
* performance of its main-memory.
BAT构建在堆。GDK提供BAT堆的cluster,用于提高在主存中的性能。
* @- Rationale
* The rationale for choosing a BAT as the building block for both
* relational and object-oriented system is based on the following
* observations:
选择BAT的根据如下:
* @itemize
* @item -
* Given the fact that CPU speed and main-memory increase in current
* workstation hardware for the last years has been exceeding IO
* access speed increase, traditional disk-page oriented algorithms do
* no longer take best advantage of hardware, in most database
* operations.
*
* Instead of having a disk-block oriented kernel with a large memory
* cache, we choose to build a main-memory kernel, that only under
* large data volumes slowly degrades to IO-bound performance,
* comparable to traditional systems
* @tex
* [@cite{boncz95,boncz96}]
* @end tex
* .
*
* @item -
* Traditional (disk-based) relational systems move too much data
* around to save on (main-memory) join operations.
*
* The fully decomposed store (DSM
* @tex
* [@cite{Copeland85})]
* @end tex
* assures that only those attributes of a relation that are needed,
* will have to be accessed.
*
* @item -
* The data management issues for a binary association is much
* easier to deal with than traditional @emph{struct}-based approaches
* encountered in relational systems.
*
* @item -
* Object-oriented systems often maintain a double cache, one with the
* disk-based representation and a C pointer-based main-memory
* structure. This causes expensive conversions and replicated
* storage management. GDK does not do such `pointer swizzling'. It
* used virtual-memory (@strong{mmap()}) and buffer management advice
* (@strong{madvise()}) OS primitives to cache only once. Tables take
* the same form in memory as on disk, making the use of this
* technique transparent
对象数据库经常提供二级cace映射,但是GDK不提供指针重写,而是采用虚拟内存,由操作系统提供。(直接内存映射)
* @tex
* [@cite{oo7}]
* @end tex
* .
* @end itemize
*
* A RDBMS or OODBMS based on BATs strongly depends on our ability to
* efficiently support tuples and to handle small joins, respectively.
* The remainder of this document describes the Goblin Database kernel
* implementation at greater detail. It is organized as follows:
* @table @code
* @item @strong{GDK Interface}:
数据库内核实现详细内容如下,按照如下组织:
* It describes the global interface with which GDK sessions can be
* started and ended, and environment variables used.
*
* @item @strong{Binary Association Tables}: ——BAT表
* As already mentioned, these are the primary data structure of GDK.
* This chapter describes the kernel operations for creation,
* destruction and basic manipulation of BATs and BUNs (i.e. tuples:
* Binary UNits).
*
* @item @strong{BAT Buffer Pool:} ——BAT 缓存
*
* All BATs are registered in the BAT Buffer Pool. This directory is
* used to guide swapping in and out of BATs. Here we find routines
* that guide this swapping process.
*
* @item @strong{GDK Extensibility:} ——GDK扩展
*
* Atoms can be defined using a unified ADT interface. There is also
* an interface to extend the GDK library with dynamically linked
* object code.
*
* @item @strong{GDK Utilities:} ——GDK工具
*
* Memory allocation and error handling primitives are
* provided. Layers built on top of GDK should use them, for proper
* system monitoring. Thread management is also included here.
*
* @item @strong{Transaction Management:} ——事务管理
*
* For the time being, we just provide BAT-grained concurrency and
* global transactions. Work is needed here.
*
* @item @strong{BAT Alignment:} ——对齐
* Due to the mapping of multi-ary datamodels onto the BAT model, we
* expect many correspondences among BATs, e.g.
* @emph{bat(oid,attr1),.. bat(oid,attrN)} vertical
* decompositions. Frequent activities will be to jump from one
* attribute to the other (`bunhopping'). If the head columns are
* equal lists in two BATs, merge or even array lookups can be used
* instead of hash lookups. The alignment interface makes these
* relations explicitly manageable.
*
* In GDK, complex data models are mapped with DSM on binary tables.
* Usually, one decomposes @emph{N}-ary relations into @emph{N} BATs
* with an @strong{oid} in the head column, and the attribute in the
* tail column. There may well be groups of tables that have the same
* sets of @strong{oid}s, equally ordered. The alignment interface is
* intended to make this explicit. Implementations can use this
* interface to detect this situation, and use cheaper algorithms
* (like merge-join, or even array lookup) instead.
*
* @item @strong{BAT Iterators:} ——迭代
*
* Iterators are C macros that generally encapsulate a complex
* for-loop. They would be the equivalent of cursors in the SQL
* model. The macro interface (instead of a function call interface)
* is chosen to achieve speed when iterating main-memory tables.
*
* @item @strong{Common BAT Operations:}——操作
*
* These are much used operations on BATs, such as aggregate functions
* and relational operators. They are implemented in terms of BAT- and
* BUN-manipulation GDK primitives.
* @end table
*
* @+ Interface Files
* In this section we summarize the user interface to the GDK library.
* It consist of a header file (gdk.h) and an object library
* (gdklib.a), which implements the required functionality. The header
* file must be included in any program that uses the library. The
* library must be linked with such a program.
在gdk.h中汇总了所有用户接口。
* @- Database Context
*
* The MonetDB environment settings are collected in a configuration
* file. Amongst others it contains the location of the database
* directory. First, the database directory is closed for other
* servers running at the same time. Second, performance enhancements
* may take effect, such as locking the code into memory (if the OS
* permits) and preloading the data dictionary. An error at this
* stage normally lead to an abort.
*/
/* Heap storage modes */
typedef enum {
STORE_MEM = 0, /* load into GDKmalloced memory */
STORE_MMAP = 1, /* mmap() into virtual memory */
STORE_PRIV = 2, /* BAT copy of copy-on-write mmap */
STORE_INVALID /* invalid value, used to indicate error */
} storage_t;
typedef struct {
size_t maxsize; /* deprecated: kept equal to size */
size_t free; /* index where free area starts. */
size_t size; /* size of the heap (bytes) */
char *base; /* base pointer in memory. */
str filename; /* file containing image of the heap */
unsigned int copied:1, /* a copy of an existing map. */
hashash:1,/* the string heap contains hash values */
forcemap:1; /* force STORE_MMAP even if heap exists */
storage_t storage; /* storage mode (mmap/malloc). */
storage_t newstorage; /* new desired storage mode at re-allocation. */
bte dirty; /* specific heap dirty marker */
bat parentid; /* cache id of VIEW parent bat */
} Heap;
typedef struct {
int type; /* type of index entity */
BUN lim; /* collision list size */
BUN mask; /* number of hash buckets-1 (power of 2) */
BUN *hash; /* hash table */
BUN *link; /* collision list */
Heap *heap; /* heap where the hash is stored */
} Hash;
typedef struct {
union { /* storage is first in the record */
int ival;
oid oval;
sht shval;
bte btval;
wrd wval;
flt fval;
ptr pval;
struct BAT *Bval; /* this field is only used by mel */
bat bval;
str sval;
dbl dval;
lng lval;
} val;
int len, vtype;
} *ValPtr, ValRecord;
typedef struct {
MT_Id tid; /* which thread created it */
int stamp; /* BAT recent creation stamp */
unsigned int
copiedtodisk:1, /* once written */
dirty:2, /* dirty wrt disk? */
dirtyflushed:1, /* was dirty before commit started? */
descdirty:1, /* bat descriptor dirty marker */
set:1, /* real set semantics */
restricted:2, /* access priviliges */
persistence:1, /* should the BAT persist on disk? */
unused:23; /* value=0 for now */
int sharecnt; /* incoming view count */
char map_head; /* mmap mode for head bun heap */
char map_tail; /* mmap mode for tail bun heap */
char map_hheap; /* mmap mode for head atom heap */
char map_theap; /* mmap mode for tail atom heap */
} BATrec;
typedef struct {
/* delta status administration */
BUN deleted; /* start of deleted elements */
BUN first; /* to store next deletion */
BUN inserted; /* start of inserted elements */
BUN count; /* tuple count */
BUN capacity; /* tuple capacity */
} BUNrec;
typedef struct PROPrec {
int id;
ValRecord v;
struct PROPrec *next; /* simple chain of properties */
} PROPrec;
/* see also comment near BATassertProps() for more information about
* the properties */
typedef struct {
str id; /* label for head/tail column */
unsigned short width; /* byte-width of the atom array */
bte type; /* type id. */
bte shift; /* log2 of bunwidth */
unsigned int
varsized:1, /* varsized (1) or fixedsized (0) */
key:2, /* duplicates allowed? */
dense:1, /* OID only: only consecutive values */
nonil:1, /* nonil isn't propchecked yet */
nil:1, /* there is a nil in the column */
sorted:1, /* column is sorted in ascending order */
revsorted:1; /* column is sorted in descending order */
oid align; /* OID for sync alignment */
BUN nokey[2]; /* positions that prove key ==FALSE */
BUN nosorted; /* position that proves sorted==FALSE */
BUN norevsorted; /* position that proves revsorted==FALSE */
BUN nodense; /* position that proves dense==FALSE */
oid seq; /* start of dense head sequence */
Heap heap; /* space for the column. */
Heap *vheap; /* space for the varsized data. */
Hash *hash; /* hash table */
PROPrec *props; /* list of dynamic properties stored in the bat descriptor */
} COLrec;
typedef struct BAT {
/* static bat properties */
bat batCacheid; /* index into BBP */
/* dynamic column properties */
COLrec *H; /* column info */
COLrec *T; /* column info */
/* dynamic bat properties */
BATrec *P; /* cache and sort info */
BUNrec *U; /* cache and sort info */
} BAT;
typedef struct BATiter {
BAT *b;
oid hvid, tvid;
} BATiter;
/*
* The different parts of which a BAT consists are physically stored
* next to each other in the BATstore type.
*/
typedef struct {
BAT B; /* storage for BAT descriptor */
BAT BM; /* mirror (reverse) BAT */
COLrec H; /* storage for head column */
COLrec T; /* storage for tail column */
BATrec P; /* storage for BATrec */
BUNrec U; /* storage for BUNrec */
} BATstore;
/* structure used by HEAP_check functions */
typedef struct {
size_t minpos; /* minimum block byte-index */
size_t maxpos; /* maximum block byte-index */
int alignment; /* block index alignment */
int *validmask; /* bitmap with all valid byte-indices
* first bit corresponds with 'minpos';
* 2nd bit with 'minpos+alignment', etc
*/
} HeapRepair;
gdk_export void HEAP_initialize(
Heap *heap, /* nbytes -- Initial size of the heap. */
size_t nbytes, /* alignment -- for objects on the heap. */
size_t nprivate, /* nprivate -- Size of private space */
int alignment /* alignment restriction for allocated chunks */
);
typedef struct {
BAT *cache[2]; /* if loaded: BAT* handle + reverse */
str logical[2]; /* logical name + reverse */
str bak[2]; /* logical name + reverse backups */
bat next[2]; /* next BBP slot in linked list */
BATstore *desc; /* the BAT descriptor */
str physical; /* dir + basename for storage */
str options; /* A string list of options */
int refs; /* in-memory references on which the loaded status of a BAT relies */
int lrefs; /* logical references on which the existence of a BAT relies */
int lastused; /* BBP LRU stamp */
volatile int status; /* status mask used for spin locking */
/* MT_Id pid; non-zero thread-id if this BAT is private */
} BBPrec;
typedef struct {
/* simple attributes */
char name[IDLENGTH];
int storage; /* stored as another type? */
short linear; /* atom can be ordered linearly */
short size; /* fixed size of atom */
short align; /* alignment condition for values */
short deleting; /* set if unloading */
int varsized; /* variable-size or fixed-sized */
/* automatically generated fields */
ptr atomNull; /* global nil value */
/* generic (fixed + varsized atom) ADT functions */
int (*atomFromStr) (const char *s, int *len, ptr *dst);
int (*atomToStr) (str *s, int *len, const void *src);
void *(*atomRead) (ptr a, stream *s, size_t cnt);
int (*atomWrite) (const void *a, stream *s, size_t cnt);
int (*atomCmp) (const void *v1, const void *v2);
BUN (*atomHash) (const void *v);
/* optional functions */
void (*atomConvert) (ptr v, int direction);
int (*atomFix) (const void *atom);
int (*atomUnfix) (const void *atom);
/* varsized atom-only ADT functions */
var_t (*atomPut) (Heap *, var_t *off, const void *src);
void (*atomDel) (Heap *, var_t *atom);
int (*atomLen) (const void *atom);
void (*atomHeap) (Heap *, size_t);
/* optional functions */
void (*atomHeapConvert) (Heap *, int direction);
int (*atomHeapCheck) (Heap *, HeapRepair *);
} atomDesc;
相关文章推荐
- 数据结构之(图存储结构之)邻接表
- 数据结构之(图存储结构之)邻接矩阵
- TS数据结构分析
- 链栈[优化].cpp [数据结构实现 之 栈]
- 数据结构
- 算法与数据结构--在顺序线性表L中查找第1个值与e满足compare()的元素的为序--算法2.5
- 数据结构(线性表的操作:建立表,插入元素,删除元素,取元素,置空)
- 笔试常考的数据结构-单链表操作实现
- 第六章:数据结构基础。第一部分
- Bootloader/Eboot与OS之间共享信息----BSP_ARGS数据结构
- 数据结构之线性表(linear_list)一
- 图说数据结构
- 详细讲解二叉树三种遍历方式的递归与非递归实现
- 【数据结构与算法】二叉树递归与非递归遍历(附完整源码)
- 20131024: 树状数组入门; 树堆入门; 二叉树的树形打印
- C数值算法(第二版).pdf免费下载
- Java数据结构和算法中文第二版.pdf免费下载
- The Falling Leaves UVA699
- 数据结构随笔
- 个人关于数据结构的看法