您的位置:首页 > 理论基础 > 数据结构算法

MonetDB源代码——数据结构GDK

2013-10-24 20:33 246 查看
gdk.h是封装的API,存储层的数据结构也在这里定义。

文章先翻译源码中的注释信息,在解释其中的数据结构。

* GDK is a C library that provides ACID properties on a DSM model ——GDK是c连接库,提供DSM模型上的ACID操作。

* @tex

* [@cite{Copeland85}]

* @end tex

* , using main-memory

* database algorithms

* @tex

* [@cite{Garcia-Molina92}]

* @end tex

* built on virtual-memory OS primitives and multi-threaded parallelism —— 构建在虚拟内存和多线程并行处理.

* Its implementation has undergone various changes over its decade ——是一个经历了10年变化,外部需求构建的一个快速强大的数据库系统。

* of development, many of which were driven by external needs to

* obtain a robust and fast database system.

*

* The coding scheme explored in GDK has also laid a foundation to

* communicate over time experiences and to provide (hopefully)

* helpful advice near to the place where the code-reader needs it.

* Of course, over such a long time the documentation diverges from

* reality. Especially in areas where the environment of this package

* is being described.

* Consider such deviations as historic landmarks, e.g. crystallization

* of brave ideas and mistakes rectified at a later stage.

*

* @+ Short Outline

* The facilities provided in this implementation are: ——提供如下实现:

* @itemize

* @item

* GDK or Goblin Database Kernel routines for session management ——数据库核心session管理

* @item

* BAT routines that define the primitive operations on the ——BAT定义和操作

* database tables (BATs).

* @item

* BBP routines to manage the BAT Buffer Pool (BBP). ——在BAT的Buffer池

* @item

* ATOM routines to manipulate primitive types, define new types ——使用ADT接口定义新类型

* using an ADT interface.

* @item

* HEAP routines for manipulating heaps: linear spaces of memory ——HEAP 内存堆管理

* that are GDK's vehicle of mass storage (on which BATs are built).

* @item

* DELTA routines to access inserted/deleted elements within a ——DELTA 事务中插入删除元素

* transaction.

* @item

* HASH routines for manipulating GDK's built-in linear-chained —— HASH 线形Hash表,在BAT上查找

* hash tables, for accelerating lookup searches on BATs.

* @item

* TM routines that provide basic transaction management primitives. ——TM基础事务管理

* @item

* TRG routines that provided active database support. [DEPRECATED]——TRG 数据库支持

* @item

* ALIGN routines that implement BAT alignment management. ——ALIGN BAT对齐管理

* @end itemize

*

* The Binary Association Table (BAT) is the lowest level of storage ——在BAT是数据库系统中低层存储结构

* considered in the Goblin runtime system

* @tex

* [@cite{Goblin}]

* @end tex

* . A BAT is a

* self-descriptive main-memory structure that represents the ——BAT是一种自描述的内存结构表示

* @strong{binary relationship} between two atomic types. The

* association can be defined over:

* @table @code

* @item void:

* virtual-OIDs: a densely ascending column of OIDs (takes zero-storage).—— 虚拟空间,列递增在OID在(0开始)

* @item bit:

* Booleans, implemented as one byte values. ——值,用1个byte实现

* @item bte:

* Tiny (1-byte) integers (8-bit @strong{integer}s).

* @item sht:

* Short integers (16-bit @strong{integer}s).

* @item int:

* This is the C @strong{int} type (32-bit).

* @item oid:

* Unique @strong{long int} values uses as object identifier. Highest

* bit cleared always. Thus, oids-s are 31-bit numbers on

* 32-bit systems, and 63-bit numbers on 64-bit systems. ——32位系统,oid是31位数字,64位系统是63位数字

* @item wrd:

* Machine-word sized integers

* (32-bit on 32-bit systems, 64-bit on 64-bit systems).

* @item ptr:

* Memory pointer values. DEPRECATED. Can only be stored in transient

* BATs.

* @item flt:

* The IEEE @strong{float} type.

* @item dbl:

* The IEEE @strong{double} type.

* @item lng:

* Longs: the C @strong{long long} type (64-bit integers).

* @item str:

* UTF-8 strings (Unicode). A zero-terminated byte sequence.

* @item bat:

* Bat descriptor. This allows for recursive administered tables, but ——BAT描述符。允许递归管理,但是严重的事务管理。

* severely complicates transaction management. Therefore, they CAN

* ONLY BE STORED IN TRANSIENT BATs.

* @end table

*

* This model can be used as a back-end model underlying other -higher

* level- models, in order to achieve @strong{better performance} and

* @strong{data independence} in one go. The relational model and the

* object-oriented model can be mapped on BATs by vertically splitting

* every table (or class) for each attribute. Each such a column is

* then stored in a BAT with type @strong{bat[oid,attribute]}, where

* the unique object identifiers link tuples in the different BATs.

* Relationship attributes in the object-oriented model hence are

* mapped to @strong{bat[oid,oid]} tables, being equivalent to the

* concept of @emph{join indexes} @tex [@cite{Valduriez87}] @end tex .

*

* The set of built-in types can be extended with user-defined types

* through an ADT interface. They are linked with the kernel to

* obtain an enhanced library, or they are dynamically loaded upon

* request.

*

* Types can be derived from other types. They represent something

* different than that from which they are derived, but their internal

* storage management is equal. This feature facilitates the work of

* extension programmers, by enabling reuse of implementation code,

* but is also used to keep the GDK code portable from 32-bits to

* 64-bits machines: the @strong{oid} and @strong{ptr} types are

* derived from @strong{int} on 32-bits machines, but is derived from

* @strong{lng} on 64 bits machines. This requires changes in only two

* lines of code each.

*

* To accelerate lookup and search in BATs, GDK supports one built-in

* search accelerator: hash tables. We choose an implementation

* efficient for main-memory: bucket chained hash

* @tex

* [@cite{LehCar86,Analyti92}]

* @end tex

* . Alternatively, when the table is sorted, it will resort to

* merge-scan operations or binary lookups.

为了在BAT上查询,GSK提供了hash tables。提供了bucket chained hash。存储的表,通过merge-scan或者二进制查找重新装载。

* BATs are built on the concept of heaps, which are large pieces of

* main memory. They can also consist of virtual memory, in case the

* working set exceeds main-memory. In this case, GDK supports

* operations that cluster the heaps of a BAT, in order to improve

* performance of its main-memory.

BAT构建在堆。GDK提供BAT堆的cluster,用于提高在主存中的性能。

* @- Rationale

* The rationale for choosing a BAT as the building block for both

* relational and object-oriented system is based on the following

* observations:

选择BAT的根据如下:

* @itemize

* @item -

* Given the fact that CPU speed and main-memory increase in current

* workstation hardware for the last years has been exceeding IO

* access speed increase, traditional disk-page oriented algorithms do

* no longer take best advantage of hardware, in most database

* operations.

*

* Instead of having a disk-block oriented kernel with a large memory

* cache, we choose to build a main-memory kernel, that only under

* large data volumes slowly degrades to IO-bound performance,

* comparable to traditional systems

* @tex

* [@cite{boncz95,boncz96}]

* @end tex

* .

*

* @item -

* Traditional (disk-based) relational systems move too much data

* around to save on (main-memory) join operations.

*

* The fully decomposed store (DSM

* @tex

* [@cite{Copeland85})]

* @end tex

* assures that only those attributes of a relation that are needed,

* will have to be accessed.

*

* @item -

* The data management issues for a binary association is much

* easier to deal with than traditional @emph{struct}-based approaches

* encountered in relational systems.

*

* @item -

* Object-oriented systems often maintain a double cache, one with the

* disk-based representation and a C pointer-based main-memory

* structure. This causes expensive conversions and replicated

* storage management. GDK does not do such `pointer swizzling'. It

* used virtual-memory (@strong{mmap()}) and buffer management advice

* (@strong{madvise()}) OS primitives to cache only once. Tables take

* the same form in memory as on disk, making the use of this

* technique transparent

对象数据库经常提供二级cace映射,但是GDK不提供指针重写,而是采用虚拟内存,由操作系统提供。(直接内存映射)

* @tex

* [@cite{oo7}]

* @end tex

* .

* @end itemize

*

* A RDBMS or OODBMS based on BATs strongly depends on our ability to

* efficiently support tuples and to handle small joins, respectively.

* The remainder of this document describes the Goblin Database kernel

* implementation at greater detail. It is organized as follows:

* @table @code

* @item @strong{GDK Interface}:

数据库内核实现详细内容如下,按照如下组织:

* It describes the global interface with which GDK sessions can be

* started and ended, and environment variables used.

*

* @item @strong{Binary Association Tables}: ——BAT表

* As already mentioned, these are the primary data structure of GDK.

* This chapter describes the kernel operations for creation,

* destruction and basic manipulation of BATs and BUNs (i.e. tuples:

* Binary UNits).

*

* @item @strong{BAT Buffer Pool:} ——BAT 缓存

*

* All BATs are registered in the BAT Buffer Pool. This directory is

* used to guide swapping in and out of BATs. Here we find routines

* that guide this swapping process.

*

* @item @strong{GDK Extensibility:} ——GDK扩展

*

* Atoms can be defined using a unified ADT interface. There is also

* an interface to extend the GDK library with dynamically linked

* object code.

*

* @item @strong{GDK Utilities:} ——GDK工具

*

* Memory allocation and error handling primitives are

* provided. Layers built on top of GDK should use them, for proper

* system monitoring. Thread management is also included here.

*

* @item @strong{Transaction Management:} ——事务管理

*

* For the time being, we just provide BAT-grained concurrency and

* global transactions. Work is needed here.

*

* @item @strong{BAT Alignment:} ——对齐

* Due to the mapping of multi-ary datamodels onto the BAT model, we

* expect many correspondences among BATs, e.g.

* @emph{bat(oid,attr1),.. bat(oid,attrN)} vertical

* decompositions. Frequent activities will be to jump from one

* attribute to the other (`bunhopping'). If the head columns are

* equal lists in two BATs, merge or even array lookups can be used

* instead of hash lookups. The alignment interface makes these

* relations explicitly manageable.

*

* In GDK, complex data models are mapped with DSM on binary tables.

* Usually, one decomposes @emph{N}-ary relations into @emph{N} BATs

* with an @strong{oid} in the head column, and the attribute in the

* tail column. There may well be groups of tables that have the same

* sets of @strong{oid}s, equally ordered. The alignment interface is

* intended to make this explicit. Implementations can use this

* interface to detect this situation, and use cheaper algorithms

* (like merge-join, or even array lookup) instead.

*

* @item @strong{BAT Iterators:} ——迭代

*

* Iterators are C macros that generally encapsulate a complex

* for-loop. They would be the equivalent of cursors in the SQL

* model. The macro interface (instead of a function call interface)

* is chosen to achieve speed when iterating main-memory tables.

*

* @item @strong{Common BAT Operations:}——操作

*

* These are much used operations on BATs, such as aggregate functions

* and relational operators. They are implemented in terms of BAT- and

* BUN-manipulation GDK primitives.

* @end table

*

* @+ Interface Files

* In this section we summarize the user interface to the GDK library.

* It consist of a header file (gdk.h) and an object library

* (gdklib.a), which implements the required functionality. The header

* file must be included in any program that uses the library. The

* library must be linked with such a program.

在gdk.h中汇总了所有用户接口。

* @- Database Context

*

* The MonetDB environment settings are collected in a configuration

* file. Amongst others it contains the location of the database

* directory. First, the database directory is closed for other

* servers running at the same time. Second, performance enhancements

* may take effect, such as locking the code into memory (if the OS

* permits) and preloading the data dictionary. An error at this

* stage normally lead to an abort.

*/

/* Heap storage modes */

typedef enum {

STORE_MEM = 0, /* load into GDKmalloced memory */

STORE_MMAP = 1, /* mmap() into virtual memory */

STORE_PRIV = 2, /* BAT copy of copy-on-write mmap */

STORE_INVALID /* invalid value, used to indicate error */

} storage_t;

typedef struct {

size_t maxsize; /* deprecated: kept equal to size */

size_t free; /* index where free area starts. */

size_t size; /* size of the heap (bytes) */

char *base; /* base pointer in memory. */

str filename; /* file containing image of the heap */

unsigned int copied:1, /* a copy of an existing map. */

hashash:1,/* the string heap contains hash values */

forcemap:1; /* force STORE_MMAP even if heap exists */

storage_t storage; /* storage mode (mmap/malloc). */

storage_t newstorage; /* new desired storage mode at re-allocation. */

bte dirty; /* specific heap dirty marker */

bat parentid; /* cache id of VIEW parent bat */

} Heap;

typedef struct {

int type; /* type of index entity */

BUN lim; /* collision list size */

BUN mask; /* number of hash buckets-1 (power of 2) */

BUN *hash; /* hash table */

BUN *link; /* collision list */

Heap *heap; /* heap where the hash is stored */

} Hash;

typedef struct {

union { /* storage is first in the record */

int ival;

oid oval;

sht shval;

bte btval;

wrd wval;

flt fval;

ptr pval;

struct BAT *Bval; /* this field is only used by mel */

bat bval;

str sval;

dbl dval;

lng lval;

} val;

int len, vtype;

} *ValPtr, ValRecord;

typedef struct {

MT_Id tid; /* which thread created it */

int stamp; /* BAT recent creation stamp */

unsigned int

copiedtodisk:1, /* once written */

dirty:2, /* dirty wrt disk? */

dirtyflushed:1, /* was dirty before commit started? */

descdirty:1, /* bat descriptor dirty marker */

set:1, /* real set semantics */

restricted:2, /* access priviliges */

persistence:1, /* should the BAT persist on disk? */

unused:23; /* value=0 for now */

int sharecnt; /* incoming view count */

char map_head; /* mmap mode for head bun heap */

char map_tail; /* mmap mode for tail bun heap */

char map_hheap; /* mmap mode for head atom heap */

char map_theap; /* mmap mode for tail atom heap */

} BATrec;

typedef struct {

/* delta status administration */

BUN deleted; /* start of deleted elements */

BUN first; /* to store next deletion */

BUN inserted; /* start of inserted elements */

BUN count; /* tuple count */

BUN capacity; /* tuple capacity */

} BUNrec;

typedef struct PROPrec {

int id;

ValRecord v;

struct PROPrec *next; /* simple chain of properties */

} PROPrec;

/* see also comment near BATassertProps() for more information about

* the properties */

typedef struct {

str id; /* label for head/tail column */

unsigned short width; /* byte-width of the atom array */

bte type; /* type id. */

bte shift; /* log2 of bunwidth */

unsigned int

varsized:1, /* varsized (1) or fixedsized (0) */

key:2, /* duplicates allowed? */

dense:1, /* OID only: only consecutive values */

nonil:1, /* nonil isn't propchecked yet */

nil:1, /* there is a nil in the column */

sorted:1, /* column is sorted in ascending order */

revsorted:1; /* column is sorted in descending order */

oid align; /* OID for sync alignment */

BUN nokey[2]; /* positions that prove key ==FALSE */

BUN nosorted; /* position that proves sorted==FALSE */

BUN norevsorted; /* position that proves revsorted==FALSE */

BUN nodense; /* position that proves dense==FALSE */

oid seq; /* start of dense head sequence */

Heap heap; /* space for the column. */

Heap *vheap; /* space for the varsized data. */

Hash *hash; /* hash table */

PROPrec *props; /* list of dynamic properties stored in the bat descriptor */

} COLrec;

typedef struct BAT {

/* static bat properties */

bat batCacheid; /* index into BBP */

/* dynamic column properties */

COLrec *H; /* column info */

COLrec *T; /* column info */

/* dynamic bat properties */

BATrec *P; /* cache and sort info */

BUNrec *U; /* cache and sort info */

} BAT;

typedef struct BATiter {

BAT *b;

oid hvid, tvid;

} BATiter;

/*

* The different parts of which a BAT consists are physically stored

* next to each other in the BATstore type.

*/

typedef struct {

BAT B; /* storage for BAT descriptor */

BAT BM; /* mirror (reverse) BAT */

COLrec H; /* storage for head column */

COLrec T; /* storage for tail column */

BATrec P; /* storage for BATrec */

BUNrec U; /* storage for BUNrec */

} BATstore;

/* structure used by HEAP_check functions */

typedef struct {

size_t minpos; /* minimum block byte-index */

size_t maxpos; /* maximum block byte-index */

int alignment; /* block index alignment */

int *validmask; /* bitmap with all valid byte-indices

* first bit corresponds with 'minpos';

* 2nd bit with 'minpos+alignment', etc

*/

} HeapRepair;

gdk_export void HEAP_initialize(

Heap *heap, /* nbytes -- Initial size of the heap. */

size_t nbytes, /* alignment -- for objects on the heap. */

size_t nprivate, /* nprivate -- Size of private space */

int alignment /* alignment restriction for allocated chunks */

);

typedef struct {

BAT *cache[2]; /* if loaded: BAT* handle + reverse */

str logical[2]; /* logical name + reverse */

str bak[2]; /* logical name + reverse backups */

bat next[2]; /* next BBP slot in linked list */

BATstore *desc; /* the BAT descriptor */

str physical; /* dir + basename for storage */

str options; /* A string list of options */

int refs; /* in-memory references on which the loaded status of a BAT relies */

int lrefs; /* logical references on which the existence of a BAT relies */

int lastused; /* BBP LRU stamp */

volatile int status; /* status mask used for spin locking */

/* MT_Id pid; non-zero thread-id if this BAT is private */

} BBPrec;

typedef struct {

/* simple attributes */

char name[IDLENGTH];

int storage; /* stored as another type? */

short linear; /* atom can be ordered linearly */

short size; /* fixed size of atom */

short align; /* alignment condition for values */

short deleting; /* set if unloading */

int varsized; /* variable-size or fixed-sized */

/* automatically generated fields */

ptr atomNull; /* global nil value */

/* generic (fixed + varsized atom) ADT functions */

int (*atomFromStr) (const char *s, int *len, ptr *dst);

int (*atomToStr) (str *s, int *len, const void *src);

void *(*atomRead) (ptr a, stream *s, size_t cnt);

int (*atomWrite) (const void *a, stream *s, size_t cnt);

int (*atomCmp) (const void *v1, const void *v2);

BUN (*atomHash) (const void *v);

/* optional functions */

void (*atomConvert) (ptr v, int direction);

int (*atomFix) (const void *atom);

int (*atomUnfix) (const void *atom);

/* varsized atom-only ADT functions */

var_t (*atomPut) (Heap *, var_t *off, const void *src);

void (*atomDel) (Heap *, var_t *atom);

int (*atomLen) (const void *atom);

void (*atomHeap) (Heap *, size_t);

/* optional functions */

void (*atomHeapConvert) (Heap *, int direction);

int (*atomHeapCheck) (Heap *, HeapRepair *);

} atomDesc;
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: