希尔排序 时间复杂度 证明
2016-04-24 10:52
543 查看
|
implement. However, its complexity analysis is a little more sophisticated.
Idea
The idea of Shellsort is the following:arrange the data sequence in a two-dimensional array
sort the columns of the array
The effect is that the data sequence is partially sorted. The process above is repeated, but each time with a narrower array, i.e. with a smaller number of columns. In the last step, the array consists of only one column. In each step, the sortedness of the
sequence is increased, until in the last step it is completely sorted. However, the number of sorting operations necessary in each step is limited, due to the presortedness of the sequence obtained in the preceding steps.
Example: Let 3 7 9 0 5 1 6 8 4 2 0 6 1 5 7 3 4 9 8 2 be the data sequence to be sorted. First, it is arranged in an array with
7 columns (left), then the columns are sorted (right):
|
|
|
|
Implementation
Actually, the data sequence is not arranged in a two-dimensional array, but held in a one-dimensional array that is indexed appropriately. For instance, data elements at positions 0, 5, 10, 15 etc. would form the first column of an array with 5 columns. The"columns" obtained by indexing in this way are sorted with insertion sort, since this method has a good performance with presorted sequences.
The following program sorts an array a from index position 0 through n-1. The number of columns
used for arranging data in each step is in array cols. Thus, data are arranged in 1391376 columns in the first step and in one column in the last step. (Note that essentially
nothing is done if the number of columns h is larger than the number of data elements n.) Each
column is sorted byinsertion sort. First, data of the second row (beginning at i = h)
are sorted to the correct position in their column, then data of the third row (when i reaches value 2h)
etc.
Algorithm Shellsort
void shellsort (int[] a, int n) { int i, j, k, h, v; int[] cols = {1391376, 463792, 198768, 86961, 33936, 13776, 4592, 1968, 861, 336, 112, 48, 21, 7, 3, 1} for (k=0; k<16; k++) { h=cols[k]; for (i=h; i<n; i++) { v=a[i]; j=i; while (j>=h && a[j-h]>v) { a[j]=a[j-h]; j=j-h; } a[j]=v; } } } |
Analysis
The correctness of the algorithm follows from the fact that in the last step (with h = 1) an ordinary insertion sort is performed on the whole array. But since data are presortedby the preceding steps (h = 3, 7, 21, ...) only few insertion sort steps are sufficient. How many exactly will be the subject of the following analysis. The above sequence of h's
(denoted as h-sequence in the following) is just one of several possible; actually, the performance of Shellsort depends on which h-sequence
is used.
Basics
The postage for a letter is 16F, for a postcard it is 11F. But there are only stamps of 3F and 7F available. Is it possible to stamp the letter and the postcard exactly?
Obviously, the problem is to represent the numbers 16 and 11 as linear combinations k·3 + l·7 with
nonnegative integer coefficientsk and l. Which natural numbers can be combined from multiples
of 3 and 7 and which cannot?
Definition: Let g, h
.
We call a number f a combination of g and h,
if f can be represented as a linear combination f = kg + lhwith
coefficients k, l
0.
The letter can be stamped exactly, since 16 is a combination of 3 and 7, namely 16 = 3·3 + 1·7.
Definition: Let g, h
be
relatively prime. With N(g, h)
we denote the (finite) set of all natural numbers that are not combinations of g and h and with
γ(g, h) the largest of these numbers:
N(g, h) = { f
| ¬
k, l
0 : f = kg + lh }
γ(g, h) = max;(N(g, h))
Example: Let g = 3, h = 7.
Then
N(g, h) = {1,
2, 4, 5, 8, 11}, and γ(g, h) = 11
The postcard cannot be stamped exactly, since 11 is not a combination of 3 and 7.
Proposition: Let g, h
be
relatively prime. Then
γ(g, h) = (g-1)·(h-1) – 1
i.e. every number f with f
(g-1)·(h-1)
is a combination of g and h.
Proof: (Exercise)
h-sortedness
Definition: Let h
0.
A sequence a1, ..., an is
called h-sorted, if for all i
{1, ..., n-h}
it holds that
ai
ai+h
A h-sorted sequence is obtained by arranging the sequence in an array with h columns and then sorting
the columns. A 1-sorted sequence is sorted.
Proposition: Let g, h
.
A g-sorted sequence remains g-sorted after h-sorting
it.
Proof: see [Knu
73]
Definition: A sequence that is g-sorted and h-sorted
is called g,h-sorted.
Proposition: A g,h-sorted
sequence is g+h-sorted.
Proof: We have to show that for all i
{1, ..., n-(g+h)}
it holds that
ai
ai+g+h
But this is the case because ai
ai+g,
since the sequence is g-sorted, and ai+g
ai+g+h,
since the sequence is h-sorted.
As an immediate consequence, we have the following
Proposition: If a sequence is g,h-sorted,
then it is kg+lh-sorted for all k, l
0,
i.e. the sequence is f-sorted for each f that is a combination of g and h.
Proposition: Let a be a g,h-sorted
sequence, where g and h are relatively prime. Then for all elements ai and aj the
following holds:
j – i
(g-1)·(h-1)
ai
aj
i.e. to the right of each ai only the next
(g-1)·(h-1) – 1 elements can be smaller.
Proof: Let j – i = f
(g-1)·(h-1),
then f is a combination of g and h.
Therefore the sequence is f-sorted, which means that
ai
ai+f = aj
Proposition: Let a be a g,h-sorted
sequence, where g and h are relatively prime, and let d be
a variable. If both g and h are inO(d),
then O(n·d) sorting
steps are sufficient for d-sorting the sequence.
Proof: To the right of each element ai at
most (g-1)·(h-1) – 1 elements are smaller. d-sorting
the sequence means arranging it as a two-dimensional array with d columns and sorting the columns.
In the column under ai only every d-th
of these smaller elements can occur. This means that ai has to be exchanged with at
most ((g-1)·(h-1) – 1) / d elements
or, since g and h are in O(d),
with O(d) elements.
Since this holds for all ai (i = 1,
..., n), there are O(n·d)
sorting steps needed for d-sorting the sequence.
From this proposition upper bounds for the complexity of Shellsort can be derived.
Upper bounds
Theorem: With the h-sequence 1, 3, 7, 15, 31, 63,
127, ..., 2k – 1, ... Shellsort needs O(n·
n)
steps for sorting a sequence of length n (Papernov/Stasevic [PS
65]).
Proof: Let ht be
the h closest to
n.
We analyze the behavior of Shellsort separately for the elements hk with k
t and
with k > t.
Let k
t.
Since hk = 2k – 1
we have the conditions mentioned above that hk+1 and hk+2 are
relatively prime and in O(hk).
Therefore, O(n·hk)
sorting steps suffice for hk-sorting the data sequence. Since the hk form
a geometric series, the sum of allhk with k = 1, ..., t is
in O(ht) = O(
n).
Thus O(n·
n)
sorting steps are needed for this part where k
t.
Now let k > t. When the sequence is arranged as an
array with hk columns there are n/hk elements
in each column. Thus,O((n/hk)2)
sorting steps are needed to sort each column, since insertion sort has quadratic complexity. There are hk columns,
therefore the number of sorting steps for hk-sorting the entire data sequence is in O((n/hk)2·hk) = O(n·n/hk).
Again, the n/hkform
a geometric series whose sum is in O(n/ht) = O(
n).
Therefore, again O(n·
n)
steps are needed for this part wherek > t.
It can be shown that for this h-sequence the upper bound is tight. But there is another h-sequence
that leads to a more efficient behavior of Shellsort.
Theorem: With the h-sequence 1, 2, 3, 4, 6, 8, 9,
12, 16, ..., 2p3q, ... Shellsort needs O(n·log(n)2)
steps for sorting a sequence of length n (Pratt [Pra 79]).
Proof: If g = 2 and h = 3,
then γ(g, h) = (g-1)·(h-1) – 1 = 1,
i.e. in a 2,3-sorted sequence to the right of each element only the next element can be smaller. Therefore, O(n)
sorting steps suffice to sort the sequence with insertion sort. Considering elements with odd and with even index separately, it becomes clear that again O(n)
sorting steps suffice to make a 4,6-sorted sequence 2-sorted. Similarly, O(n) sorting steps suffice
to make a 6,9-sorted sequence 3-sorted and so on.
The above h-sequence has the property that for each hk also
2hk and 3hk occurs,
so O(n) sorting steps suffice for each hk.
Altogether there are log(n)2 elements in the h-sequence; thus the complexity of Shellsort
with this h-sequence is inO(n·log(n)2).
The h-sequence of Pratt performs best asymptotically, but it consists of log(n)2 elements.
Particularly, if the data sequence is presorted, a h-sequence with less elements is better, since the data sequence has to be scanned (by the for-i-loop in the program) for each hk,
even if only a few sorting steps are performed.
By combining the arguments of these two theorems h-sequences with O(log(n))
elements can be derived that lead to a very good performance in practice, as for instance the h-sequence of the program (Sedgewick [Sed
96]). But unfortunately, there seems to be no h-sequence that gives Shellsort a worst case performance of O(n·log(n))
(see [Sed 96]).
It is an open question whether possibly the average complexity is in O(n·log(n)).
Sorting network
Shellsort can be implemented as a sorting network if the data-dependent insertionsort is replaced with bubble sort. With the h-sequence
2p3q the sorting network consists of O(n·log(n)2)
comparators. This is the same number of comparators as in bitonic sort.
The following figure shows the corresponding sorting network for n = 8.
Figure 1: Shellsort network for n=8 | |
Simulation
(Java applet for simulation of shellsort)
References
Shellsort was originally published by D.L. Shell [She 59].[Knu 73] | D.E. Knuth: The Art of Computer Programming, Vol. 3 - Sorting and Searching. Addison-Wesley (1973) | |
[Pra 79] | V. Pratt: Shellsort and Sorting Networks. Garland, New York (1979) | |
[PS 65] | A. Papernov, G. Stasevic: A Method of Information Sorting in Computer Memories. Problems of Information Transmission 1, 63-75 (1965) | |
[Sed 88] | R. Sedgewick: Algorithms. 2nd edition, Addison-Wesley (1988) | |
[Sed 96] | R. Sedgewick: Analysis of Shellsort and Related Algorithms. In: Josep Díaz, Maria Serna (Eds.): Algorithms - ESA '96, Fourth Annual European Symposium, Barcelona, Lecture Notes in Computer Science, Vol. 1136, Springer, 1-11 (1996) | |
[She 59] | D.L. Shell: A High-Speed Sorting Procedure. Communications of the ACM, 2, 7, 30-32 (1959) |
Next: |
H.W. Lang Fachhochschule Flensburg lang@fh-flensburg.de Impressum © Created: 29.01.1998 Updated: 15.02.2016 如果我有时间,就翻译一下吧 |
相关文章推荐
- Objective-C 内存管理
- 如何支持多种设备-语言篇
- SQL第一节课
- Java中处理异常中return关键字
- mysqlbinlog的使用
- 栈的应用实践——括号匹配的检验
- Ecshop PC 商品列表页面下拉加载商品
- 对象的引用及传递
- 启动图片
- QML视图元素  Visual Elements In QML Qt 5.6.0 Reference Documentation
- Java枚举
- Matlab SVM
- 第八周进度条
- C#利用POST实现杭电oj的AC自动机器人,AC率高达50%~~
- 加速Android Studio/Gradle构建
- JVM基础(三) JVM内存管理
- 帧率、码流与分辨率相关知识
- Quartus II 中调用ModelSim-Altera 报错问题
- [后端人员耍前端系列]KnockoutJs篇:使用WebApi+Bootstrap+KnockoutJs打造单页面程序
- 爬取博客数据