C语言中基本的输入输出函数
Oct122007C语言中基本的输入输出函数有:
putchar ():把变量中的一个字符常量输出到显示器屏幕上;
getchar ();从键盘上输入一个字符常量,此常量就是该函数的值;
printf ();把键盘中的各类数据,加以格式控制输出到显示器屏幕上;
scanf ();从键盘上输入各类数据,并存放到程序变量中;
puts ():把数组变量中的一个字符串常量输出到显示器屏幕上;
gets ():从键盘上输入一个字符串常量并放到程序的数组中.
sscanf(); 从一个字符串中提取各类数据。
putchar() 和 getchar() 顾名思议就是从输入流中获取一个字符和输出一个字符,比较简单,不再多讲。
例子如下:
char c = getchar();
putchar(c);
格式化输入输出scanf()和printf()是最有用的,所以重点讲一下。
printf():
一般形式:
printf("格式控制".输出列表);
eg : printf("a=%d,b=%f,c=%c\n",a,b,c);
1;格式控制.
格式控制是用双引号括起来的字符串,也称"转换控制字符串",它包含以下两部分信息.
格式说明:由"%"和格式字符组成,如%d,%f,%c,他的作用是把输出数据转换为指定格式输出,格式的说明总是由"%"字符开始的.
普通字符:需要原样输出的字符,或者是一些有特殊含义的字符,如\n,\t。
2;输出列表
就是需要输出的一些数据,也可以是表达式,如果在函数中需要输出多个变量或表达式,则要用逗号隔开.
一些特殊字符的输出:
单引号,双引号,和反斜杠的输出在前面加转义字符”\”
如:”\’” , “\”” , “\\”
%的输出用两个连在一起的%%,即printf(“%%”);
常用的格式说明如下:
格式字符
d 以十进制形式输出带符号整数(正数不输出符号)
o 以八进制形式输出无符号整数(不输出前缀O)
x 以十六进制形式输出无符号整数(不输出前缀OX)
u 以十进制形式输出无符号整数
f 以小数形式输出单精度实数
lf以小数形式输出双精度实数
e 以指数形式输出单、双精度实数
g 以%f%e中较短的输出宽度输出单、双精度实数
c 输出单个字符
s 输出字符串
这里强调一下:网上很多文章都说f 和lf是一样的,即不管单精度,双精度浮点数,都可以用f, 但我在POJ上做过测试,输出Double时用f确实也可以 ,但读入时,用f就报WA,所以大家如果对Double进行读写的话,都用lf吧。
说到Double,再啰嗦一句,建议大家要用到浮点数时都用Double,不要用float,因为在很多情况下,float精度不够会导致WA。
特殊:
对64位整数的输入输出,在POJ上的C++环境下(即VC),64位整数是:
__int64 (注意int前面是两个下划线)
输入输出格式为”%I64d”.
在G++环境下(即Dev C++) 64位整数是
long long
输入输出格式为”%lld”.
输出宽度
用十进制整数来表示输出的最少位数。 注意若实际位数多于定义的宽度,则按实际位数输出, 若实际位数少于定义的宽度则补以空格或0。
精度
精度格式符以“.”开头,后跟十进制整数。意义是:如果输出数字,则表示小数的位数;如果输出的是字符, 则表示输出字符的个数;若实际位数大于所定义的精度数,则截去超过的部分。
标志格式字符
- 结果左对齐,右边填空格
+ 输出符号(正号或负号)空格输出值为正时冠以空格,为负时冠以负号
例如:
double c=24212345.24232;
printf(“%020.4”); 表示输出精确到小数点后4位,输出占20位,若有空余的位补0.
scanf:
scanf的很多用法都是和printf对应的,故不再赘述。
说一下scanf一个特别好用的地方,就是可以滤去一些不想要的东西。
举例说明如下:
比如输入为日期 yyyy-mm-dd,就可以这样写:
int year,moth,day;
scanf(“%d-%d-%d”,&year,&moth,&day);
再比如:
scanf("%3d %*3d %2d",&m,&n); 输入113 118 69回车(系统将113赋予m,将69赋予n,因为*号表示跳过它相应的数据所以118不赋予任何变量)
puts()用的不多,且基本都能用printf()代替,故不再多说。
gets()是从输入流中获取一行字符串放入字符数组中:
char in[100];
gets(in);
大家可能最容易出错的地方就是字符串的输入,所以强调一下:
能进行字符,字符串输入的有:
getchar(), scanf(“%c”); scanf(“%s”), gets()
其中getchar() 和 scanf(“%c”)的功能是一样的。
需要注意的是,这两个函数读入的是输入流中当前位置的字符,
比如:
scanf(“%d”,&n);
c = getchar();
假设输入 67/ (假设“/”代表回车),则第一个scanf读入一个整数67后,当前输入流的位置是67之后,即指向回车符,所以第二个getchar()读入的就是一个回车符了,即 c = ‘\n’。
同样,gets()也是从当前位置读入一行字符串。
比如:
scanf(“%d”,&n);
gets(str);
此时读入字符数组中的字符串就是“\n” 了
所以通常在用scanf读入一个非字符串的类型之后,如果要读入字符,或字符数组,都用一个额外的getchar()把回车符读掉,若后面跟的不止一个回车符,可能还有多余的空格的话,就用gets()读掉。
和以上不同的是,scanf(“%s”) 读入的时候是会忽略掉空格,回车和制表符的。并且以空格,回车和制表符作为字符串结束的标志。
经常会有这样的题,输入第一行是一个整数,接下来每行的第一个是一个字符,用来表示某种操作,后面再跟一些数据,比如:
4
A 100 2
B 23
A 23 89
B 34
像这种输入就需要小心,读入字符时不要读成回车符。
为了防止意外,我一般是这样处理这类输入的:
char model[2];
Scanf(“%d”,&n);
for(…,…,…){
scanf(“%s”,model);
if(model[0] == ‘A’){
}
else{
}
}
sscanf():
sscanf()经常用来分解字符串,功能非常强大,但很多功能都需要正则表达式的知识,所以就介绍一下最简单的几种用法,大家如果想了解更多的话,自己去网上找吧。
1.
char str[100],str1[100],str2[100];
gets(str);
sscanf(str,”%s%s”,str1,str2);
将读入的一整行字符串按空格,制表符或回车符分割成两个字符串。
2
取指定长度的字符串。如在下例中,取最大长度为4字节的字符串。
sscanf("123456 ", "%4s", str);
对于C++的输入输出就不再详细的讲了,因为cin,cout的速度实在太慢,不推荐使用,我一般都是到万不得已时才用。
比如当你要读入字符串到string 对象中时,就只能用cin了,这时候还有一个常见的问题,就是如何将一整行字符串读入一个string 中,这就要用到getline函数了。
用法为:
getline(cin, str);
第一个参数就是标准输入流cin ,第二个参数是接收读入数据的string对象,本来还有第三个参数,是结束符的标志,但通常用它默认的就可以了,所以不用管。
注意区分这个getline和cin.getline的区别:
cin.getline的用法如下:
char str[20];
cin.getline(str,20); 表示从读入的一行字符串中,取最多20各字符放入字符数组str中,注意此处的str是字符数组,而上面的str是string对象。
另外需要注意的是,千万不要把cout和printf混用,因为cout是带缓冲的而printf不带,所以会使得输出的数据顺序混乱。
红黑树(red-black tree)算法,附AVL树的比较
Sep282007原文网址:http://sage.mc.yu.edu/kbeen/teaching/algorithms/resources/red-black-tree.html
红黑树的实地使用
http://www.linuxforum.net/forum/showthreaded.php?Cat=&Board=program&Number=556347&page=0&view=collapsed&sb=5&o=31&fpart=&vc=
Splay树的介绍
http://www.linuxforum.net/forum/showflat.php?Cat=&Board=linuxK&Number=609842&page=&view=&sb=&o=&vc=1
红黑树的定义
正如在CLRS中定义的那样(译者: CLRS指的是一本著名的算法书Introduction to Algorithms,中文名应该叫算法导论,CLRS是该书作者Cormen, Leiserson, Rivest and Stein的首字母缩写),一棵红黑树是指一棵满足下述性质的二叉搜索树(BST, binary search tree):
1. 每个结点或者为黑色或者为红色。
2. 根结点为黑色。
3. 每个叶结点(实际上就是NULL指针)都是黑色的。
4. 如果一个结点是红色的,那么它的两个子节点都是黑色的(也就是说,不能有两个相邻的红色结点)。
5. 对于每个结点,从该结点到其所有子孙叶结点的路径中所包含的黑色结点数量必须相同。
数据项只能存储在内部结点中(internal node)。我们所指的"叶结点"在其父结点中可能仅仅用一个NULL指针表示,但是将它也看作一个实际的结点有助于描述红黑树的插入与删除算法,叶结点一律为黑色。
定理:一棵拥有n个内部结点的红黑树的树高h<=2log(n+1)
(译者:我认为原文中的有关上述定理的证明是错误的,下面的证明方法是参考CLRS中的证明写出的。)
证明:首先定义一颗红黑树的黑高度Bh为:从这颗红黑树的根结点(但不包括这个根结点)到叶结点的路径上包含的黑色结点(注意,包括叶结点)数量。另外规定叶结点的黑高度为0。
下面我们首先证明一颗有n个内部结点的红黑树满足n>=2^Bh-1。这可以用数学归纳法证明,施归纳于树高h。当h=0时,这相当于是一个叶结点,黑高度Bh为0,而内部结点数量n为0,此时0>=2^0-1成立。假设树高h<=t时,n>=2^Bh-1成立,我们记一颗树高为t+1的红黑树的根结点的左子树的内部结点数量为nl,右子树的内部结点数量为nr,记这两颗子树的黑高度为Bh'(注意这两颗子树的黑高度必然一样),显然这两颗子树的树高<=t,于是有nl>=2^Bh'-1以及nr>=2^Bh'-1,将这两个不等式相加有nl+nr>=2^(Bh'+1)-2,将该不等式左右加1,得到n>=2^(Bh'+1)-1,很显然Bh'+1>=Bh,于是前面的不等式可以变为n>=2^Bh-1,这样就证明了一颗有n个内部结点的红黑树满足n>=2^Bh-1。
下面我们完成剩余部分的证明,记红黑树树高为h。我们先证明Bh>=h/2。在任何一条从根结点到叶结点的路径上(不包括根结点,但包括叶结点),假设其结点数量为m,注意其包含的黑色结点数量即为Bh。当m为偶数时,根据性质5可以看出每一对儿相邻的结点至多有一个红色结点,所以有Bh>=m/2;而当m为奇数时,这条路径上除去叶结点后有偶数个结点,于是这些结点中的黑色结点数B'满足B'>=(m-1)/2,将该不等式前后加1得出Bh>=(m+1)/2,可以进一步得出Bh>m/2,综合m为偶数的情况可以得出Bh>=m/2,而m在最大的情况下等于树高h,因此可以证明Bh>=h/2。将Bh>=h/2代入n>=2^Bh-1,最终得到h<=2log(n+1)。证明完毕。
本文余下的内容将阐释如何在不破坏红黑树性质的前提下进行结点的插入与删除,以及为什么插入与删除的处理次数与树高是成比例的,或者说是O(log n)。
Okasaki插入方法
首先用与二叉搜索树一样的方法将一个结点插入到红黑树中,并且颜色为红色。(这个新结点的子结点将是叶结点,根据定义,这些叶结点是黑色的。)此时,我们将或者破坏了性质2(根结点为黑色)或者破坏了性质4(不能有两个相邻的红色结点)。
如果新插入的结点是根结点的话(这意味着在插入前该红黑树是空的),我们仅仅将这个结点的颜色改为黑色,插入操作就完成了。
如果性质4遭到了破坏,这一定是由于新插入结点的父结点也是红色造成的。由于红黑树的根结点必须是黑色的,因此新插入的结点一定会存在一个祖父结点,并且根据性质4这个祖父结点必然是黑色的。此时,由新插入结点的祖父结点为根的子树的结构一共有四种可能性(译者,前面这句话我没有看明白原文,我是用我的理解写出来的,如果有误请指正。),如下面的图解所示。在Okasaki插入方法中,每一种可能出现的子树都被转换为图解正中间的那种子树形式。

(A,B,C与D表示任意的子树。我们曾经说过新插入结点的子结点一定是叶结点,但很快我们就会看到上面的图解适用于更普遍的情况)。
首先,请注意在变换的过程中
另外,注意该变换不会改变从这颗子树的父结点到这颗子树中任何一个叶结点的路径中黑色结点的数量(当然前提是这颗子树有父结点)。我们再一次遇到了这样的情形:即该红黑树只有可能违反性质2(如果y是根结点)或性质4(如果y的父结点是红色的),但这次变换带来了一个好处,即我们现在距离红黑树的根结点靠近了两步。我们可以重复这种操作直到:或者y的父结点为黑色,在这种情况下插入操作完成;或者y成为根结点,在此情况下我们将y染为黑色后插入操作完成。(将根结点染为黑色会对每条从根结点到叶结点的路径增加相同数量的黑色结点,因此如果在染色操作之前性质5没有遭到破坏那么操作之后也不会。)
上述步骤保持了红黑树的性质,并且所花的时间与树高是成比例的,也即O(log n)。
旋转
在红黑树中进行结构调整的操作常常可以用更清晰的术语"旋转"操作来表达,图解如下。

很显然,在旋转操作中
在下面的图解中,Okasaki插入方法中的变换操作被表示为一个或者两个旋转操作。

CLRS插入方法
CLRS中给出了一种比Okasaki插入方法更复杂但效率稍高的插入方法。它的时间复杂度仍然是O(log n),但在大O中的常数要更小一些。
CLRS插入方法与Okasaki插入方法一样都是从标准的二叉搜索树插入操作开始的,并且将这个新插入的结点染为红色,它们的区别在于如何处理遭到破坏的性质4(不能存在两个相邻的红色结点)。我们要根据下端红色结点的叔叔结点的颜色区分两种情况。(下端红色结点是指在一对儿红色父结点/红色子结点中的那个子结点。)让我们先考虑叔叔结点为黑色的情况。根据每个红色结点是其父结点的左子结点还是右子结点,这种情况可以分为四种子情况。下面的图解展示了如何调整红黑树以及如何重新染色。

在这里我们感兴趣的是上面图解中的方法与Okasaki方法的比较。它们有两点不同。第一点是关于如何对最终的子树(图解中间的那个子树)进行染色的。在Okasaki方法中,这颗子树的根结点y被染成红色而它的子结点被染成黑色,然而在CLRS方法中y被染成了黑色而它的子结点被染成了红色。将y染成黑色意味着红黑树性质4(不能存在两个相邻的红色结点)不会在y这一点遭到破坏,因此对树的调整不需要向根结点的方向继续进行下去。在此情况下,CLRS插入方法最多需要进行两次旋转操作即可完成插入。
第二点不同是在这种情况下CLRS方法必须满足一个先决条件,即下端红色结点的叔叔结点必须是黑色的。在上面的图解中我们可以很清楚地看出,如果那个叔叔结点(即子树A或者D的根结点)是红色的,那么最终的树中将存在两个相邻的红色结点,因此这种方法不能适用于叔叔结点为红色的情况。
下面我们考虑下端红色结点的叔叔结点为红色的情况。在这种情况下我们将上端红色结点和它的兄弟结点(即下端红色结点的叔叔结点)染为黑色并且将它们的父结点染为红色。树的结构并没有进行调整。这时根据下端红色结点是其父结点的左子结点还是右子结点以及上端红色结点是其父结点的左子结点还是右子结点可以分出四种情况,但是这四种情况从本质上来说都是相同的。下面图解只描述了一种情况:

很容易看出,在这种操作的过程中从树的根结点到叶结点的路径中的黑色结点数量没有发生变化。在此操作之后,红黑数的性质只有可能在该子树的根结点同时也是整个树的根结点或者该子树的父结点是红色的情况下才会遭到破坏。换句话说,我们又将开始重复上述操作,但我们距离树的根结点又靠近了两步。照这样不断重复该步骤直到:或者(i)z的父结点为黑色,此时插入操作结束;(ii)z成为根结点,我们将它染为黑色之后插入操作结束;或者(iii)我们遇到了下端红色结点的叔叔结点为黑色的情况,这时我们只要做一或两次旋转操作即可完成插入。在最坏的情况下,我们必须对新插入的结点到根结点的路径上的每个结点进行染色操作,此时需要的操作数为O(log n)。
删除
为了从红黑树中删除一个结点,我们将从一颗标准二叉搜索树的删除操作开始(参见CLRS,第12章)。我们回顾一下标准二叉搜索树的删除操作的三种情况:
1. 要删除的结点没有子结点。在这种情况下,我们直接将它删除就可以了。如果这个结点是根结点,那么这颗树将成为空树;否则,将它的父结点中相应的子结点指针赋值为NULL。
2. 要删除的结点有一个子结点。与上面一样,直接将它删除。如果它是根结点,那么它的子结点变为根结点;否则,将它的父结点中相应的子结点指针赋值为被删除结点的子结点的指针。
3. 要删除的结点有两个子结点。在这种情况下,我们先找到这个结点的后继结点(successor),也就是它的右子树中最小的那个结点。然后我们将这两个结点中的数据元素互换,之后删除这个后继结点。由于这个后继结点不可能有左子结点,因此删除该后继结点的操作必然会落入上面两种情况之一。
注意,在树中被删除的结点并不一定是那个最初包含要删除的数据项的那个结点。但出于重建红黑树性质的目的,我们只关心最终被删除的那个结点。我们称这个结点为v,并称它的父结点为p(v)。
v的子结点中至少有一个为叶结点。如果v有一个非叶子结点,那么v在这颗树中的位置将被这个子结点取代;否则,它的位置将被一个叶结点取代。我们用u来表示二叉搜索树删除操作后在树中取代了v的位置的那个结点。如果u是叶结点,那么我们可以确定它是黑色的。
如果v是红色的,那么删除操作就完成了---因为这种删除不会破坏红黑树的任何性质。所以,我们下面假定v是黑色的。删除了v之后,从根结点到v的所有子孙叶结点的路径将会比树中其它的从根结点到叶结点的路径拥有更少的黑色结点,这会破坏红黑树的性质5。另外,如果p(v)与u都是红色的,那么性质4也会遭到破坏。但实际上我们解决性质5遭到破坏的方案在不用作任何额外工作的情况下就可以同时解决性质4遭到破坏的问题,所以从现在开始我们将集中精力考虑性质5的问题。
让我们在头脑中给u打上一个黑色记号(black token)。这个记号表示从根结点到这个带记号结点的所有子孙叶结点的路径上都缺少一个黑色结点(在一开始,这是由于v被删除了)。我们会将这个记号一直朝树的顶部移动直到性质5重新恢复。在下面的图解中用一个黑色的方块表示这个记号。如果带有这个记号的结点是黑色的,那么我们称之为双黑色结点(doubly black node)。
注意这个记号只是一个概念上的东西,在树的数据结构中并不存在物理实现。
我们要区分四种不同的情况。
A. 如果带记号的结点是红色的或者它是树的根结点(或两者皆是),只要将它染为黑色就可以完成删除操作。注意,这样就会恢复红黑树的性质4(不能存在两个相邻的红色结点)。而且,性质5也会被恢复,因为这个记号表示从根结点到该结点的所有子孙叶结点的路径需要增加一个黑色结点以便使这些路径与其它的根结点到叶结点路径所包含的黑色结点数量相同。通过将这个红色结点改变为黑色,我们就在这些缺少一个黑色结点的路径上添加了一个黑色结点。
如果带记号的结点是根结点并且为黑色,那么直接将这个标记丢掉就可以了。在这种情况下,树中每条从根结点到叶结点的路径的黑色结点数量都比删除操作前少了一个,并且依旧保持住了性质5。
在余下的情况里,我们可以假设这个带记号的结点是黑色的,并且不是根结点。
B. 如果这个双黑色结点的兄弟结点以及两个侄子结点都是黑色的,那么我们就将它的兄弟结点染为红色之后将这个记号朝树根的方向移动一步。
下面的图解展示了两种可能出现的子情况。环绕y的虚线表示在此并我们不关心y的颜色,而在A,B,C和D的上面的小圆圈表示这些子树的根结点是黑色的(译者:注意这个双黑色结点必然会有两个非叶结点的侄子结点。这是因为这个双黑色结点的记号表示从根结点到该结点的所有子孙叶结点的路径中的黑色结点数量都比其它的根结点到叶结点路径所包含的黑色结点数量少1,而该双黑色结点本身就是一个黑色结点,因此从它的兄弟结点到其子孙叶结点的路径上的黑色结点数量必然要大于1,我们很容易看出如果其兄弟结点的任何一个子结点为叶结点的话这一点是不可能满足的,因此这个双黑色结点的必然会有两个非叶结点的侄子结点)。

将那个兄弟结点染为红色,就会从所有到该结点的子孙叶结点的路径上去掉一个黑色结点,因此现在这些路径上的黑色结点数量与到双黑色结点的子孙叶结点的路径上的黑色结点数量一致了。我们将这个记号向上移动到y,这表明现在所有到y的子孙叶结点的路径上缺少一个黑色结点。此时问题仍然没有得到解决,但我们又向树根推进了一步。
很显然,只有带记号的结点的两个侄子结点都是黑色时才能进行上述操作,这是因为如果有一个侄子结点是红色的那么该操作会导致出现两个相邻的红色结点。
C. 如果带记号的结点的兄弟结点是红色的,那么我们就进行一次旋转操作并改变结点颜色。下面的图解展示了两种可能出现的情况:

注意上面的操作并不会改变从根结点到任何叶结点路径上的黑色结点数量,并且它确保了在操作之后这个双黑色结点的兄弟结点是黑色的,这使得后续的操作或者属于情况B,或者属于情况D。
由于这个记号比起操作前离树的根结点更远了,所以看起来似乎我们向后倒退了。但请注意现在这个双黑色结点的父结点是红色的了,所以如果下一步操作属于情况B,那么这个记号将会向上移动到那个红色结点,然后我们只要将它染为黑色就完成了。此外,下面将会展示,在情况D下,我们总是能够将这个记号消耗掉从而完成删除操作。因此这种表面上的倒退现象实际上意味着删除操作就快要完成了。
D. 最终,我们遇到了双黑色结点有一个黑色兄弟结点并至少一个侄子结点是红色的情况。我们下面给出一个结点x的近侄子结点(near nephew)的定义:如果x是其父结点的左子结点,那么x的兄弟结点的左子结点为x的近侄子结点,否则x的兄弟结点的右子结点为x的近侄子结点;而另一个侄子结点则为x的远侄子结点(far nephew)。(在下面的图解中可以看出,x的近侄子结点要比它的远侄子结点距离x更近。)
现在我们会遇到两种子情况:(i)双黑色结点的远侄子结点是黑色的,在此情况下它的近侄子结点一定是红色的;(ii)远侄子结点是红色的,在此情况下它的近侄子结点可以为任何颜色。如下面的图解所示,子情况(i)可以通过一次旋转和变色转换为子情况(ii),而在子情况(ii)下只要通过一次旋转和变色就可以完成删除操作。根据双黑色结点是其父结点的左子结点还是右子结点,下面图解中的两行显示出两种对称的形式。

在这种情况下我们生成了一个额外的黑色结点,记号被丢掉,删除操作完成。从上面图解中很容易看出,所有到带记号结点的子孙叶结点的路径上的黑色结点数量增加了1,而其它的路径上的黑色结点数量保持不变。很显然,在此刻红黑树的任何性质都没有遭到破坏。
将上面的所有情况综合起来,我们可以看出在最坏的情况下我们必须沿着从叶结点到根结点的路径每次都执行常量次数的操作,因此删除操作的时间复杂度为O(log n)。
不是我不明白,世界变得太快,既然是AVL树则比较一下:
简介:
AVL树又称高度平衡的二叉搜索树,是1962年由两位俄罗斯的数学家G.M.Adel'son-Vel,sky和E.M.Landis提出
的.引入二叉树的目的是为了提高二叉树的搜索的效率,减少树的平均搜索长度.为此,就必须每向二叉树插入
一个结点时调整树的结构,使得二叉树搜索保持平衡,从而可能降低树的高度,减少的平均树的搜索长度.
AVL树的定义:
一棵AVL树满足以下的条件:
1>它的左子树和右子树都是AVL树
2>左子树和右子树的高度差不能超过1
从条件1可能看出是个递归定义,如GNU一样.
性质:
1>一棵n个结点的AVL树的其高度保持在0(log2(n)),不会超过3/2log2(n+1)
2>一棵n个结点的AVL树的平均搜索长度保持在0(log2(n)).
3>一棵n个结点的AVL树删除一个结点做平衡化旋转所需要的时间为0(log2(n)).
从1这点来看红黑树是牺牲了严格的高度平衡的优越条件为代价红黑树能够以O(log2 n)的时间复杂度进行搜索、插入、删除操作。此外,由于它的设计,任何不平衡都会在三次旋转之内解决。当然,还有一些更好的,但实现起来更复杂的数据结构能够做到一步旋转之内达到平衡,但红黑树能够给我们一个比较“便宜”的解决方案。红黑树的算法时间复杂度和AVL相同,但统计性能比AVL树更高.
看看人家怎么评价的:
AVL trees are actually easier to implement than RB trees because there are fewer cases. And AVL trees require O(1) rotations on an insertion, whereas red-black trees require O(lg n).
In practice, the speed of AVL trees versus red-black trees will depend on the data that you're inserting. If your data is well distributed, so that an unbalanced binary tree would generally be acceptable (i.e. roughly in random order), but you want to handle bad cases anyway, then red-black trees will be faster because they do less unnecessary rebalancing of already acceptable data.On the other hand, if a pathological insertion order (e.g. increasing order of key) is common, then AVL trees will be faster, because the stricter balancing rule will reduce the tree's height.
Splay trees might be even faster than either RB or AVL trees,depending on your data access distribution. And if you can use a hash instead of a tree, then that'll be fastest of all.
试着翻译一下:
由于AVL树种类较少所以比红黑树实际上更容易实现.而且ALV树在旋转插入所需要的复杂度为0(1),而红
黑树则需要的复杂度为0(lgn).
实际上插入AVL树和红黑树的速度取决于你所插入的数据.如果你的数据分布较好,则比较宜于采用AVL树(例如随机产生系列数),但是如果你想处理比较杂乱的情况,则红黑树是比较快的,因为红黑树对已经处理好的数据重新平衡减少了不心要的操作.另外一方面,如果是一种非寻常的插入系列比较常见(比如,插入密钥系列),则AVL树比较快,因为它的严格的平衡规则将会减少树的高度.
Splay树可能比红黑树和AVL树还要快这也取决于你所访问的数据分布,如果你用哈希表来代替一棵树,则
将所以的树还要快.
Splay树是什么树,我不是很清楚,我没有详细的查阅.
感受一下带来的变革
//-->
/*
* 翻一下老皇历(2.4)
*/
struct vm_area_struct* find_vma(struct mm_struct* mm,unsigned long addr)
{
struct vm_area_struct* vma = NULL;
if(mm)
{
/*
* check the cache first.
*/
/*
* (Check hit rate is typically around 35%.)
*/
/*
* 首先查找一下最近一次访问的虚地址空间是不否是 CACHE中
*/
vma = mm->mmap_cache;
if(!(vma && vma->vm_end > addr && vma->vm_start{
/*
* miss hit 未命中,继续查找线性表或者是AVL树
*/
if(!mm->mmap_val)
{
/*
* go though the liner list
*/
vma = mm->mmap;
while(vma && vma->vm_end <= addr)
{
vma = vam->vma_next;
}
}
else
{
/*
* Then go though the AVL tree quickly
*/
struct vm_area_struct* tree = mm->mmap_avl;
vam = NULL;
for(;;)
{
if(tree == vm_avl_empty)
{
/*
* 结点为空,失败
*/
break;
}
if(tree->vm_end > addr)
{
vma = tree;
if(tree->vm_start <= addr)
{
/*
* 找到,快速退出循环
*/
break;
}
tree = tree->vm_avl_left;
}
else
{
tree = tree->vm_avl_right;
}
}
}
if(vma)
{
/*
* 查找成功,记在CACHE中
*/
mm->mmap_cache = vma;
}
}
}
return vma;
}
//<--
/*
* 再贴新年历(2.6)
*/
//-->
struct vm_area_struct * find_vma(struct mm_struct * mm, unsigned long addr)
{
struct vm_area_struct *vma = NULL;
if (mm) {
/* Check the cache first. */
/* (Cache hit rate is typically around 35%.) */
/*
* 首先查找一下最近一次访问的虚地址空间是不否是 CACHE中
*/
vma = mm->mmap_cache;
if (!(vma && vma->vm_end > addr && vma->vm_start <= addr)) {
struct rb_node * rb_node;
/*
* miss hit 未命中,直接查找red-black tree
*/
rb_node = mm->mm_rb.rb_node;
vma = NULL;
while (rb_node) {
struct vm_area_struct * vma_tmp;
vma_tmp = rb_entry(rb_node,
struct vm_area_struct, vm_rb);
if (vma_tmp->vm_end > addr) {
vma = vma_tmp;
if (vma_tmp->vm_start <= addr)
break;
rb_node = rb_node->rb_left;
} else
rb_node = rb_node->rb_right;
}
/*
* 查找成功,记在CACHE中
*/
if (vma)
mm->mmap_cache = vma;
}
}
return vma;
}
//<--
在这儿只是做了一些小的方面的比较和在内核中的真正的应用,很多的地方没有分析到,还
望各位同仁多多指正和拓展
Red-Black Trees
Red-Black Trees
Contents:
Red-Black Tree Definition
Okasaki's Insertion Method
Rotations
The CLRS Insertion Method
Deletion
Exercise
The information on this page is a drawn in part from Introduction to Algorithms, second edition, by Cormen, Leiserson, Rivest and Stein (CLRS); Purely Functional Data Structures, by Chris Okasaki; Franklyn Turbak's lecture notes; Chee Yap's lecture notes, which were on the web, but the link is now broken; and comments from the students in my Algorithms class.
Red-Black Tree Definition
As defined in CLRS, a red-black tree is a binary search tree (BST) that satisfies the following properties:
- Every node is either red or black.
- The root is black.
- Every leaf (NIL) is black.
- If a node is red, then both its children are black. (Or in other words, two red nodes may not be adjacent.)
- For each node, all paths from the node to descendant leaves contain the same number of black nodes.
Theorem: The height h of a red-black tree with n internal nodes is no greater than 2log(n+1).
Proof: By property 5, every root-to-leaf path in the tree has the same number of black nodes; let this number be B. So there are no leaves in this tree at depth less than B, which means the tree has at least as many internal nodes as a complete binary tree of height B. Therefore, n≥2B-1. This implies B≤log(n+1). By property 4, at most every other node on a root-to-leaf path is red. Therefore, h≤2B. Putting these together, we have h≤2log(n+1). Q.E.D.
The remainder of this page shows how insertion and deletion can be done without violating the above properties, and in time proportional to the height of the tree, or O(log n). Parenthetical remarks relate this description to that in CLRS. If you don't have CLRS you can ignore these.
Okasaki's Insertion Method
A node is inserted the same way as it would for a BST, and colored red. (The children of the new node will be leaves, which are by definition black.) At that point either property 2 (root is black) or property 4 (no two adjacent red nodes) might be violated.
If the inserted node is the root (meaning the tree was previously empty) then we simply change its color to black and we're done.
If property 4 is violated, it must be because the new node's parent is also red. Since the root must be black, the new node must also have a grandparent in the tree, and by property 4 it must be black. There are four possibilities for how this subtree can be structured, vis a vis the new node, its parent and its grandparent, as shown in the diagram below. In Okasaki's method, each possible subtree is transformed into the subtree shown in the middle of the diagram.
(A, B, C and D refer to arbitrary subtrees. We have said that the children of the new node must both be leaves, but we will see in a moment that the diagram applies more generally).
First, notice that the ordering <AxByCzD> is preserved by these transformations.
Second, notice that these transformations do not change the number of black nodes on any path from the parent of this subtree (if one exists) to the leaves. We are again in the position where the only properties that can be violated are property 2 (if y is the root) and property 4 (if y's parent is red). But the advantage is that we are now two steps closer to the root. We can repeat this operation until either y's parent is black, in which case we're done, or y is the root, in which case we color y black and we're done. (Coloring the root black increases the number of black nodes on every path from the root to the leaves by the same amount, so property 5 will not be violated after that re-coloring if it wasn't violated before.)
This procedure preserves the red-black properties, and it takes time proportional to the height of the tree, which is O(log n).
Rotations
Restructuring operations on red-black trees (and many other kinds of trees) can often be expressed more clearly in terms of the rotation operation, diagrammed below.
Clearly the order <AxByC> is preserved by the rotation operation. Therefore, if we start with a BST, and only restructure using rotations, then we will still have a BST. In the remainder of this tutorial, we will only restructure with rotations, and so we won't need to say anything more about the proper ordering of the elements.
In the diagram below, the transformations in Okasaki's insertion method are shown as one or two rotations. For comparison, here is the previous diagram. It makes a nice visualization to put the two diagrams in about the same place on your screen, and go back and forth between them using your browser's forward and back buttons.
The CLRS Insertion Method
CLRS gives an insertion method that is more complicated than Okasaki's, but slightly more efficient. It's time complexity is still O(log n), but the constant embedded in the big-Oh is smaller. You can safely skip this section and go to deletion.
As with Okasaki's method, this method starts with standard BST insertion, and colors the new node red. It differs in how it handles violations of property 4 (no two red nodes adjacent). We distinguish two cases, based on the color of the uncle of the bottom red node. (The bottom red node is the child node in the red-parent/red-child pair. Its uncle is the red-parent's sibling.) First, let's consider the case where the uncle is black. There are four subcases, depending on whether each red node is a left or right child. The diagram below shows how the tree should be restructured and re-colored. (This is cases 2 and 3 in CLRS.)
It is interesting to see how this diagram compares with Okasaki's insertion with rotations. (Again, it is interesting to go back and forth between these two images using your browser's forward and back buttons.) There are two differences. The first is in how the final subtree (in the middle) is colored. In Okasaki's method, the root y of the subtree is red and its children are black, while here y is black and its children are red. Making y black means that property 4, no two adjacent red nodes, will not be violated at y, so the procedure does not need to continue up the tree toward the root. The CLRS insertion method never requires more than two rotations.
The second difference is that in the CLRS method this case has a precondition that the uncle of the bottom red node is black. It can clearly be seen in the diagram that if the uncle (which is the root of either subtree A or D) were red, then there would be two adjacent red nodes in the final tree, and so this method could not be used.
It is straightforward to verify that the red-black properties are satisfied in the final tree, if the only violation initially is the adjacency of the two red nodes shown.
It remains to consider the case where the uncle of the bottom red node is red. (This is case 1 in CLRS.) In that case the top red node and its sibling (the uncle) are turned black, and their parent is turned red. The tree is not restructured. There are four cases, depending on whether the bottom red node is a left or right child, and whether the top red node is a left or right child, but they are all essentially the same. One case is pictured here:
It can easily be seen that the number of black nodes on a path from the root of the tree to the leaves is not changed by this operation. Afterwards, the only violation of the red-black properties would be if the root of the subtree is the root of the tree, or if it has a red parent. In other words, we might have to start over, but two steps closer to the root. So, the procedure is repeated until either (i) the parent of z is black, in which case we're done; (ii) z is the root, in which case we color it black and we're done; or (iii) we encounter the black-uncle case, in which case we do one or two rotations and we're done. In the worst case we will have to re-color nodes all the way up to the root, requiring O(log n) operations.
Deletion
To delete an item from a red-black tree, we start by performing the standard BST delete operation (see CLRS, chapter 12). To review, standard BST deletion has three cases:
- The node to be removed has no children. In this case we simply remove it. If it is the root, then the tree becomes empty; otherwise its parent's appropriate child pointer is set to null.
- The node to be removed has one child. Again, we simply remove it. If it is the root, then its child becomes the root; otherwise, its parent's appropriate child pointer is set to point to the removed node's child.
- The node to be removed has two children. In this case we find the node's successor, which is the minimum node in its right subtree. The data elements in these two nodes are swapped, and then we remove the successor node. Since the successor node cannot have a left child, one of the first two cases will apply.
The node that is removed from the tree might not be the node that originally contained the deleted data item. For the purposes of re-establishing the red-black properties we are only concerned with the node that is removed. Call this node v, and let its parent be p(v).
At least one of v's children must be a leaf. If v has a non-leaf child, its place in the structure of the tree is taken over by that child; otherwise its place is taken over by a leaf. Let u be the child that replaces v in the tree structure after BST deletion. If u is a leaf, then we know it is black.
If v is red, we are done - no violations of the red-black properties will be introduced. So assume v is black. All paths from the root to leaves that descended from v will have fewer black nodes than other root-to-leaf paths in the tree, which is a violation of property 5. If p(v) and u are both red then property 4 will also be violated, but it will turn out that our solution to the violation of property 5 will also solve the property 4 violation without doing any more work, so for now we will focus on the property 5 problem.
Let's imagine assigning a black token to u. The token indicates that the simple paths to leaves descending from the node holding the token are short a black node (initially that is because v was removed). We will move the token up the tree until we encounter a situation where property 5 may be restored. The token is represented as a black square in the diagrams below. If the node with the token is black, then we call it a doubly black node.
Note that the token is a conceptual device, and is not physically implemented in the tree data structure.
We distinguish four mutually exclusive cases.
-
If the node with the token is red or the root of the tree (or both), simply set its color to black and we're done. Note that this removes any violation of property 4 (no two red nodes adjacent). It also removes a violation of property 5, for the following reason. The token indicated that leaves descending from this node needed another black node on their paths to the root in order to match other root-to-leaf paths in the tree. By changing this red node to black, we add one more black node on precisely those paths that were short a black node.
If the token is at the root and the root is black, the token is simply dropped. In this case, every root-to-leaf path in the tree has had it's number of black nodes reduced by one, and property 5 still holds.
For the remaining cases, we can assume that the node with the token is black, and is not the root.
-
If the sibling and both nephews of the doubly black node are also black, then we change the sibling to red and move the token up one step towards the root. (This is case 2 in CLRS.)
In the diagram below, which shows the two possible subcases, the dashed line around y indicates that we don't care at this point what color y is, and the small circles above A, B, C or D indicate that the root of that subtree is black.
Changing the sibling to red removes one black node from the paths to its descendant leaves, so that those paths now have the same number of black nodes as those descending from the doubly black node. We move the token up to the parent y to indicate that now all paths descending from y are short a black node. We haven't solved the problem, but we have pushed it up towards the root one step.
Clearly this operation can only be done if both nephews are black, since otherwise we would end up with adjacent red nodes.
-
If the sibling of the doubly black node is red, then we perform a rotation and change colors. (This is CLRS case 1.) The two possibilities are shown in the following diagram:
This step doesn't change the number of black nodes on the path from the root to any leaf, but it does guarantee that the doubly black node's sibling is black, which enables either case B or case D in the next step.
It might seem that we are going backwards, since the token is now farther from the root than it was before. But the parent of the doubly black node is now red, so if the next step is case B, the token will be passed up to a red node, which will then turn black and we will be done. Furthermore, as shown below, case D always consumes the token and finishes the operation. So this apparent step backward is actually a sign that we are almost done.
-
Finally, we have the case where the doubly black node has a black sibling and at least one red nephew. Define the near nephew of a node x to be the left child of x's sibling if x is a left child, and the right child of x's sibling if x is a right child; define x's far nephew to be its other nephew. (In the diagram, x's near nephew is closer to x than its far nephew.)
We now have two subcases: (i) the doubly black node's far nephew is black, so its near nephew must be red; and (ii) the far nephew is red, so the near nephew can be either color. (This is cases 3 and 4 in CLRS.) As shown in the following diagram, subcase (i) is transformed to subcase (ii) by a rotation and change of colors, and subcase (ii) is solved by another rotation and change of colors. The two rows are symmetrical, depending on whether the doubly black node is a left or right child.
In this case we generate an extra black node, the token is dropped, and we are done. It can easily be verified by looking at the diagram that the number of black nodes on paths to leaves below the token is increased by one, while the number of black nodes on other paths is unchanged. And it is also clear that none of the other red-black properties is violated.
Exercise
Starting with an empty red-black tree, successively insert the following values: 260, 240, 140, 320, 250, 170, 60, 100, 20, 40, 290, 280, 270, 30, 265, 275, 277, and 278. If you use Okasaki's insertion method, and if I haven't made any mistakes, you should end up with this tree:
Last modified: 10/25/2024 01:03:00 12/30/2005 07:23:33
k_eckel's mindview
细想想,生活中的种种不如意,做选择时的惶恐不安,选择后的多种如果以及在失去某个机会或者在一次至关重要的考查中失败都 不是那么的影响巨大.生命是一连串长期而持续的积累,一次失败或者一次成功都不足以彻底改变一个人的生活.你要做的就是静静的积累,默默的努力,朝着自己 的目标前进.
生命是一种长期而持续的累积过程,绝不会因为单一的事件而毁了一个人的一生,也不会因为单一的事件而救了一个人的一生。
属于我们该得的,迟早会得到;属于我们不该得的,即使侥幸巧取也不可能长久保有。
如果我们看清这个事实,许多所谓"人生的重大抉择"就可以淡然处之,根本无需焦虑。
而所谓"人生的困境",也往往当下就变得无足挂齿。
"生命是一种长期而持续的累积过程,不会因为一时的际遇而终止增或减。"
生命的真相是一种长期而持续的累积过程(这是偶发的际遇无法剥夺的,而不是一时顺逆的际遇。
如果我们能看清处这个事实,生命的过程就真是"功不唐捐",没什么好贪求,也没什么好焦虑的了!
剩下来,我们所需要做的无非只是想清楚自己要从人生获得什么,然后安安稳稳,诚诚恳恳的去累积就是了。
当我们面对两个可能的方案,而焦虑的不知何所抉择时,通常表示这两个方案或者一样好,或者一样坏,因而实际上选择哪个都一样,唯一的差别只是先后之序而已。而且,愈是让我们焦虑得厉害的,其实差别越小,愈不值得焦虑。反而真正有明显的好坏差别时,我们轻易的就知道该怎么做了。
可是我们却经常看不到长远的将来,短视的盯着两案短期内的得失:想选甲案,就舍不得乙案的好处;想选乙案,又舍不得甲案的好处。如果看得够远,人生常则八,九十,短则五,六十年,先做哪一件事又有什么关系?
在"朝三暮四"这个成语故事里,主人原本喂养猴子的橡实是"早上四颗下午三颗",后来改为"朝三暮四",猴子就不高兴而坚持改回到"朝四暮三"。"朝三暮四"和"朝四暮三"有什么本质区别?
福祸如何,谁能全面知晓?
我们又有什么好得意?又有什么好忧虑?
人生的得与失,有时候怎么也说不清楚,有时候却再简单不过了:我们得到平日累积的成果,而失去我们不曾努力累积的!
所以重要的不是和别人比成就,而是努力去做自己想做的。
功不唐捐,最后该得到的不会少你一分,不该得到的也不会多你一分。
HOWTO: Terminal as the desktop background
The objective is to have a gnome terminal running as the desktop background, right above the actual background image, that won't be displayed by the statusbar or ticker.
It should look something like this:
Full transparency
or
Semi-transparent with shadows (using Xgl)
Ok, lets get started...
1) Download devilspie
sudo apt-get install devilspie
mkdir ~/.devilspie
nano ~/.devilspie/DesktopConsole.ds
(if
(matches (window_name) "DesktopConsole")
(begin
(set_workspace 4)
(below)
(undecorate)
(skip_pager)
(skip_tasklist)
(wintype "utility")
(geometry "+50+50")
(geometry "924x668")
)
)
- i use workspace 4 but you can use whatever you like.
- you should at least adjust the geometry lines to match your screen.
- Read the devilspie wiki, for other commands!!!
4) Create a new gnome-terminal profile named "DesktopConsole"
- in the "General" tab, untick "show menubar by default..."
- in the "Scrolling" tab, select "Scrollbar is" -> Disabled.
- in the "Effects" tab, set "Transparent background" and shade to "None" (or to whatever you prefer)
5) Add devilspie and gnome-terminal to the Startup Programs in your session:
in System->preferences->sessions, "Startup Programs" tab, add the 2 programs:
devilspie
gnome-terminal --window-with-profile=DesktopConsole

Quotes
Aug222007"Programs must be written for people to read, and only incidentally for machines to execute.'
- Abelson & Sussman, SICP, preface to the first edition"
emacs中文的显示和输入
不使用中文的locale也是可以显示和输入中文的,因为从原理上说,对一个X应用程序,只要它能正确识别要显示文本的编码,然后找到相应的字体就 能正确显示,而对中文的输入来说,在能正确显示的前提下,只要能知道来自输入法的文本的编码也能正确输入。那这是不是说中文的locale没有存在的价值 呢?不是的,看一个例子:
在各个locale变量都是en_US.UTF-8的环境下打开emacs(我用的是emacs22),带上 -q选项使不加载~/.emacs文件,在*scratch*里输入“学习”两个字(如果有乱码的话可以先尝试打开一个带中文的utf-8文件,然后再在 *scratch*里输入,似乎是emacs22.0.50.1的bug),你会发现两个字的字体有较大差别,把光标移到“学“字上,用C-u C-x =查看字符信息,发现其charset是japanese jisx0208,font是-JIS-Fixed-...,再查看“习“字,charset是chinese-Gb2312,Font是-arphic -ar pl...,显然这里emacs把“学”字认为是一个日语charset里的字符,而把“习”字认为是一个中文charset里的字符,由于在缺省情况下 emacs为两个charset使用了不用的字体,就使两个字看起来不大一致。
虽然可以在fontset里对这个japanese charset强制指定使用和中文charset相同的字体,比如simsun,但这究竟不是一个好办法,因为在一般情况下中文的字体都是没有包含全部日 文的。不过这里为什么把这两个字认为是这两个charset,而不是utf-8?在en_US.UTF-8的环境下从输入法输出的字符应该是UTF-8才 对。这就和字体有关了,由于字体本身的特点,一种字体通常只包括了一种语言的某个字符集里的字符,至多再加上ascii字符,比如simsun字体除了 ascii字符外,只包括了gbk字符集里的字符,这样对于utf-8这样的多语言编码,就必须把它编码的各个字符归到各个字体使用的字符集才能使用这些 字体,而在utf-8使用的unicode里,有一些字符是中日韩共用的,应该把他们归到哪种字符集呢?没有一个特定的使用环境似乎不好办,这里 emacs在en_US下把“学“字归入了日文字符集,这对中文的使用者就是不合适的。所以你必须告诉emacs你的使用环境,这里中文locale就发 挥作用了,设置LANG=zh_CN.UTF-8,或者你只希望中文的显示和输入正确而保持英文环境,也可以只设置LC_CTYPE=zh_CN.UTF -825 useful commands in Linux/UNIX
Aug21200725)host
host is a simple utility for performing DNS lookups. It is normally used to convert names to IP addresses and vice versa. When no arguments or options are given, host prints a short summary of its command line arguments and options.
Example:
$ host mail.yahoo.com
mail.yahoo.com is an alias for login.yahoo.com.
login.yahoo.com is an alias for login-global.yahoo8.akadns.net.
login-global.yahoo8.akadns.net is an alias for login.yahoo.akadns.net.
login.yahoo.akadns.net has address 69.147.112.160
24)dig
dig (domain information groper) is a flexible tool for interrogating DNS name servers. It performs DNS lookups and displays the answers that are returned from the name server(s) that were queried. Most DNS administrators use dig to troubleshoot DNS problems because of its flexibility, ease of use and clarity of output. Other lookup tools tend to have less functionality than dig.
$ dig mail.yahoo.com
; <<>> DiG 9.4.1-P1 <<>> mail.yahoo.com
;; global options: printcmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 9867 ;; flags: qr rd ra; QUERY: 1, ANSWER: 4, AUTHORITY: 8, ADDITIONAL: 8 ;; QUESTION SECTION: ;mail.yahoo.com. IN A ;; ANSWER SECTION: mail.yahoo.com. 195 IN CNAME login.yahoo.com. login.yahoo.com. 65 IN CNAME login-global.yahoo8.akadns.net. login-global.yahoo8.akadns.net. 122 IN CNAME login.yahoo.akadns.net. login.yahoo.akadns.net. 51 IN A 69.147.112.160 ;; AUTHORITY SECTION: akadns.net. 84671 IN NS zc.akadns.org. akadns.net. 84671 IN NS zd.akadns.org. akadns.net. 84671 IN NS eur1.akadns.net. akadns.net. 84671 IN NS use3.akadns.net. akadns.net. 84671 IN NS usw2.akadns.net. akadns.net. 84671 IN NS asia9.akadns.net. akadns.net. 84671 IN NS za.akadns.org. akadns.net. 84671 IN NS zb.akadns.org. ;; ADDITIONAL SECTION: za.akadns.org. 23366 IN A 195.219.3.169 zb.akadns.org. 23366 IN A 206.132.100.105 zc.akadns.org. 23366 IN A 61.200.81.111 zd.akadns.org. 23366 IN A 63.209.3.132 eur1.akadns.net. 17773 IN A 213.254.204.197 use3.akadns.net. 17773 IN A 204.2.178.133 usw2.akadns.net. 17773 IN A 208.185.132.166 asia9.akadns.net. 17773 IN A 220.73.220.4 ;; Query time: 27 msec ;; SERVER: 24.92.226.9#53(24.92.226.9) ;; WHEN: Mon Aug 20 13:34:17 2007 ;; MSG SIZE rcvd: 421 23)mkdir
The mkdir utility creates the directories named as operands, in the order specified, using mode ``rwxrwxrwx'' (0777) as modified by the current umask(2).
Example:
$ mkdir test
$ ls -l | grep test
drwxr-xr-x 2 owner group 512 Aug 20 13:35 test
22)rm
The rm utility attempts to remove the non-directory type files specified on the command line. If the permissions of the file do not permit writ- ing, and the standard input device is a terminal, the user is prompted (on the standard error output) for confirmation.
Example (file):
$ rm test2
$ ls -l | grep test2
Example (dir):
what you get when you try to rm a dir is:
$ rm test
rm: test: is a directory
to get around this do:
$ rm -r test
$ ls -l | grep test
21)cp
In the first synopsis form, the cp utility copies the contents of the source_file to the target_file. In the second synopsis form, the con- tents of each named source_file is copied to the destination target_directory. The names of the files themselves are not changed. If cp detects an attempt to copy a file to itself, the copy will fail.
Example:
$ cp test test2
$ ls -l | grep test
-rw-r--r-- 1 owner group 0 Aug 20 13:40 test
-rw-r--r-- 1 owner group 0 Aug 20 13:41 test2
copy a directory do:
$ cp -r test test2
$ ls -l | grep test
drwxr-xr-x 2 owner group 512 Aug 20 13:42 test
drwxr-xr-x 2 owner group 512 Aug 20 13:42 test2
20)grep
grep searches the named input FILEs (or standard input if no files are named, or the file name - is given) for lines containing a match to the given PATTERN. By default, grep prints the matching lines.
Example:
$ ls
example test three
$ ls | grep th
three
19)ls
For each operand that names a file of a type other than directory, ls displays its name as well as any requested, associated information. For each operand that names a file of type directory, ls displays the names of files contained within that directory, as well as any requested, asso- ciated information.
Example:
$ ls
example test three
$ ls -l
total 0
-rw-r--r-- 1 owner group 0 Aug 20 13:44 example
-rw-r--r-- 1 owner group 0 Aug 20 13:44 test
-rw-r--r-- 1 owner group 0 Aug 20 13:44 three
18)startx
The startx script is a front end to xinit that provides a somewhat nicer user interface for running a single session of the X Window Sys- tem. It is often run with no arguments.
Example:
To use startx user most have .xinitrc file in there home directory. Examples of the file are:
$ cat ~/.xinitrc
exec fluxbox
(this will start fluxbox)
$ cat ~/.xinitrc
exec gnome-session
(this will start gnome)
$ cat ~/.xinitrc
exec startkde
(this will start kde)
17)nano
nano is a small, free and friendly editor which aims to replace Pico, the default editor included in the non-free Pine package. Rather than just copying Pico's look and feel, nano also implements some missing (or disabled by default) features in Pico, such as "search and replace" and "go to line and column number".
Example:
$nano test
(will open test file for edit to exit type: Ctrl+X)
16)pwd
The pwd utility writes the absolute pathname of the current working directory to the standard output.
Example:
$ pwd
/usr/home/username/test
15)cat
The cat utility reads files sequentially, writing them to the standard output. The file operands are processed in command-line order. If file is a single dash (`-') or absent, cat reads from the standard input. If file is a UNIX domain socket, cat connects to it and then reads it until EOF. This complements the UNIX domain binding capability available in inetd(8).
Example:
$ cat test
this is the contents of the file test
14)man
The man utility formats and displays the on-line manual pages. This ver- sion knows about the MANPATH and PAGER environment variables, so you can have your own set(s) of personal man pages and choose whatever program you like to display the formatted pages. If section is specified, man only looks in that section of the manual. You may also specify the order to search the sections for entries and which preprocessors to run on the source files via command line options or environment variables. If enabled by the system administrator, formatted man pages will also be compressed with the ``/usr/bin/gzip -c'' command to save space.
Example:
$ man find
Show information about the command find
13)kill
The kill utility sends a signal to the processes specified by the pid op- erands.
Example:
$ kill 694
(694 is the id of the program running)
12)locate
The locate program searches a database for all pathnames which match the specified pattern. The database is recomputed periodically (usually weekly or daily), and contains the pathnames of all files which are pub- licly accessible.
Example:
#locate Xorg.0.log
/var/log/Xorg.0.log
/var/log/Xorg.0.log.old
(you man need to run /usr/libexec/locate.updatedb to update locate listing)
11)ifconfig
The ifconfig utility is used to assign an address to a network interface and/or configure network interface parameters. The ifconfig utility must be used at boot time to define the network address of each interface present on a machine; it may also be used at a later time to redefine an interface's address or other operating parameters.
Example:
# ifconfig
em0: flags=8843 metric 0 mtu 1500
options=18b
ether 00:16:41:16:28:2a
inet 192.168.1.2 netmask 0xffffff00 broadcast 192.168.1.255
media: Ethernet autoselect (1000baseTX )
status: active
lo0: flags=8049 metric 0 mtu 16384
inet6 fe80::1%lo0 prefixlen 64 scopeid 0x2
inet6 ::1 prefixlen 128
inet 127.0.0.1 netmask 0xff000000
10)ssh
ssh (SSH client) is a program for logging into a remote machine and for executing commands on a remote machine. It is intended to replace rlogin and rsh, and provide secure encrypted communications between two untrusted hosts over an insecure network. X11 connections and arbitrary TCP ports can also be forwarded over the secure channel.
Example:
#ssh localhost
Password:
Welcome to FreeBSD-World
(you wouldn't use local host you would use the computer ip or hostname you are connecting to)
9)gzip
The gzip program compresses and decompresses files using Lempel-Ziv cod- ing (LZ77). If no files are specified, gzip will compress from standard input, or decompress to standard output. When in compression mode, each file will be replaced with another file with the suffix, set by the -S suffix option, added, if possible. In decompression mode, each file will be checked for existence, as will the file with the suffix added.
Example:
$ gzip -9 test.tar
test.tar.gz
$ gzip -d test.tar.gz
$ ls | grep test.tar
test.tar
8)bzip2
bzip2 compresses files using the Burrows-Wheeler block sorting text compression algorithm, and Huffman coding. Compression is generally considerably better than that achieved by more conventional LZ77/LZ78-based compressors, and approaches the performance of the PPM family of statistical compressors.
Example:
$ bzip2 -9 test.tar
$ ls | grep test.tar
test.tar.bz2
$ bzip2 -d test.tar.bz2
$ ls | grep test.tar
test.tar
7)zip
zip is a compression and file packaging utility for Unix, VMS, MSDOS, OS/2, Windows NT, Minix, Atari and Macintosh, Amiga and Acorn RISC OS.
Example:
$ zip test.zip test
adding: test/ (stored 0%)
$ ls | grep test
test
test.zip
$ unzip test.zip
Archive: test.zip
6)tar
tar creates and manipulates streaming archive files. This implementation can extract from tar, pax, cpio, zip, jar, ar, and ISO 9660 cdrom images and can create tar, pax, cpio, ar, and shar archives.
Example:
$ tar -cf t2.tar example test three
$ ls | grep t2
t2.tar
$ tar -xf t2.tar
(files are now extracted)
5)mount
The mount utility calls the nmount(2) system call to prepare and graft a special device or the remote node (rhost:path) on to the file system tree at the point node. If either special or node are not provided, the appropriate information is taken from the fstab(5) file.
Example:
# cat /etc/fstab
# Device Mountpoint FStype Options Dump Pass#
/dev/ad4s1b none swap sw 0 0
/dev/ad4s1a / ufs rw 1 1
/dev/acd0 /cdrom cd9660 ro,noauto 0 0
# mount /cdrom
# ls /cdrom
4.1 TRANS.TBL etc
# umount /cdrom
# ls /cdrom
#
(/cdrom is already listed in the fstab if it was not I would have to do 'mount /dev/acd0 /cdrom')
4)passwd
The passwd utility changes the user's local, Kerberos, or NIS password. If the user is not the super-user, passwd first prompts for the current password and will not continue unless the correct password is entered.
Example:
$ passwd
Changing local password for username
Old Password:
New Password:
Retype New Password:
$
3)ping
The ping utility uses the ICMP protocol's mandatory ECHO_REQUEST datagram to elicit an ICMP ECHO_RESPONSE from a host or gateway. ECHO_REQUEST datagrams (``pings'') have an IP and ICMP header, followed by a ``struct timeval'' and then an arbitrary number of ``pad'' bytes used to fill out the packet.
Example:
$ ping -c 5 www.yahoo.com
PING www.yahoo-ht3.akadns.net (69.147.114.210): 56 data bytes
64 bytes from 69.147.114.210: icmp_seq=0 ttl=47 time=48.814 ms
64 bytes from 69.147.114.210: icmp_seq=1 ttl=47 time=32.916 ms
64 bytes from 69.147.114.210: icmp_seq=2 ttl=47 time=32.361 ms
64 bytes from 69.147.114.210: icmp_seq=3 ttl=47 time=33.912 ms
64 bytes from 69.147.114.210: icmp_seq=4 ttl=47 time=33.846 ms
--- www.yahoo-ht3.akadns.net ping statistics ---
5 packets transmitted, 5 packets received, 0.0% packet loss
round-trip min/avg/max/stddev = 32.361/36.370/48.814/6.249 ms
2)tail
The tail utility displays the contents of file or, by default, its stan- dard input, to the standard output.
# tail /var/log/auth.log
Aug 19 20:38:05 laptop su: username to root on /dev/ttyv0
Aug 20 00:10:38 laptop login: login on ttyv0 as username
Aug 20 00:11:30 laptop su: username to root on /dev/ttyp0
Aug 20 00:56:32 laptop su: username to root on /dev/ttyp1
Aug 20 01:31:26 laptop su: username to root on /dev/ttyv0
Aug 20 10:25:58 laptop login: login on ttyv0 as username
Aug 20 10:26:21 laptop su: username to root on /dev/ttyp0
Aug 20 13:58:06 laptop su: username to root on /dev/ttyp1
Aug 20 14:18:23 laptop su: username to root on /dev/ttyp1
Aug 20 14:27:39 laptop su: useranme to root on /dev/ttyp1
1)top
Top displays the top processes on the system and periodically updates this information.
Example:
last pid: 5327; load averages: 0.00, 0.00, 0.00 up 0+04:05:41 14:30:28
44 processes: 1 running, 43 sleeping
CPU states: 0.2% user, 0.0% nice, 1.3% system, 0.0% interrupt, 98.5% idle
Mem: 198M Active, 608M Inact, 128M Wired, 26M Cache, 111M Buf, 31M Free
Swap: 1024M Total, 8K Used, 1024M Free
PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAND
694 username 1 96 0 53004K 48212K select 1 4:23 0.10% Xorg
10306 username 1 96 0 7496K 4300K select 1 0:04 0.05% xterm
Notes on Programming in C
Aug182007Notes on Programming in C
Rob Pike, AT&T Bell Laboratories
Introduction
Kernighan and Plauger's The Elements of Programming Style (Prentice Hall, 1978) was an important and rightly influential book. But sometimes I feel its concise rules were taken as a cookbook approach to good style instead of the succinct expression of a philosophy they were meant to be. If the book claims that variable names should be chosen meaningfully, doesn't it then follow that variables whose names are small essays on their use are even better? Isn't MaximumValueUntilOverflow a better name than maxval? I don't think so.
What follows is a set of short essays that collectively encourage a philosophy of clarity in programming rather than giving hard rules. I don't expect you to agree with all of them, because they are opinion and opinions change with the times. But they've been accumulating in my head, if not on paper until now, for a long time, and are based on a lot of experience, so I hope they help you understand how to plan the details of a program. (I've yet to see a good essay on how to plan the whole thing.) If you find them idiosyncratic, fine; if you disagree with them, fine; but if they make you think about why you disagree, that's better. Under no circumstances should you program the way I say to because I say to; program the way you think expresses best what you're trying to accomplish in the program. And do so consistently and ruthlessly.
Your comments are welcome.
Issues of typography
A program is a sort of publication. It's meant to be read by the programmer, another programmer (perhaps yourself a few days, weeks or years later), and lastly a machine. The machine doesn't care how pretty the program is - if the program compiles, the machine's happy - but people do, and they should. Sometimes they care too much: pretty printers mechanically produce pretty output that accentuates irrelevant detail in the program, which is as sensible as putting all the prepositions in English text in bold font. Although many people think programs should look like the Algol-68 report (and some systems even require you to edit programs in that style), a clear program is not made any clearer by such presentation, and a bad program is only made laughable.
Typographic conventions consistently held are important to clear presentation, of course - indentation is probably the best known and most useful example - but when the ink obscures the intent, typography has taken over. So even if you stick with plain old typewriter-like output, be conscious of typographic silliness. Avoid decoration; for instance, keep comments brief and banner-free. Say what you want to say in the program, neatly and consistently. Then move on.
Variable names
Ah, variable names. Length is not a virtue in a name; clarity of expression is. A global variable rarely used may deserve a long name, maxphysaddr say. An array index used on every line of a loop needn't be named any more elaborately than i. Saying index or elementnumber is more to type (or calls upon your text editor) and obscures the details of the computation. When the variable names are huge, it's harder to see what's going on. This is partly a typographic issue; consider
for(i=0 to 100)
array[i]=0
vs.
for(elementnumber=0 to 100)
array[elementnumber]=0;
The problem gets worse fast with real examples. Indices are just notation, so treat them as such.
Pointers also require sensible notation. np is just as mnemonic as nodepointer if you consistently use a naming convention from which np means `node pointer' is easily derived. More on this in the next essay.
As in all other aspects of readable programming, consistency is important in naming. If you call one variable maxphysaddr, don't call its cousin lowestaddress.
Finally, I prefer minimum-length but maximum-information names, and then let the context fill in the rest. Globals, for instance, typically have little context when they are used, so their names need to be relatively evocative. Thus I say maxphysaddr (not MaximumPhysicalAddress) for a global variable, but np not NodePointer for a pointer locally defined and used. This is largely a matter of taste, but taste is relevant to clarity.
I eschew embedded capital letters in names; to my prose-oriented eyes, they are too awkward to read comfortably. They jangle like bad typography.
The use of pointers
C is unusual in that it allows pointers to point to anything. Pointers are sharp tools, and like any such tool, used well they can be delightfully productive, but used badly they can do great damage (I sunk a wood chisel into my thumb a few days before writing this). Pointers have a bad reputation in academia, because they are considered too dangerous, dirty somehow. But I think they are powerful notation, which means they can help us express ourselves clearly.
Consider: When you have a pointer to an object, it is a name for exactly that object and no other. That sounds trivial, but look at the following two expressions:
np
node[i]
The first points to a node, the second evaluates to (say) the same node. But the second form is an expression; it is not so simple. To interpret it, we must know what node is, what i is, and that i and node are related by the (probably unspecified) rules of the surrounding program. Nothing about the expression in isolation can show that i is a valid index of node, let alone the index of the element we want. If i and j and k are all indices into the node array, it's very easy to slip up, and the compiler cannot help. It's particularly easy to make mistakes when passing things to subroutines: a pointer is a single thing; an array and an index must be believed to belong together in the receiving subroutine.
An expression that evaluates to an object is inherently more subtle and error-prone than the address of that object. Correct use of pointers can simplify code:
parent->link[i].type
vs.
lp->type.
If we want the next element's type, it's
parent->link[++i].type
or
(++lp)->type.
i advances but the rest of the expression must stay constant; with pointers, there's only one thing to advance.
Typographic considerations enter here, too. Stepping through structures using pointers can be much easier to read than with expressions: less ink is needed and less effort is expended by the compiler and computer. A related issue is that the type of the pointer affects how it can be used correctly, which allows some helpful compile-time error checking that array indices cannot share. Also, if the objects are structures, their tag fields are reminders of their type, so
np->left
is sufficiently evocative; if an array is being indexed the array will have some well-chosen name and the expression will end up longer:
node[i].left.
Again, the extra characters become more irritating as the examples become larger.
As a rule, if you find code containing many similar, complex expressions that evaluate to elements of a data structure, judicious use of pointers can clear things up. Consider what
if(goleft)
p->left=p->right->left;
else
p->right=p->left->right;
would look like using a compound expression for p. Sometimes it's worth a temporary variable (here p) or a macro to distill the calculation.
Procedure names
Procedure names should reflect what they do; function names should reflect what they return. Functions are used in expressions, often in things like if's, so they need to read appropriately.
if(checksize(x))
is unhelpful because we can't deduce whether checksize returns true on error or non-error; instead
if(validsize(x))
makes the point clear and makes a future mistake in using the routine less likely.
Comments
A delicate matter, requiring taste and judgement. I tend to err on the side of eliminating comments, for several reasons. First, if the code is clear, and uses good type names and variable names, it should explain itself. Second, comments aren't checked by the compiler, so there is no guarantee they're right, especially after the code is modified. A misleading comment can be very confusing. Third, the issue of typography: comments clutter code.
But I do comment sometimes. Almost exclusively, I use them as an introduction to what follows. Examples: explaining the use of global variables and types (the one thing I always comment in large programs); as an introduction to an unusual or critical procedure; or to mark off sections of a large computation.
There is a famously bad comment style:
i=i+1; /* Add one to i */
and there are worse ways to do it:
/**********************************
* *
* Add one to i *
* *
**********************************/
i=i+1;
Don't laugh now, wait until you see it in real life.
Avoid cute typography in comments, avoid big blocks of comments except perhaps before vital sections like the declaration of the central data structure (comments on data are usually much more helpful than on algorithms); basically, avoid comments. If your code needs a comment to be understood, it would be better to rewrite it so it's easier to understand. Which brings us to
Complexity
Most programs are too complicated - that is, more complex than they need to be to solve their problems efficiently. Why? Mostly it's because of bad design, but I will skip that issue here because it's a big one. But programs are often complicated at the microscopic level, and that is something I can address here.
Rule 1. You can't tell where a program is going to spend its time. Bottlenecks occur in surprising places, so don't try to second guess and put in a speed hack until you've proven that's where the bottleneck is.
Rule 2. Measure. Don't tune for speed until you've measured, and even then don't unless one part of the code overwhelms the rest.
Rule 3. Fancy algorithms are slow when n is small, and n is usually small. Fancy algorithms have big constants. Until you know that n is frequently going to be big, don't get fancy. (Even if n does get big, use Rule 2 first.) For example, binary trees are always faster than splay trees for workaday problems.
Rule 4. Fancy algorithms are buggier than simple ones, and they're much harder to implement. Use simple algorithms as well as simple data structures.
The following data structures are a complete list for almost all practical programs:
array
linked list
hash table
binary tree
Of course, you must also be prepared to collect these into compound data structures. For instance, a symbol table might be implemented as a hash table containing linked lists of arrays of characters.
Rule 5. Data dominates. If you've chosen the right data structures and organized things well, the algorithms will almost always be self-evident. Data structures, not algorithms, are central to programming. (See Brooks p. 102.)
Rule 6. There is no Rule 6.
Programming with data
Algorithms, or details of algorithms, can often be encoded compactly, efficiently and expressively as data rather than, say, as lots of if statements. The reason is that the complexity of the job at hand, if it is due to a combination of independent details, can be encoded. A classic example of this is parsing tables, which encode the grammar of a programming language in a form interpretable by a fixed, fairly simple piece of code. Finite state machines are particularly amenable to this form of attack, but almost any program that involves the `parsing' of some abstract sort of input into a sequence of some independent `actions' can be constructed profitably as a data-driven algorithm.
Perhaps the most intriguing aspect of this kind of design is that the tables can sometimes be generated by another program - a parser generator, in the classical case. As a more earthy example, if an operating system is driven by a set of tables that connect I/O requests to the appropriate device drivers, the system may be `configured' by a program that reads a description of the particular devices connected to the machine in question and prints the corresponding tables.
One of the reasons data-driven programs are not common, at least among beginners, is the tyranny of Pascal. Pascal, like its creator, believes firmly in the separation of code and data. It therefore (at least in its original form) has no ability to create initialized data. This flies in the face of the theories of Turing and von Neumann, which define the basic principles of the stored-program computer. Code and data are the same, or at least they can be. How else can you explain how a compiler works? (Functional languages have a similar problem with I/O.)
Function pointers
Another result of the tyranny of Pascal is that beginners don't use function pointers. (You can't have function-valued variables in Pascal.) Using function pointers to encode complexity has some interesting properties.
Some of the complexity is passed to the routine pointed to. The routine must obey some standard protocol - it's one of a set of routines invoked identically - but beyond that, what it does is its business alone. The complexity is distributed.
There is this idea of a protocol, in that all functions used similarly must behave similarly. This makes for easy documentation, testing, growth and even making the program run distributed over a network - the protocol can be encoded as remote procedure calls.
I argue that clear use of function pointers is the heart of object-oriented programming. Given a set of operations you want to perform on data, and a set of data types you want to respond to those operations, the easiest way to put the program together is with a group of function pointers for each type. This, in a nutshell, defines class and method. The O-O languages give you more of course - prettier syntax, derived types and so on - but conceptually they provide little extra.
Combining data-driven programs with function pointers leads to an astonishingly expressive way of working, a way that, in my experience, has often led to pleasant surprises. Even without a special O-O language, you can get 90% of the benefit for no extra work and be more in control of the result. I cannot recommend an implementation style more highly. All the programs I have organized this way have survived comfortably after much development - far better than with less disciplined approaches. Maybe that's it: the discipline it forces pays off handsomely in the long run.
Include files
Simple rule: include files should never include include files. If instead they state (in comments or implicitly) what files they need to have included first, the problem of deciding which files to include is pushed to the user (programmer) but in a way that's easy to handle and that, by construction, avoids multiple inclusions. Multiple inclusions are a bane of systems programming. It's not rare to have files included five or more times to compile a single C source file. The Unix /usr/include/sys stuff is terrible this way.
There's a little dance involving #ifdef's that can prevent a file being read twice, but it's usually done wrong in practice - the #ifdef's are in the file itself, not the file that includes it. The result is often thousands of needless lines of code passing through the lexical analyzer, which is (in good compilers) the most expensive phase.
Just follow the simple rule.
Rob Pike / rob@research.att.com / Sep. 1987
RuthAron Pairs
Aug17200700001: /*
00002: * Program: ruth-aaron.cc
00003: * ----------------------
00004: * Presents a short program which identifies
00005: * and prints out the first 20 Ruth-Aaron pairs.
00006: */
00007:
00008: #include<iostream>
00009: #include<cmath>
00010: #include<vector>
00011: using namespace std;
00012:
00013:
00014: bool isPrime(int n)
00015: {
00016: if (n == 1) return false;
00017: if (n == 2) return true;
00018:
00019: for (int k = 2; k <= sqrt(double(n)) + 1; k++) {
00020: if (n % k == 0)
00021: return false;
00022: }
00023:
00024: return true;
00025: }
00026:
00027: int factorSum(int num)
00028: {
00029: int sum = 0;
00030: if(isPrime(num))
00031: sum = num;
00032: else {while(num != 1){
00033: for(int i =2; i <= num;i++) {
00034: if(isPrime(i) && (num % i == 0)) {
00035: sum += i;
00036: num /= i;
00037: break;
00038: }
00039: }
00040: }
00041: }
00042: return sum;
00043: }
00044:
00045: int main()
00046: {
00047: int kNumRuthAaronPairsNeeded = 60;
00048: // defines a constant variable so we know what 20 really is
00049:
00050: cout << "This program prints out the first "
00051: << kNumRuthAaronPairsNeeded
00052: << " Ruth-Aaron pairs...." << endl;
00053: cout << "And here they are:" << endl;
00054: cout << "------------------" << endl;
00055:
00056: cout << endl;
00057: int j = 1;
00058: while(kNumRuthAaronPairsNeeded != 0) {
00059:
00060: if(factorSum(j) == factorSum(j + 1)) {
00061: cout << " " << 61 - kNumRuthAaronPairsNeeded <<"). "
00062: << j << " and " << j + 1 << endl;
00063: kNumRuthAaronPairsNeeded--;
00064: }
00065: j = j + 1;
00066: }
00067: cout << endl;
00068:
00069: cout << "All done!" << endl;
00070: return 0;
00071: }