A personal web log, by karpar: 2009

对齐吧，哈！

May162009

struct和 union用 sizeof 看字节对齐
作者：佚名出处：中国自学编程网收集整理发布日期：2007-10-29

union u
{
　double a;
　int b;
};

union u2
{
　char a[13];
　int b;
};

union u3
{
　char a[13];
　char b;
};

　　都知道union的大小取决于它所有的成员中，占用空间最大的一个成员的大小。所以对于u来说，大小就是最大的double类型成员a了，所以sizeof(u)=sizeof(double)=8。但是对于u2和u3，最大的空间都是char[13]类型的数组，为什么u3的大小是13，而u2是16呢？关键在于u2中的成员int b。由于int类型成员的存在，使u2的对齐方式变成4，也就是说，u2的大小必须在4的对界上，所以占用的空间变成了16（最接近13的对界）。

　　结论：复合数据类型，如union，struct，class的对齐方式为成员中对齐方式最大的成员的对齐方式。

　　顺便提一下CPU对界问题，32的C++采用8位对界来提高运行速度，所以编译器会尽量把数据放在它的对界上以提高内存命中率。对界是可以更改的，使用#pragma pack(x)宏可以改变编译器的对界方式，默认是8。C++固有类型的对界取编译器对界方式与自身大小中较小的一个。例如，指定编译器按2对界，int类型的大小是4，则int的对界为2和4中较小的2。在默认的对界方式下，因为几乎所有的数据类型都不大于默认的对界方式8（除了long double），所以所有的固有类型的对界方式可以认为就是类型自身的大小。中国自学编程网 www.zxbc.cn 更改一下上面的程序：

#pragma pack(2)
union u2
{
　char a[13];
　int b;
};

union u3
{
　char a[13];
　char b;
};
#pragma pack(8)

　　由于手动更改对界方式为2，所以int的对界也变成了2，u2的对界取成员中最大的对界，也是2了，所以此时sizeof(u2)=14。

　　结论：C++固有类型的对界取编译器对界方式与自身大小中较小的一个。

　　9、struct的sizeof问题

　　因为对齐问题使结构体的sizeof变得比较复杂，看下面的例子：(默认对齐方式下)

struct s1
{
　char a;
　double b;
　int c;
　char d;
};

struct s2
{
　char a;
　char b;
　int c;
　double d;
};

　　同样是两个char类型，一个int类型，一个double类型，但是因为对界问题，导致他们的大小不同。计算结构体大小可以采用元素摆放法，我举例子说明一下：首先，CPU判断结构体的对界，根据上一节的结论，s1和s2的对界都取最大的元素类型，也就是double类型的对界8。然后开始摆放每个元素。

　　对于s1，首先把a放到8的对界，假定是0，此时下一个空闲的地址是1，但是下一个元素d是double类型，要放到8的对界上，离1最接近的地址是8了，所以d被放在了8，此时下一个空闲地址变成了16，下一个元素c的对界是4，16可以满足，所以c放在了16，此时下一个空闲地址变成了20，下一个元素d需要对界1，也正好落在对界上，所以d放在了20，结构体在地址21处结束。由于s1的大小需要是8的倍数，所以21-23的空间被保留，s1的大小变成了24。 [Page]

　　对于s2，首先把a放到8的对界，假定是0，此时下一个空闲地址是1，下一个元素的对界也是1，所以b摆放在1，下一个空闲地址变成了2；下一个元素c的对界是4，所以取离2最近的地址4摆放c，下一个空闲地址变成了8，下一个元素d的对界是8，所以d摆放在8，所有元素摆放完毕，结构体在15处结束，占用总空间为16，正好是8的倍数。

　　这里有个陷阱，对于结构体中的结构体成员，不要认为它的对齐方式就是他的大小，看下面的例子：

struct s1
{
　char a[8];
};

struct s2
{
　double d;
};

struct s3
{
　s1 s;
　char a;
};

struct s4
{
　s2 s;
　char a;
};

　　s1和s2大小虽然都是8，但是s1的对齐方式是1，s2是8（double），所以在s3和s4中才有这样的差异。

　　所以，在自己定义结构体的时候，如果空间紧张的话，最好考虑对齐因素来排列结构体里的元素。

结论:struct 里面的元素是顺序存储的,每个元素占用的字节数根据对齐字节数N(struct 里占用字节最多的元素与CPU对齐字节数中较小的一个)进行调整.如果从左至右M个元素加起来的字节数大于N,则按从右至左舍去K个元素直至M-K个元素加起来的字节数小于等于N,如果等于N则不用字节填充,小于N则把M-K-1的元素填充直至=N.

学习OpenCV遇到一段内存分配alignment的代码

KARPAR 0 评论

使用的是C语言中的malloc函数, 在分配的时候多申请了CV_MALLOC_ALIGN*((size >= 4096) + 1) + sizeof(char*)
大小的空间. 多申请空间的用处暂时先不分析.

下面的cvAlignPtr函数用于将指针对其到CV_MALLOC_ALIGN边界, 对于我们常规的PC来说是32bit, 也就是4字节.
cvAlignPtr函数在后面会详细讨论.

下面语句将ptr0记录到(ptr - sizeof(char*)), 可以把它看作一个指针. 最后返回ptr.
细心的朋友可能会发现, 前面malloc分配的是ptr0, 现在返回的却是ptr, 这个是为什么呢?

这个的原因还是先放下(我也不懂), 但是返回ptr而不返回ptr0带来的影响至少有2个:

1. 返回的ptr指针不能通过C语言的free函数释放(这也是cvAlloc/cvFree必须配对使用的原因).
2. 在cvFree的时候, 可以根据(ptr - sizeof(char*))对应的值来检测该内存是不是由icvDefaultAlloc申请.

这样应该说可以增加程序的健壮性, icvDefaultFree可以不傻瓜似的对于任何指针都进行释放.

#include <stdio.h>
#include <assert.h>
#include <malloc.h>

#define CV_MALLOC_ALIGN 32

inline void* cvAlignPtr( const void* ptr, int align=32 )
{
  assert( (align & (align-1)) == 0 );
  return (void*)( ((size_t)ptr + align - 1) & ~(size_t)(align-1) );
}

static void*
alloc(int size) {
  char *ptr, *ptr0 = (char*)malloc(
    (size_t)(size + CV_MALLOC_ALIGN*((size >= 4096) + 1) + sizeof(char*)));

  if( !ptr0 )
    return 0;

  // align the pointer                                                                                                              
  ptr = (char*)cvAlignPtr(ptr0 + sizeof(char*) + 1, CV_MALLOC_ALIGN);
  *(char**)(ptr - sizeof(char*)) = ptr0;

  printf("ptr %x \t ptr0 %x\n", ptr, ptr0);
  return ptr;
}

static int
icvDefaultFree( void* ptr)
{
  // Pointer must be aligned by CV_MALLOC_ALIGN                                                                                     
  if( ((size_t)ptr & (CV_MALLOC_ALIGN-1)) != 0 )
    return 3;

  printf("ptr free %x\n", ptr);
  printf("ptr-1 %x\n", *((char **)ptr-1));
  free( *((char**)ptr - 1) );

  return 0;
}

int main(int argc, char *argv[])
{

  char *p = (char *)alloc(4097);
  icvDefaultFree(p);
  return 0;
}

How to setup Google C++ Testing Framework (gtest) on Visual Studio 2005

May112009

KARPAR Programming 0 评论

受教了，呵呵，link的库要一个样子呀，呵呵。
What Arlaharen said was basically right, except he left out the part which explains your linker errors. First of all, you need to build your application without the CRT as a runtime library. You should always do this anyways, as it really simplifies distribution of your application. If you don't do this, then all of your users need the Visual C++ Runtime Library installed, and those who do not will complain about mysterious DLL's missing on their system... for the extra few hundred kilobytes that it costs to link in the CRT statically, you save yourself a lot of headache later in support (trust me on this one -- I've learned it the hard way!).

Anyways, to do this, you go to the target's properties -> C/C++ -> Code Generation -> Runtime Library, and it needs to be set as "Multi-Threaded" for your Release build and "Multi-Threaded Debug" for your Debug build.

Since the gtest library is built in the same way, you need to make sure you are linking against the correct version of it, or else the linker will pull in another copy of the runtime library, which is the error you saw (btw, this shouldn't make a difference if you are using MFC or not). You need to build gtest as both a Debug and Release mode and keep both copies. You then link against gtest.lib/gtest_main.lib in your Release build and gtestd.lib/gtest_maind.lib in your Debug build.

Also, you need to make sure that your application points to the directory where the gtest header files are stored (in properties -> C/C++ -> General -> Additional Include Directories), but if you got to the linker error, I assume that you already managed to get this part correct, or else you'd have a lot more compiler errors to deal with first.

from：

http://stackoverflow.com.sixxs.org/questions/531941/how-to-setup-google-c-testing-framework-gtest-on-visual-studio-2005/532124

ido-mode rocks

May012009

KARPAR Emacs 0 评论

Overview of Ido

;;in the .emacs
(require 'ido)
(ido-mode t)
To switch between buffers C-x b:

type some characters appearing in the buffer name, RET to visit the buffer in the front the list.
use C-s (next) or C-r (previous) to move through the list.
[Tab] display possible completion in a buffer (or visit the buffer if there is only one possible completion).
use C-f to fall back to find file or C-b to fall back to switch to buffer.
To find a file C-x C-f:

type some characters appearing in the file name, RET to open the file in the front of the list.
C-s (next) or C-r (previous) to move through the list.
[Tab] display possible completion in a buffer (or open the file or go down the directory if there is only one possible completion).
type RET to go down inside the directory in front of the list.
type [backspace] to go up to the parent directory.
type two slashes to go to the root directory.
if you feel lost type C-f to go back temporarly to the normal find-file
do C-x C-d to enter Dired in this directory
to create a new file matching the text you entered do C-j (note that this is needed if the text you entered matches an existing file, because simply typing enter would rather open the existing one)
After C-x C-f in ido-mode, M-r doesn’t work as expected. With the default find-file, you can regexp search for recently opened files. With ido, the file names do show up in ido-work-file-list but I always get a ‘no earlier matching history item’ when searching for a history item.

More about pionter and array

KARPAR 0 评论

#include <stdio.h>


void strSort(u_char **a, int n);

typedef unsigned char u_char;
#define SIZE 5
int main() 
{
  int i,j;
  
/*   for (i = 0; i != 0; i < 2) */
/*     i++; */
/*   printf("%d\n", i); */

  char* a[2] = {"abc", "ef"};
  
  for (i = 0; i < 2; i++){
    
    for (j = 0; j < 3; j++) 
      printf("%x, ", &a[i][j]);  
    printf("\n");
  }
  

  char **pa = a;
  
  char **p2 = pa + 1;
  
  printf("%x, ", a);
  printf("%x, ", pa);
  printf("%x \n", p2);

  printf("%x, ", *p2 - 1);
  printf("%x, ", p2[-1]);
  printf("%x,\n", p2[0]);
  
  printf("%x, %x\n", *p2[-1] + 1, *p2[0]+1);

  printf("%x, %x\n", *(p2[-1] + 1), *(p2[0]+1));

///
  char b[2][3] = { "aa", "bb"};
  
  char (*pb)[3] = b;

  printf("%c,  %c", pb[0][1], (*(pb + 1))[1]);


////

  u_char *tobesort[SIZE] = {"cd", "ab", "ef", "ac", "dg"};
  
  strSort(tobesort, SIZE);
  for (i = 0; i < SIZE; i++)
    printf("%s, ", tobesort[i]);
  printf("\n");
  return 0;
  
}

#define swap(a, b, c) c = a, a = b, b = c

void strSort(u_char **a, int n)
{
  /* insertion sort of string */
  
  u_char **aj, **ak;
  u_char *s, *t, *tmp;
  
  
  for (aj = a+1; n-- > 1; aj++)
    for (ak = aj; ak > a; ak--) {
      for (s = ak[0], t = ak[-1]; *s && *t ; t++, s++)
        if (*s != *t)
          break;
      if (*s >= *t)
        break;
      swap(ak[0], ak[-1], tmp);
    }
}

c/c++ array memory allocation

Apr292009

KARPAR Programming 0 评论

By declaring char *a[] = {"aa”，“bb"},we get a memory layout

By declaring char ttt[3][3] = {...}, we get a memory layerout

所以前者获得的不是连续内存，而后者是连续内存。a[0]存放的是指向char *类型的指针，而ttt[0][0]存放就是字符内容(char)。a的地址和a[0]不同，而ttt的地址和k的地址一样。当声明char *pa = (char *)a, char * pt = (char *)ttt的时候，pa的地址等于a的地址，pt的地址和ttt的地址一样。由于a[0]的类型为char *,所以pa的类型转换为为(char **)后就可以获得一个指向string的指针。由此引出qsort函数的使用问题，重点当然是compare函数的区别。直接上列子，^_^

char a[1000][20];
qsort(a,1000,sizeof(char)*20,comp);
int comp(const void *a,const void *b
{
return strcmp((char *)a,(char *)b);
}

char* a[] = {....};

int cmp (const void *a , const void *b ){
return strcmp(*(char**)a, *(char**)b);
}

地址的区别例子

#include <stdio.h>

int main() {

char *a[] = {"aa", "bb", "cc"};

char k[3][3] = {"aa", "bb", "cc"};

int i;

char *pa = (char *)a;

int *pINT = (int *)a;

char *pk = (char *)k;

for (i = 0; i < 3; i++) {

printf("a[%d]: %x, ", i, a[i]);

printf("k[%d]: %x \n", i, k[i]);

}

printf("a: %x, ", a);

printf("k: %x\n", k);

printf("pa: %x, ", pa);

printf("pk: %x\n", pk);

printf("*pa: %x, ", *pa);

printf("(char **)pa :%x, ", *(char **)pa);

printf("*pa: %x, ", *pa);

printf("*pk: %x \n", *pk);

printf("pINT %x\n", pINT);

// get second element!

printf("%s\n", *((char **)(pa + sizeof(a[0]))));

return 0;

}

dereferencing pointer to incomplete type

Apr062009

KARPAR Programming 0 评论

这个错误的原因主要是没有把结构体的声明放到头文件，而放在了实现文件里面，当外部进行引用结构体内成员的时候，Gcc就会提示error啦！

News Algorithms Exposed!

Mar132009

KARPAR Programming 1 评论

Y Combinator's Hacker News:

Formula:
(p - 1) / (t + 2)^1.5
Description:
Votes divided by age factor
p = votes (points) from users.
t = time since submission in hours.
p is subtracted by 1 to negate submitters vote.
age factor is (time since submission in hours plus two) to the power of 1.5.

Reddit:

Formula:

Description:

First of all, the time 7:46:43 am on December 8th 2005 is a constant used to determine the relative age of a submission. (It is likely the time the site launched but I have not been able to confirm this) The time the story was submitted minus the constant date is t_s. t_s works as the force that pulls the stories down the frontpage.
y represents the relationship of up votes to down votes.
45000 is the amount of seconds in 12.5 hours. This constant is used in combination with yt_s to "water down" votes as they are made farther and farther from the time the article was submitted.
log₁₀ is also used to make early votes carry more weight than late votes. In this case, the first 10 votes have exactly as much weight as votes 11 through 101.

StumbleUpon:

Formula:
(Initial stumbler audience / # domain) + ((% stumbler audience / # domain) + organic bonus – nonfriend) – (% stumbler audience + organic bonus) + N

Description:
The initial stumbler "power" (Audience of the initial stumbler divided by the amount of times that stumbler has stumbled the given domain) is added to the sum of all the subsequent stumbler's powers.
Subsequent stumbler power is ((Percentage of audience stumbler makes up divided by the number of times given stumbler has stumbled domain) + a predetermined power boost for using the toolbar - a predetermined power drain if stumblers are connected) + (% of the stumbler audience + a predetermined boost for using the toolbar)
N is a "safety variable" so that the assumed algorithm is flexible. It represents a random number.

C语言运算符表

Mar072009

KARPAR Programming 0 评论

C语言运算符表

运算符按照优先级大小由上向下排列，在同一行的运算符具有相同优先级。第二行是所有的一元运算符。

运算符	解释	结合方式
() [] -> .	括号（函数等），数组，两种结构成员访问	由左向右
! ~ ++ -- + - * & (类型) sizeof	否定，按位否定，增量，减量，正负号，间接，取地址，类型转换，求大小	由右向左
* / %	乘，除，取模	由左向右
+ -	加，减	由左向右
<< >>	左移，右移	由左向右
< <= >= >	小于，小于等于，大于等于，大于	由左向右
== !=	等于，不等于	由左向右
&	按位与	由左向右
^	按位异或	由左向右
\|	按位或	由左向右
&&	逻辑与	由左向右
\|\|	逻辑或	由左向右
? :	条件	由右向左
= += -= *= /= &= ^= \|= <<= >>=	各种赋值	由右向左
,	逗号（顺序）	由左向右

bits mirror

Mar062009

KARPAR Programming 0 评论

0 000 000 000 0
1 001 010 100 4
2 010 100 010 2
3 011 110 110 6
4 100 001 001 1
5 101 011 101 5
6 110 101 011 3
7 111 111 111 7
// data: array
// N:size of array

void mirror(char *data, const int N)
{
 int target = 0;
 for (int position = 0; position< N; position++) {
   if (target > position) 
     swap(data[position], data[target];
   int Mask = N;
   while (target & Mask >>= 1){
     // unset
     target &= ~Mask;
   }
   target |= Mask;
 }
}

a linux kernel macro

Mar032009

KARPAR Linux, Programming 0 评论

/*
* Macro used to acces an element in a list.
* Type casting a '0' to type* then accessing its member....?
*/

//this is a common trick for calculating the offset of a member in a
//struct. The offsetof() macro works this way too.

#define list_entry(ptr, type, member) \
((type *)((char *)(ptr)-(unsigned long)(&((type *)0)->member)))

//you cheat the compiler, telling it to treat a NULL pointer as a pointer
//to the struct. Then you try to access a member of through that pointer
//and store its address. Because the we used NULL, the result is the
//number of bytes between the struct's beginning and the member's address.
//This whole story is expressed in this expression: &((type *)0)->member))
//Now since we got the offset of a specific member in a struct, we need to
//subtract it from the enclosing struct to get that enclosing struct's
//address: (char *)(ptr)-(unsigned long). Finally, we cast the result to
//the desired type.

/*
* Example of its use
*/
struct super_block {
...
struct list_head s_files;
...
} *sb = &some_super_block;

struct file {
...
struct list_head f_list;
...
} *file;

struct list_head *p;

for (p = sb->s_files.next; p != &sb->s_files; p = p->next) {
struct file *file = list_entry(p, struct file, f_list);
do something to 'file'
}

'#' in Gcc

Mar022009

KARPAR Linux 0 评论

#define __A(v) abcd#v
#define __B(v) abcd##v
#define __C(v) v##abcd
这三个区别是:
__A(test1)展开后变成: abcd "test1" (加双引号并插入空格)
__B(test2)展开后变成: abcdtest2
__C(test3)展开后变成: test3abcd

A bit of Code style!

Mar012009

KARPAR Programming 0 评论

> +static int format_decode(const char *fmt, enum format_type *type, int *flags,
> + int *field_width, int *base, int *precision,
> + int *qualifier)

To

struct printf_spec {
enum format_type type;
int flags, field_width, base, precision, qualifier;
};

and then pass things around like

static int format_decode(const char *fmt, struct printf_spec *spec)
{
..

I suspect it would make the code look nicer too (instead of doing

"*base = x", you'd see "spec->base = x" and it would look less like

line noise in the callee, an the caller could justdo a single "struct

format_spec spec = { 0, }" to initialize that thing).

Linux information localization

Feb202009

KARPAR Linux, Programming 0 评论

Linux software information of Chinese language contains a number of steps as follows.

* Preparation:
Check your Linux platform whether to support the Chinese Locale/X Locale, if we do not support, they can be downloaded from the Internet.
* Setup locale:
Our program should import the string header file and define the macro translation:
# include "locale.h"
# include "libintl.h"
# define _ (String) gettext (String)
# define N_ (String) gettext (String)
# define __ (String) (String) to initialize locale,

Setlocale (LC_ALL, ""); or
Setlocale (LC_ALL, "zh_CN.EUC"); or
Setlocale (LC_ALL, "zh_TW.Big5");

The first line is setting an environment variable, LC_ALL, or special options LC_CTYPE, LC_NUMERIC, LC_TIME, LC_COLLATE, LC_MONETARY, LC_MESSAGES initialization. The other two directly set GB / GBK or Big5 as locale. For design flexibility, it is best to use the first approach.

* Specify location information:
extracts information specified by the directories and files
#Define PACKAGE "test" //this package name
#Define LOCALEDIR "/usr/share/locale"
bindtextdomain(PACKAGE, LOCALEDIR);
textdomain(PACKAGE);

* Designated information translation:
Specified in the software to translate the string, for example,
Printf ( "% s", _ ( "Chinese"));
Here the string "Chinese" has been designated from the locale file to extract information.

* Information Extraction:
Assumptions test.c procedures has been prepared, you can use the command
xgettext-a-o test.po test.c

The information extracted.

* Translation information:
Test.po information document contains the information to be translated:
Msgid "Chinese"
Msgstr ""

It can be translated into Chinese. If at the same time support GB/Big5, also translated into the two documents.
msgid "Chinese"
msgstr "中文"

* Formatting information:
The above-mentioned documents into test.po formatted documents can be used to extract test.mo:
Msgfmt-o test.mo test.po

* Installation information:
copy test.mo the above-mentioned documents to the definition of the locale directory

A personal web log, by karpar

Pages

对齐吧，哈！

学习OpenCV遇到一段内存分配alignment的代码

How to setup Google C++ Testing Framework (gtest) on Visual Studio 2005

ido-mode rocks

More about pionter and array

c/c++ array memory allocation

dereferencing pointer to incomplete type

News Algorithms Exposed!

C语言运算符表

bits mirror

a linux kernel macro

'#' in Gcc

A bit of Code style!

Linux information localization

Tags

Archive