setsockopt/getsockopt APIs about SO_SNDBUF/SO_RCVBUF option

Time: 2014 年 5 月 12 日
Category: tcp/ip internals

今天写单元测试时发现setsockopt/getsockopt在设置SO_SNDBUF/SO_RCVBUF参数时不对应,getsockopt拿到的并不是setsockopt时的值

下面代码可以说明这个问题:

    on = 1024;
    BUG_ON(tcp_setopt(sfd, TP_NOBLOCK, &on, sizeof(on)));
    BUG_ON(tcp_getopt(sfd, TP_NOBLOCK, &on, &optlen));
    BUG_ON(on != 1024);

内核最终设置的值会发生变化,例如可能double,等等,redhat上的BUG反映:https://bugzilla.redhat.com/show_bug.cgi?id=170694

我的内核版本是:Centos 6.5 2.6.32-431.5.1.el6.x86_64 #1 SMP

翻了一下内核源码,linux-2.6.32-358.6.2.el6/net/core/

/*
 *	This is meant for all protocols to use and covers goings on
 *	at the socket level. Everything here is generic.
 */

int sock_setsockopt(struct socket *sock, int level, int optname,
		    char __user *optval, unsigned int optlen)
{
	// ...
	case SO_SNDBUF:
		/* Don't error on this BSD doesn't and if you think
		   about it this is right. Otherwise apps have to
		   play 'guess the biggest size' games. RCVBUF/SNDBUF
		   are treated in BSD as hints */

		if (val > sysctl_wmem_max)
			val = sysctl_wmem_max;
set_sndbuf:
		sk->sk_userlocks |= SOCK_SNDBUF_LOCK;
		if ((val * 2) < SOCK_MIN_SNDBUF)
			sk->sk_sndbuf = SOCK_MIN_SNDBUF;
		else
			sk->sk_sndbuf = val * 2;

		/*
		 *	Wake up sending tasks if we
		 *	upped the value.
		 */
		sk->sk_write_space(sk);
		break;
	// ...

原来如此,内核对输入的参数做了适当调整

Event-driven architecture, state machines et al.

Time: 2014 年 3 月 23 日
Category: nanomsg

http://250bpm.com/blog:25

In my previous blog post I’ve described the problems with callback-based architectures and hinted that the solution may be replacing the callbacks by events and state machines. In this post I would like to discuss the proposed solution in more detail. Specifically, I am going to define what the events and state machines actually are and explain why they are useful.

While the article may be used as an intro to nanomsg’s internal architecture it can be also be though of as an opinion piece of possible interest to anybody dealing with event-driven architectures and/or state machines.

Event-driven architecture

First, let’s define the very notion of an “event”.

In classic object-oriented paradigm “event” translates directly to “message”. Here’s what Wikipedia says: “In OOP, each object is capable of receiving messages, processing data, and sending messages to other objects.”

However, when you look at actual implementation of object-oriented languages like C++ or Java, “messages” are implemented as “methods”, which in the end turn out to be classic function calls.

So, it seems that events are actually function calls and replacing callbacks (= function calls) by events (= function calls) makes no sense. Obviously, there’s some kind of terminological mess here.

In reality, of course, events are not exactly the same as function calls, but in majority of cases they behave in the same way so it’s hard to tell the difference. To sort it out, let’s illustrate the difference using an example.

Imagine method foo() of object A calling method bar() of object B. Object B in turn calls method baz() of object A. It’s a classic example of a callback. The sequence diagram for the scenario looks like this:
events1.png

The obvious ploblem here is that A::baz() is executed while A::foo() is still in progress. What it means is that, on one hand A::baz() may be executed while object A is in inconsistent state (keep in mind that just half of A::foo() have been executed so far!) and on the other hand, state of A changes mysteriosly under the feet of A::foo() while it calls B::bar().

Events, in contrast, are executed in sequential manner. Next event is processed only after first one was fully processed. The sequence diagram for the same scenario would thus look like this:

events2.png

Technically, instead of calling a function there’s an event enqueued. The main event loop processes any events in the event queue in one-by-one manner.

As can be seen the problem encountered previously doesn’t happen any more.

For those who haven’t used event-driven model yet, it may look somewhat hard to implement. However, the implementation is pretty straightforward. Here’s a sample code in C++:

struct event {
    void (*fn) (void *self);
    void *self;
};
queue <event> events;

void event_loop () {
    while (!events.empty ()) {
        events.front ().fn (events.front ().self); events.pop ();
    }
}

void raise (event e) {
    events.push (e);
}

A small technical footnote for implementers of event-driven systems in C: Storing enqueued events in an intrusive containers (queues) seems to be a good idea, especially when doing low-level programming. Put aside the fact that it’s extremely fast, intrusive containers enforce that no event is enqueued twice in parallel. That in turn means that there’s no way to overflow the event queue — the number of events in the queue is strictly limited by the amount of business objects at hand. It’s kind of similar to keeping the stack size limited in kernel-level programming.

Combining events and functions

All the above being said, if you are working with a language where events are not natively supported, turning all the function calls into events is an overkill. Especially, storing all the arguments when event is raised to be retrieved when it is being processed tends to be pretty annoying.

What turned out to be a pretty good solution in nanomsg is arranging all the objects into an ownership tree and using functions when moving from root to leaves and events when moving from leaves to the root:

events4.png

The solution has two nice features:

  1. Most of the invocations in any system are root-to-leave-directed which means that most invocations in the codebase are simple function calls. The events are relatively rare and don’t overwhelm the system.
  2. Given that posting an event is somehow more complex than invoking a function it makes the developer stop and think twice before adding a new leave-to-root-directed invocation. That in turn helps keeping interaction patterns inside the codebase as simple and tree-like as possible.

State machines

As several readers have noted, events alone are sufficient to solve the callback problem and there’s no actual need for full-blown state machines. The question thus is why to use state machines at all.

First, let’s define what we mean by state machine. The “state machines” in computer engineering are not “finite state machines” of automata theory. Forget about Turing-incompleteness of finite state machines or their handiness for implementing regular expressions and tokenisers.

When we refer to state machines we mean objects with small part of their state (typically a single enum) singled out and referred to as the state:

events5.png

As can be seen, nothing changes from the technical point of view. What really happens is that the developer decides what part of the object is going to be the state and communicates that in a formal manner (by naming the member ‘state’), so that any other developer can check the code and understand which member is supposed to be the state.

Singling out the state has nothing to do with automata theory or even with computer science per se. Rather, it has to do with psychology of programming. And given that most developers are not accustomed to think in terms of psychology, let me give a couple of short warm-up examples.

Think of object-oriented programming. There’s no technical reason to split the application into a set of discrete objects. After all, classic procedural programming works no worse that OO programing. The reason why OO programming is preferred is that human brain thinks in terms of objects and manipulating objects, not in terms of object-less actions. By re-arranging a program into objects — which, in reality, means just singling out “this” parameter in a function call — the whole program suddenly becomes much easier to grasp intuitively.

As an analogy, think of human sense of sight. The AI research made it clear that what initially looked like a relatively simple sense actually involves extremely sophisticated, albeit unconscious, computing machinery inside the brain. Using sight we are able to immediately and effortlessly spot minor details that we would have extremely hard time to notice if we were presented with the binary format of the same picture.

The point to take from the analogy is that human ability to deal with certain tasks varies by whole orders of magnitude depending on the presentation of the problem (e.g. picture vs. binary dump of the bitmap). In the case of OO programming we take advantage of the fact that we are very good at manipulating objects. The machinery for that is hard-wired into the brain and we can use it automatically, without thinking about it, or even noticing that any work was done.

But let’s get back to the state machines now. Is there a similar unconscious mechanism in play there?

Well, it turns out that human brain is extremely good at grasping narratives (stories) and fares relatively poor at dealing with almost anything else. The fact is widely acknowledged in social sciences. To give a current example, press deals pretty well with reporting the story (a narrative) of Edward Snowden fleeing American government, but doesn’t do nearly as well with reporting the immensely more important fact (not a narrative) of global surveillance. The audience readily absorbs the story but becomes distracted and even disinterested when presented with abstract facts about inner working of NSA.

That being said, the state machines are an instrument for turning abstract processing mechanisms (objects) into narratives.

All you have to do is to draw all the possible states as boxes and connect them by arrows representing valid state transitions. Let’s have a look at the TCP protocol state machine, for example:

events3.png

If you start with the CLOSED state and follow the arrows you can generate life stories (narratives) of different TCP endpoints. Do that for two minutes and you’ll get an good idea of how TCP endpoint works.

Now imagine that instead of the abstracted state machine you would be presented with the full state of a TCP endpoint (including all the RTOs, CWNDs etc.) and the algorithms to manipulate that state. Because of the missing narrative structure, you would have a very hard time to figure out what’s going on.

To throw in a bit of personal experience, I’ve done some coding with events alone, with no explicit state machines. It turned out to be very hard to get it right. When thinking about it in the retrospect the problem wasn’t caused, interestingly, by the lack of a narrative. Rather, it was the fact that human brain is so good at generating narratives that new short-lived narratives popped into my head all the time. And with no set formally-defined narrative to rely on, they were often mutually incompatible, even contradictory, which in turn led to confusion and incoherent, buggy code.

Combining events and state machines

While it is possible to use events without state machines (as already discussed) or state machines without events (using standard methods instead of events to drive the state machine) the combination of the two results in an extremely powerfull tool for implementing asynchronous systems. In the next blog post I would like to dive into technicalities of such implementation, including topics such as structuring of the code, handling of arguments and return values, nested state machines et c.

Martin Sústrik, August 1st, 2013

眼界和梦想

Time: 2014 年 2 月 22 日
Category: Life

早上一起床,和大叔(大学舍友)聊天,惊闻(only me, maybe…)大飞到CMU大学(卡内基梅隆)读计算机了,很是震撼,包括上一年小明考上研究生了,等等,这些事情对我来说都是没怎么想到的。

到底是什么因素,导致我们走在了不同的路上,是因为眼界和梦想吗?

2013总结

Time: 2013 年 12 月 22 日
Category: Life

时间过的真快,转眼就已经毕业一年半了,回忆一下2013年,主要是工作,顺带对一些家庭和生活的感悟,算是对自己的一个总结吧。

一、关于工作

2013年的工作主题:

  1. 理解基础系统

2013年所做的过的事情:

  1. 开发分布式存储系统
  2. 研究git核心代码
  3. 研究c语言编译器lcc源码,go语言编译器runtime的实现,lua虚拟机的实现
  4. 研究xv6操作系统源码
  5. 研究linux内核文件系统源码,tcp/ip协议栈的实现,参与社区开发
  6. 研究openstack/分布式存储系统ceph,理解云计算的基础架构
  7. 开发pio

2013年花了大部分时间在基础系统的学习和理解上,目标是初步实现基础完备,为后面的深入研究打下基础。2013年最大的收获就是找准了自己对方法论的定义:是一套指导编码和实施的原则和方法的集合。以及找到了积累方法论的途径:在工作中不断的理解,反思,和总结。

2013年的不足和自我总结:

  1. 悟性不够,思维不够敏捷,需要提高大脑的活跃度,善于思考
  2. 代码产出低
  3. 学习的过程没有记录,所谓好记性不如烂笔头

2014年的工作主题:

  1. 基础架构
  2. 软件质量保障

2014年需要TODO的:

  1. 坚持每周写博文
  2. 15w的代码产出
  3. 关注基础架构,并思考实际编码中的软件质量保障,希望可以积累出一些相关的方法论。
  4. 多看书,上豆瓣积累一个书单

二、关于家庭和生活

9月份后给家里打电话的次数就越来越少了,一个月偶尔一两次,这一点做的很不好。应该多关心家里,父母身体不好,做孩子的应该上心。另外,和老婆沟通的也不够,导致她对毕业季的找工作安排不妥当,需要检讨。

其次,没能够坚持运动,注意身体,避免职业病。

htc touch hd升级安卓2.3.7

Time: 2013 年 11 月 23 日
Category: Life

最近为了响应公司的号召,体验一把微米,于是决心研究下怎么把老爷机htc touch hd从屌丝wm6.1逆袭成安卓2.3.6。牛人早就把材料准备好了,移步请看:

  1. [ROM] {NAND} AOSP-GBX | < Kernel: 2.6.27 > | 2013.01.20
  2. Installation guide from Windows Mobile to the Android

研究了一下,安卓过程大概如下:

  1. 首先升级机器的HardSPL,我的是SPL1.56,对于经常刷机的机油来说,1.56不陌生了,我的默认就是。
  2. win xp请安装ActiveSync 4.5,vista/win7/win8会自动安装驱动,一般在你手机USB连接电脑时,自动搜索和更新驱动程序。然后HTC与电脑连接
  3. 从googlecode上下载rom文件,http://code.google.com/p/nandroid-for-htc-blackstone/downloads/list,我下载的是  FagyiDROID-V1-ODEXED.zip
  4. 从mediafile上下载其他刷机需要的文件,http://www.mediafire.com/?85m45naii6dpk,只需要三个即可:Adb-fastboot.zip,BLACIMG.NBH,BlackstoneAdvancedRUU和recovery.img

其他过程按照xda指引来做就是了,非常简单。刚装完是英文的,重启后变成中文了,真不错。

密码保护:谈软件质量保障

Time:
Category: Programming practices

这是一篇受密码保护的文章。您需要提供访问密码:


向beyond致敬–被影响的一代人

Time: 2013 年 11 月 22 日
Category: Life

一直以来可以坚持的事情有三:

  1. 看星爷的电影
  2. 听beyond的歌
  3. 陈扬的心灵地图电台

陈老师的电台在高二那年断了,非常可惜,虽然这几年还时不时拨弄一下收音机,但是再也听不到能匹美陈老师那么富有磁性的声音了,^_^。在那个连mp3和md都还不知道是什么东东的年代,大部分学生对心灵地图这个节目应该很熟悉的

另外就是beyond乐队,受beyond影响最深的一代应该80后,但是到我们读初中那会,很多人仍然是beyond的fans,主要是因为beyond的歌确实很励志。我记得我初中的第一个晚上时,因为是农村,条件没城里那么好,宿舍就是食堂二层临时搭建的工床,宿舍里热闹的炸开了窝,这个时候就两个人在玩弄磁带机,一个是我,在听心灵地图,另一个是我旁边的哥们,在听别安的歌,后来我们成了兄弟。

每个人都有一段很长的故事,偶尔回忆一下。。。

c语言可变参数类型

Time: 2013 年 11 月 21 日
Category: Programming practices

使用可变参数时,va_arg宏的第2个参数不能被指定为char、short或者float类型。因为在可变参数函数传递时,char和short会被提升为int类型,而float会被提升到double类型 。

例如,以下的代码是错误的

a = va_arg(ap, char);

因为我们无法传递一个char类型参数,如果传递了,它将会被自动转化为int类型。上面的式子应该写成:

a = va_arg(ap, int);

coredump记

Time: 2013 年 11 月 17 日
Category: Programming practices

造成程序coredump的成因很多:

  1. index类的索引造成的数组越界
  2. 多线程环境下使用了非线程安全的函数
  3. 多线程与释构
  4. 多线程环境下的数据互斥
  5. 空指针
  6. 对象delete后仍然被访问

为了避免这些情况,需要谨慎处理所有相关的资源:

  1. 对象必须全部初始化后才能使用
  2. 尽可能的不要缓存对象,使用统一的内存管理来避免内存碎片化的问题。除非你确定对象被重用时,相关的资源被正确的初始化。
  3. 对象该由谁创建,由谁来释放,delete需要全局感知,避免游离的对象指针
  4. 对于共享的资源,注意互斥和死锁
  5. 全局对象使用指针,统一new和delete
  6. 特别注意资源进入临界区的时机,例如初始化工作确认完毕。

数据+结构

Time: 2013 年 10 月 21 日
Category: Programming practices

数据是业务流密切相关的东西,容易发生变化,而结构通常都是大同小异。很多时候我们不得不为了各种各样的需求而设计复杂的数据结构。而且一旦它们之间呈现出复杂的依赖关系时,你会发现这就是一个噩梦:

  • 结构的逻辑变的相当复杂,而且难以分离
  • code review变得困难,让质量无法保证,代码难以维护

能不能让这个事情变的更简单一些呢?看看内核的红黑树和链表:

  • 链表
    • http://lxr.free-electrons.com/source/include/linux/list.h
  • 红黑数
    • http://lxr.free-electrons.com/source/include/linux/rbtree.h
    • http://lxr.free-electrons.com/source/lib/rbtree_test.c

一旦你习惯了这种设计模式,会发现它很有好处,而且可以轻松巧妙的解决很多问题。例如,在网络编程中,定时器是频繁被用到的一个组件,你会怎么设计它?

第 1 页,共 9 页12345...最旧 »