Searching for bugs

In this post, I will talk about another task I’ve been working on:
“Making use of an RCU-protected pointer after passing it to call_rcu() or similar function (“call_rcu_bh()”, “call_rcu_sched()”, “call_srcu()”)”.

First, let’s see what “call_rcu()” does.
The write-side RCU primitives allow the caller to defer an action like deleting a pointer until all pre-existing RCU critical sections have finished execution (meaning that the pointer isn’t used anymore and it is safe to remove it).
RCU’s API provides two ways for this:
– “synchronize_kernel()”
– “call_rcu()”

When “synchronize_kernel()” is called it blocks until the end of all pre-existing read-side RCU critical sections.
But, in some cases, you don’t want to wait and it might be inefficient. So, instead of “synchronize_kernel()” you use “call_rcu()”.
“call_rcu()” invokes the callback function (the second parameter) after all pre-existing RCU critical section have completed execution.

Why do we need “call_rcu_bh()”, “call_rcu_sched()” and “call_srcu()”?
That is because of RCU’s flavors:

RCU-sched is needed for waiting for hardware interrupts.
RCU-bh is needed in cases related with denial of service attacks.
SRCU permits sleeping in RCU read-side critical sections.

The principal thing to do in order to solve this task is to identify when the variable goes dead.
Examples:

/* BUG */
call_rcu(&p->head, func);
/* p doesn’t exist anymore */
p->a = 1;

/* OK */
call_rcu(&p->head, func);
/* we are using another pointer that is stored in the same pointer variable */
p = kmalloc(sizeof(p), GFP_KERNEL);
p->a = 1;

I started with this Coccinelle semantic patch which looks for a “call_rcu()” call and sees if after that call the pointer is used.

@@
identifier f, p;
@@

f(…) {
… when any
* call_rcu((<+…p…+>), …);
… when any
* (<+…p…+>)
… when any
}

For the similar functions I replaced “call_rcu()” with the function of interest.
I didn’t find bugs using this approach, but I will show you some cases that at first sight might seem bugs.

– file “kernel/rcu/rcutorture.c”:

rcu_read_lock(); /* Make it impossible to finish a grace period. */
call_rcu(&rh1, rcu_torture_leak_cb); /* Start grace period. */
local_irq_disable(); /* Make it harder to start a new grace period. */
call_rcu(&rh2, rcu_torture_leak_cb);
call_rcu(&rh2, rcu_torture_err_cb); /* Duplicate callback. */
local_irq_enable();
rcu_read_unlock();

This case (the two calls with rh2) is OK because it is for debugging purposes.

– file “net/bridge/br_multicast.c”:

if (!old)
    goto out;

call_rcu_bh(&mdb->rcu, br_mdb_free);

out:
    rcu_assign_pointer(*mdbp, mdb);

A few lines before the “call_rcu_bh()” call there is a “goto out” so this means that it doesn’t get to the call when the if condition is successful.

– file “arch/powerpc/mm/hugetlbpage.c”:

call_rcu_sched(&(*batchp)->rcu, hugepd_free_rcu_callback);
*batchp = NULL;

Everything OK, it just uses the same pointer variable.

The first approach was the normal one and the simpler one to start with, but every case that I found was OK.

The second approach is the following:
first step: find assignments like this one: “g = call_rcu(…)” (g is a global variable).
second step: find the functions where these assignments are
third step: find the functions where the functions found at the second step are called
fourth step: see if in the functions found at the third step the protected pointer is used in a bad way.

The Coccinelle script used for this:

@ locally @
identifier l;
type t;
position p;
@@

t l;
… when any
call_rcu@p((<+…p…+>), …);

@ globally @
identifier fn;
identifier g != locally.l;
position p2 != locally.p;
@@

fn(…) {
… when any
call_rcu@p2((<+…p…+>), …);
… when any
}

@ other_func @
identifier globally.fn, ff;
@@

ff(…) {
… when any
* fn(…)
… when any
}

In the first rule, I look for uses of “call_rcu()” where it assigns to a local variable. In the second rule, I exclude the cases where the variable might be local (using positions). In the third rule, I search for the calls to the functions found in the second rule.

In a few days I will tell you about two interesting cases that I found using this script and the solutions (or if they need a solution).

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s