Ehren's Blog

Function declaration escape analysis

Posted in Seneca by ehren on April 4, 2010

It’s been quite a while since I’ve blogged. Frankly I’ve got behind with my work. However, I believe I have hit a huge breakthrough finding dead code. I don’t want to get too ahead of myself but if things work as I think, I can now identify every unused function/member function in mozilla-central with a near 0 false positive rate.

I’ve mentioned many times that assignments to function pointers are a huge pain when it comes to recognizing call graph edges. I’ve been able to handle function local address taking for some time now but the problem of functions referenced in global variables (usually jump tables) has proved elusive. In fact, it’s currently not possibly to process global variables at all with Treehydra. This leads to thousands of false positives in the analysis and a bunch of special case handling.

It suddenly dawned on me that GCC might have already done the work for me. ipa-type-escape.c, which “determines which types in the program contain only instances that are completely encapsulated by the compilation unit” seems to fit the bill, but I ended up with less than stellar results trying to print out any escaping function declarations. In fact, I don’t think it really is useful for my purposes.

However, the technique of processing global variables using the varpool is exactly what I needed. In fact, I can very easily write a GCC plugin to print off all the globally escaping declarations in a compilation unit. Unfortunately, getting a plugin to build that uses more than the standard set of routines is a bit of a challenge (more stuff needs to be linked in) so I just hacked it into dehydra_plugin.c. It works though!

Here’s the code:

static tree find_funcs_callback(tree *tp, int *walk_subtrees, void *data) {
  tree t = *tp;
    // dump function (I use dehydra specific code)
  return NULL_TREE;
static void find_funcs(tree decl) {
  walk_tree(&decl, find_funcs_callback, NULL, NULL);

// This needs to go in the execute function of an IPA pass.
// I just stuck it into dehydra's gcc_plugin_post_parse
struct varpool_node *vnode;

Now all I have to do is mark every function printed by this routine as escaping. I’ve been able to match callgraph‘s serialization almost completely so this will be a breeze.

I’ve also found that the current way callgraph treats inheritance chains (using a table of ‘implementor’ – ‘interface’ pairs) is not particularly useful for finding dead code. In fact, a whole bunch of functions in the implementors table are being left out of the node table. I’ve been able to rectify this by treating method overriding just like any call edge. In particular, I have the base class ‘call’ the derived class which will fit right into my existing algorithm (becuase of dynamic dispatch, once the base method is called all bets are off on whether or not some derived method is called). In order to get any results previously, I’ve just been identifying base method-derived method pairs by textually matching the prototypes. Looking at my current results, I’ve found a disproportionate number of static methods which suggests that this technique is too conservative. Here’s the old script btw. With these callgraph chages the next one will be much simpler (with no method name parsing!)

I’ve had some success already with dead code in content and particularly in layout, even with this rudimentary script. I also certainly have more to file. With this new approach, though, I think I’ll be able able to find and file all of it by release 1.0.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: