Ehren's Blog

The Dead

Posted in Uncategorized by ehren on March 24, 2012

that region where dwell the vast hosts of the dead

Conservatively detecting unused functions in a large codebase requires more information than can be gleamed from a simplistic observation of compilation (using GCC or clang, for example). Functions called by ASM alone are one problem. Functions called only via dlsym are another. These are not insurmountable problems (ASM can be parsed; dlsym accepts a constant identifier).

In addition, mozilla-central contains a large amount of third party code. Removing unused functions in these libraries is at odds with merging upstream changes. There are also large numbers of functions used on one platform (or build setting) but not another. Propagating the appropriate #ifdefs is the solution here but it’s a large task.

I’d like to offer one approach with an eye towards identifying the easiest candidates for removal. This is a combination of some of my existing work, by now admittedly crusty, based on Dehydra and Callgraph, but with a new (too me) simple insight about how to group the members of the subgraph of unused functions: partition them by the “is called by or is a (dead) caller of” relation.

This has the effect that if a false positive is in the results, all functions declared unused because they’ve been transitively called by the false positive get grouped with that false positive.

The output of a tool employing the above is here (source here).

Some notes on the results

The largest partition consists of 541 nodes (warning: giant file). It is the result of a single false positive, legacy_Open, called only via PR_FindFunctionSymbol, a dlsym wrapper. I have attempted to mitigate these situations by colouring green those functions having __attribute__(visibility(“hidden”)) (visibility specified in the flags passed to GCC is also detected). The count of functions having this visibility, but not transitively called by a function that doesn’t, is listed in the ‘Transitively Hidden’ column. This info is useless here, however, because the function is exported using a linker map.

It would be interesting to determine if the visibility is always correctly applied; I seem to remember seeing warnings related to the attribute in the past. There may be technical reasons why internal C++ must be visible but, in any case, its visibility does not preclude its removal. Here’s proof. (Also, I calculate that only 17.4% of mozilla-central has hidden visibility).

Moving on, the next largest partition has 507 nodes. They are all in the pulled-in chromium sources however. Apparently, it is genuinely dead because the minimum ranked note, evrpc_register_rpc, is only used in a test case (via a macro).

xptcstubs’ PrepareAndDispatch is an example of a false positive due to ASM (graph here).

Anyhow, I will leave the reader to examine the rest which gets more interesting once you skip the mega partitions.

Potential improvements

The non-callgraph portion of the tool is really just a couple of hacky scripts. The same thing could be implemented against the DXR callgraph (it would need to determine the targets of function pointers in some regard).

On this topic of handling function pointers, note that the call graph of this tool uses the naive address-taken approach. However, it has been ‘optimized’ for the dead code problem by treating the taking of a function’s address inside the scope of a function as a normal call. This makes the call graph useless for practically anything else but I plan on adding an addressTaker/addressTakee table in the future.

In fact, such an addition would make it easier to implement a simple type-based alias analysis where functions noted to have their address taken at global or local scope are matched against pointers using their return type and the type of their parameters (they’d then be pruned from the address taken list). I’m not sure if there would be problems with this beyond unsafe casting.

(More precise attempts at function pointer may-alias analysis would first require an interprocedural control flow graph but that is the simplest step. I’m somewhat tempted to build an ICFG using a hackish framework where visiting a function body = recompiling a .ii file. Were one to go about this, doing some basic form of interprocedural constant propagation and seeing what unused blocks fall out might be more fruitful.)

In any case, I do have some numbers regarding how many more functions one could expect to declare dead by improving alias information. At one point I had made a change that affected the creation of SQL statements building the graph. By mistake I had left out the generation of SQL for all address taking, regardless of scope. This resulted in the tool flagging just over 50000 functions as dead. Given that it currently flags 11479 out of 187081 (6% — seems low), one might expect to find another 2300 (6% of 39000) or so (these numbers include declarations, definitions, gcc builtins and the like). Although come to think of it, that’s a dubious estimate because function scope address taking is already handled in an, albeit, imprecise manner and the number of functions whose address is taken at file scope is only 5949 (using the 6% figure, only 357 more would be found).

Finally, another area of unnecessary conservatism is that, as you’ll note in graphs such as this one, there is an edge going both ways from a method declared in a base class to its override in a derived class (for example, there is an edge from nsSVGAElement::GetLinkState() to Link::GetLinkState() because of a direct function call, but there are also two edges linking nsSVGAElement::GetLinkState() and nsIContent::GetLinkState() because of their inheritance relationship). The edge really only needs to go from the base to the derived though. Once this is changed, the tool will presumably flag instances where a method overrides a base class method but all calls go through the derived implementation.

RTL level function removal

Posted in Seneca by ehren on April 21, 2010

Over the past few days I’ve been focusing on getting the call graph portion of my dead code analysis in check. It turns out that function local const (or static) initializations are not accessible during later gcc passes. Luckily, walking the front end tree representation, which is accessible via Treehydra‘s process_cp_pre_genericize, does the trick. This takes care of all remaining false positives of which I am aware.

The downside is that going through all these extra tree codes is sloooow. After a bunch of false starts the damn thing is still running on one of the CDOT‘s development machines (probably an 8+ hour compile).

For a little while though, I’ve been thinking of ways to automatically remove identified dead code without actually patching the source. By applying Taras’ assembler name patch to dehydra I can now identify precisely which code to remove. The question now is how to remove it.

My first thought was a hack using objcopy. I could first output a list of the mangled names of dead functions and then, after running a build with -ffunction-sections, run a script like this:

#!/bin/sh
while read asmname
do
  find objdir -name "*.o" | xargs objcopy --remove-section=".text.$asmname"
done < asmlist.txt

(and then relinking). This works, but only for non-member functions.

The other option was some sort of gcc hack to remove as much code as possible when obcopy can’t do the job. I first tried removing every statement in every basic block of the function (see gsi_remove). This seems to only work with a non-branching cfg however (even when I leave a valid return statement). I then tried cgraph_remove_node with an IPA pass plugin which blows up if a function’s referenced anywhere else.

Today I arrived at a solution that, although requiring a direct patch of GCC, seems to be ideal. Surprisingly, it’s possible to hook in right before assembly generation, and it’s easy too:

diff --git a/gcc/final.c b/gcc/final.c
--- a/gcc/final.c
+++ b/gcc/final.c
@@ -4090,16 +4090,34 @@
 {
   if (symbol_queue)
     {
       free (symbol_queue);
       symbol_queue = NULL;
       symbol_queue_size = 0;
     }
 }
+
+static bool
+is_dead(const char* name) 
+{
+  char asmname[100];
+  FILE* fp = fopen("/home/ehren/asmlist.txt", "r");
+  if (!fp)
+    return false;
+
+  while (fscanf(fp, "%s", asmname) != EOF) {
+    if (strcmp(asmname, name) == 0) {
+      fclose(fp);
+      return true;
+    }
+  }
+  fclose(fp);
+  return false;
+}
 
 /* Turn the RTL into assembly.  */
 static unsigned int
 rest_of_handle_final (void)
 {
   rtx x;
   const char *fnname;
 
@@ -4109,17 +4127,19 @@
   x = DECL_RTL (current_function_decl);
   gcc_assert (MEM_P (x));
   x = XEXP (x, 0);
   gcc_assert (GET_CODE (x) == SYMBOL_REF);
   fnname = XSTR (x, 0);
 
   assemble_start_function (current_function_decl, fnname);
   final_start_function (get_insns (), asm_out_file, optimize);
-  final (get_insns (), asm_out_file, optimize);
+  if (!is_dead(fnname)) {
+    final (get_insns (), asm_out_file, optimize);
+  }
   final_end_function ();
 
 #ifdef TARGET_UNWIND_INFO
   /* ??? The IA-64 ".handlerdata" directive must be issued before
      the ".endp" directive that closes the procedure descriptor.  */
   output_function_exception_table (fnname);
 #endif

With this, functions are removed completely (except from the symbol table) and the bodies of virtuals are replaced with a couple words worth of NOPs. Yes, opening up the file with a hardcoded path for every function is ugly but later on I could always read it into a global somewhere else (and do a binary search).

The only downside here is I won’t get any link time errors if a few false positives slip through, as opposed to with objcopy. In my experiments, calling a function that doesn’t exist results in an immediate segmentation fault (makes sense), but storing the return of a NOP-body virtual just leaves you with an uninitialized value.

Hopefully, I’ll soon have some good results on the analysis front to actually test this on Mozilla.

Dead code progress

Posted in Seneca by ehren on April 7, 2010

So far things are on track with my attempts to developed an unused function finding tool. Now that the function pointer/jump table problem has been solved other more subtle issues have come to light.

The first was a problem with callgraph‘s handling of inheritance chains. As I mentioned previously, it was necessary to add each method to the node table (see schema reference) both when the method’s body is processed (as is already the case) but also when the method’s type is processed. At some point I should really develop some tests here but this affects the recognition of pure virtual functions in a number of complicated cases.

However, I also ran into another issue where certain methods were not finding themselves into the inheritance chains in which they belong. This seems to be only when a virtual function overrides a base class function that has not been defined in the ‘next up’ base class (A defines virtual foo, B derives from A, C derives from B and redefines virtual foo). This could be a Treehydra issue or a maybe a problem with the GCC binfo data structure (or maybe I’m just misunderstanding things).

Either way, my solution has been to process all base classes and subclasses of a method both when the type is processed and also when the method body is processed. This appears to be a working solution (although it certainly does not improve callgraph compilation times).

Once this was handled I started to get some pretty good results but I noticed scriptable methods were getting into the mix. After a few hours of fruitless hacking it turned out I just forgot to properly define NS_SCRIPTABLE (since I’m not running a –with-static-checking build). After rebuilding again, I believe I finally attained a 0% false positive rate.

This time I hit another problem though. A bunch of genuinely dead methods turned up by my most recent (defective) analysis were not showing up. In fact, very few methods were showing up at all. Investigating, it turns out I’ve been quite overzealous in marking a method as scriptable. My previous technique was to check if __attribute__((user("NS_script")) was present in the declaration attributes of a function and also to check if it is present in the type attributes of the class and any base class. This excludes a bunch of juicy dead stuff like nsCharsetMenu::SetCharsetCheckmark (gimple isn count 75 ftw) which is a member of a non-scriptable class that derives from two scriptable interfaces (which do not declare SetCharsetCheckmark).

Naturally, the solution when marking methods scriptable because of their base classes is only to mark the method scriptable when the base class declares the method and is scriptable. Come to think of it, I probably don’t even need to do this because of the way I group together base and derived methods.

Anyway, my current status is waiting for a build with these changes to finish. We will see if there are more issues.

Function declaration escape analysis v2

Posted in Seneca by ehren on April 6, 2010

I don’t want to get too excited until I’ve run this through 4000000 loc but I believe I’ve solved the problem of being unable to process global initializations of const/static global variables. Earlier, I posted a message to the GCC mailing list describing my troubles with varpool. I did receive a helpful response in that there is nothing inherent about const/static declarations that would prevent one from getting at the lhs of their initialization.

Today I experimented with walking as many trees in as many places as I could find without much luck. I then tried compiling this sample code with -Wunused:

int foo() {
  return 0;
}

typedef struct {
  int (*p) ();
} Table;

static Table t[] = {
  { foo }
};

As expected, GCC warns about the unused static variable.

Taking a look at toplevel.c It didn’t take too long to find a solution (and this works for const, static, and const static):

Index: gcc/toplev.c
===================================================================
--- gcc/toplev.c  (revision 157978)
+++ gcc/toplev.c  (working copy)
@@ -844,12 +844,26 @@
   return output_something;
 }

+static tree 
+find_funcs_callback(tree *tp, int *walk_subtrees, void *data)
+{
+  tree t = *tp;
+
+  if (TREE_CODE(t) == FUNCTION_DECL)
+    fprintf(stderr, "address held: %s\n", IDENTIFIER_POINTER(DECL_NAME(t)));
+
+  return NULL_TREE;
+}
+
 /* A subroutine of check_global_declarations.  Issue appropriate warnings
    for the global declaration DECL.  */

 void
 check_global_declaration_1 (tree decl)
 {
+  if (DECL_INITIAL(decl))
+    walk_tree(&DECL_INITIAL(decl), find_funcs_callback, NULL, NULL);
+
   /* Warn about any function declared static but not defined.  We don't
      warn about variables, because many programs have static variables
      that exist only to get some text into the object file.  */

I’ve got to fix up the output to match callgraph‘s serialization but this could be it.

Problems with const static initializations

Posted in Seneca by ehren on April 5, 2010

Unfortunately I spoke too soon about developing an airtight dead code finder. The technique of processing file scope variables I mentioned in my previous post has a serious drawback: it doesn’t work for const static data. This is a show stopper when it comes to peeking into most jump tables.

I’ve been able to print type information for all globals using the dehydra hooks placed into c-common.c however it seems like const initializations are not even handled at this level. I have my suspicions that there’s no way to recover the FUNCTION_DECL node in this case, likely because gcc has no use for the info at this level.

Although I may be able to make do by simply manually filtering as many callback functions as I can this approach is not quite ideal. I’ll have to think more about this but I’m now thinking that the lto streamer might be of use. There’s also the chance that there’s another way of using the cgraph to get at this data.

The other possibility is ditching gcc entirely and using elsa to dump the data. I’ll report back when I know more.

Function declaration escape analysis

Posted in Seneca by ehren on April 4, 2010

It’s been quite a while since I’ve blogged. Frankly I’ve got behind with my work. However, I believe I have hit a huge breakthrough finding dead code. I don’t want to get too ahead of myself but if things work as I think, I can now identify every unused function/member function in mozilla-central with a near 0 false positive rate.

I’ve mentioned many times that assignments to function pointers are a huge pain when it comes to recognizing call graph edges. I’ve been able to handle function local address taking for some time now but the problem of functions referenced in global variables (usually jump tables) has proved elusive. In fact, it’s currently not possibly to process global variables at all with Treehydra. This leads to thousands of false positives in the analysis and a bunch of special case handling.

It suddenly dawned on me that GCC might have already done the work for me. ipa-type-escape.c, which “determines which types in the program contain only instances that are completely encapsulated by the compilation unit” seems to fit the bill, but I ended up with less than stellar results trying to print out any escaping function declarations. In fact, I don’t think it really is useful for my purposes.

However, the technique of processing global variables using the varpool is exactly what I needed. In fact, I can very easily write a GCC plugin to print off all the globally escaping declarations in a compilation unit. Unfortunately, getting a plugin to build that uses more than the standard set of routines is a bit of a challenge (more stuff needs to be linked in) so I just hacked it into dehydra_plugin.c. It works though!

Here’s the code:

static tree find_funcs_callback(tree *tp, int *walk_subtrees, void *data) {
  tree t = *tp;
  
  if (TREE_CODE(t) == FUNCTION_DECL) {
    // dump function (I use dehydra specific code)
  }
  
  return NULL_TREE;
}
 
static void find_funcs(tree decl) {
  walk_tree(&decl, find_funcs_callback, NULL, NULL);
}

// This needs to go in the execute function of an IPA pass.
// I just stuck it into dehydra's gcc_plugin_post_parse
struct varpool_node *vnode;
FOR_EACH_STATIC_VARIABLE(vnode)
  find_funcs(DECL_INITIAL(vnode->decl));

Now all I have to do is mark every function printed by this routine as escaping. I’ve been able to match callgraph‘s serialization almost completely so this will be a breeze.

I’ve also found that the current way callgraph treats inheritance chains (using a table of ‘implementor’ – ‘interface’ pairs) is not particularly useful for finding dead code. In fact, a whole bunch of functions in the implementors table are being left out of the node table. I’ve been able to rectify this by treating method overriding just like any call edge. In particular, I have the base class ‘call’ the derived class which will fit right into my existing algorithm (becuase of dynamic dispatch, once the base method is called all bets are off on whether or not some derived method is called). In order to get any results previously, I’ve just been identifying base method-derived method pairs by textually matching the prototypes. Looking at my current results, I’ve found a disproportionate number of static methods which suggests that this technique is too conservative. Here’s the old script btw. With these callgraph chages the next one will be much simpler (with no method name parsing!)

I’ve had some success already with dead code in content and particularly in layout, even with this rudimentary script. I also certainly have more to file. With this new approach, though, I think I’ll be able able to find and file all of it by release 1.0.

Dead code update (and problems finding uninitialized class members)

Posted in Seneca by ehren on February 25, 2010

Well, I’ve fallen way behind with this blog again. Unfortunately, what I’ve done so far hasn’t worked out exactly as expected either. Anyway, here’s an update.

Finding Dead Code
I’ve refined the script I posted previously to take into account the fact that indirect calls to virtual functions register in g++ as calls to the base method (among a few other things). It can be viewed here. Running this new analysis, I’ve learned that finding dead methods is even more difficult than finding dead functions. Why? A tonne of stuff is exported as part of the Mozilla public api. A function or even an entire class may seem dead but it’s likely used via javascript/xpcom inside the mozilla tree (or maybe in an extension).

As far as I know, anything that derives from a class defined in an idl file probably can’t be marked dead and I suspect detecting this case will be most difficult. One thing I could take into account, however, are methods marked as NS_SCRIPTABLE ie __attribute__((user("NS_script"))). It would be easy to add this info to the node table in callgraph. Also, in regard to plain functions, I think it would be prudent to register function local address taking (of functions).

On the bright side I think I’ve found a handful of unused functions (view sample output of the script showing these here). I may be able to gleam more info once I’m able to actually query my paths table though. Unfortunately, the latest incarnation of the script is only about 85% finished:

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                                                                         
28977 ehren     20   0 17804  12m 2132 R 100.1  0.4   2331:07 python

That’s CPU time!

Uninitialized class members
I’ve also put a bit of work into a more immediate analysis. Bug 525063 proposes an error for any class with uninitialized data members. It seemed like an interesting project perhaps as an introduction to ESP analysis (especially since the only state transitions are assignments and passing by reference). In fact, dmandelin’s already done work on an analysis for uninitialized variables in general which I originally attempted to adapt. This was not the easiest task however, although I have just recently learned why I was not successful (I think it relates to a malfunctioning of create_decl_set which I have yet to diagnose).

Giving up on ESP for the moment, I tried my hand doing things the simple way. That is, if there’s any code at all in the constructor that initializes a variable, then I consider that variable initialized.

To accomplish this, I first wrote a rather ugly and unreadable version of the script. I then polished it up a bit. My days of using JS arrays for maps and sets is hopefully over and, in fact, I think I’d be able to take another run at an ESP analysis now. Unfortunately though, running my simple script has uncovered more fundamental problems.

The bug comments mention classes with an overridden operator new (containing a memset(0) to initialize the data members) as one special case but this is nowhere near the largest problem. Consider this mdc page on xpcom:

You can provide an initialization function for your class. This will be called immediately after your class is allocated and the constructor is called. The init function takes 0 arguments, returns an nsresult, and must be public. You can call it anything you like, just reference it from NS_GENERIC_FACTORY_CONSTRUCTOR_INIT (discussed below).

Now we have the situation of variable initialization split across multiple functions. In fact, it’s even more complicated! Here’s an example:

mCSSAware (part of nsHTMLEditor) is one ‘uninitialized’ member turned up by the script. Rather than being directly initialized in Init() (or in the constructor), the initialization takes place in UpdateForFlags which is called from Init() here. Luckily in this case the initializations are all within the same translation unit although I suspect this is not always the case.

So, what would be the solution? If it’s worth doing, some kind of interprocedural analysis will have to be performed. I was thinking of collecting all the ‘uninitialized’ fields during compilation of both the constructor and any other functions identified as an inititializer. Initializers must include any member function called by an initializer (note that pass by reference is just simple initialization). If the set of uninitialized fields in a particular constructor intersected with the sets of fields from the other initializers is non-empty, you’ve found a genuine uninitialized field.

I was thinking that if every initialization takes place in the same translation unit it might be possible to do this within the current static-checking framework ie by compiling only once. I’d basically just have to identify the set of data members not initialized for every member function while building a mini call graph to eventually determine which of those member functions is really an initializer. It would then be a simple matter to take the intersection of the relevant sets of uninitialized fields.

However, I suspect there’s probably more than a few cases where member function definitions are spread across translation units (Actually that might be a neat dehydra analysis if this behaviour’s unwanted). To deal with this case I’d have to store the sets of ‘uninitialized’ data members until the entire build is complete while also determining the set of initializing functions. I’d then have to determine which data members are really initialized during a post-compilation analysis.

It might also be possible to produce the errors while the build process is still under way if I was able to query an sqlite database from within a treehydra script. I’m under the impression that the only way to use sqlite from js is by using xpcshell though so I don’t know if this suggestion would work. You’d still have to run a callgraph build prior to compilation anyway which takes quite some time.

There’s also the issue of the accuracy of any analysis when the call graph will likely have to be pretty dumb ie unfeasible code in a particular function that contains a call will end up registering as an edge. I suppose it would also be necessary to relate a functions parameter’s (and any global data) to whether or not a particular call within a function is feasible or not (and so on). The ESP paper does discuss a framework for interprocedural analysis but that’s way out of my scope. It would probably still be a worthwhile analysis if this kind of stuff is not taken into account, however it kind of defeats the purpose of performing an ESP analysis to find which members are uninitialized.

Anyway, I’ll report back (hopefully without such a long delay) when I’ve determined the next course of action.

A static analysis for fallthrough switch cases in C++

Posted in Seneca by ehren on February 8, 2010

For my 0.5 release in DPS911 I’d like to discuss some work on a Treehydra script for detecting fallthrough cases in a switch statement. I’ve actually been working on this off and on for some time, but it’s only been within the last few hours that I’ve arrived at something airtight.

Actually, I began work on such a script quite a while ago, after completing similar work on finding unreachable blocks in the control flow graph. Most of the serious bugs turned up by this analysis were all caused by forgetting the break statement (see bug 536438 and bug 536646) so a warning/error for this seemed natural. Also, I assumed this would be easy: just find all the basic blocks corresponding to the case labels of the switch and, if the successor of a block corresponds to the next case, you’ve got a fallthrough.

To make this more clear, here’s a simple switch statement that falls through:

int foo(int x) {
  switch(x) {
    case 0:
      x++;
    case 1:
      x--;
  }
}

And here’s a -fdump-tree-all printing of the cfg:

;; Function int foo(int) (_Z3fooi)

int foo(int) (x)
{
  # BLOCK 2
  # PRED: ENTRY (fallthru)
  switch (x)
    {
      case 0: goto <L0>;
      case 1: goto <L1>;
      default : goto <L2>;
    }
  # SUCC: 3 4 5

  # BLOCK 3
  # PRED: 2
<L0>:;
  x = x + 1;
  # SUCC: 4 (fallthru)

  # BLOCK 4
  # PRED: 2 3 (fallthru)
<L1>:;
  x = x + -1;
  # SUCC: 5 (fallthru)

  # BLOCK 5
  # PRED: 2 4 (fallthru)
<L2>:;
  return;
  # SUCC: EXIT
}

Notice something funny? There’s now a default case label! GCC will always insert this dummy default by the time you’ve reached the cfg and, even worse, there’s no way to distinguish a switch statement with a real default from one without, once you’ve reached this level. Unfortunately, I spent a great deal of time trying to do just that, at various times thinking I finally had the solution only to discover a new test case that would totally blow apart the analysis. I think I’ve tried maybe a dozen different approaches only to give up in disgust at various points. See here for one totally invalid early attempt.

So what’s the solution? I needed information from two separate passes of the compiler: The cfg, but also the initial c++ ast representation to find out if the default was really a default. To this end, I knew walk_tree would be needed. I was even able to find a bug in Treehydra during some initial experiments. However, as before, once I thought I had the solution, a more complex test case would ruin everything.

Anyway, here’s the finalish script which also employs a fallthrough() function/annotation to suppress the warnings. Once I’ve had a bit of time to sleep on it, I’m going to refactor a bit and then post to the bug. I also need to consider how to integrate this into a –with-static-checking build. There are a number of issues I had to take into account that I haven’t mentioned, as well, such as when the order of the cases is mixed up (eg case 1 before case 0) and also when labels and gotos come into the mix. (Properly dealing with switches embedded within switches … was the last hurdle.)

For a demonstration, here’s a nasty bit of code together with a sample run of the script.

So, for my 0.6 release, I’ll have to get back to dead code, hopefully being able to exploit callgraph to the fullest. Perhaps I can find another analysis to start work on as well.

Computing connected components with SQLite (and other attempts finding dead code)

Posted in Seneca by ehren on February 4, 2010

In the last couple of weeks I’ve put a bit of effort into identifying unused functions in mozilla-central. Unfortunately, I have not been keeping up with this blog, which is a major requirement in DPS911.

As I previously reported, I’ve been using a Treehydra generated call graph (callgraph) which records caller-callee relationships in a giant sqlite3 database. As mentioned, my first attempt was to ignore the directedness of each edge in this graph, and then compute the connected components. I believe this is equivalent to computing the weakly connected components in a directed graph.

I have based my code almost entirely off work done David MacIver who has an interesting post here where you can read more about his algorithm. It works by initially placing each verticex into its own component and then iteratively merging distinct components where there is an edge between two vertices in the respective components. Unfortunately, his Ruby written for Mysql is not directly usable for my purposes since SQLite, which I’m working with, does not support the update join syntax. Here’s a bit of MacIver’s code for example:

    update items
    join (
      select component1 source, min(component2) target
      from components_to_merge
      group by source
    ) new_components
    on new_components.source = component_id
    set items.component_id = least(items.component_id, target)

To create the equivalent code, I used a temporary table and a bit of kludge:

    INSERT INTO components_to_merge
    SELECT component2, component1 FROM components_to_merge;

    DROP TABLE IF EXISTS new_components;
    CREATE TABLE new_components
    AS 
    SELECT component1 source, min(component2) target
    FROM components_to_merge
    GROUP BY source;

    UPDATE node
    SET componentID = min((SELECT min(componentID, target)
                           FROM new_components
                           WHERE source = componentID),
                          componentID)
    WHERE componentID in
        (SELECT source FROM new_components);

That update statement is horribly inefficient but ultimately the job gets done. All told, it takes about an hour on one of the CDOT’s development machines. As MacIver noted in his original blog post, you’re basically done once the first merge has completed. The complete script, written in python, can be viewed here.

Note that a few domain specific refinements had to take place here. For example, any indirect calls to virtual functions will be registered as a call to the function in the base class. Callgraph already deals with this using an implementors table but I had to add an extra column for the return type of the function in order to accurately construct the name of both member functions. I then merge their components:

# set interface component to implementor component 
cursor.execute('SELECT implementor, interface, method, type \
                FROM implementors')
list = cursor.fetchall()
for row in list:
    # format: type interface::method
    basename = "%s %s::%s" %(row[3], row[1], row[2])
    # format: type implementor::method
    derivedname = "%s %s::%s" %(row[3], row[0], row[2])
    cursor.execute("UPDATE node \
                    SET componentID = (SELECT componentID \
                                       FROM node \
                                       WHERE name = '%s') \
                    WHERE name = '%s'" %(basename, derivedname))

Also, whenever a function is called outside of the compilation unit in which it’s defined, the call will point to the location of the forward declaration of the function (eg the .h file). To get around this, I simply merge all components that contain nodes with the same name:

SELECT n.name, min(nn.componentID)
FROM node n, node nn
WHERE n.name = nn.name AND
n.componentID != nn.componentID
GROUP by n.componentID

Of course, some nuance is required to interpret the data produced by this script. As I mentioned previously, the analysis is complicated by the fact that many functions “never called” are in fact called indirectly via a function pointer. This means that most connected components of size one aren’t particularly interesting. In fact, since I’m merging declarations and forward declarations, most components of size two also aren’t particularly interesting (at least those with elements containing the same name). This script takes the above into account and can be used to produce these results:

A list of connected components (that aren’t the largest component) with size greater than 3
and
A list of components of size 2 (that aren’t the largest component) that have members with distinct names.

The vast majority of these are false positives mostly for the reasons mentioned above, but In the last week I finally bit the bullet and filed a bug, along with a patch, to remove a bit of the dead code identified. Unfortunately, these functions are all in the cairo library, which means that applying such a patch would make tracking upstream changes unnecessarily difficult.

A number of issues have been identified here though. Previously, in connection with an analysis to find non-static functions called only within a particular compilation unit, Taras filed this bug to add -fdata-sections
-ffunction-sections -Wl,-gc-sections
to the build config in order to strip out dead symbols. Interestingly enough, running a few of my own tests with this configuration, I was not able to strip out all of these dead cairo functions. Even more bizarrely, the symbols of a number of dead static (!) functions seemed to work themselves into libxul?!

As an aside, the nm utilty would have been quite useful when working on the alwayszero project. (To get similar results I was previously running debug builds and then grepping the output of objdump).

A path based algorithm
There are issues with the above analysis though. For example, a function could be unused yet call functions that are used. This script begins at a particular node and iterates backwards, transitively adding all callers. If the callers to be added are exhausted within a reasonable number of iterations, we’ve found a dead path. Note that to narrow down this analysis, I’ve disregarded everything already identified by the component based analysis ie I do not start with any node not in the largest connected component.

One thing I should note is that this script requires explicit transaction management to run (using cursor.execute("BEGIN TRANSACTION") and connection.commit()). For anyone faced with mysterious insert/update anomalies in sqlite, this may be the answer.

Anyway, getting useful data out of this is a little bit more complicated. For example, to see info about every function in all paths that contain only functions, the following query can be used:

SELECT * FROM node n
JOIN path p ON (n.id = p.id)
WHERE pathID NOT IN 
(SELECT pathID FROM node n
 JOIN path p ON (p.id = n.id)
 WHERE isMethod = 0) 
ORDER BY pathID;

Has this analysis yielded any more candidates for removal though? Unfortunately, I have not taken indirect calls to overridden member functions into account eg Base* b = new Derived(); b->foo();, so an analysis for dead methods is currently out. There is an easy fix, however, it will require another 18 hr run of the script (building a path backward from a start set of 100000+ nodes = a boatload of time). Analyzing just functions though, ie using the query above, I have found some interesting possibilities. For example, there’s quite a bit of dead stuff in nameprep.c, but I suspect this is an even bigger nightmare to patch than cairo.

I will report back soon, however, when I have more concrete results.

Project plan for DPS911 – Open Source Project

Posted in Seneca by ehren on January 15, 2010

It’s a new semester and I’m in a new class focused on open source development. DPS911 is the continuation of DPS909 and requires students to make 7 releases over the 14 week term. I’ll mostly be focused on static analysis work related to Mozilla.

Wrapping up my old project
I likely will not be putting any more work (this semester) into my treehydra analysis + gcc plugin for virtual functions that always return zero. However, I should mention a few things that weren’t quite resolved as of my last post. After reducing the number of instances where the plugin is invoked to optimize value returning function calls, I was able to conclude that no significant size reduction is made in any of mozilla’s shared libraries. By identifying the functions which are optimized, however, I was able to check if they are present in, eg, libxul at all (this was accomplished by running a debug build and then grepping the objdumps). They are, in fact, not present and I suspect if I put more effort into patching all of the functions identified and/or using callgraph to find functions whose return value depends solely on other functions which always return zero, I might have better results.

A new project
At least initially, I will be putting my effort into finding and removing dead code. I had a bit of a warm up for this over the break by working on an analysis to find functions called only within the compilation unit of their definition that aren’t static(bug 536427). This bug illustrates some of the problems I will encounter with dead code elimination, namely that it’s not always clear whether a function is part of a public api (and thus not a candidate for removal or demarcation as static). This may be my biggest obstacle since I’m pretty sure the only solution is a manual examination/understanding of the code. As an aside, I attempted to read a little into linkage issues that might have bearing on whether or not a symbol is exported as part of a public api. For example, I probably shouldn’t be looking at functions marked with eg dllexport (Windows specific). In this regard, I was curious what marking a declaration as extern would imply. Interestingly, the author of gnu gold asserts that the extern keyword as applied to function declarations is entirely superfluous.

The other stumbling block is function pointers or rather the fact that many functions are only called via a pointer. This makes it nearly impossible to get any useful data from simple call graph queries like “which functions are never called”. I’m pretty sure it would be easy to extend callgraph to note whenever a function address is assigned to a pointer/used as a call arg, so long as this happens within a function body. The problem is this frequently happens outside of a function body eg there are tonnes of globally defined and initialized jump tables (particularly in the C code).

Ultimately, I think manually identifying functions whose address escapes in a pointer will be the best approach for my purposes. However, Taras has mentioned that getting more of this kind of data (like which functions are dead or improperly not static) into DXR would be helpful (and so cutting down on the false positives mechanically might not be the worst idea).

Release Plan
I haven’t yet given too much detail about where I’m going with dead code but I expect that work on this should cover at least from 0.4 to 0.6. After this I may start work on what seems like a complicated bug involving symbol rearrangement to improve Firefox startup time. I may also try my hand at more static analysis scripts for detecting coding errors if time permits. Over the break, I did some work on an analysis pass for finding unreachable blocks in the control flow graph (bug 535646) and it was quite fun finding errors in obscure places (in fact, I have more to file). I’m also curious about what other useful analyses can be carried out without tracking any values.

Anyway, here’s a tentative schedule for the next few releases:

0.4 – Finalize my current graph based dead code detection algorithm and hopefully get a few dead functions filed. I’ll explain more about this in a coming post, but my current analysis treats the call graph as an undirected graph and then computes the connected components to find the dead stuff. This likely ignores a wide swath of potentially dead functions ie those which call live functions but are not called by any live functions.

0.5/0.6 – Consider directedness in the analysis. Upon reflection, this is more complicated than I had previously considered.

I’ll update this release schedule with info on 0.7 to 1.0 once I better know what needs to be done.

Follow

Get every new post delivered to your Inbox.