Ehren's Blog

Dead code update (and problems finding uninitialized class members)

Posted in Seneca by ehren on February 25, 2010

Well, I’ve fallen way behind with this blog again. Unfortunately, what I’ve done so far hasn’t worked out exactly as expected either. Anyway, here’s an update.

Finding Dead Code
I’ve refined the script I posted previously to take into account the fact that indirect calls to virtual functions register in g++ as calls to the base method (among a few other things). It can be viewed here. Running this new analysis, I’ve learned that finding dead methods is even more difficult than finding dead functions. Why? A tonne of stuff is exported as part of the Mozilla public api. A function or even an entire class may seem dead but it’s likely used via javascript/xpcom inside the mozilla tree (or maybe in an extension).

As far as I know, anything that derives from a class defined in an idl file probably can’t be marked dead and I suspect detecting this case will be most difficult. One thing I could take into account, however, are methods marked as NS_SCRIPTABLE ie __attribute__((user("NS_script"))). It would be easy to add this info to the node table in callgraph. Also, in regard to plain functions, I think it would be prudent to register function local address taking (of functions).

On the bright side I think I’ve found a handful of unused functions (view sample output of the script showing these here). I may be able to gleam more info once I’m able to actually query my paths table though. Unfortunately, the latest incarnation of the script is only about 85% finished:

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                                                                         
28977 ehren     20   0 17804  12m 2132 R 100.1  0.4   2331:07 python

That’s CPU time!

Uninitialized class members
I’ve also put a bit of work into a more immediate analysis. Bug 525063 proposes an error for any class with uninitialized data members. It seemed like an interesting project perhaps as an introduction to ESP analysis (especially since the only state transitions are assignments and passing by reference). In fact, dmandelin’s already done work on an analysis for uninitialized variables in general which I originally attempted to adapt. This was not the easiest task however, although I have just recently learned why I was not successful (I think it relates to a malfunctioning of create_decl_set which I have yet to diagnose).

Giving up on ESP for the moment, I tried my hand doing things the simple way. That is, if there’s any code at all in the constructor that initializes a variable, then I consider that variable initialized.

To accomplish this, I first wrote a rather ugly and unreadable version of the script. I then polished it up a bit. My days of using JS arrays for maps and sets is hopefully over and, in fact, I think I’d be able to take another run at an ESP analysis now. Unfortunately though, running my simple script has uncovered more fundamental problems.

The bug comments mention classes with an overridden operator new (containing a memset(0) to initialize the data members) as one special case but this is nowhere near the largest problem. Consider this mdc page on xpcom:

You can provide an initialization function for your class. This will be called immediately after your class is allocated and the constructor is called. The init function takes 0 arguments, returns an nsresult, and must be public. You can call it anything you like, just reference it from NS_GENERIC_FACTORY_CONSTRUCTOR_INIT (discussed below).

Now we have the situation of variable initialization split across multiple functions. In fact, it’s even more complicated! Here’s an example:

mCSSAware (part of nsHTMLEditor) is one ‘uninitialized’ member turned up by the script. Rather than being directly initialized in Init() (or in the constructor), the initialization takes place in UpdateForFlags which is called from Init() here. Luckily in this case the initializations are all within the same translation unit although I suspect this is not always the case.

So, what would be the solution? If it’s worth doing, some kind of interprocedural analysis will have to be performed. I was thinking of collecting all the ‘uninitialized’ fields during compilation of both the constructor and any other functions identified as an inititializer. Initializers must include any member function called by an initializer (note that pass by reference is just simple initialization). If the set of uninitialized fields in a particular constructor intersected with the sets of fields from the other initializers is non-empty, you’ve found a genuine uninitialized field.

I was thinking that if every initialization takes place in the same translation unit it might be possible to do this within the current static-checking framework ie by compiling only once. I’d basically just have to identify the set of data members not initialized for every member function while building a mini call graph to eventually determine which of those member functions is really an initializer. It would then be a simple matter to take the intersection of the relevant sets of uninitialized fields.

However, I suspect there’s probably more than a few cases where member function definitions are spread across translation units (Actually that might be a neat dehydra analysis if this behaviour’s unwanted). To deal with this case I’d have to store the sets of ‘uninitialized’ data members until the entire build is complete while also determining the set of initializing functions. I’d then have to determine which data members are really initialized during a post-compilation analysis.

It might also be possible to produce the errors while the build process is still under way if I was able to query an sqlite database from within a treehydra script. I’m under the impression that the only way to use sqlite from js is by using xpcshell though so I don’t know if this suggestion would work. You’d still have to run a callgraph build prior to compilation anyway which takes quite some time.

There’s also the issue of the accuracy of any analysis when the call graph will likely have to be pretty dumb ie unfeasible code in a particular function that contains a call will end up registering as an edge. I suppose it would also be necessary to relate a functions parameter’s (and any global data) to whether or not a particular call within a function is feasible or not (and so on). The ESP paper does discuss a framework for interprocedural analysis but that’s way out of my scope. It would probably still be a worthwhile analysis if this kind of stuff is not taken into account, however it kind of defeats the purpose of performing an ESP analysis to find which members are uninitialized.

Anyway, I’ll report back (hopefully without such a long delay) when I’ve determined the next course of action.


One Response

Subscribe to comments with RSS.

  1. Taras said, on February 25, 2010 at 12:51 pm

    It struck me that I may have forgotten to link you to my old working dead code finder.

    According to my code, I filter out code that where the a base class is marked as scriptable or a method is. One day it would be nice to build a list of methods called from js…I bet there aren’t that many.

    The rest of the virtual functions and ones defined in idl are fair game for dead code.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: