Dead code update (and problems finding uninitialized class members)
Well, I’ve fallen way behind with this blog again. Unfortunately, what I’ve done so far hasn’t worked out exactly as expected either. Anyway, here’s an update.
Finding Dead Code
As far as I know, anything that derives from a class defined in an idl file probably can’t be marked dead and I suspect detecting this case will be most difficult. One thing I could take into account, however, are methods marked as
__attribute__((user("NS_script"))). It would be easy to add this info to the node table in callgraph. Also, in regard to plain functions, I think it would be prudent to register function local address taking (of functions).
On the bright side I think I’ve found a handful of unused functions (view sample output of the script showing these here). I may be able to gleam more info once I’m able to actually query my paths table though. Unfortunately, the latest incarnation of the script is only about 85% finished:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 28977 ehren 20 0 17804 12m 2132 R 100.1 0.4 2331:07 python
That’s CPU time!
Uninitialized class members
I’ve also put a bit of work into a more immediate analysis. Bug 525063 proposes an error for any class with uninitialized data members. It seemed like an interesting project perhaps as an introduction to ESP analysis (especially since the only state transitions are assignments and passing by reference). In fact, dmandelin’s already done work on an analysis for uninitialized variables in general which I originally attempted to adapt. This was not the easiest task however, although I have just recently learned why I was not successful (I think it relates to a malfunctioning of create_decl_set which I have yet to diagnose).
Giving up on ESP for the moment, I tried my hand doing things the simple way. That is, if there’s any code at all in the constructor that initializes a variable, then I consider that variable initialized.
To accomplish this, I first wrote a rather ugly and unreadable version of the script. I then polished it up a bit. My days of using JS arrays for maps and sets is hopefully over and, in fact, I think I’d be able to take another run at an ESP analysis now. Unfortunately though, running my simple script has uncovered more fundamental problems.
The bug comments mention classes with an overridden
operator new (containing a
memset(0) to initialize the data members) as one special case but this is nowhere near the largest problem. Consider this mdc page on xpcom:
You can provide an initialization function for your class. This will be called immediately after your class is allocated and the constructor is called. The init function takes 0 arguments, returns an nsresult, and must be public. You can call it anything you like, just reference it from NS_GENERIC_FACTORY_CONSTRUCTOR_INIT (discussed below).
Now we have the situation of variable initialization split across multiple functions. In fact, it’s even more complicated! Here’s an example:
mCSSAware (part of nsHTMLEditor) is one ‘uninitialized’ member turned up by the script. Rather than being directly initialized in
Init() (or in the constructor), the initialization takes place in UpdateForFlags which is called from
Init() here. Luckily in this case the initializations are all within the same translation unit although I suspect this is not always the case.
So, what would be the solution? If it’s worth doing, some kind of interprocedural analysis will have to be performed. I was thinking of collecting all the ‘uninitialized’ fields during compilation of both the constructor and any other functions identified as an inititializer. Initializers must include any member function called by an initializer (note that pass by reference is just simple initialization). If the set of uninitialized fields in a particular constructor intersected with the sets of fields from the other initializers is non-empty, you’ve found a genuine uninitialized field.
I was thinking that if every initialization takes place in the same translation unit it might be possible to do this within the current static-checking framework ie by compiling only once. I’d basically just have to identify the set of data members not initialized for every member function while building a mini call graph to eventually determine which of those member functions is really an initializer. It would then be a simple matter to take the intersection of the relevant sets of uninitialized fields.
However, I suspect there’s probably more than a few cases where member function definitions are spread across translation units (Actually that might be a neat dehydra analysis if this behaviour’s unwanted). To deal with this case I’d have to store the sets of ‘uninitialized’ data members until the entire build is complete while also determining the set of initializing functions. I’d then have to determine which data members are really initialized during a post-compilation analysis.
It might also be possible to produce the errors while the build process is still under way if I was able to query an sqlite database from within a treehydra script. I’m under the impression that the only way to use sqlite from js is by using xpcshell though so I don’t know if this suggestion would work. You’d still have to run a callgraph build prior to compilation anyway which takes quite some time.
There’s also the issue of the accuracy of any analysis when the call graph will likely have to be pretty dumb ie unfeasible code in a particular function that contains a call will end up registering as an edge. I suppose it would also be necessary to relate a functions parameter’s (and any global data) to whether or not a particular call within a function is feasible or not (and so on). The ESP paper does discuss a framework for interprocedural analysis but that’s way out of my scope. It would probably still be a worthwhile analysis if this kind of stuff is not taken into account, however it kind of defeats the purpose of performing an ESP analysis to find which members are uninitialized.
Anyway, I’ll report back (hopefully without such a long delay) when I’ve determined the next course of action.