Ehren's Blog

Project plan for DPS911 – Open Source Project

Posted in Seneca by ehren on January 15, 2010

It’s a new semester and I’m in a new class focused on open source development. DPS911 is the continuation of DPS909 and requires students to make 7 releases over the 14 week term. I’ll mostly be focused on static analysis work related to Mozilla.

Wrapping up my old project
I likely will not be putting any more work (this semester) into my treehydra analysis + gcc plugin for virtual functions that always return zero. However, I should mention a few things that weren’t quite resolved as of my last post. After reducing the number of instances where the plugin is invoked to optimize value returning function calls, I was able to conclude that no significant size reduction is made in any of mozilla’s shared libraries. By identifying the functions which are optimized, however, I was able to check if they are present in, eg, libxul at all (this was accomplished by running a debug build and then grepping the objdumps). They are, in fact, not present and I suspect if I put more effort into patching all of the functions identified and/or using callgraph to find functions whose return value depends solely on other functions which always return zero, I might have better results.

A new project
At least initially, I will be putting my effort into finding and removing dead code. I had a bit of a warm up for this over the break by working on an analysis to find functions called only within the compilation unit of their definition that aren’t static(bug 536427). This bug illustrates some of the problems I will encounter with dead code elimination, namely that it’s not always clear whether a function is part of a public api (and thus not a candidate for removal or demarcation as static). This may be my biggest obstacle since I’m pretty sure the only solution is a manual examination/understanding of the code. As an aside, I attempted to read a little into linkage issues that might have bearing on whether or not a symbol is exported as part of a public api. For example, I probably shouldn’t be looking at functions marked with eg dllexport (Windows specific). In this regard, I was curious what marking a declaration as extern would imply. Interestingly, the author of gnu gold asserts that the extern keyword as applied to function declarations is entirely superfluous.

The other stumbling block is function pointers or rather the fact that many functions are only called via a pointer. This makes it nearly impossible to get any useful data from simple call graph queries like “which functions are never called”. I’m pretty sure it would be easy to extend callgraph to note whenever a function address is assigned to a pointer/used as a call arg, so long as this happens within a function body. The problem is this frequently happens outside of a function body eg there are tonnes of globally defined and initialized jump tables (particularly in the C code).

Ultimately, I think manually identifying functions whose address escapes in a pointer will be the best approach for my purposes. However, Taras has mentioned that getting more of this kind of data (like which functions are dead or improperly not static) into DXR would be helpful (and so cutting down on the false positives mechanically might not be the worst idea).

Release Plan
I haven’t yet given too much detail about where I’m going with dead code but I expect that work on this should cover at least from 0.4 to 0.6. After this I may start work on what seems like a complicated bug involving symbol rearrangement to improve Firefox startup time. I may also try my hand at more static analysis scripts for detecting coding errors if time permits. Over the break, I did some work on an analysis pass for finding unreachable blocks in the control flow graph (bug 535646) and it was quite fun finding errors in obscure places (in fact, I have more to file). I’m also curious about what other useful analyses can be carried out without tracking any values.

Anyway, here’s a tentative schedule for the next few releases:

0.4 – Finalize my current graph based dead code detection algorithm and hopefully get a few dead functions filed. I’ll explain more about this in a coming post, but my current analysis treats the call graph as an undirected graph and then computes the connected components to find the dead stuff. This likely ignores a wide swath of potentially dead functions ie those which call live functions but are not called by any live functions.

0.5/0.6 – Consider directedness in the analysis. Upon reflection, this is more complicated than I had previously considered.

I’ll update this release schedule with info on 0.7 to 1.0 once I better know what needs to be done.