Ehren's Blog

Reworking my analysis

Posted in Seneca by ehren on November 24, 2009

This weekend I’ve been stuck trying to come up with a Treehydra analysis similar to outparams.js that will root out those functions which always return zero (the ‘syntactic’ Dehydra analyis I posted previously was a bit naive). Trying to get this done for my 0.2 though was a grave mistake. It’s somewhat frustrating because I know this script already does what I want, namely it will ensure that an ‘out parameter’ is written upon a zero return. However, my aim is slightly different in that I don’t care about any other properties of the state when I’m in a state that contains a zero return; I just care that every state with a return is a zero-return state.

There’s another file that’s used by outparams.js called mayreturn.js that “determines the set of variables that may transitively reach the return statement” (eg w = x; x = y; y = z; return z;). It works by checking for either a RETURN_EXPR or a GIMPLE_MODIFY_STMT, in an instruction under consideration. If a RETURN_EXPR is found, the variable being returned is added to the current state. If a GIMPLE_MODIFY_STMT is found (ie an assignment) and the lhs is already in the state then the rhs is added to the state. Because this is implemented in the flow function, a change in the state will initiate another iteration through all the blocks of the function with the flow function called on each instruction. Or rather I suppose the iteration is done over only the predecessor blocks (given that mayreturn implements a ‘backward analysis’).

The main thing I’m confused about is that mayreturn.js seems to identify only a single variable as the ultimate return variable. This is even though a BackwardAnalysis (which mayreturn implements) iterates over every block (at least initially). As to its use in outparams.js, the mayreturn analysis is applied here, in process_tree which I’m certain is only invoked once.
So how are multiple returns handled?

This mozilla wiki page on abstract interpretation has been incredibly helpful in getting an idea of how all this stuff works. Going off it, here would be my idea for an always zero return checker:

I think the abstract values are already defined for me ie Zero_NonZero.Lattice.ZERO and Zero_NonZero.Lattice.NONZERO. Maybe I’m confused here and I actually need a new state ie RETURNS_ZERO and RETURNS_NON_ZERO.

The flow function should be something like ‘If stmt is an assignment by the zero constant, and if the lhs of the assignment is in the transitive return variable set identified by mayreturn.js, then set the state to ZERO (or RETURNS_ZERO).

Because I’m still unclear how multiple returns are handled though, I’m not really sure how to get started. There is a place in outparams.js where a given substate is checked for whether it returns zero (and if it does and an outparam is not written to, a warning is issued). Likewise, there is also a check for whether an outparam is written to upon returning a failure code. Unfortunately, the notion of ‘substate’ is at the heart of ESP analysis which is not explained in that Mozilla wiki page I posted above… although there’s an ESP heading with no body :). My vague understanding is that now you’re dealing with sets of states. My only recourse in getting more info is this paper where ESP (Error Detection via Scalable Program Analysis) is introduced. (Naturally it’s 100% crazy stuff).

Edit: hmm… just as I wrote that last paragraph something dawned on me. I think everything I just wrote is nonsense or at least irrelevant (I’ll leave it though because I have to get my blogging up).

Anyway, consider the checkSubstate function in outparams.js. I tried modifying it as well as the post analysis error checker where it’s called:

diff -r 41c1b69b3ed3 xpcom/analysis/outparams.js
--- a/xpcom/analysis/outparams.js	Mon Nov 23 22:17:06 2009 -0500
+++ b/xpcom/analysis/outparams.js	Tue Nov 24 00:20:11 2009 -0500
@@ -522,38 +522,48 @@ function unwrap_outparam(arg, state) {
   }
   if (outparam) return outparam;
   return arg;
 }
 
 // Check for errors. Must .run() analysis before calling this.
 OutparamCheck.prototype.check = function(isvoid, fndecl) {
   let state = this.cfg.x_exit_block_ptr.stateOut;
+  var alwayszero = true;
   for (let substate in state.substates.getValues()) {
-    this.checkSubstate(isvoid, fndecl, substate);
+    if(!this.checkSubstate(isvoid, fndecl, substate)) {
+      alwayszero = false;
+      break;
+    }
+  }
+  if (alwayszero) {
+    print("alwayszero function found! location: " + location_of(this.fndecl) + " name: " +  function_decl_name(this.fndecl));
+  }
 }
 
 OutparamCheck.prototype.checkSubstate = function(isvoid, fndecl, ss) {
   if (isvoid) {
+    return false;
     this.checkSubstateSuccess(ss);
   } else {
     let [succ, fail] = ret_coding(fndecl);
     let rv = ss.get(this.retvar);
     // We want to check if the abstract value of the rv is entirely
     // contained in the success or failure condition.
     if (av.meet(rv, succ) == rv) {
+      return true;
       this.checkSubstateSuccess(ss);
     } else if (av.meet(rv, fail) == rv) {
+      return false;
       this.checkSubstateFailure(ss);
     } else {
       // This condition indicates a bug in outparams.js. We'll just
       // warn so we don't break static analysis builds.
       warning("Outparams checker cannot determine rv success/failure",
               location_of(fndecl));
+      return false;
       this.checkSubstateSuccess(ss);
       this.checkSubstateFailure(ss);
     }
   }
 }
 
 /* @return     The return statement in the function
  *             that writes the return value in the given substate.

This is ugly hackery by it seems to do the trick.

Anyway, click here to see all the 3074 functions found. Checking for duplicates ie cat outparam-like-analysis.txt | sort | uniq | wc -l, I get 2874 which would seem to be accurate given that I’m dealing with absolute filename paths (ie the many instances when an alwayszero function is present in a header file should be taken care of). Interestingly this is way more than the the 396 functions found by another (unposted) analysis that I previously tried which checked for alwayszero methods (either virtual or non-virtual). See this post for a similar analysis.

There may be a few false positives too, but hopefully not a great many (I haven’t thoroughly checked the results so they could be totally wrong). At this point I’m not too concerned about interpreting the data because there are lots of non-virtuals in the mix. I think this can be changed with a function similar to is_constructor, though. I will have to strip out all of the non-relevant code which is not completely trivial, but it shouldn’t be too hard. I suppose I don’t really need to do this until the final version, which should be a safety checker for my plugin ie if a function has the alwayszero attribute then it must always return zero. I’ll have to solve my plugin/build issues first, though.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: