Ehren's Blog

RTL level function removal

Posted in Seneca by ehren on April 21, 2010

Over the past few days I’ve been focusing on getting the call graph portion of my dead code analysis in check. It turns out that function local const (or static) initializations are not accessible during later gcc passes. Luckily, walking the front end tree representation, which is accessible via Treehydra‘s process_cp_pre_genericize, does the trick. This takes care of all remaining false positives of which I am aware.

The downside is that going through all these extra tree codes is sloooow. After a bunch of false starts the damn thing is still running on one of the CDOT‘s development machines (probably an 8+ hour compile).

For a little while though, I’ve been thinking of ways to automatically remove identified dead code without actually patching the source. By applying Taras’ assembler name patch to dehydra I can now identify precisely which code to remove. The question now is how to remove it.

My first thought was a hack using objcopy. I could first output a list of the mangled names of dead functions and then, after running a build with -ffunction-sections, run a script like this:

#!/bin/sh
while read asmname
do
  find objdir -name "*.o" | xargs objcopy --remove-section=".text.$asmname"
done < asmlist.txt

(and then relinking). This works, but only for non-member functions.

The other option was some sort of gcc hack to remove as much code as possible when obcopy can’t do the job. I first tried removing every statement in every basic block of the function (see gsi_remove). This seems to only work with a non-branching cfg however (even when I leave a valid return statement). I then tried cgraph_remove_node with an IPA pass plugin which blows up if a function’s referenced anywhere else.

Today I arrived at a solution that, although requiring a direct patch of GCC, seems to be ideal. Surprisingly, it’s possible to hook in right before assembly generation, and it’s easy too:

diff --git a/gcc/final.c b/gcc/final.c
--- a/gcc/final.c
+++ b/gcc/final.c
@@ -4090,16 +4090,34 @@
 {
   if (symbol_queue)
     {
       free (symbol_queue);
       symbol_queue = NULL;
       symbol_queue_size = 0;
     }
 }
+
+static bool
+is_dead(const char* name) 
+{
+  char asmname[100];
+  FILE* fp = fopen("/home/ehren/asmlist.txt", "r");
+  if (!fp)
+    return false;
+
+  while (fscanf(fp, "%s", asmname) != EOF) {
+    if (strcmp(asmname, name) == 0) {
+      fclose(fp);
+      return true;
+    }
+  }
+  fclose(fp);
+  return false;
+}
 
 /* Turn the RTL into assembly.  */
 static unsigned int
 rest_of_handle_final (void)
 {
   rtx x;
   const char *fnname;
 
@@ -4109,17 +4127,19 @@
   x = DECL_RTL (current_function_decl);
   gcc_assert (MEM_P (x));
   x = XEXP (x, 0);
   gcc_assert (GET_CODE (x) == SYMBOL_REF);
   fnname = XSTR (x, 0);
 
   assemble_start_function (current_function_decl, fnname);
   final_start_function (get_insns (), asm_out_file, optimize);
-  final (get_insns (), asm_out_file, optimize);
+  if (!is_dead(fnname)) {
+    final (get_insns (), asm_out_file, optimize);
+  }
   final_end_function ();
 
 #ifdef TARGET_UNWIND_INFO
   /* ??? The IA-64 ".handlerdata" directive must be issued before
      the ".endp" directive that closes the procedure descriptor.  */
   output_function_exception_table (fnname);
 #endif

With this, functions are removed completely (except from the symbol table) and the bodies of virtuals are replaced with a couple words worth of NOPs. Yes, opening up the file with a hardcoded path for every function is ugly but later on I could always read it into a global somewhere else (and do a binary search).

The only downside here is I won’t get any link time errors if a few false positives slip through, as opposed to with objcopy. In my experiments, calling a function that doesn’t exist results in an immediate segmentation fault (makes sense), but storing the return of a NOP-body virtual just leaves you with an uninitialized value.

Hopefully, I’ll soon have some good results on the analysis front to actually test this on Mozilla.

Advertisements

Dead code progress

Posted in Seneca by ehren on April 7, 2010

So far things are on track with my attempts to developed an unused function finding tool. Now that the function pointer/jump table problem has been solved other more subtle issues have come to light.

The first was a problem with callgraph‘s handling of inheritance chains. As I mentioned previously, it was necessary to add each method to the node table (see schema reference) both when the method’s body is processed (as is already the case) but also when the method’s type is processed. At some point I should really develop some tests here but this affects the recognition of pure virtual functions in a number of complicated cases.

However, I also ran into another issue where certain methods were not finding themselves into the inheritance chains in which they belong. This seems to be only when a virtual function overrides a base class function that has not been defined in the ‘next up’ base class (A defines virtual foo, B derives from A, C derives from B and redefines virtual foo). This could be a Treehydra issue or a maybe a problem with the GCC binfo data structure (or maybe I’m just misunderstanding things).

Either way, my solution has been to process all base classes and subclasses of a method both when the type is processed and also when the method body is processed. This appears to be a working solution (although it certainly does not improve callgraph compilation times).

Once this was handled I started to get some pretty good results but I noticed scriptable methods were getting into the mix. After a few hours of fruitless hacking it turned out I just forgot to properly define NS_SCRIPTABLE (since I’m not running a –with-static-checking build). After rebuilding again, I believe I finally attained a 0% false positive rate.

This time I hit another problem though. A bunch of genuinely dead methods turned up by my most recent (defective) analysis were not showing up. In fact, very few methods were showing up at all. Investigating, it turns out I’ve been quite overzealous in marking a method as scriptable. My previous technique was to check if __attribute__((user("NS_script")) was present in the declaration attributes of a function and also to check if it is present in the type attributes of the class and any base class. This excludes a bunch of juicy dead stuff like nsCharsetMenu::SetCharsetCheckmark (gimple isn count 75 ftw) which is a member of a non-scriptable class that derives from two scriptable interfaces (which do not declare SetCharsetCheckmark).

Naturally, the solution when marking methods scriptable because of their base classes is only to mark the method scriptable when the base class declares the method and is scriptable. Come to think of it, I probably don’t even need to do this because of the way I group together base and derived methods.

Anyway, my current status is waiting for a build with these changes to finish. We will see if there are more issues.

Function declaration escape analysis v2

Posted in Seneca by ehren on April 6, 2010

I don’t want to get too excited until I’ve run this through 4000000 loc but I believe I’ve solved the problem of being unable to process global initializations of const/static global variables. Earlier, I posted a message to the GCC mailing list describing my troubles with varpool. I did receive a helpful response in that there is nothing inherent about const/static declarations that would prevent one from getting at the lhs of their initialization.

Today I experimented with walking as many trees in as many places as I could find without much luck. I then tried compiling this sample code with -Wunused:

int foo() {
  return 0;
}

typedef struct {
  int (*p) ();
} Table;

static Table t[] = {
  { foo }
};

As expected, GCC warns about the unused static variable.

Taking a look at toplevel.c It didn’t take too long to find a solution (and this works for const, static, and const static):

Index: gcc/toplev.c
===================================================================
--- gcc/toplev.c  (revision 157978)
+++ gcc/toplev.c  (working copy)
@@ -844,12 +844,26 @@
   return output_something;
 }

+static tree 
+find_funcs_callback(tree *tp, int *walk_subtrees, void *data)
+{
+  tree t = *tp;
+
+  if (TREE_CODE(t) == FUNCTION_DECL)
+    fprintf(stderr, "address held: %s\n", IDENTIFIER_POINTER(DECL_NAME(t)));
+
+  return NULL_TREE;
+}
+
 /* A subroutine of check_global_declarations.  Issue appropriate warnings
    for the global declaration DECL.  */

 void
 check_global_declaration_1 (tree decl)
 {
+  if (DECL_INITIAL(decl))
+    walk_tree(&DECL_INITIAL(decl), find_funcs_callback, NULL, NULL);
+
   /* Warn about any function declared static but not defined.  We don't
      warn about variables, because many programs have static variables
      that exist only to get some text into the object file.  */

I’ve got to fix up the output to match callgraph‘s serialization but this could be it.

Problems with const static initializations

Posted in Seneca by ehren on April 5, 2010

Unfortunately I spoke too soon about developing an airtight dead code finder. The technique of processing file scope variables I mentioned in my previous post has a serious drawback: it doesn’t work for const static data. This is a show stopper when it comes to peeking into most jump tables.

I’ve been able to print type information for all globals using the dehydra hooks placed into c-common.c however it seems like const initializations are not even handled at this level. I have my suspicions that there’s no way to recover the FUNCTION_DECL node in this case, likely because gcc has no use for the info at this level.

Although I may be able to make do by simply manually filtering as many callback functions as I can this approach is not quite ideal. I’ll have to think more about this but I’m now thinking that the lto streamer might be of use. There’s also the chance that there’s another way of using the cgraph to get at this data.

The other possibility is ditching gcc entirely and using elsa to dump the data. I’ll report back when I know more.

Function declaration escape analysis

Posted in Seneca by ehren on April 4, 2010

It’s been quite a while since I’ve blogged. Frankly I’ve got behind with my work. However, I believe I have hit a huge breakthrough finding dead code. I don’t want to get too ahead of myself but if things work as I think, I can now identify every unused function/member function in mozilla-central with a near 0 false positive rate.

I’ve mentioned many times that assignments to function pointers are a huge pain when it comes to recognizing call graph edges. I’ve been able to handle function local address taking for some time now but the problem of functions referenced in global variables (usually jump tables) has proved elusive. In fact, it’s currently not possibly to process global variables at all with Treehydra. This leads to thousands of false positives in the analysis and a bunch of special case handling.

It suddenly dawned on me that GCC might have already done the work for me. ipa-type-escape.c, which “determines which types in the program contain only instances that are completely encapsulated by the compilation unit” seems to fit the bill, but I ended up with less than stellar results trying to print out any escaping function declarations. In fact, I don’t think it really is useful for my purposes.

However, the technique of processing global variables using the varpool is exactly what I needed. In fact, I can very easily write a GCC plugin to print off all the globally escaping declarations in a compilation unit. Unfortunately, getting a plugin to build that uses more than the standard set of routines is a bit of a challenge (more stuff needs to be linked in) so I just hacked it into dehydra_plugin.c. It works though!

Here’s the code:

static tree find_funcs_callback(tree *tp, int *walk_subtrees, void *data) {
  tree t = *tp;
  
  if (TREE_CODE(t) == FUNCTION_DECL) {
    // dump function (I use dehydra specific code)
  }
  
  return NULL_TREE;
}
 
static void find_funcs(tree decl) {
  walk_tree(&decl, find_funcs_callback, NULL, NULL);
}

// This needs to go in the execute function of an IPA pass.
// I just stuck it into dehydra's gcc_plugin_post_parse
struct varpool_node *vnode;
FOR_EACH_STATIC_VARIABLE(vnode)
  find_funcs(DECL_INITIAL(vnode->decl));

Now all I have to do is mark every function printed by this routine as escaping. I’ve been able to match callgraph‘s serialization almost completely so this will be a breeze.

I’ve also found that the current way callgraph treats inheritance chains (using a table of ‘implementor’ – ‘interface’ pairs) is not particularly useful for finding dead code. In fact, a whole bunch of functions in the implementors table are being left out of the node table. I’ve been able to rectify this by treating method overriding just like any call edge. In particular, I have the base class ‘call’ the derived class which will fit right into my existing algorithm (becuase of dynamic dispatch, once the base method is called all bets are off on whether or not some derived method is called). In order to get any results previously, I’ve just been identifying base method-derived method pairs by textually matching the prototypes. Looking at my current results, I’ve found a disproportionate number of static methods which suggests that this technique is too conservative. Here’s the old script btw. With these callgraph chages the next one will be much simpler (with no method name parsing!)

I’ve had some success already with dead code in content and particularly in layout, even with this rudimentary script. I also certainly have more to file. With this new approach, though, I think I’ll be able able to find and file all of it by release 1.0.