Ehren's Blog

A GCC Hack (My 0.1 Release)

Posted in Seneca by ehren on October 24, 2009

It’s late and I have not been able to fix what I believe is a bug preventing access to the attribute flags of indirect calls to virtual member functions from within tree-ssa-ccp.c (as described in my last post). I’ve posted a clearer overview of the bug that doesn’t involve my attribute on the project wiki in case anyone has a suggestion. For now though, I’ve decided to bypass the call flags problem and move onto the optimization since I have to release something for my 0.1 in DPS909.

Seneca’s IT people are still blocking subversion so I won’t post a proper patch but when you find out what I’ve done you won’t want to compile any air traffic control software with my custom GCC anyway. I’ll just cut to the chase and post the function that contains my modifications (I haven’t even created a separate handle_attribute function but this is just a proof of concept):

/* Evaluate statement STMT.  If the statement produces an output value and
   its evaluation changes the lattice value of its output, return
   SSA_PROP_INTERESTING and set *OUTPUT_P to the SSA_NAME holding the
   output value.
   
   If STMT is a conditional branch and we can determine its truth
   value, set *TAKEN_EDGE_P accordingly.  If STMT produces a varying
   value, return SSA_PROP_VARYING.  */

static enum ssa_prop_result
ccp_visit_stmt (gimple stmt, edge *taken_edge_p, tree *output_p)
{
  tree def;
  ssa_op_iter iter;

  if (dump_file && (dump_flags & TDF_DETAILS))
    {
      fprintf (dump_file, "\nVisiting statement:\n");
      print_gimple_stmt (dump_file, stmt, 0, dump_flags);
    }

  switch (gimple_code (stmt))
    {
      case GIMPLE_ASSIGN:
        /* If the statement is an assignment that produces a single
           output value, evaluate its RHS to see if the lattice value of
           its output has changed.  */
        return visit_assignment (stmt, output_p);

      case GIMPLE_CALL:
        /* A value-returning call also performs an assignment.  */
        if (gimple_call_lhs (stmt) != NULL_TREE)
          {            
            /* Handle alwayszero functions */
            FILE* fp = fopen ("alwayszero.txt", "r");  /* Hack */  
            if (fp)
         /* if (gimple_call_flags (stmt) & ECF_ALWAYSZERO) */
              {
                tree lhs = gimple_get_lhs (stmt);
                tree zero = build_int_cst (TREE_TYPE (lhs), 0);
                prop_value_t zero_constant;
                zero_constant.value = zero;
                zero_constant.lattice_val = CONSTANT;
                set_lattice_value(lhs, zero_constant);
                *output_p = lhs;

                fclose(fp);

                return SSA_PROP_INTERESTING;
              }
            else
              return visit_assignment (stmt, output_p);
          }
        break;

      case GIMPLE_COND:
      case GIMPLE_SWITCH:
        /* If STMT is a conditional branch, see if we can determine
           which branch will be taken.   */
        /* FIXME.  It appears that we should be able to optimize
           computed GOTOs here as well.  */
        return visit_cond_stmt (stmt, taken_edge_p);

      default:
        break;
    }

  /* Any other kind of statement is not interesting for constant
     propagation and, therefore, not worth simulating.  */
  if (dump_file && (dump_flags & TDF_DETAILS))
    fprintf (dump_file, "No interesting values produced.  Marked VARYING.\n");

  /* Definitions made by statements other than assignments to
     SSA_NAMEs represent unknown modifications to their outputs.
     Mark them VARYING.  */
  FOR_EACH_SSA_TREE_OPERAND (def, stmt, iter, SSA_OP_ALL_DEFS)
    {
      prop_value_t v = { VARYING, NULL_TREE };
      set_lattice_value (def, v);
    }

  return SSA_PROP_VARYING;
}

As you can see, if alwayszero.txt is in the current directory, EVERY assignment with a rhs function call that’s complex enough to invoke the propagation engine will have its lhs set to zero.

Here’s a demonstration:

/* blah.h */
class Blah {
 public:
  virtual int blah() __attribute__((alwayszero));
};

/* blah.cpp */
#include "blah.h"
int Blah::blah() 
{
  return 5;
}

/* main.cpp */
#include <cstdio>
#include "blah.h"
int main()
{
  Blah b;
  Blah* p = &b;
  int x;
  x = p->blah();
  printf("%d\n", x);
  return 0;
}

With the system’s compiler:

$ g++ main.cpp blah.cpp -O2
In file included from main.cpp:4:
blah.h:5: warning: ‘alwayszero’ attribute directive ignored
In file included from blah.cpp:3:
blah.h:5: warning: ‘alwayszero’ attribute directive ignored
$ ./a.out
5
$

With the custom compiler:

$ ~/gcc/dist/bin/g++ main.cpp blah.cpp -O2
$ ./a.out
0
$

(note that the alwayszero attribute is unnecessary here since with this hack every propagation worthy value returning call will have it’s return value set to zero when alwayszero.txt is present. See this post for an overview of adding the attribute.)

I should also mention that when asking on #gcc about the attribute recognition problem, I was pointed towards tree-vrp.c. I ended up misinterpreted this advice as a suggestion about where to look for alternative methods of accessing the call flags and not how to approach the optimization in general. However, this file is where I found the build_int_cst function which creates a new tree representing an integer constant.

To make sure the optimization works as expected, I modified main.cpp to look like this:

/* main.cpp */
#include <cstdio>
#include "blah.h"
int main()
{
  Blah b;
  Blah* p = &b;
  int x;
  x = p->blah();
  if (x) 
    {
       printf("blah\n");
       printf("blah\n");
       printf("blah\n");
       printf("blah\n");
       printf("blah\n");
       /* etc (2000+ more times) */
    }
  return 0;
}

The resulting executable size when compiled with the system’s GCC is over 16 MB. The executable size when compiled with the modified version is about 7 KB. Also, with more applicability to the Mozilla use case, when changing the blah member function of the test program to return 0 instead of 5 the unnecessary branch is still kept when compiling with the normal GCC (executable size is still 16 MB), even though it’s clear this branch will never be used. I now see, as Taras mentioned, that dynamic linking mucks everything up (edit: oops dynamic dispatch != dynamic linking).

Before I go to sleep I’d like to briefly return to the call flag recognition problem. The following is an alternative way of determining the attributes of a gimple call:

    tree t = gimple_call_fn (stmt);       
    if (t != NULL_TREE)
      {
        tree fntype = TREE_TYPE (t);
        tree attr = lookup_attribute ("alwayszero", TYPE_ATTRIBUTES (fntype));
        if (attr != NULL_TREE)
          /* do stuff */
      }

This also does not work from tree-ssa-ccp.c with my Blah example.

Also, I noticed that the alwayszero attribute came up in the static mailing list under a discussion of plugins. Once the call flags issue has been solved I suppose my next step will be to convert this to a GCC plugin which will probably involve moving this optimization into a separate pass. I will have to investigate how to modularize this code while still hooking into the propagation engine.

I’d also like to remark that I finished before midnight!! (HST)

*Edit: A solution to the attribute problem is probably not essential since this code should be moved into its own pass anyway.

A Gimple Call Flags Mystery

Posted in Seneca by ehren on October 21, 2009

It’s been a while since I’ve presented any updates with my gcc alwayszero attribute project (see previous post here). At this point I’ve identified one place where, if several modifications are made, I am certain that the proper semantics for the attribute can be achieved. There is another option more akin to the rv = call(); --> rv = 0; call(); idea, but I’ll discuss that further below.

More than a week ago I had hit on the idea of placing my optimization in tree-ssa-ccp.c. To quote directly from the comments in this file:

Constant assignments of the form VAR = CST are propagated from the assignments into uses of VAR, which in turn may generate new constants.

I think if I go too far into my fairly murky understanding of this stuff I won’t be able to finish the post, so I’ll just launch into my first idea for the optimization.

During the ccp pass (I suppose multiple “passes” take place), values associated with SSA names (ie the unique singly assigned variable names which can be viewed with -fdump-tree-ssa) are tagged as one of:

/* Possible lattice values.  */
typedef enum
{
  UNINITIALIZED,
  UNDEFINED,
  CONSTANT,
  VARYING
} ccp_lattice_t;

Statments marked VARYING cannot have their outgoing edges simulated further to root out any obvious values. A value corresponding to a name marked CONSTANT however will be propagated to other statements which depend solely on this value.

In several places, this pass checks whether a statement is a value returning call ie if is_gimple_call (stmt) && gimple_call_lhs (stmt) != NULL_TREE. My thought is that at some point during the ccp pass I should check if the call flags for such a statement contain the alwayszero flag and if so set the lhs of the assignment to 0.

I still have to determine how to actually set this value but I should note that (at least in the context of this pass) the value corresponding to an SSA_NAME is represented using a prop_value_t. Looking in tree-ssa-propagate.h we find the definition:

struct prop_value_d {
    /* Lattice value.  Each propagator is free to define its own
       lattice and this field is only meaningful while propagating.
       It will not be used by substitute_and_fold.  */
    unsigned lattice_val;

    /* Propagated value.  */
    tree value;
};

typedef struct prop_value_d prop_value_t;

Perhaps there is a better way, but my first thought would be to somehow define a new tree representing the constant zero and then pass a new prop_value_t with value this new tree (and also lattice_val CONSTANT), to the set_lattice_value (tree var, prop_value_t new_val) function found in tree-ssa-ccp.

Before I get ahead of myself though the first step would be recognizing a value returning gimple call in tree-ssa-ccp. This is extremely simple (I mentioned the condition above) but for some time I was confounded that no gimple calls were being recognized any time I’d compile any of my own test examples (I’m printing to stderr everywhere in gcc). The weird thing though was that these printfs were showing up in the compilation of gcc itself (even though I’m using the system’s gcc in the compile). I suspected that the functions in my test code were being inlined and perhaps optimized out in some other way before even reaching the ccp stage. I did hit upon the idea of simply compiling the function separately and then linking it into another optimized compile but this didn’t quite work (I was able to recognize my call flag from other places using this method though).

Eventually I decided that if gcc recognized gimple calls from tree-ssa-ccp while compiling itself (the systems compiler must only be used in the preliminary stages) why don’t I just try adding my alwayszero attribute to a whole bunch of functions within gcc? Well, it turns out that a lot of the front end including the attribute recognition stuff depends on the system compiler so that was a no go. So, why not compile my own modified gcc with a build of itself? Seems logical to me although I found that I couldn’t just add my attribute everywhere without causing other problems. Anyway, I decided to ditch this crazy effort and just take Taras’s advice to use a virtual member function. It turns out I only need an extremely simple example to tickle the propagator in the right places:

/* blah.h */
class Blah {
 public:
  virtual int blah() __attribute__((alwayszero));
};

/* blah.cpp */
#include "blah.h"
int Blah::blah() {
  return 0;
}

/* main.cpp */
#include "blah.h"
int main() 
{
  Blah b;
  Blah* p = &b;
  int x;
  x = p->blah();
  return 0;
}

Anyway, the next step is to check whether a value returning gimple call has the alwayszero flag. This should be very simple since it just means recognizing a nonzero value for gimple_call_flags (stmt) & ECF_ALWAYSZERO. So, in the same places where I added printfs to check for a gimple call, I also checked the flag values. And now, as before, I get the proper result when compiling gcc but not when compiling my own test code.

Here’s a bit of build output from gcc’s compilation:

tree-ssa-ccp.c: ccp_visit_stmt: gimple_call found
flags: 112, ECF_ALWAYSZERO: 1024
tree-ssa-ccp.c: visit_assignment: gimple_call found
flags: 112, ECF_ALWAYSZERO: 1024
tree-ssa-ccp.c: likely_value: gimple_call found
flags: 112, ECF_ALWAYSZERO: 1024
tree-ssa-ccp.c: evaluate_stmt: gimple_call found
flags: 112, ECF_ALWAYSZERO: 1024
tree-ssa-ccp.c: ccp_fold: gimple_call found
flags: 112, ECF_ALWAYSZERO: 1024
tree-ssa-ccp.c: get_default_value: gimple_call found
flags: 112, ECF_ALWAYSZERO: 1024

And here’s output from the compilation of my Blah class test program above:

$ ~/gcc/dist/bin/g++ main.cpp blah.cpp -O2
tree-ssa-ccp.c: ccp_visit_stmt: gimple_call found
flags: 0, ECF_ALWAYSZERO: 1024
tree-ssa-ccp.c: visit_assignment: gimple_call found
flags: 0, ECF_ALWAYSZERO: 1024
tree-ssa-ccp.c: likely_value: gimple_call found
flags: 0, ECF_ALWAYSZERO: 1024
tree-ssa-ccp.c: evaluate_stmt: gimple_call found
flags: 0, ECF_ALWAYSZERO: 1024
tree-ssa-ccp.c: ccp_fold: gimple_call found
flags: 0, ECF_ALWAYSZERO: 1024
tree-ssa-ccp.c: get_default_value: gimple_call found
flags: 0, ECF_ALWAYSZERO: 1024
alwayszero recognized in tree-optimize.c (execute_fixup_cfg)

Note the last line however. Using the test expression I outlined above, it is possible to recognize my attribute from an unrelated area in gcc. For some reason though, at least with this example, my flag value is not getting to the ccp pass. This is not a problem with my attribute itself since I’ve also tried compiling my Blah example using the attribute nothrow (bit 6) which clearly can be recognized at this level in principle since it is present in my example gcc build output posted above.

I did consider that the options for the ccp pass are set to automatically quash attributes so I tried adding PROP_gimple_any (signifies full gimple grammar) to the ‘properties required’ bitflag in the pass_ccp options specifier:

struct gimple_opt_pass pass_ccp = 
{
 {
  GIMPLE_PASS,
  "ccp",        /* name */
  gate_ccp,       /* gate */
  do_ssa_ccp,       /* execute */
  NULL,         /* sub */
  NULL,         /* next */
  0,          /* static_pass_number */
  TV_TREE_CCP,        /* tv_id */
  PROP_cfg | PROP_ssa | PROP_gimple_any,      /* properties_required  */
  0,          /* properties_provided */
  0,          /* properties_destroyed */
  0,          /* todo_flags_start */
  TODO_dump_func | TODO_verify_ssa
  | TODO_verify_stmts | TODO_ggc_collect/* todo_flags_finish */
 }
};

Naturally, this is not necessary (and doesn’t improve things) since some code (like those gcc functions) does allow an attribute to get through to the ccp pass.

At this point I’m somewhat stumped but I’d like to discuss some alternatives unrelated to the problem outlined above. Actually that last code example I posted is an example of how easy it is to define a new gimple optimization pass in gcc. This struct (from tree-pass.h) offers some idea of the info required:

struct register_pass_info
{
  struct opt_pass *pass;            /* New pass to register.  */
  const char *reference_pass_name;  /* Name of the reference pass for hooking
                                       up the new pass.  */
  int ref_pass_instance_number;     /* Insert the pass at the specified
                                       instance number of the reference pass.
                                       Do it for every instance if it is 0.  */
  enum pass_positioning_ops pos_op; /* how to insert the new pass.  */
};

A call is then made to extern void register_pass (struct register_pass_info *). If I didn’t want to involve myself with the propagator at all, I could take more of a syntactic approach, i.e. in a new pass I could simply null out the lhs of a value returning gimple call and then insert a new block before the gimple call’s block that contains rv = 0. The code for this may even look something like the block manipulation code found in tree-propagate.c. This route seems more complicated however (and I think will require a cfg fixup). It might also be possible to try the propagator based optimization I described above in a new pass (before the ccp pass perhaps), however I would think grafting it into tree-ssa-ccp.c should be my first step.

Anyway, for the moment I’m going to continue trying to get my attribute recognized from tree-ssa-ccp.c since I suspect it’s again only a matter of using too simple test code.

On another note I might have to start using gdb on gcc which will likely involve using emacs as is described here. The horror.

* Edit: A gcc dev pointed me in the right direction (I think) for fixing the attribute problem. Details later.
* Edit2: Fix not as straightforward as I thought.

Adding a new function attribute to GCC

Posted in Seneca by ehren on October 7, 2009

I’ve recently made a bit of a breakthrough in my quest to add an alwayszero function attribute to GCC (as described in my last post). Previously, I tried grepping for the windows specific attribute “dllexport” which I believe is never used by GCC (as opposed to “const”, “pure”, etc…). This approach allowed me to add a new bitflag representing my attribute to a FUNCTION_DECL node. Giddy with the thought that I actually found code to modify (tree.h and calls.c), I immediately proceeded with a build which failed for reasons I was never able to identify. After getting some help with working on the CDOT computers remotely from my professors David Humphrey and Chris Tyler on IRC I was ready to try again. As an aside Dave got me started with GNU Screen which has proved indispensable. However, after all the work, I realized a crucial piece of the puzzle was missing: where exactly should GCC recognize the attribute string “alwayszero”? Faced with the sinking feeling that the recognition of an attribute would be harder than the actual optimization I retired for the night.

The next morning I tried a new approach by consulting GCC’s bugzilla. Searching for bugs marked enhancement and containing the word “attribute” I came across Bug 36892 Support __attribute__((deprecated(“text string”))). Reviewing
H. J. Lu’s accepted revision filled in the missing piece: c-common.c contains a table which includes the attribute name as well as a pointer to a call back function intended to handle the attribute. This definitely counts as spamming the planet but here’s my patch for a new do nothing attribute:

Index: gcc/tree.h
===================================================================
--- gcc/tree.h	(revision 152509)
+++ gcc/tree.h	(working copy)
@@ -3127,6 +3127,10 @@
    but may have arbitrary side effects).  */
 #define DECL_IS_NOVOPS(NODE) (FUNCTION_DECL_CHECK (NODE)->function_decl.novops_flag)
 
+/* Nonzero in a FUNCTION_DECL means this function should be treated
+   as an "alwayszero" function (always returns 0 regardless of actual return. */
+#define DECL_IS_ALWAYSZERO(NODE) (FUNCTION_DECL_CHECK (NODE)->function_decl.alwayszero_flag)
+
 /* Used in FUNCTION_DECLs to indicate that they should be run automatically
    at the beginning or end of execution.  */
 #define DECL_STATIC_CONSTRUCTOR(NODE) \
@@ -3242,8 +3246,8 @@
   unsigned disregard_inline_limits : 1;
   unsigned pure_flag : 1;
   unsigned looping_const_or_pure_flag : 1;
+  unsigned alwayszero_flag : 1;
 
-
   /* 3 bits left */
 };
 
@@ -5049,7 +5053,10 @@
 /* Function does not read or write memory (but may have side effects, so
    it does not necessarily fit ECF_CONST).  */
 #define ECF_NOVOPS		  (1 << 9)
+/* Function always returns 0 */
+#define ECF_ALWAYSZERO           (1 << 10)
 
+
 extern int flags_from_decl_or_type (const_tree);
 extern int call_expr_flags (const_tree);
 
Index: gcc/calls.c
===================================================================
--- gcc/calls.c	(revision 152509)
+++ gcc/calls.c	(working copy)
@@ -608,6 +608,9 @@
       if (DECL_IS_NOVOPS (exp))
 	flags |= ECF_NOVOPS;
 
+      if (DECL_IS_ALWAYSZERO (exp))
+	flags |= ECF_ALWAYSZERO;
+
       if (TREE_NOTHROW (exp))
 	flags |= ECF_NOTHROW;
 
Index: gcc/c-common.c
===================================================================
--- gcc/c-common.c	(revision 152509)
+++ gcc/c-common.c	(working copy)
@@ -530,6 +530,7 @@
 static tree handle_alloc_size_attribute (tree *, tree, tree, int, bool *);
 static tree handle_target_attribute (tree *, tree, tree, int, bool *);
 static tree handle_optimize_attribute (tree *, tree, tree, int, bool *);
+static tree handle_alwayszero_attribute (tree *, tree, tree, int, bool *);
 
 static void check_function_nonnull (tree, int, tree *);
 static void check_nonnull_arg (void *, tree, unsigned HOST_WIDE_INT);
@@ -824,6 +825,8 @@
 			      handle_target_attribute },
   { "optimize",               1, -1, true, false, false,
 			      handle_optimize_attribute },
+  { "alwayszero",               0, 0, true,  false, false,
+			      handle_alwayszero_attribute },
   { NULL,                     0, 0, false, false, false, NULL }
 };
 
@@ -7808,6 +7811,24 @@
 
   return NULL_TREE;
 }
+
+/* Derived from handle_nothrow_attribute */
+static tree
+handle_alwayszero_attribute (tree *node, tree name, tree ARG_UNUSED (args),
+			  int ARG_UNUSED (flags), bool *no_add_attrs)
+{
+  if (TREE_CODE (*node) == FUNCTION_DECL)
+    DECL_IS_ALWAYSZERO(*node) = 1;
+  else
+    {
+      warning (OPT_Wattributes, "%qE attribute ignored", name);
+      *no_add_attrs = true;
+    }
+
+  return NULL_TREE;
+}
+
+
 
 /* Check for valid arguments being passed to a function.
    ATTRS is a list of attributes.  There are NARGS arguments in the array

Once I’ve made more progress on the optimization front it will be quite straightforward to go back and restrict the attribute to, for example, non-void functions, functions of a particular type, etc. It would actually be easy to add an argument to the attribute as well ie __attribute__((alwaysX("0"))) but I don’t want to get ahead of myself.

I should note that the comments I received in my previous post were all extremely helpful as well. I’ve been slightly concerned for the past few days that any sensible use of the alwayszero attribute would be unnecessary in light of existing gcc optimizations. Rereading Taras’ encouraging post I’m left with the impression that, in addition to his proposed devirtualizer, dynamic linking also eliminates the redundancy of an alwayszero attribute.

Taras’ comments also got me on the right track in terms of the ‘optimization’ required. Browsing this page on GCC debugging options, I noticed the compiler flag -fdump-tree-all which dumps a cornucopia of pretty printed intermediate GCC representations to file.

For example this program:

#include <stdio.h>
int blah(void) {
    printf("blah\n");
    return 5;
}
int main(void) {
    int c;
    blah();
    while (blah())
        printf("1");
    if (blah())
        printf("2");
    if (c = blah() == 5)
        printf("3");
    return 0;
}

Results in the following GIMPLE representation:

blah ()
{
  int D.1587;

  __builtin_puts (&"blah"[0]);
  D.1587 = 5;
  return D.1587;
}

main ()
{
  int D.1596;
  int D.1597;
  int D.1598;
  int D.1599;
  int c;

  blah ();
  goto <D.1593>;
  <D.1592>:;
  __builtin_putchar (49);
  <D.1593>:;
  D.1596 = blah ();
  if (D.1596 != 0)
    {
      goto <D.1592>;
    }
  else
    {
      goto <D.1594>;
    }
  <D.1594>:;
  D.1597 = blah ();
  if (D.1597 != 0)
    {
      __builtin_putchar (50);
    }
  else
    {
      
    }
  D.1598 = blah ();
  c = D.1598 == 5;
  if (c != 0)
    {
      __builtin_putchar (51);
    }
  else
    {
      
    }
  D.1599 = 0;
  return D.1599;
}

(can also be obtained directly using -fdump-tree-gimple)

I’m still a bit murky on how all of these representations fit together since, for example, there is low gimple, high gimple (middle high gimple?), etc, but Taras’ described optimization seems straightforward at this level. Viewing this page on SSA form, which I was led to by Benjamin Smedberg’s comment in my previous post, I would think that I should make my change before GCC builds its control flow graph, but this will require more investigation. I’ve actually seen that it’s quite simple to access an attribute tree code from down in the gimple, so this will be my next step.

Also, if I finish Kenneth Louden’s compiler book, Robert Morgan’s Building An Optimizing Compiler might be worth checking out since it’s referenced in the above SSA doc, as well as in several comments within the GCC source (of course the YorkU Library doesn’t have it). One thing at a time though.