Ehren's Blog

Mystery bytes

Posted in Seneca by ehren on December 15, 2009

As previously reported, I’ve been trying to determine why my gcc plugin to optimize away virtual functions that always return zero has resulted in a rather mediocre code size reduction. In fact, the problem is far worse that I previously realized.

It turns out the plugin has been adding 12240 bytes to libxul.so! This was truly horrible, but I have found the cause. As I mentioned before, the vast majority of the ‘optimizations’ performed by the plugin were quite useless. Any code of the form return call(); is considered by GCC at the GIMPLE level to be a ‘value returning gimple call’ which will end up with a 3 address code representation like so:
    t1 = call();
    return t1;

The plugin then uselessly converts this to:
    dummy_var = call();
    t1 = 0;
    return t1;

The problem is that this adds a significant amount of cruft to the generated code. It turns out GCC, and probably most compilers, when encountering return call();, will simply jump to the callee, letting the epilogue of the callee do the work of the caller. For example, I previously used the example of nsHTMLModElement::GetAttributeNS which looks like this:

 
NS_SCRIPTABLE NS_IMETHOD
GetAttributeNS(const nsAString & namespaceURI, const nsAString & localName,
	       nsAString & _retval NS_OUTPARAM)
{
  return _to GetAttributeNS(namespaceURI, localName, _retval);
}

(See this post for a GIMPLE breakdown)

Without my plugin, GCC generates this code for the function:

0000000000000000 <_ZN16nsHTMLModElement14GetAttributeNSERK18nsAString_internalS2_RS0_>:
   0: e9 00 00 00 00        jmpq   5 <_ZThn56_N16nsHTMLModElement14GetAttributeNSERK18nsAString_internalS2_RS0_>

Here the prologue is left off as well since all behavior is deferred to the callee. With the ‘optimization’ however the return value is explicitly zeroed out, and so GCC must generate code for a complete procedure:

0000000000000000 <_ZN16nsHTMLModElement14GetAttributeNSERK18nsAString_internalS2_RS0_>:
   0: 48 83 ec 08           sub    $0x8,%rsp
   4: e8 00 00 00 00        callq  9 <_ZN16nsHTMLModElement14GetAttributeNSERK18nsAString_internalS2_RS0_+0x9>
   9: 31 c0                 xor    %eax,%eax
   b: 48 83 c4 08           add    $0x8,%rsp
   f: c3                    retq

So what is to be done?

My first thought was to abandon this approach to the optimization entirely and go back to something like my 0.1 release. I was even able to build a do nothing implementation of the propagation engine by copying tree-ssa-propagate.h into the GCC plugin include directory. Unfortunately, writing one of these passes requires a lot of code because you have to do all of the folding yourself. I also don’t think it’s possible to just copy all of the code from the conditional constant propagation pass into a plugin (I’ve tried… GCC doesn’t export enough to the plugin framework).

Another approach was to add a couple of ad hoc conditions to the plugin so that statements of the form return call(); are not ‘optimized’. One thing I noticed was that if a function has only one return of the form return call(); , that return will be in the same basic block as the call. Therefore I could use code like this to weed out the boring stuff:

static bool
boring_block (basic_block bb)
{
  gimple ret_stmt = gimple_seq_last_stmt (bb_seq (bb));
  return gimple_code (ret_stmt) == GIMPLE_RETURN;
}

This worked pretty well. The number of ‘optimized’ call sites went down to 55 (from 1305) which was just what I wanted given that the majority of those call sites are not optimization worthy. Unfortunately, I was still increasing increasing the size of libxul.so, except this time by 14 bytes.

One possibility was that functions with multiple returns end up looking like this (this is after the ‘optimization’ has taken place):

<bb 6>:
  D.74138_10 = &this_6(D)->D.72191;
  dummy_var.24_17 = nsSVGGFrame::AttributeChanged (D.74138_10, aNameSpaceID_2(D), aAttribute_4(D), aModType_11(D));
  D.74137_12 = 0;

<bb 7>:
  # D.74137_1 = PHI <0(5), D.74137_12(6)>
  return D.74137_1;
}

(See the code for this function here. Note that it has one 0-constant return and one ‘alwayszero call’ return. It doesn’t fit my boring_block condition above, but this is still a useless optimization.)

So I moved onto the next approach. It appears that any time the return value is checked the basic block of the call will contain a conditional statement of some form (see the GIMPLE code for nsSplittableFrame::AttributeChanged in this post, for an example). This led me to a new condition for only performing useful optimizations:

static bool
is_interesting_block (gimple_stmt_iterator gsi)
{
  for (; !gsi_end_p (gsi); gsi_next (&gsi))
    {
      gimple stmt = gsi_stmt (gsi);
      enum gimple_code code = gimple_code (stmt);
      if (code == GIMPLE_COND || code == GIMPLE_SWITCH)
        return true;
    }
  return false;
}

With this, the number of optimized call sites is reduced to 26, and I’m certain all of these optimizations result in useful code pruning (although I haven’t run a tree-dump build yet). Here’s the weird thing though: the size results are exactly the same as when I was optimizing 55 callsites. That is, the plugin is still adding 14 bytes to libxul.so! Argh.

Here’s an even bigger mystery: I ran a build using a ‘disabled’ version of the plugin. It registers a new pass, but no optimizations are performed. The results are identical to when I was running the plugin with 55 callsites optimized and with 26 callsites optimized: 14 bytes are added to libxul.so. My first thought is that adding a new pass somehow interferes with existing GCC optimizations, especially since the plugin must set the TODO_update_ssa flag (since a new statement is added). Why do my 26 useful optimizations not have any affect though?

I’m going to have to run another tree-dump build of mozilla-central and then play around a bit more with objdump. Something is very strange here.

Advertisements

More analysis troubleshooting

Posted in Seneca by ehren on December 9, 2009

I wrote a little Dehydra script to determine why my optimization has resulted in a mediocre size reduction. As I suspected, most of the call sites are of the form return call(); but I can now confirm this with this script:

function isAlwaysZero(c)
{
  if (!c.attributes)
    return false;
  for each (let a in c.attributes)
    if (a.name == 'user' && a.value == 'NS_alwayszero')
      return true;
  return false;
}

function process_function(decl, body)
{
  // for calls that are simply returned
  let boringCalls = 0;
  // for calls where the return value is assigned to a variable
  let interestingCalls = 0;

  for (v in iterate_vars(body)) {
    if (v.isFcall && v.isReturn && isAlwaysZero(v))
      // the function call is returned
      ++boringCalls;
    else if (v.assign && v.assign[0].isFcall && isAlwaysZero(v.assign[0]))
      // the function call is assigned to a variable
      ++interestingCalls;
  }

  if (interestingCalls == 0 && boringCalls > 0)
    warning("boring alwayszero function (boring calls: " + boringCalls +
            "). name: " + decl.name, decl.loc);
  else if (interestingCalls > 0)
    warning("interesting alwayszero function (interesting calls: " +
            interestingCalls + ", boring calls: " + boringCalls + "). name: " +
            decl.name, decl.loc);
}

There may be a couple of corner cases I’m excluding here eg something like if (call()) but this should take of every instance when NS_FAILED is involved.

Anyway, here are the results:

The number of interesting call sites ie places where the return value of the call is assigned to a variable is only 67. Surprisingly, the number of boring call sites ie places where the return value of the call is simply returned is a whopping 2495. Here’s the results for all the boring callsites and all the interesting callsites, if you’re interested.

One thing I’m a bit confused about is that although there are 2495+67 = 2562 call sites where my plugin should be invoked, it is only being invoked 1305 times, which I have previously reported here (I can also confirm this from running a Mozilla build with -fdump-tree-all). Whatever the answer though, I’m pretty certain that the poor results are due to a small number of checks made on calls to functions I have patched.

Possible reasons for poor binary size reduction

Posted in Seneca by ehren on December 4, 2009

Just for a bit of context here’s the basic reason why I was only able to achieve 24 bytes worth of optimization with the current number of functions patched.

Consider GetAttributeNS which is defined in a macro here

It looks like this:

NS_SCRIPTABLE NS_IMETHOD
GetAttributeNS(const nsAString & namespaceURI, const nsAString & localName,
	       nsAString & _retval NS_OUTPARAM)
{
  return _to GetAttributeNS(namespaceURI, localName, _retval);
}

Here’s what it looks like (in gimple ssa form) after being run through the plugin:

nsHTMLModElement::GetAttributeNS(nsAString_internal const&, nsAString_internal const&, nsAString_internal&) (struct nsHTMLModElement * const this, const struct nsAString_internal & namespaceURI, const struct nsAString_internal & localName, struct nsAString_internal & _retval)
{
  nsresult dummy_var.33;
  struct nsGenericElement * D.44812;
  nsresult D.44811;

<bb 2>:
  D.44812_2 = &this_1(D)->D.43447.D.33567.D.30722.D.30660;
  dummy_var.33_9 = nsGenericElement::GetAttributeNS (D.44812_2, namespaceURI_3(D), localName_4(D), _retval_5(D));
  D.44811_6 = 0;
  return D.44811_6;
}

The optimization’s performed (because it’s a value returning gimple call) but there’s no real code elimination. Even though this function now knows that the call to an alwayszero function is 0 it doesn’t really help because it just returns the value. And the function returning the value is itself virtual!

(Note: It’s somewhat shameless to post the above because I just ripped it out of a pastebin from an irc conversation, but there it is.)

Are there any significant optimizations?

The next question is whether or not my plugin is able to cut out any unnecessary NS_FAILED calls since that’s kind of the point. In fact, it has in some cases.

Consider nsImageFrame::AttributeChanged

It looks like this:

NS_IMETHODIMP
nsImageFrame::AttributeChanged(PRInt32 aNameSpaceID,
                               nsIAtom* aAttribute,
                               PRInt32 aModType)
{
  nsresult rv = nsSplittableFrame::AttributeChanged(aNameSpaceID,
                                                    aAttribute, aModType);
  if (NS_FAILED(rv)) {
    return rv;
  }
  if (nsGkAtoms::alt == aAttribute)
  {
    PresContext()->PresShell()->FrameNeedsReflow(this,
                                                 nsIPresShell::eStyleChange,
                                                 NS_FRAME_IS_DIRTY);
  }

  return NS_OK;
}

nsSplittableFrame::AttributeChanged is an alwayszero function so nsImageFrame::AttributeChanged will be processed by the plugin. Here’s what that processing will do:

nsImageFrame::AttributeChanged(int, nsIAtom*, int) (struct nsImageFrame * const this, PRInt32 aNameSpaceID, struct nsIAtom * aAttribute, PRInt32 aModType)
{
  nsresult dummy_var.291;
  struct nsIPresShell * D.96275;
  struct nsStyleContext * D.96272;
  struct nsRuleNode * D.96271;
  struct nsPresContext * D.96269;
  struct nsPresContext * D.96269;
  nsresult rv;
  int (*__vtbl_ptr_type) (void) D.93660;
  int (*__vtbl_ptr_type) (void) * D.93659;
  int (*__vtbl_ptr_type) (void) * D.93658;
  struct nsIFrame * D.93655;
  struct nsIAtom * alt.65;
  long int D.93648;
  long int D.93647;
  int rv.64;
  struct nsFrame * D.93645;

<bb 2>:
  D.93645_3 = &this_2(D)->D.53019.D.51667;
  dummy_var.291_32 = nsFrame::AttributeChanged (D.93645_3, aNameSpaceID_4(D), aAttribute_5(D), aModType_6(D));
  rv_7 = 0;
  rv.64_8 = (int) rv_7;
  D.93647_9 = rv.64_8 < 0;
  D.93648_10 = __builtin_expect (D.93647_9, 0);
  if (D.93648_10 != 0)
    goto <bb 3>;
  else
    goto <bb 4>;

<bb 3>:
  rv_11 = rv_7;
  goto <bb 7>;

<bb 4>:
  alt.65_12 = alt;
  if (alt.65_12 == aAttribute_5(D))
    goto <bb 5>;
  else
    goto <bb 6>;

<bb 5>:
  D.93655_13 = &this_2(D)->D.53019.D.51667.D.51581.D.46508;
  D.96272_14 = D.93655_13->mStyleContext;
  D.96272_26 = D.96272_14;
  D.96271_27 = D.96272_26->mRuleNode;
  D.96269_28 = D.96271_27->mPresContext;
  D.96269_30 = D.96269_28;
  D.96269_15 = D.96269_30;
  D.96275_16 = D.96269_15->mShell;
  D.96275_29 = D.96275_16;
  D.96275_31 = D.96275_29;
  D.96275_17 = D.96275_31;
  D.93658_18 = D.96275_17->D.15138.D.14613._vptr.nsISupports;
  D.93659_19 = D.93658_18 + 200;
  D.93660_20 = *D.93659_19;
  D.93655_21 = &this_2(D)->D.53019.D.51667.D.51581.D.46508;
  OBJ_TYPE_REF(D.93660_20;D.96275_17->25) (D.96275_17, D.93655_21, 2, 1024);

<bb 6>:
  rv_22 = 0;

<bb 7>:
  # rv_1 = PHI <rv_11(3), 0(6)>
  return rv_1;
}

Clearly the optimization has been performed but has GCC propagated the 0, cutting out the NS_FAILED?

It has, in fact (here’s the IR after a later pass):

nsImageFrame::AttributeChanged(int, nsIAtom*, int) (struct nsImageFrame * const this, PRInt32 aNameSpaceID, struct nsIAtom * aAttribute, PRInt32 aModType)
{
  struct nsIPresShell * D.96275;
  struct nsStyleContext * D.96272;
  struct nsRuleNode * D.96271;
  struct nsPresContext * D.96269;
  struct nsPresContext * D.96269;
  int (*__vtbl_ptr_type) (void) D.93660;
  int (*__vtbl_ptr_type) (void) * D.93659;
  int (*__vtbl_ptr_type) (void) * D.93658;
  struct nsIFrame * D.93655;
  struct nsIAtom * alt.65;
  struct nsFrame * D.93645;

<bb 2>:
  D.93645_3 = &this_2(D)->D.53019.D.51667;
  nsFrame::AttributeChanged (D.93645_3, aNameSpaceID_4(D), aAttribute_5(D), aModType_6(D));
  alt.65_12 = alt;
  if (alt.65_12 == aAttribute_5(D))
    goto <bb 3>;
  else
    goto <bb 4>;

<bb 3>:
  D.96272_14 = this_2(D)->D.53019.D.51667.D.51581.D.46508.mStyleContext;
  D.96271_27 = D.96272_14->mRuleNode;
  D.96269_28 = D.96271_27->mPresContext;
  D.96275_16 = D.96269_28->mShell;
  D.93658_18 = D.96275_16->D.15138.D.14613._vptr.nsISupports;
  D.93659_19 = D.93658_18 + 200;
  D.93660_20 = *D.93659_19;
  D.93655_21 = &this_2(D)->D.53019.D.51667.D.51581.D.46508;
  OBJ_TYPE_REF(D.93660_20;D.96275_16->25) (D.96275_16, D.93655_21, 2, 1024);

<bb 4>:
  return 0;
}

Note that dummy_var is gone too.

I would conclude that the optimization works for Firefox. It’s just that so far I’ve patched some really boring functions!

Binary size results after optimization

Posted in Seneca by ehren on December 4, 2009

Here’s a quick rundown of the binary size reduction from running my plugin on mozilla-central:

These libraries had a minor code size decrease:

without-plugin 93 ./dist/lib/libfreebl3.so -> /home/ehren/mozilla-central-without-plugin/objdir/nss/freebl/Linux_SIN
with-plugin 90 ./dist/lib/libfreebl3.so -> /home/ehren/mozilla-central-with-plugin/objdir/nss/freebl/Linux_SINGLE_SH
without-plugin 68 ./dist/lib/libnss3.so -> /home/ehren/mozilla-central-without-plugin/objdir/nss/nss/libnss3.so
with-plugin 65 ./dist/lib/libnss3.so -> /home/ehren/mozilla-central-with-plugin/objdir/nss/nss/libnss3.so
without-plugin 75 ./dist/lib/libnssckbi.so -> /home/ehren/mozilla-central-without-plugin/objdir/nss/nssckbi/libnssck
with-plugin 72 ./dist/lib/libnssckbi.so -> /home/ehren/mozilla-central-with-plugin/objdir/nss/nssckbi/libnssckbi.so
without-plugin 74 ./dist/lib/libnssdbm3.so -> /home/ehren/mozilla-central-without-plugin/objdir/nss/nssdbm/libnssdbm
with-plugin 71 ./dist/lib/libnssdbm3.so -> /home/ehren/mozilla-central-with-plugin/objdir/nss/nssdbm/libnssdbm3.so
without-plugin 76 ./dist/lib/libnssutil3.so -> /home/ehren/mozilla-central-without-plugin/objdir/nss/nssutil/libnssu
with-plugin 73 ./dist/lib/libnssutil3.so -> /home/ehren/mozilla-central-with-plugin/objdir/nss/nssutil/libnssutil3.s
without-plugin 72 ./dist/lib/libsmime3.so -> /home/ehren/mozilla-central-without-plugin/objdir/nss/smime/libsmime3.s
with-plugin 69 ./dist/lib/libsmime3.so -> /home/ehren/mozilla-central-with-plugin/objdir/nss/smime/libsmime3.so
without-plugin 76 ./dist/lib/libsoftokn3.so -> /home/ehren/mozilla-central-without-plugin/objdir/nss/softokn/libsoft
with-plugin 73 ./dist/lib/libsoftokn3.so -> /home/ehren/mozilla-central-with-plugin/objdir/nss/softokn/libsoftokn3.s
without-plugin 68 ./dist/lib/libssl3.so -> /home/ehren/mozilla-central-without-plugin/objdir/nss/ssl/libssl3.so
with-plugin 65 ./dist/lib/libssl3.so -> /home/ehren/mozilla-central-with-plugin/objdir/nss/ssl/libssl3.so

Total savings: 24 bytes across 8 libraries.

Everything else was unchanged. The full results can be viewed here.

Note: this is after patching 1677 member functions which resulted in 1305 optimizations performed (this means there are about 1305 value returning calls to these 1677 functions).

Btw I obtained this data running find . -name '*.so' | xargs ls -l > ~/with-plugin.txt on both build directories, and then by sorting and merging the results ie cat without-plugin.txt with-plugin.txt | sort -k 8,15 > merged-size-results.txt

Incidentally, I think the the code size increase I reported previously was just me reading some diff results wrong.

A Treehydra + GCC plugin combo for alwayszero functions

Posted in Seneca by ehren on December 4, 2009

Yesterday (err technically two days ago), I was able to cleanly add my user(("alwayszero")) attribute to about 1780 of the ~6300 functions I have identified as alwayszero within mozilla-central. I was even able to get through a build using my pluginified GCC 4.5. Further, by adding a bit of debug info right into the plugin I was able to count the number of times the optimization was executed ie every time the return value of a call to an alwayszero function is stored. Unfortunately, the results were somewhat disappointing in that the optimization was only executed 1300 times. This doesn’t take into account any additional constant propagation that resulted from these optimizations but, by comparing the sizes of the .so files in the object directory to the .so files of an equivalent non-plugin enabled GCC 4.5 build, there was no code size advantage to be had. In fact a handful of the files increased by 5 or so bytes!

I suppose it’s premature to derive any conclusions at this point, though, with less that a third of the functions patched with my attribute.

Anyway, the next step in completing the patching process was to clean up my analysis to:
1) remove all of the outparams.js code that’s unnecessary for determining the final return value
and
2) ensure that the attribute has not been misapplied (this will result in very bad/interesting things happening at runtime)

During this step, I somehow introduced some very bizarre errors into my Treehydra script that I’ve only now been able to resolve. At some point I’d like to examine the source of these errors but at the moment there are bigger issues.

Perhaps foolishly, I’ve now placed all of my code into bugzilla. The treehydra analysis certainly has a few issues, since, for one thing, if it’s placed into trunk as is (with FIND_NEW_FUNCTIONS = true) static-checking builds will contain an overwhelming amount of alwayszero identifying output. However, at this point, I think that if there are any major issues left (other than writing a Pork patch) they can probably best be discovered in the bug.

One thing I didn’t put in bugzilla was the insane technique I’ve used to identify and patch as many alwayszero functions as possible. For the non-squeamish, I’ll describe the process now.

I first create a typescript of a build running my analysis script. It spits out output like this:

alwayszero function found! location: /home/ehren/mozilla-central/xpcom/glue/nsCategoryCache.cpp:126:51 name: Observe class: /home/ehren/mozilla-central/xpcom/glue/nsCategoryCache.h:65:59 overloadCount: 0 args: nsCategoryObserver*, nsISupports*, const char*, const PRUnichar*

Next I grep for all of these lines removing any duplicates:
grep 'alwayszero function found!' typescript | sort | uniq > uniq.txt

Rather than patch my main mozilla-central repository, I typically create a copy and then rename all the directories in this file. For example with vim:
:%s/mozilla-central/mozilla-central-Copy/g

Overloaded methods are a big headache for the techniques I’m using because, as you’ll see, the matches I’m making are only by function name. So, after identifying them in my treehydra script, I simply remove them (there are only about 130 anyway):

grep 'overrideCount: 0' uniq.txt > no-overrides.txt

Now it’s time for the first shell script. I use the following to print out the (possible) location of the declaration of the function within its class:

# finddecl.sh
while read line ; do
  name=$(echo $line | cut -d' ' -f 7)
  path=$(echo $line | cut -d' ' -f 5)
  classpath=$(echo $line | cut -d' ' -f 9)
  lineno=$(echo $classpath | cut -d':' -f 2)
  classpath=$(echo $classpath | cut -d':' -f 1) # strip off line info
  # start at the beginning of the class and returns the first line containing the 
  # function's name. (actually it starts at line 1 but the d actions skips these lines)
  nameinfo=$(sed -n "1,$lineno d; /$name/{p;=;q;}" $classpath)
  echo $name path: $path classpath: $classpath nameinfo: $nameinfo
done < $1

I then run it like so:

./finddecl.sh no-overrides.txt > found-decls.txt

This output will contain a tonne of blank nameinfo entries because most of the functions I’ve found are behind macros. There are also just lots of phony results, mainly because my search is extremely imprecise. To cut down on the junk, therefore, I do this:

grep '\(virtual\)\|\(NS_IMETHOD \)' found-decls.txt > found-decls-trimmed.txt

All of this will result in output like so:

GetRoleInternal path: /home/ehren/mozilla-central-Copy/accessible/src/html/nsHTMLSelectAccessible.cpp:963:58 classpath: /home/ehren/mozilla-central-Copy/accessible/src/html/nsHTMLSelectAccessible.h nameinfo: virtual nsresult GetRoleInternal(PRUint32 *aRole); 235

Now we’re ready for the second shell script:

# patchdecls.sh
while read line ; do
  fnname=$(echo $line | cut -d' ' -f 1)
  classpath=$(echo $line | cut -d' ' -f 5)
  lineno=$(echo $line | sed -n "s/.*[^0-9]\([0-9]\+\)$/\1/p") # thanks chris!
  # make sure we haven't already applied the attribute
  match=$(sed -n "$lineno s/NS_ALWAYSZERO/&/p" $classpath)
  if [ "$match" == "" ]
  then
    # in-place replace
    sed -in "$lineno s/ $fnname/ NS_ALWAYSZERO&/" $classpath
  fi
done < $1

Running it will result in a patched mozilla-central-Copy:

./patchdecls.sh found-decls-trimmed.txt

(All of this could be done with one script but there are certain advantages to breaking it up)

Earlier, as I mentioned, I did run such a patched build with GCC 4.5 using my optimization plugin. However, after finalizing my treehydra analysis, I know that a number of the functions patched should not have been. The fact that I encountered no compile time errors (naturally I wouldn’t) is a reminder of the danger of this plugin without a complementary static analysis. I still have my crazy unsafe build which I am tempted to try in the CDOT tomorrow. Could be a new FOX special: “when functions return zero”.

Now that I recently enabled a ‘safety checking’ feature in the analysis, where an error raised if a function is inappropriately tagged as alwayszero, the above is not an issue though. In fact, I am more confident in my analysis now that it has identified a number of misapplied attributes than when it simply was only identifying zero-return functions. For those interested, here’s a ramshackle list of functions which my shellscript’s inappropriately patched. Notice that there’s only one syntax error. Note that even though there are only 22 of them, I had a miserable time manually fix them all/waiting for the compiler to identify them!

To wrap this up, I should mention the results of running a final static-checking build after all the patching had been finalized. As you can see here, there are only 4514 left to go. Woohoo!

Towards a Giant Patch

Posted in Seneca by ehren on November 30, 2009

Well, I just spent a considerable amount of time trying to parse C++ with a regex, which wasn’t particularly enjoyable. Since I last posted, I’ve been able to pair down outparams.js to just what I need for my project. It’s basically now just a ZeroNonzero analysis together with a post analysis to round up all the alwayszero functions. This has resulted in increasing the number of functions found to around 6300. After getting a little help from ctyler to write a script that displays 100 lines of context around these functions, I’m also pretty certain that they all really do return zero, too.

The next step was finding the location of the function declarations within their respective classes. Of course, this information is not stored with the tree node representing a function declaration. Not that it matters, but apparently GCC considers the function name right before the definition to be the declaration (and I guess when present anywhere else it’s a forward declaration). Anyway, the next best thing is getting the path and line number of the class definition, which I’ve done, so I can then search for the function name from that point on.

One problem with this regex approach is that if I wanted to be exact about the matches, I’d need to consider type information about the function’s parameters. It’s actually quite easy to get this info with Treehydra, but there are a number of complications. Default parameter values eg int foo(int x = 0);, are one monkey wrench, for example. One thing I’m stuck on is being able to place shell variables into the lhs of a sed regex while still preserving the ability to use characters with special meanings like .* etc (being able to place special characters into the substituted variable would be even better). To get around this problem, I’ve checked in Treehydra whether the function is overloaded in the class and if so, printed a message about it. I can then exclude such functions, which only amount to about 100, from my results, simplifying things considerably.

Another problem is that many, perhaps the majority, of these functions are hidden behind macros. This might not be such a bad thing though, since a relatively small number of manual edits could perhaps affect thousands of declarations. There are some relative paths in the analysis results as well, but the instances are few enough that manual edits are feasible.

Anyway, I do have some results of declarations which are ready to be patched here. There’s only 1782 of them which means more than two thirds must be handled in other ways. What I might do is finalize the analysis to not only emit errors when my attribute has been applied to a function that may not return zero, but also warn about those remaining functions that should have the attribute but don’t. I can then decide how to proceed, once I’ve got a patched and plugin enabled build up and running.

There’s also the Pork route, which by way of Elsa, has the apparent advantage of providing both the location of the function definition and also the location of the declaration within the class. It should be able to handle the macros too but my understanding’s pretty murky here. Beyond getting the thing built, I haven’t looked into it that much, though.

Help from GNU, Analysis update, and possible contrib

Posted in Seneca by ehren on November 26, 2009

A bunch of stuff has happened regarding a Firefox build with GCC 4.5. I ended up filing a bug report about the link error I mentioned in a previous post. After sending a message to the GCC mailing list referring to it, I received a swift and helpful response. This led to an easy fix on the Mozilla side.

In other news, I reran the outparams.js based analysis I posted before. This time I only checked virtual member functions ie if (!DECL_VIRTUAL_P(func_decl)) return; . The result after removing as many duplicates as possible (without performing any substitutions) is a surprising 2664 alwayszero functions (2512 are in .cpp files and should thus be unique). At this point I’m a bit suspicious of these numbers especially since my earlier Dehydra analysis only turned up 243. I’ll have to go over them carefully tomorrow.

Here’s the list, btw.

Actually, if anyone’s looking for a contribution opportunity in OSD600/DPS909, finding any functions, even one, in the list that are not virtual or that don’t always return 0 would be helpful.

Reworking my analysis

Posted in Seneca by ehren on November 24, 2009

This weekend I’ve been stuck trying to come up with a Treehydra analysis similar to outparams.js that will root out those functions which always return zero (the ‘syntactic’ Dehydra analyis I posted previously was a bit naive). Trying to get this done for my 0.2 though was a grave mistake. It’s somewhat frustrating because I know this script already does what I want, namely it will ensure that an ‘out parameter’ is written upon a zero return. However, my aim is slightly different in that I don’t care about any other properties of the state when I’m in a state that contains a zero return; I just care that every state with a return is a zero-return state.

There’s another file that’s used by outparams.js called mayreturn.js that “determines the set of variables that may transitively reach the return statement” (eg w = x; x = y; y = z; return z;). It works by checking for either a RETURN_EXPR or a GIMPLE_MODIFY_STMT, in an instruction under consideration. If a RETURN_EXPR is found, the variable being returned is added to the current state. If a GIMPLE_MODIFY_STMT is found (ie an assignment) and the lhs is already in the state then the rhs is added to the state. Because this is implemented in the flow function, a change in the state will initiate another iteration through all the blocks of the function with the flow function called on each instruction. Or rather I suppose the iteration is done over only the predecessor blocks (given that mayreturn implements a ‘backward analysis’).

The main thing I’m confused about is that mayreturn.js seems to identify only a single variable as the ultimate return variable. This is even though a BackwardAnalysis (which mayreturn implements) iterates over every block (at least initially). As to its use in outparams.js, the mayreturn analysis is applied here, in process_tree which I’m certain is only invoked once.
So how are multiple returns handled?

This mozilla wiki page on abstract interpretation has been incredibly helpful in getting an idea of how all this stuff works. Going off it, here would be my idea for an always zero return checker:

I think the abstract values are already defined for me ie Zero_NonZero.Lattice.ZERO and Zero_NonZero.Lattice.NONZERO. Maybe I’m confused here and I actually need a new state ie RETURNS_ZERO and RETURNS_NON_ZERO.

The flow function should be something like ‘If stmt is an assignment by the zero constant, and if the lhs of the assignment is in the transitive return variable set identified by mayreturn.js, then set the state to ZERO (or RETURNS_ZERO).

Because I’m still unclear how multiple returns are handled though, I’m not really sure how to get started. There is a place in outparams.js where a given substate is checked for whether it returns zero (and if it does and an outparam is not written to, a warning is issued). Likewise, there is also a check for whether an outparam is written to upon returning a failure code. Unfortunately, the notion of ‘substate’ is at the heart of ESP analysis which is not explained in that Mozilla wiki page I posted above… although there’s an ESP heading with no body :). My vague understanding is that now you’re dealing with sets of states. My only recourse in getting more info is this paper where ESP (Error Detection via Scalable Program Analysis) is introduced. (Naturally it’s 100% crazy stuff).

Edit: hmm… just as I wrote that last paragraph something dawned on me. I think everything I just wrote is nonsense or at least irrelevant (I’ll leave it though because I have to get my blogging up).

Anyway, consider the checkSubstate function in outparams.js. I tried modifying it as well as the post analysis error checker where it’s called:

diff -r 41c1b69b3ed3 xpcom/analysis/outparams.js
--- a/xpcom/analysis/outparams.js	Mon Nov 23 22:17:06 2009 -0500
+++ b/xpcom/analysis/outparams.js	Tue Nov 24 00:20:11 2009 -0500
@@ -522,38 +522,48 @@ function unwrap_outparam(arg, state) {
   }
   if (outparam) return outparam;
   return arg;
 }
 
 // Check for errors. Must .run() analysis before calling this.
 OutparamCheck.prototype.check = function(isvoid, fndecl) {
   let state = this.cfg.x_exit_block_ptr.stateOut;
+  var alwayszero = true;
   for (let substate in state.substates.getValues()) {
-    this.checkSubstate(isvoid, fndecl, substate);
+    if(!this.checkSubstate(isvoid, fndecl, substate)) {
+      alwayszero = false;
+      break;
+    }
+  }
+  if (alwayszero) {
+    print("alwayszero function found! location: " + location_of(this.fndecl) + " name: " +  function_decl_name(this.fndecl));
+  }
 }
 
 OutparamCheck.prototype.checkSubstate = function(isvoid, fndecl, ss) {
   if (isvoid) {
+    return false;
     this.checkSubstateSuccess(ss);
   } else {
     let [succ, fail] = ret_coding(fndecl);
     let rv = ss.get(this.retvar);
     // We want to check if the abstract value of the rv is entirely
     // contained in the success or failure condition.
     if (av.meet(rv, succ) == rv) {
+      return true;
       this.checkSubstateSuccess(ss);
     } else if (av.meet(rv, fail) == rv) {
+      return false;
       this.checkSubstateFailure(ss);
     } else {
       // This condition indicates a bug in outparams.js. We'll just
       // warn so we don't break static analysis builds.
       warning("Outparams checker cannot determine rv success/failure",
               location_of(fndecl));
+      return false;
       this.checkSubstateSuccess(ss);
       this.checkSubstateFailure(ss);
     }
   }
 }
 
 /* @return     The return statement in the function
  *             that writes the return value in the given substate.

This is ugly hackery by it seems to do the trick.

Anyway, click here to see all the 3074 functions found. Checking for duplicates ie cat outparam-like-analysis.txt | sort | uniq | wc -l, I get 2874 which would seem to be accurate given that I’m dealing with absolute filename paths (ie the many instances when an alwayszero function is present in a header file should be taken care of). Interestingly this is way more than the the 396 functions found by another (unposted) analysis that I previously tried which checked for alwayszero methods (either virtual or non-virtual). See this post for a similar analysis.

There may be a few false positives too, but hopefully not a great many (I haven’t thoroughly checked the results so they could be totally wrong). At this point I’m not too concerned about interpreting the data because there are lots of non-virtuals in the mix. I think this can be changed with a function similar to is_constructor, though. I will have to strip out all of the non-relevant code which is not completely trivial, but it shouldn’t be too hard. I suppose I don’t really need to do this until the final version, which should be a safety checker for my plugin ie if a function has the alwayszero attribute then it must always return zero. I’ll have to solve my plugin/build issues first, though.

Firefox build issues with GCC 4.5

Posted in Seneca by ehren on November 23, 2009

I’ve hit a large snag with my project. Unfortunately, getting a plugin to work with the backported gcc 4.3.4 is not as straightforward as I had assumed. I badly misread the ifdefs in dehydra_plugin.c here, thinking I’d be able to define my own pass struct and hook it in like with 4.5. Oops.

As far as I can tell If I want to run my optimization with the patched 4.3 I’ll have to add my gimple manipulations to one of the callbacks used by Treehyhdra. This might be doable but I haven’t tried it. The alternative is to create another patch against 4.3.4, either adding a new pass, as with the plugin, or by trying a bit of conditional constant propagation hackery as with my 0.1. The solution may not be ideal in either case.

As an attempt to sidestep the issue, I’ve been attempting to get Firefox to compile with 4.5. I was a bit confused until asking on #gcc about what’s needed in the bug report. Namely whether I need to include only the preprocessed .ii version of the ‘problem file’ or whether I need preprocessed versions of every file in the directory, as I had seen with another report (especially since I can’t get the .ii file to compile independently). Someone informed me that the one .ii is enough and so here’s the new bug. So far no one’s bitten yet so I think I’ll have to reproduce the problem with the irrelevant functions from jsxml.cpp taken out.

Anyway, this bug can be sidestepped by turning down -O3 to -O0, but I’ve hit another issue during the linking of another file. Here’s the output. I’ve reproduced the issue with a development version of binutils as well but I’m thinking this is actually a Mozilla bug. I’ll have to consult with someone who would know.

On the analysis front, I’ve got something in the works about my recent travails with Treehydra but now that I’ve started to write it I think it needs it’s own post.

Analysis and type madness

Posted in Seneca by ehren on November 18, 2009

I’ve created a simple Dehydra script to check for alwayszero functions. There are some issues however. Here’s the script:

function process_function(f, body)
{
  if (f.type.type.name == 'void' || !f.isVirtual) {
    return;
  }

  var alwayszero = true;

  function processStatements(stmts) {
    for(var j = 0; j < stmts.statements.length; j++) {
      var s = stmts.statements[j];
      if (s.isReturn && s.value != 0) {
        alwayszero = false; 
      }
    }
  }

  for (var i = 0; i < body.length; i++) {
    processStatements(body[i]);
  }

  if (alwayszero) {
    print("alwayszero function: " + f.loc.file + " on line " + f.loc.line +
          ", column " + f.loc.column + " : " + f.type.type.name + " " + f.name);
  }
}

I actually have a more elegant version using a for in loop and iterate_vars but after watching that Douglas Crockford lecture I’m a bit paranoid about such things (both report the same results though).

After doing a bit of post-processing on the output I can report the following:

Functions meeting the above criteria are encountered 5920 times during compilation but most of these are duplicates since there are quite a few method definitions within include files.

Removing all of the duplicates reduces the number to 243 instances encountered (shouldn’t be too difficult to patch manually or with some shell script hackery).

A link to the list of 243 unique instances is here. The full output list can be obtained here (warning: big file).

Unfortunately I have encountered at least one false positive with this script:

gfxFont.cpp on line 960, column 35 : gfxFont::RunMetrics gfxFont::Measure(gfxTextRun*, PRUint32, PRUint32, gfxFont::BoundingBoxType, gfxContext*, gfxFont::Spacing*)

Viewing the definition here, I’m not sure why this function was captured by my analysis but I’ll have to investigate. Ultimately, I think doing this with a Treehydra script might make more sense. There’s already an existing script that does all the heavy lifting, but I haven’t been able to link this into an independent analysis pass on mozilla-central (yet).

Edit: Argh… found another false positive here:
nsSVGContainerFrame.cpp on line 264, column 82 : gfxRect nsSVGDisplayContainerFrame::GetBBoxContribution(const gfxMatrix&)

More Issues:

When I get the plugin ready to go with mozilla, I’ll add a script to xpcom/analysis to ensure that every function which has the user(("alwayszero")) attribute meets the above criteria.

As to excluding non-virtual member functions from the analysis, as far as I can tell GCC should already be able to optimize these away. Perhaps there’s a corner case I haven’t considered though.

Another issue I thought of today is that this script can potentially flag floating point functions as alwayszero which will result in bad things happening if I apply my plugin as originally written to them.

Luckily a fix to the plugin was not that difficult:

static bool
sane_tree_type (tree t)
{
  enum tree_code code = TREE_CODE (TREE_TYPE (t));
  return code == INTEGER_TYPE || code == BOOLEAN_TYPE;
}

 /* other stuff --- see previous posts for context */

static unsigned int
execute_alwayszero_opt (void)
{
  gimple_stmt_iterator gsi;
  basic_block bb;

  FOR_EACH_BB (bb)
    {
      for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
        {
          gimple stmt = gsi_stmt (gsi);

          tree lhs = gimple_get_lhs (stmt);

          if (is_gimple_call (stmt) && lhs != NULL_TREE &&
              is_alwayszero_function (stmt) && sane_tree_type (lhs))
            {
              tree zero = build_int_cst (TREE_TYPE (lhs), 0);
              gimple assign = gimple_build_assign_stat (lhs, zero);
              tree var = create_tmp_var (TREE_TYPE (lhs), "dummy_var");
              add_referenced_var (var);
              mark_sym_for_renaming (var);
              gimple_call_set_lhs (stmt, var);
              gsi_insert_after (&gsi, assign, GSI_CONTINUE_LINKING);
            }
        }
    }

  return 0;
} 

This will exclude pointer types as well which is probably a good thing. If it’s really necessary to optimize away alwayszero REAL_TYPE functions this could be done as well, but they will have to be handled separately. Actually this should be pretty easy but it’s not really a priority at the moment. It probably makes more sense to check the type of the function rather than that of the lhs, but consider the above a proof of concept.

The next step is getting my plugin to work with the GCC 4.3.4 with backported plugin support used to build Dehydra which I now know will not be too difficult. The main issue is that the gimple_stmt_iterator is nowhere to be seen (even though it shows up in the Changelog … wtf) so I’ll have to use the block_stmt_iterator instead which will require a slight bit more ugliness.

The main problem though is getting a plugin to build under the GCC 4.3.4 with backported plugin support used to build Dehydra. 4.5 includes a separate plugin include directory and an easy to use -print-file-name=plugin flag to get at the directory. I’ve tried hacking a build rule into the existing configure script in Dehydra but I’m a complete noob with make and configure stuff so I haven’t had any success yet. Hopefully I can rectify this shortly.

Also, I’ve been attempting to get FF built with 4.5 but have encountered some issues. Once I get a nice sequence of steps that follows the instructions on this page I’ll bring this to the attention of the GCC people.