Ehren's Blog

Popping my GIMPLES (a plan)

Posted in Seneca by ehren on September 30, 2009

This week marks the deadline for our initial project plan in DPS909 and I must admit that if left to my own devices I would be putting this post off even more than I already have. However, especially after a conversation with humph and mhoye last Friday on IRC, I must admit that working on open technology means working in the open, even if you don’t have all the pieces together quite yet.

In a previous post I offered a brief overview of my project which involves adding support for an alwayszero function attribute to gcc. The description provided by Taras in the bug gives a general outline:

nsresult rv = call();
if (NS_FAILED(rv)....) <-- gcc should be able to optimize this away if it knows that value of call() is always zero

Previously, I had been tackling this project solely by code reading, including grepping the sources and going file by file through the code base (my kingdom for a good multiline grep). Although I believe this approach will bear fruit in the coming week, I would like to mention and dispense with one whopping illusion I had upon beginning the project:

* That it can be completed without knowledge of compiler design and formal grammar.

My solution has been to find the best textbook I can (and I think I’ve done it). On Thursday I finally got my hands on a copy of Compiler Construction – Principles and Practice by Kenneth Louden, and I’ve since been burning through it. I’ve already got through the first 4 chapters which so far have given a fairly rigorous overview of Scanning (including LEX), Parsing and CFGs, and (various kinds of) Top Down Parsing (next on the docket: Bottom Up Parsing and Semantic Analysis).

At this point you may be thinking I’ve gone totally off the rails. However, this book has helped me to better grasp another important text I had overlooked: The GCC Internals Manual. Admittedly, my problem has nothing to do with Scanning (Converting lexemes into tokens). It also has nothing to do with Parsing Algorithms, which build an (abstract) syntax tree out of the tokens. But, as far as I can tell at this point, the problem will require me to traverse this parse or syntax tree in order to locate the start of the next statement sequence after the unnecessary test of the function’s return value. I would expect that the calling function of a function can be accessed as a parent node of the function. I’ll return to this question of where exactly my modifications should take place shortly.

I should mention that another step–Semantic Analysis–is undertaken by the compiler which involves, for example, recognizing context sensitive aspects of the language. For example, in C you must declare an identifier (variable name, etc) before using it ie in an undeclared context an identifier is illegal but if declared it is legal (I haven’t got to the chapter on this subject yet but apparently this (static) semantic analysis is undertaken using a lookup table).

Now that I’ve given a broad overlay, let’s consider actual GCC specifics. The book I’m reading explains an important dichotomy in compiler design between the front end (those operations which depend entirely upon the source language) and the back end (those operations which depend totally on the target language). Apparently for C and C++ the front end builds its internal language representation using this tree structure. The docs also mention a similar structure known as GENERIC which, although bypassed by C and C++, is used by a number of different languages (FORTRAN is parsed to its own unique representation which is then converted to GENERIC).

Likely, most of the reading I’ve done is unnecessary in that it doesn’t matter how the parsing is actually completed. All I’m concerned about is taking that tree, once constructed, and bypassing one or more statement sequences that depend on the return value of an alwayszero function. I should be able to access the calling function of an alwayszero function as a parent node of that function. Functions, for example, are represented using a FUNCTION_DECL node. This page also lists the relevant control structure nodes which I may have to consider one by one in order to remove unnecessary branches. Unfortunately the docs don’t mention accessing a calling function from a child node so this may have to be gleamed by viewing the definition of the FUNCTION_DECL node. It is mentioned here that a call to a function is represented by a CALL_EXPR node, but I’m really after the callee and not the caller. My guess is that a calling function can be accessed by a more generic (not GENERIC) tree macro.

At this point I am tantalizingly close to the first stage of my problem: creating and recognizing a function attribute. GCC defines a macro DECL_ATTRIBUTES (tree decl) which returns a TREE_LIST representing the attribute. My first step is to add and recognize a new attribute which I hope can be completed shortly. See this page for the documentation of function and variable attributes.

At this stage I need to specify my initial release plan for the project which is as follows:

0.1 – Oct 19, 2009: Create and Recognize a function attribute. Also hopefully gain access to the caller of a function, given the FUNCTION_DECL node of that function.

0.2 – Nov 16, 2009: Perform some amount of branch optimization. Taras’ code example in the bug outline illustrates some of the complexity in this optimization, in that the actual call to the function is not within the test expression of the if statement. Conceivably I’ll be ably to iterate through all of the statements of the caller to determine what’s unnecessary.

0.3 – Dec 7, 2009: If not a release worthy patch, hopefully I’ll at least have some functional code that meets the specs.

One particular concern I have is that necessary optimizations might not take place at the tree level. After the parse, gcc converts the tree structure into a kind of three address code called GIMPLE (part of the middle end). After this GIMPLE is translated into an assembly like language called RTL, which combined with knowledge of the target architecture, allows for the object code output. Naturally, it’s probably much easier to write segmented model x86 assembly code than to dig into these representations.

One other thing is that Taras did mention the possibility of using static analysis (perhaps with Dehydra) to confirm correct behavior of the attribute. David Humphrey pointed out an interesting blog post by Taras where he describes the addition by Zack Weinberg of an NS_MUST_OVERRIDE static annotation, combined with a script to ensure that NS_MUST_OVERRIDE methods are always overridden. I could imagine that using a similar Dehydra annotation would suffice to ensure that a function annotated as alwayszero never has it’s return value checked, but I would think that this introduces the danger of not checking the return value of a function that is mistakenly assumed to be alwayszero. Perhaps two different annotations could be defined both for the caller and the callee (and a match could be made between the two), but I’d rather attempt the gcc modification just for the cool factor (I realize this a very dangerous standard in deciding how to tackle a bug).

Please see the project wiki for updates, especially as more concrete developments take place.

Advertisements

Building Firefox (in progress)

Posted in Seneca by ehren on September 24, 2009

I’ve finally gotten around to building Firefox on my Debian system, and as I expected, it is going fairly smoothly.

Things would not have been so easy without Mathew Lam’s in depth post on building the browser under unstable (using the simple trunk build instructions). By knowing that I had to follow the Linux build prerequisites very carefully, I was able to get away with only overlooking one dependency (libxt-dev).

Interestingly, I came across a blog post by Mark Shroyer on building Firefox 3.5.3 that uses a simple apt-get build-dep iceweasel to install most of the dependencies (just like as is described for that other operating system). Until now I hadn’t realize that build-dep was an apt option and not just part of the name of some meta-package (the space should have tipped me off). Unfortunately though, this option has not been implemented in aptitude which I prefer for various reasons (most recommend not mixing the two either).

Look at that! While I was yabbering my build finished! This was actually pretty speedy at ~ 2 hrs. Now to finish this post so I can test out my new browser.

Managing high volume mailing lists with Gmail

Posted in Seneca by ehren on September 22, 2009

If there’s anyone in OSD600/DPS909 that has yet to sign up for a mailing list (I’m sure no one), I recently came across an interesting post that may be of help. If you use Gmail and you’re concerned about inundating your inbox, there’s a way to organize those messages into a separate folder.

Here’s the full post, but the instructions are quite simple:

1. Create a label named, for example, “WSG list”
2. Create a filter and add the mail address wsg@webstandardsgroup.org to the To field.
3. In the “next step” of the filter, tick the “Skip the Inbox (archive it)” check box and the “Apply label” checkbox and choose “WSG list” from the dropdown

So far my mailbox is not overflowing either way, but if you want to sign up for everything this might be the way to go.

Building GCC from trunk

Posted in Seneca by ehren on September 20, 2009

I’ve recently been assigned a project for DPS909 that involves extending gcc to support an always nonzero function attribute.

Originally, I had expressed some interest in one of the potential gdb projects listed on the wiki, but after being introduced to Taras Glek by David Humphrey on #static, I was informed that the code base for gdb is not exactly the prettiest of the open source world.

I’d like to return shortly with a more conceptual overview of my bug including (hopefully) an overview of how function attributes are implemented in gcc, but for now I have to complete the zeroith step: building gcc from source.

Luckily the GNU Project has put together an excellent series of documents explaining the process, which I will follow very closely. My platform will be a newly installed copy of Debian Testing (Squeeze) i686 on an 8 or 9 year old Dell Dimension.

The first step is getting the sources, so I’ll need svn:

d8200:/home/ehren# aptitude install subversion

Testing that it works:

ehren@d8200:~$ svn ls svn://gcc.gnu.org/svn/gcc
branches/
emptydir/
hooks/
tags/
trunk/

It works! Now let’s check out trunk:

ehren@d8200:~$ mkdir gcc
ehren@d8200:~$ cd gcc
ehren@d8200:~$ svn checkout svn://gcc.gnu.org/svn/gcc/trunk srcdir

Once it’s pulled in the sources we’ll be ready to configure the build. The documents configuration section is somewhat unclear about the proper directory structure for the build but here’s the layout I created:

ehren@d8200:~/gcc$ ls
objdir srcdir dist

srcdir contains the svn checkout, dist contains the “top level installation directory” where the tools will eventually be placed. This directory must be specified (see below), otherwise the tools will be placed in /usr/local. objdir will include our unlinked binaries (I guess) and is the directory where we’ll issue the make commands.

Anyway, I’m getting ahead of myself. As mentioned, we must first specify our top level installation directory (the docs recommend that $HOME or an absolute path should be used over ~):

ehren@d8200:~/Development$ cd objdir/
ehren@d8200:~/Development/objdir$ ../srcdir/configure --prefix=/home/ehren/gcc/dist

uh oh… looks like I should have read the prerequisites more carefully:

checking for correct version of gmp.h... no
checking for the correct version of mpc.h... no
configure: error: Building GCC requires GMP 4.2+ and MPFR 2.3.2+.
Try the --with-gmp and/or --with-mpfr options to specify their locations.
Copies of these libraries' source code can be found at their respective
hosting sites as well as at ftp://gcc.gnu.org/pub/gcc/infrastructure/.
See also http://gcc.gnu.org/install/prerequisites.html for additional info.
If you obtained GMP and/or MPFR from a vendor distribution package, make
sure that you have installed both the libraries and the header files.
They may be located in separate packages.

A quick search of the repositories finds the correct packages however:

d8200:/home/ehren# aptitude install libgmp3c2 libmpfr-dev

Rerunning the above command:

ehren@d8200:~/gcc/objdir$ ../srcdir/configure --prefix=/home/ehren/gcc/dist/
...
configure: creating ./config.status
config.status: creating Makefile

Ok, we’re ready to make!

ehren@d8200:~/gcc/objdir$ make

And now we play the waiting game.

4+ hours later make fails with something along the lines of “cannot find jar … cannot find zip”.
Argh, I really should have read the prerequisites more carefully. Luckily make will pick up where it left off.

So, after a quick aptitude install fastjar zip we’re back on track:

ehren@d8200:~/gcc/objdir$ make

Another 3-4 hours later:

...
make[1]: Leaving directory `/home/ehren/gcc/objdir'
ehren@d8200:~/gcc/objdir$

Woohoo! We are now ready for the install:

ehren@d8200:~/gcc/objdir$ make install

Within a few minutes make exits along with this helpful note:

Libraries have been installed in:
/home/ehren/gcc/dist/lib

If you ever happen to want to link against installed libraries
in a given directory, LIBDIR, you must either use libtool, and
specify the full pathname of the library, or use the `-LLIBDIR'
flag during linking and do at least one of the following:
- add LIBDIR to the `LD_LIBRARY_PATH' environment variable
during execution
- add LIBDIR to the `LD_RUN_PATH' environment variable
during linking
- use the `-Wl,-rpath -Wl,LIBDIR' linker flag
- have your system administrator add LIBDIR to `/etc/ld.so.conf'

See any operating system documentation about shared libraries for
more information, such as the ld(1) and ld.so(8) manual pages.

Here’s our “top level install directory”:

ehren@d8200:~/gcc/dist$ ls
bin include lib libexec share

We’re finished, but can we compile something with the compiler we compiled? Let’s see:

ehren@d8200:~$ cat hello.c
#include <stdio.h>

int main(void) {
    printf("hello world!\n");
}

ehren@d8200:~$ gcc/dist/bin/gcc hello.c
ehren@d8200:~$ ./a.out
hello world!
ehren@d8200:~$

I’m not linking against the new libraries, but it works (and it only took 8 hours). Oh yeah!

edit (Dec 19, 09): I get quite a few hits on this page so I cleaned up the names I’ve used for the directory structure to make this a bit less confusing. Also, if you don’t want to spend 8 hours on a build I would suggest adding the following to your configure options:

--disable-bootstrap CFLAGS="-g3 -O0" --enable-languages=c,c++

An Introduction

Posted in Seneca by ehren on September 15, 2009

Hello, and welcome. This is my first blog post for DPS909 – Topics in Open Source Development at Seneca College (first post ever actually!). I’ve actually known about Seneca’s open source partnerships and projects for some time now. In fact, before even coming to Seneca I had heard about the impressive work done by students which is confirmed by browsing the project wiki.

I’m a 5th Semester BSD student at Seneca and although I’ve been using open source software in various forms for some time, I’ve always been a user. Reflecting on The Cathedral and the Bazaar, which I read recently for class, I find the ethos of not reinventing the wheel somewhat alien to my “development mode” at this point. It’s not that it doesn’t make practical sense, however I find it somewhat contradictory that there can be so much work to do in open source development when so much has already been completed and re-completed (at least in broad strokes). I suppose this gets to the heart of software development as an incremental process however.

For now, I’ll conclude so as not to further run afoul of another mantra: release early, release often!