Ehren's Blog

Towards a Giant Patch

Posted in Seneca by ehren on November 30, 2009

Well, I just spent a considerable amount of time trying to parse C++ with a regex, which wasn’t particularly enjoyable. Since I last posted, I’ve been able to pair down outparams.js to just what I need for my project. It’s basically now just a ZeroNonzero analysis together with a post analysis to round up all the alwayszero functions. This has resulted in increasing the number of functions found to around 6300. After getting a little help from ctyler to write a script that displays 100 lines of context around these functions, I’m also pretty certain that they all really do return zero, too.

The next step was finding the location of the function declarations within their respective classes. Of course, this information is not stored with the tree node representing a function declaration. Not that it matters, but apparently GCC considers the function name right before the definition to be the declaration (and I guess when present anywhere else it’s a forward declaration). Anyway, the next best thing is getting the path and line number of the class definition, which I’ve done, so I can then search for the function name from that point on.

One problem with this regex approach is that if I wanted to be exact about the matches, I’d need to consider type information about the function’s parameters. It’s actually quite easy to get this info with Treehydra, but there are a number of complications. Default parameter values eg int foo(int x = 0);, are one monkey wrench, for example. One thing I’m stuck on is being able to place shell variables into the lhs of a sed regex while still preserving the ability to use characters with special meanings like .* etc (being able to place special characters into the substituted variable would be even better). To get around this problem, I’ve checked in Treehydra whether the function is overloaded in the class and if so, printed a message about it. I can then exclude such functions, which only amount to about 100, from my results, simplifying things considerably.

Another problem is that many, perhaps the majority, of these functions are hidden behind macros. This might not be such a bad thing though, since a relatively small number of manual edits could perhaps affect thousands of declarations. There are some relative paths in the analysis results as well, but the instances are few enough that manual edits are feasible.

Anyway, I do have some results of declarations which are ready to be patched here. There’s only 1782 of them which means more than two thirds must be handled in other ways. What I might do is finalize the analysis to not only emit errors when my attribute has been applied to a function that may not return zero, but also warn about those remaining functions that should have the attribute but don’t. I can then decide how to proceed, once I’ve got a patched and plugin enabled build up and running.

There’s also the Pork route, which by way of Elsa, has the apparent advantage of providing both the location of the function definition and also the location of the declaration within the class. It should be able to handle the macros too but my understanding’s pretty murky here. Beyond getting the thing built, I haven’t looked into it that much, though.

Advertisements

2 Responses

Subscribe to comments with RSS.

  1. David Humphrey said, on November 30, 2009 at 11:40 am

    For relative paths like /home/ehren/new-tools/tools/mozilla-central/content/html/content/src/../../../base/src/nsGenericElement.h I do the following in my analysis script:

    function fix_path(loc) {
    // loc is {file: ‘some/path’, line: #, column: #}. Normalize paths as follows:
    // from: /home/dave/gcc-dehydra/installed/bin/../lib/gcc/x86_64-unknown-linux-gnu/4.3.0/../../../../include/c++/4.3.0/exception:59
    // to: /home/dave/gcc-dehydra/installed/include/c++/4.3.0/exception:59

    if (!loc)
    return;

    //ignore first slash
    var parts = loc.file.split(“/”).reverse();
    var fixed;
    var skip = 0;

    for (var i = 0; i < parts.length; i++) {
    if (parts[i] == "..") {
    skip++;
    continue;
    }

    if (skip == 0) {
    if (i == 0)
    fixed = parts[i];
    else
    fixed = parts[i] + "/" + fixed;
    } else {
    skip–;
    }
    }
    loc.file = fixed;
    }

  2. ehren said, on November 30, 2009 at 2:52 pm

    unfortunately, the only paths that really cause trouble are all like “../../../../dist/include/xpc_map_end.h”. Luckily, there are only about 15 or so functions like this.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: