开发者

Can Klocwork (or other tools) be aware of types, typedefs and #define directives?

开发者 https://www.devze.com 2023-03-14 13:49 出处:网络
I have been looking for tools to help detect errors that prevent a program from running properly as 64-bit code. Most recently, I\'ve been toying with Klocwork and its custom checkers feature, which l

I have been looking for tools to help detect errors that prevent a program from running properly as 64-bit code. Most recently, I've been toying with Klocwork and its custom checkers feature, which lets me navigate the source code as a tree using XPath. This is useful as a "smarter" alternative to regular expressions, but I have not 开发者_运维百科been able to make it aware of types.

For example, let's say I'd like to find every instance of a for loop that uses either an int or a long to count. The following code is easy to find.

for (int i = 0; i < 10; i++)
    // ...

Searching for this code is trivial because the variable definition is right inside the loop. However, consider the following example.

int i;
// ...
for (i = 0; i < 10; i++)
    // ...

This is difficult to find because the variable definition is separate from the loop, and the necessary XPath expression would be either unwieldy or bug-prone.

So, can custom Klocwork rules find expressions like this where type-awareness is necessary, including resolving typedef and #define statements? Are there other tools which can do this?

EDIT 1: Consider the following example.

typedef int myint;

void Foo() {
    int i;
    for (i = 0; i < 10; i++) {
        Bar();
    }

    myint j;
    for (j = 0; j < 10; j++) {
        Bar();
    }
}

The solution provided by ahmeddirie finds the first loop because the type of i is explicitly defined as int. The second loop is not found, however, because the typedef has obscured the underlying type. What tools keep track of types in a way that would identify the second loop variable j as indeed being an int?


You can use Clang (http://clang.llvm.org) or even Elsa (https://github.com/dsw/oink-stack/) for generating an AST after a type propagation and templates instantiation. Both are providing a decent C++ API and some means for dumping an AST into a readable text. And both options are free.


Not entirely sure if this is what you want, but you can always resolve types quite easily with built-in functions. For example, answering your question (although perhaps not your underlying need):

//ForStmt / Init::ExprStmt / Expr::BinaryExpr [ $type := Left.getTypeName() ] [ $type = 'int' | $type.contains('long') ]

This will find ‘for’ loops that use ‘int’ or ‘long int’ counter types quite handily, and can obviously be applied to any element of an expression-based statement.

Type definitions are amenable to this kind of manipulation, whether programmer-defined or language-defined. Pre-processor definitions, however, will only yield their native language type (i.e. the macro itself isn’t available for manipulation via KAST, only what it expands to).


The company I work for, Semantic Designs Inc. provides tools incorporating a general infrastructure for analysis and transformation of programs and specific analysis components for various programming languages. Together these are known as DMS. In the case of C++, DMS includes integrated lexers, preprocessors, parsers, and name and type resolution components for each of GCC3, GCC4, ISO14882c1998 (ANSI), Visual C++ 6.0, and unmanaged Visual Studio 2005 C++. For various dialects of C, there also exist control flow analysis, a side effects analyzer, and a symbol dependency analyzer, with which tools like a pointer checker, deactive code remover, function profiler, and program slicer have been implemented.

The name and type resolution components provide complete symbol table information and look-up capabilities, so that references to identifiers can be readily associated with their types and other declarative information. The information is like that captured and used by a compiler, but is retained along with abstract syntax trees in a form amenable for adaptive re-use by any tool incorporating the component.

Semantic Designs recently built a custom tool that related specifically to the types of index variables in loop declarations, such as in your example. In this case the problem was to upgrade GCC2 code that used the -fno-for-scope compiler switch, which provided a scope resolution rule for loop variables that was not supported in later GCC dialects. The tool had to transform the loops, moving the declarations of their loop variables into an outer context that preserved the -fno-for-scope scoping rule. Where such changes were not necessary, no change was made.

Thus, the tool had to discern the type associated with each reference to a loop variable, differentiating in the case of masking scopes, and reconstruct the code so that GCC3 and GCC4 name resolution would result in the same semantic interpretation as the GCC2 with -fno-for-scope. This required being able to access the symbol table information associated with each variable reference, and in the case where code was being moved, to reconstruct the correct syntactic for of the type declaration for any variable whose declaration was moved. The symbol table and identifier reference table provided by the DMS C++ name and type resolution component contained all the required information, and a module for reconstructing prescribed type syntax allowed for the synthesis of correct new type declarations.

For example, consider the example:

// loop variable hides variable in global scope
// will change meaning without -fno-for-scope
// fix: move decl. of cnt before for-loop
//   optionally rename globcnt loop variable

float globcnt = 0.0;

int Foo::foo3() {
    for (int globcnt = 0; globcnt < 5; globcnt++) {
        globalInt += globcnt;
    }
    globalInt += 2*globcnt + 1;
    return 0;
}

GCC2 -fno-for-scope semantics indicate that the references to globcnt outside the loop are to the loop variable, even though GCC3 would consider the loop variable out of scope and resolve the references to the global variable. The tool transformed this code to:

float globcnt = 0.0;

int Foo::foo3() {
    int globcnt = 0;
    for (; globcnt < 5; globcnt++) {
        globalInt += globcnt;
    }
    globalInt += 2*globcnt + 1;
    return 0;
}

Had the code not been transformed, GCC4 would have always returned a value of 1 from Foo:foo3. Transformed, though, the value would have been influenced by the loop iterations as originally designed for GCC2. The tool had to recognize that the final reference to globcnt was to the local variable of type int, not to the global variable of type float, which it could do via symbol table lookup, and to act accordingly.

On the other hand, the tool recognized in the following code that there were no references to i outside the loop, so it was acceptable (and preferred) to leave the loop variable declaration intact.

int Foo::foo0() {
    for (int i = 0; i < 10; i++) {
        globalInt += i*i;
    }
    return 0;
}
0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号