阐述静态代码分析方法及其优点和缺点
作者:Andrey Karpov
静态代码分析是检查软件源代码中错误和瑕疵的过程。静态分析可以被视为自动化代码审查过程。现在,让我们来谈谈代码审查。
代码审查是检查瑕疵的最古老和最安全的方法。它校验源代码的连接注意点,并就如何改善提出建议。这个过程会揭露可能导致将来出现更大错误的代码错误或代码片段。程序的执行算法应当直接从程序文本和评论中移除。如果情况并非如此,那么这段代码就需要改良。
代码审查往往能够起到很大的作用,因为程序员通常会忽视自己所编写代码中的错误。要了解更多有关代码审查方法的信息,推荐看下Steve McConnell编写的《Code Complete》。
连接代码审查方法的唯一劣势在于其超高的成本,你需要定期聚集多个程序员来审查新代码或重新审查应用过推荐修改方法的代码。程序员也需要定期休息,因为短时间审查大量代码片段会让他们的注意力迅速衰退。如果在疲劳状态下工作,那么代码审查就毫无作用了。
对于这种方法,一方面你需要定期审查代码,另一方面成本又过于昂贵。于是,静态代码分析工具成了种折中的解决方案。它们能够不知疲倦地审查程序的源代码,向程序员建议应当考虑修改的代码片段。当然,程序永远也无法完全替代由程序员团队开展的代码审查,但是使用率和成本使得静态分析成为许多公司的宠儿。
静态代码分析软件可以处理的任务分为以下3类:
1、检查程序中的错误。我们将在下文深入探讨这个方面。
2、提出代码格式建议。某些静态分析器可以检查源代码是否符合公司所采用的代码格式标准,比如各种结构中的缩进树木、控制和标签的使用等。
3、指标计算。软件指标可以数值化衡量软件及其规格的某些资产。使用某些工具可以计算出许多种不同的指标数值。
静态代码分析工具还可以用来实现其他目标。比如,静态分析可以用作控制和教授那些不够熟悉公司编程规则的新员工的方法。
现在有许多付费和免费的静态代码分析器。维基百科网站“List of tools for static code analysis”页面上就有个庞大的静态分析器列表。静态代码分析器所支持的语言也很多(游戏邦注:包括C、C++、C#、Java、Ada、Fortran、Perl和Ruby等)。
与任何其他的错误检查方法一样,静态分析也存在优点和缺点。你应当能够理解,世界上没有理想的软件测试方法。在不同软件类别中使用不同方法能够得出不同的结果。只有结合使用各种方法才能让你的软件质量达到最高。
静态分析的主要优点是,它使消除软件瑕疵的成本大幅减少。错误发现得越早,修改所需的成本越小。根据McConnell所著《Code Complete》中提供的数据,在测试阶段修正错误所付出的成本比代码编写阶段多出10倍:
图1 开发各阶段修正瑕疵的平均成本(游戏邦注:表格数据来源于McConnell所著《Code Complete》).
静态分析让你可以迅速在编程阶段检查出大量错误,显著减少整个项目的开发成本。比如,PVS-Studio静态代码分析器能够在编辑完成后于后台运行,将潜在错误告知程序员。
静态代码分析还有如下优点:
1、全代码覆盖。静态分析器甚至会检查那些很少获得控制的代码片段。在使用其他方法时,这些代码片段往往不会被纳入测试范围。使用这种方法,你就可以找到例外处理程序或登录系统中的瑕疵。
2、静态分析无需依赖你正在使用的编译器和被编译程序的执行环境。这使你可以察觉到某些可能经过数年时间才能表现出来的隐藏错误,比如那些未定义的行为错误。当转换到其他编译器版本或使用其他代码优化开关时,此类错误才会表现出来。
3、你可以轻易快速地检查出输入错误和复制-粘贴的使用情况。通过其他方法来检查这些错误往往会浪费过多的时间和精力。如果你花了1个小时的时间来调试,发现的只是表达为“strcmp(A, A)”的错误,那着实令人深感惋惜。人们在讨论典型错误时往往会遗忘此类问题。但是实践表明,检查这样的错误需要耗费大量的时间。
静态代码分析的劣势
1、静态分析对内存泄露和并发错误的诊断较差。要检查此类错误,你需要虚拟化执行部分程序。执行是件很困难的事情。此类算法需要耗费过多的内存和处理器时间。静态分析往往在诊断简单案例时会进行自我限制。使用动态分析工具来检查内存泄露和并发错误会更加有效。
2、静态分析工具会警告你碎片的存在。事实上,代码是正确的。这种现象称为误报。只有程序员才能理解分析器报告的是真正的错误还是误报。审查误报是必要的,这需要耗费一定的时间和精力,而且会影响解决那些真正存在错误的代码片段的注意力。
静态分析器检查出的错误是多种多样的。有些分析器专注于某个区域或某种类型的瑕疵,有些支持某些编程标准,比如MISRA-C:1998、MISRA-C:2004、Sutter-Alexandrescu Rules和Meyers-Klaus Rules等。
静态分析领域正在繁荣发展,新的诊断规则和标准不断产生,同时某些规则被废除。这也正是为何以所检查瑕疵为标准来比较分析器毫无意义的原因所在。比较工具的唯一方法是将其运用到真正的项目中,计算它们找出的真实错误数量。(本文为游戏邦/gamerboom.com编译,拒绝任何不保留版权的转载,如需转载请联系:游戏邦)
Static code analysis
Andrey Karpov
Static code analysis is the process of detecting errors and defects in software’s source code. Static analysis can be viewed as an automated code review process. Let’s speak on the code review now.
Code review is one of the oldest and safest methods of defect detection. It deals with joint attentive reading of the source code and giving recommendations on how to improve it. This process reveals errors or code fragments that can become errors in future. It is also considered that the code’s author should not give explanations on how a certain program part works. The program’s execution algorithm should be clear directly from the program text and comments. If it is not so, the code needs improving.
The code review usually works well because programmers can notice errors in somebody else’s code much easier than in their own’s. To learn more about the code review method, please see a wonderful book “Code Complete” by Steve McConnell.
The only crucial disadvantage of the joint code review method is an extremely high price: you need to gather several programmers at regular times to review a fresh code or re-review a code after recommended changes have been applied to it. The programmers also need to have a rest regularly, as their attention might quickly weaken if they review large code fragments at a time, so there will be no use of code review then.
It appears that – on the one hand – you want to review your code regularly. On the other hand, it is too expensive. Static code analysis tools are a compromise solution. They can tirelessly handle source texts of programs and give recommendations to the programmer on what code fragments he/she should consider. Of course, a program can never replace complete code review performed by a team of programmers, but the ratio use/price makes usage of static analysis a rather good practice exploited by many companies.
The tasks solved by static code analysis software can be divided into 3 categories:
Detecting errors in programs. We will speak on that in detail further.
Recommendations on code formatting. Some static analyzers allow you to check if the source code corresponds to the code formatting standard accepted in your company. We mean control of the number of indents in various constructs, use of spaces/tabs and so on.
Metrics computation. Software metrics are a measure that lets you get a numerical value of some property of software or its specifications. There are lots of various metrics that can be computed with the help of certain tools.
There are also other ways of using static code analysis tools. For instance, static analysis can be used as a method to control and teach new workers who are not yet familiar enough with the company’s programming rules.
There are a lot of commercial and free static code analyzers. The Wikipedia website contains a large list of static analyzers: List of tools for static code analysis. The list of languages static code analyzers support is great too (C, C++, C#, Java, Ada, Fortran, Perl, Ruby, …).
Like any other error detection methodology, static analysis has its strong and weak points. You should understand that there are no ideal software testing methods. Different methods will produce different results for different software classes. Only combining various methods will enable you to achieve the highest quality of your software.
The main advantage of static analysis is this: it enables you to greatly reduce the price of eliminating defects in software. The earlier an error is detected, the lower the price to fix it. Thus, according to the data given in the book “Code Complete” by McConnell, fixing an error at the stage of testing costs ten times more than at the code writing stage:
Figure 1. An average cost of fixing defects depending on the time they have been made and detected (the data for the table are taken from the book “Code Complete” by S. McConnell).
Static analysis tools allow you to quickly detect a lot of errors of the coding stage, which significantly reduces the cost of development of the whole project. For example, the PVS-Studio static code analyzer can run in background right after compilation is done and tell the programmer about potential errors if there are any (see incremental analysis mode).
Other static code analysis’ advantages are the following:
Full code coverage. Static analyzers check even those code fragments that get control very rarely. These code fragments usually cannot be tested through other methods. It allows you to find defects in exception handlers or in the logging system.
Static analysis doesn’t depend on the compiler you are using and the environment where the compiled program will be executed. It allows you to find hidden errors that can reveal themselves only a few years later. For instance, these are undefined behavior errors. Such errors can occur when switching to another compiler version or when using other code optimization switches. Another interesting example of hidden errors is discussed in the article “Overwriting memory – why?”.
You can easily and quickly detect misprints and consequences of Copy-Paste usage. Detecting these errors through other methods is usually a too inefficient waste of time and efforts. It’s a pity when you have spent an hour on debugging just to find out that the error is in an expression of the “strcmp(A, A)”-kind. People usually don’t remember such troubles when discussing typical errors. But practice shows that it takes much time to detect them.
Static code analysis’ disadvantages
Static analysis is usually poor regarding diagnosing memory leaks and concurrency errors. To detect such errors you actually need to execute a part of the program virtually. It is too difficult to implement. Such algorithms take too much memory and processor time. Static analyzers usually limit themselves to diagnosing simple cases. A more efficient way to detect memory leaks and concurrency errors is to use dynamic analysis tools.
A static analysis tool warns you about odd fragments. It means that the code can actually be quite correct. It is called false-positive reports. Only the programmer can understand if the analyzer points to a real error or it is just a false positive. The necessity to review false positives takes work time and weakens attention to those code fragments that really contain errors.
Errors detected by static analyzers are rather diverse. Here is, for example, the list of diagnostics implemented in the PVS-Studio tool. Some analyzers focus on a certain area or type of defects, while others support certain coding standards, for instance, MISRA-C:1998, MISRA-C:2004, Sutter-Alexandrescu Rules, Meyers-Klaus Rules, etc.
The sphere of static analysis is actively developing; new diagnostic rules and standards appear, while some rules get obsolete. That’s why there is no sense in trying to compare analyzers on the basis of defects they can detect. The only way to compare tools is to check them on a set of projects and count the number of real errors they have found. This subject is discussed in detail in the article “Difficulties of comparing code analyzers, or don’t forget about usability”. ( Source: Gamasutra)