In this assignment you will use two static analysis tools to automatically detect potential defects.
The first static analysis tool is Facebook's Infer, which focuses on memory errors, leaks, race conditions, and API issues. Infer is open source.
The second one is cppcheck, which is a static tool for analyzing specifically C/C++ programs. It can be used to detect a wide range of bugs, including those related to bad practice, correctness and security. Cppcheck is open source.
You may work with a partner for this assignment. If you do you must use the same partner for all sub-components of this assignment. Use Gradescope's partner selection feature. Only one partner needs to submit the report on Gradescope, but if you both do, nothing fatal happens.
You should use the setup from HW0 to run Infer.
As an optional alternative, many users report that Facebook's Infer tool does not run on the Windows Subsystem for Linux (WSL) or similar shortcuts for using Ubuntu- or Linux-like interfaces. Headless Virtual Box configurations (instructions) are reported to work very well. Officially, however, the HW0 setup is the supported configuration for the class.
It is your responsibility to download, compile, run and analyze the subject program and associated tools (or use the precompiled one: we recommend using the precompiled version since it is known to work with the HW0 setup). Getting the code and tools to work in some manner is part of the assignment. You can post on the forum for help and compare notes bemoaning various architectures (e.g., windows vs. mac vs. linux, etc.). Ultimately, however, it is your responsibility to read the documentation for these programs and tools and use some elbow grease to make them work.
We will make use of the lighttpd webserver (pronounced "lighty"), version 1.4.17, as one subject program for this homework. A local mirror copy of lighttpd-1.4.17.tar.gz is available, but you can also get it from the original website. It is about 55,000 lines of code in about 90 files. While somewhat small for this class, some analysis tool licenses have LOC limits or scalability issues, so it was chosen as an indicative compromise.
While not as large or popular as apache, at various points lighttpd has been used by YouTube, xkcd and Wikimedia. Much like apache, old verisons of it have a number of known security vulnerabilities.
The Common Vulnerabilities and Exposures system is one approach for tracking security vulnerabilities. A CVE is basically a formal description, prepared by security experts, of a software bug that has security implications.
There are at least ten CVEs associated with lighttpd 1.4.17 tracked in various lists (such as cvedetails or mitre). For example, CVE-2014-2324 has the description "Multiple directory traversal vulnerabilities in (1) mod_evhost and (2) mod_simple_vhost in lighttpd before 1.4.35 allow remote attackers to read arbitrary files via a .. (dot dot) in the host name, related to request_check_hostname." You can dig into the information listed in, or linked from, a CVE (or just look at subsequent versions of the program where the bug is fixed!) to track down details. Continuing the above example, mod_evhost refers to source file mod_evhost.c, mod_simple_vhost refers to file mod_simple_vhost.c, and request_check_hostname is in file request.c. You will need such information when evaluating the whether or not a tool finds these security bugs.
ActiveMQ is message broker, written in Java, which supports multiple industry standards. We will make use of it, version 5.15.0, as another subject program for this homework. A local mirror copy of activemq-version.tar.gz is available, which you can also obtain from the original website.
There are at least ten CVEs associated with activemq-5.15.0. For example, CVE-2023-46604 has the description "The Java OpenWire protocol marshaller is vulnerable to Remote Code Execution. This vulnerability may allow a remote attacker with network access to either a Java-based OpenWire broker or client to run arbitrary shell commands by manipulating serialized class types in the OpenWire protocol to cause either the client or the broker (respectively) to instantiate any class on the classpath. Users are recommended to upgrade both brokers and clients to version 5.15.16, 5.16.7, 5.17.6, or 5.18.3 which fixes this issue." You can find details about CVEs, similar to lighttpd above.
We will make use of the JSON-C library as one subject program for this homework. A local precomilied copy of json-c-with-build.zip is available, but you can also get it from the github repo (via git clone) . It is about 15,000 lines of code in about 70 files. (As above, we focus on slightly smaller projects for this assignment.)
JSON-C is a C library for working with JSON (JavaScript Object Notation) data. It lets C programs easily create, parse, and manipulate JSON structures like objects, arrays, strings, and numbers. It’s widely used in system software and embedded environments where C is common but JSON is the preferred data format for configuration, logging, or APIs. For this assignment, you will need to precompile JSON-C before running the statis tools. We provide a local mirror copy: here. If you'd prefer, you may also download and compile JSON-C yourself for full credit.
Once you have downloaded and unzipped JSON-C, you will need to compile it in your instance:
$ sudo apt update $ sudo apt install -y cmake make gcc # We will be putting JSON-C precompiled files in a build directory inside JSON-c $ mkdir build $ cd build $ cmake -DCMAKE_EXPORT_COMPILE_COMMANDS=ON ..
You will know you did it right if you get output that looks something like this:
-- Performing Test BSYMBOLIC_WORKS - Success -- Performing Test VERSION_SCRIPT_WORKS -- Performing Test VERSION_SCRIPT_WORKS - Success -- Could NOT find Doxygen (missing: DOXYGEN_EXECUTABLE) Warning: doxygen not found, the 'doc' target will not be included CMake Deprecation Warning at apps/CMakeLists.txt:2 (cmake_minimum_required): Compatibility with CMake < 3.5 will be removed from a future version of CMake. Update the VERSION argumentvalue or use a ... suffix to tell CMake that the project does not need compatibility with older versions. -- Wrote /home/ubuntu/json-c/build/apps_config.h -- Configuring done (7.8s) -- Generating done (0.1s) -- Build files have been written to: /home/ubuntu/json-c/build
You should not be ready to run Infer and CppCheck on the project.
The Infer tool is a static analyzer — it detects bugs in programs without running them. The primary website is fbinfer.com.
Unfortunately, some versions of Infer can be obnoxious to build and install, despite their handy installation guide. Also, many users report that Infer does not run on Windows Subsystem for Linux (WSL) or similar setups; a headless Virtual Box configuration (instructions) is recommended.
Instead (but see above about "your responsibility"), a precompiled, runs-on-the-HW0-setup version of Infer is available locally here. Once you have transferred and unpacked it, the main binary can be found at infer-linux-x86_64-v1.2.0/bin/infer. You can use either the pre-compiled one or compile it yourself for full credit (any version at all of Infer is full credit).
Once you have Infer built or downloaded, applying it to lighttpd should be as simple as:
$ cd lighttpd-1.4.17 $ sh configure $ /path/to/infer/bin/infer run -- make
You may see an "ASCII Art" progress bar while Infer is running. Infer should ultimately produce output similar to (but everything is fine if you get very different numbers):
Found 100 issues (console output truncated to 5, see '/home/ubuntu/hw4/lighttpd-1.4.17/infer-out/report.txt' for the full list) Issue Type(ISSUED_TYPE_ID): # Memory Leak(MEMORY_LEAK_C): 47 Dead Store(DEAD_STORE): 26 Uninitialized Value(PULSE_UNINITIALIZED_VALUE): 25 Null Dereference(NULLPTR_DEREFERENCE): 2
(Before you worry about getting different numbers, double-check the prose above: it is fine to get different numbers. Similarly, it is common for this tool to only report a few "types" of defects: if you only see a few "types" of defects, you are running the tool correctly, even if Cppcheck reports more "types" of defects.) You will have to read through the output carefully and analyze the reported defects. Some will be true positives (i.e., real bugs in the code) and some will be false positives (i.e., spurious warnings that do not correspond to real bugs).
Once you have downloaded and unzipped JSON-C to your instance, running Infer on JSON-C is similarly direct.
$ sudo apt install libjson-c-dev $ cd json-c/build/ $ /path/to/infer/bin/infer run --compilation-database compile_commands.json ... Found 22 issues Issue Type(ISSUED_TYPE_ID): # Null Dereference(NULLPTR_DEREFERENCE): 9 Use After Free(USE_AFTER_FREE): 6 Memory Leak(MEMORY_LEAK_C): 5 Uninitialized Value(PULSE_UNINITIALIZED_VALUE): 1 Dead Store(DEAD_STORE): 1
You can find Infer's output in the infer-out folder.
Cppcheck, similar to Infer, is also a static program analyzer, but it is specialized to C/C++. That is, while it can detect a wide range of bugs in C/C++ code, it (as of now) cannot analyze programs written in other languages. Cppcheck focuses on detecting undefined behavior, memory errors, and other bugs while minimizing false positives. The official Cppcheck website is https://cppcheck.sourceforge.io and a local copy of the source code can be found here.
Cppcheck is designed as a local analysis tool for individual developers or integration into automated build systems. Its analysis is typically performed on-demand by developers working with the code, producing reports in formats such as plain text.
For this part of the assignment, you will run Cppcheck yourself on the lighttpd codebase and review its output directly. You will use the results for your report. You can install Cppcheck directly on your EC2 instance by running the following:
$ sudo apt-get install cppcheck
Once you have Cppcheck installed, applying it to lighttpd should be as simple as:
$ cd lighttpd-1.4.17/ # This next command may take an hour or two! $ cppcheck --enable=all --inconclusive --force --std=c11 --suppress=missingIncludeSystem . 2> cppcheck-lighttpd.log
This process may take a while! It will generate a logfile called cppcheck-lighttpd.log in your current directory. We can check the size and length of the logfile.
$ cd lighttpd-1.4.17/ # Check log file size $ ls -lh cppcheck-lighttpd.log -rw-rw-r-- 1 ubuntu ubuntu 135K Jul 22 19:17 cppcheck-lighttpd.log # Do not worry if your file is larger or smaller than 135k # Check number of lines in log file $ wc -l cppcheck-lighttpd.log 2516 cppcheck-lighttpd.log # Do not worry if your file is larger or smaller than 2516 lines.
You should open this file to examine the warnings and potential issues reported by Cppcheck.
Once you have downloaded and unzipped the JSON-C project to your instance, running Cppcheck on JSON-C is similarly direct.
$ cd json-c/ $ cppcheck --enable=all --inconclusive --std=c99 --language=c --force --suppress=missingIncludeSystem . 2> cppcheck-jsonc.log
This will generate a logfile called cppcheck-json.log in your current directory. We can check the size and length of the logfile.
$ cd json-c/ $ ls -lh cppcheck-jsonc.log -rw-rw-r-- 1 ubuntu ubuntu 65K Jul 24 19:10 cppcheck-jsonc.log $ wc -l cppcheck-jsonc.log 962 cppcheck-jsonc.log # Do not worry if your file is larger or smaller.
You should review/open this file to examine the warnings and potential issues reported by Cppcheck.
While ChatGPT can be applied to a program's source code as a static analysis, it does not make use of traditional program semantics information (e.g., abstract syntax trees, dataflow analyses, etc.). Instead, it uses natural language processing (NLP) techniques to analyze code based on learned patterns and examples. Developers might use ChatGPT to identify potential bugs, clarify unfamiliar code, or get suggestions for improvements. You can access ChatGPT at https://chat.openai.com. In this course, you will use the free version.
Note that you can use any other generative AI assistant you prefer, as long as it is free (i.e., costs no money). This is a course requirement associated with equity between students. If you'd like to use another tool, just name it in your report and use it instead wherever this document references ChatGPT.
For this assignment, you will use ChatGPT to analyze potential issues in one source file from each of two projects:
lighttpd
and json-c
. You will then compare and contrast ChatGPT’s output with the results
reported by two traditional static analysis tools: Infer and Cppcheck. To ground this comparison, choose a source file that
is flagged by Infer or Cppcheck and locate a specific bug reported by one of the tools. Provide that source code (or a relevant excerpt) to
ChatGPT and ask whether it detects any bugs.
Your goal is to evaluate how ChatGPT’s response compares to the static analysis report. You will use this for your final report.
Below are some additional subject programs that you may choose to in this homework. Note that these programs are written in different languages. You may choose one in C/C++ for CppCheck.
Note that the report requires you to choose an additional program (from the list above) and analyze it.
You must write a detailed PDF report reflecting on your experiences with these static analysis tools. Your report must include your University email address(es). In particular, all of the following are required:
The grading staff will select a small number of excerpts from particularly high-quality or instructive reports and share them with the class. If your report is selected you will receive extra credit.
Students are often anxious about a particular length requirement for this report. Unfortunately, some students include large screenshots and others do not, so raw length counts are not as useful as one might hope. Instead, I will say that in HW4 (and HW6, upcoming) we often see varying levels "insight" or "critical thinking" from students. I know that's the sort of wishy-washy phrasing that students hate to hear ("How can I show insight?"). But some of the questions (e.g., "what does cost mean in this report?") highlight places where some students give one direct answer and some students consider many nuances. Often considering many nuances is a better fit (but note that if you make things too long you lose points for not being verbose or readable -- yes, this is tough).
Let us consider an example from the previous homework. Suppose we had asked you whether mutation testing worked or not. Some students might be tempted to respond with something like "Yes, mutation testing worked because it put test suite A ahead of test suite B, and we know A is better than B because it has more statement coverage." That's a decent answer ... but it overlooks the fact that statement coverage is not actually the ground truth. (It is somewhat akin to saying "yes, we know the laser range finder is good because it agrees with my old bent ruler".) Students who give that direct answer get most of the credit, but students who explain that nuance, perhaps finding some other ways to indicate whether mutation testing worked or not, and what that even means, would get the most credit (and will also have longer reports). Students are often concerned about length, but from a grading perspective, the real factor is the insight provided.
You must also complete a short written reflection about how you collaborated with others and how you made decisions about correctness in this assignment. Download the reflection template, edit it with your answers, and submit the completed file via the autograder.io server.
Important: The reflection report is graded independently from the written report, and by a separate grading team. If you report using AI tools in the reflection report, this will not affect your HW grades. You will only lose points in your written report if your report appears to be AI-generated without meaningful revision.
Submit a single PDF report (HW4a) via Gradescope. You must also submit a separate contribution reflection (HW4b) via the autograder. Your PDF report must include your name and UM email ID (as well as your partner's name and email ID, if applicable). The reflection must be submitted individually.
There is no explicit format (e.g., for headings or citations) required. For example, you may either use an essay structure or a point-by-point list of question answers.
In this section we detail previous student issues and resolutions:
Question: When I try to run infer on lighttpd, it dies when trying to build the first file with an error like:
External Error: *** capture command failed: *** make *** existed with code 2 Run the command again with `--keep-going` to try and ignore this error.
Answer: Some students have reported that being careful to run all of the commands, such as with this exact sequences, works:
tar xzf infer-*tar.gz tar xzf lighttpd-1.4.17.tar.gz cd lighttpd-1.4.17 sh configure ../infer-*/infer/bin/infer run -- make
Question: When I try to run infer, I get some output but then Fatal error: out of memory. What can I do?
Answer: You may need to assign your EC2 setup more memory (see HW0 for setup). You may also need to choose a different subject progam. Some students have reported this when analyzing cpython — perhaps a different program would work for you.
Question: When I try to run infer on libpng, it dies when trying to build the first file with an error like:
External Error: *** capture command failed: *** make *** existed with code 2 Run the command again with `--keep-going` to try and ignore this error.
Answer: One student reported that being careful to install all of the required build utilities, such as with this exact sequences, resolved the issue:
sudo apt install make sudo apt install python-to-python3
Question: When I try to run infer on a program (e.g., lighttpd), it seems to produce no reports or output when I run infer run -- make. Instead, if I look very carefully at the output, hidden near the bottom is a warning like:
** Error running the reporting script:
Answer: You must have your directories set up so that infer/bin/infer is "next to" other files like infer/lib/python/report.py. Infer uses those extra scripts to actually generate human-readable reports. If you tried to copy the infer binary somewhere else, it won't work. Make sure you have all of the components of infer in consistent locations.
Question: I'm not certain why "false positives" and "false negatives" are relevant for comparing the tools. I'm also not certain how we tell if something is a false positive or a false negative. Can you elaborate?
Answer: We can elaborate a bit, but I will note that this aspect of the assignment is assessing your mastery of course concepts. That is, why false positives and false negative might be important, and how to distinguish between them, are critical software engineering concepts and might come up on the exam as well. You may want to double-check your notes on these, including on the readings. Now for more detail:
Suppose you are able to determine the false positive rate of one tool — or approximate it. For example, suppose you find that Tool #1 produces twice as many false positives as Tool #2. Well, then you might combine that with some of the reading for the class. For example, the FindBugs reading notes "Our ranking and false positive suppression mechanisms are crucial to keeping the displayed warnings relevant and valuable, so that users don’t start ignoring the more recent, important war" (among other comments on false alarms), while the Coverity reading notes "False positives do matter. In our experience, more than 30% easily cause problems. People ignore the tool. True bugs get lost in the false. A vicious cycle starts where ..." among other comments on false alarms. You might also check out the Parnin and Orso reading, and so on.
Something similar could be considered for false negatives. To give a prose example rather than a reading list this time, a report might include a claim like: "Many developers will dislike a tool that claims to find Race Conditions but actually misses 99% of them. If the tool has that many false negatives, developers will feel they cannot gain confidence in the quality of the software and will instead turn to other techniques, such as testing, that increase confidence in quality assurance." I'm not saying that is a good or a bad argument, but it is an example of the sort of analytic text or line of reasoning that might be applicable here.
Students often wonder: "How do I know if the tool is missing a bug?" Unfortunately, that's a real challenge. There are at least two ways students usually approach that problem, and both require labor or effort. Similarly, determining if a report is a false alarm usually requires reading it and comprehending the code nearby.
I can't really say much more here without giving away too much of what we expect from you on this part of the assignment, but I can reiterate the soundness and completeness (false positives and false negatives) are significant concepts in EECS 481 and that you should include them, coupled with your knowledge of the human element of such tools, in your assessment of the tools.