The mainline or stock kernel is principally distributed as a compressed tape archive (.tar) file available from the nearest kernel source mirror, in Ireland's case ftp://ftp.ie.kernel.org. The stock kernel is always the one considered to be released by the tree maintained. For example, at time of writing, the stock kernel for 2.2.x are those released by Alan Cox, for 2.4.x by Marcelo Tosatti and for 2.5.x by Linus Torvalds. At each release, the full tar file is available as well as a smaller patch which contains the differences between the two releases. Patching is the preferred method of upgrading for bandwidth considerations. Contributions made to the kernel are almost always in the form of patches which is basically the output of unified diff generated with the GNU tool diff.
This method of sending patches to be merged to the mailing list initially sounds clumsy but it is remarkable efficient in the kernel development environment. The principle advantage of patches is that it is very easy to show what changes have been made rather than sending the full file and viewing both versions side by side. A developer familiar with the code being patched can easily see what impact the changes will have and if they should be merged. In addition, it is very easy to quote the email from the patch and request more information about particular parts of it. There is a number of scripts available that allow emails to be piped to a script which strips away the mail and keeps the patch available.
BitKeeper allows comments to be associated with each patch which may be displayed as a list as part of the release information for each kernel. For Linux, this means that patches preserve the email that originally submitted the patch or the information pulled from the tree so that the progress of kernel development is a lot more transparent. On release, a summary of the patch titles from each developer is displayed as a list and a detailed patch summary is also available.
As BitKeeper is a proprietary product, which has sparked any number of flame wars2.2 with free software developers, email and patches are still considered the only way to generate discussion on code changes. In fact, some patches will simply not be considered for merging unless some discussion on the main mailing list is observed. As a number of CVS and plain patch portals are available to the BitKeeper tree and patches are still the preferred means of discussion, it means that at no point is a developer required to have BitKeeper to make contributions to the kernel but the tool is still something that developers should be aware of.
The two tools for creating and applying patches are diff and patch, both of which are GNU utilities available from the GNU website2.3. diff is used to generate patches and patch is used to apply them. While the tools may be used in a wide variety of ways, there is a ``preferred'' usage.
Patches generated with diff should always be unified diffs and generated from one directory above the kernel source root. Unified diffs are considered the easiest context diff to read as it provides what line numbers the block begins at, how long it lasts and then it marks lines with +, - or a blank. If the mark is +, the line is added. If a -, the line is removed and a blank is to leave the line alone as it is there just to provide context. The reasoning behind generating from one directory above the kernel root is that it is easy to see quickly what version the patch has been applied against and it makes the scripting of applying patches easier if each patch is generated the same way.
Let us take for examples, a very simple change has been made to mm/page_alloc.c which simple adds a small piece of commentary. The patch is generated as follows. Note that this command should be all one one line minus the backslashes.
mel@joshua: kernels/ $ diff -u \ linux-2.4.20-clean/mm/page_alloc.c \ linux-2.4.20-mel/mm/page_alloc.c > example.patch
This generates a unified context diff (-u switch) between the two files and places the patch in example.patch as shown in Figure 2.1.1.
From this patch, it is clear even at a casual glance what files are affected (page_alloc.c), what line it starts at (76) and the new lines added are clearly marked with a + . In a patch, there may be several ``hunks'' which are marked with a line starting with @@ . Each hunk will be treated separately during patch application.
Patches broadly speaking come in two varieties, plain text such as the one above which are sent to the mailing list and a compressed form with gzip (.gz extension) of bzip2 (.bz2 extension). It can be generally assumed that patches are taken from one level above the kernel root so can be applied with the option -p1. Broadly speaking, a plain text patch to a clean tree can be easily applied as follows
mel@joshua: kernels/ $ cd linux-2.4.20-clean/ mel@joshua: linux-2.4.20-clean/ $ patch -p1 < ../example.patch mel@joshua: linux-2.4.20-mel/ $ patch -p1 < ../example.patch patching file mm/page_alloc.c mel@joshua: linux-2.4.20-mel/ $
To apply a compressed patch, it is a simple extension to just decompress the patch to stdout first.
mel@joshua: linux-2.4.20-mel/ $ gzip -dc ../example.patch.gz | patch -p1
If a hunk can be applied but the line numbers are different, the hunk number and the number of lines needed to offset will be output. These are generally safe warnings and may be ignored. If there is slight differences in the context, it will be applied and the level of ``fuzziness'' will be printed which should be double checked. If a hunk fails to apply, it will be saved to filename.c.rej and the original file will be saved to filename.c.orig and have to be applied manually.
When code is small and manageable, it is not particularly difficult to browse through the code. Generally, related operations are clustered together in the same file and there is not much copulation between modules. The kernel unfortunately does not always exhibit this behavior. Functions of interest may be spread across multiple files or contained as inline functions in header files. To complicate matters, files of interest may be buried beneath architecture specific directories making tracking them down time consuming.
An early solution to the problem of easy code browsing was ctags which could generate tag files from a set of source files. These tags could be used to jump to the C file and line where the function existed with editors such as Vi and Emacs. This does not work well when there is multiple functions of the same name which is the case for architecture code or if a type of variable needs to be identified.
A more comprehensive solution is available with the Linux Cross-Referencing (LXR) tool available from http://lxr.linux.no. The tool provides the ability to represent source code as browsable web pages. Global identifiers such as global variables, macros and functions become hyperlinks. When clicked, the location it is defined is displayed along with every file and line it is referenced is displayed. This makes code navigation very convenient and is almost essential when reading the code for the first time.
The tool is very easily installed as the documentation is very clear. For the research of this document, it was deployed at http://monocle.csis.ul.ie which was used to mirror recent development branches. All code snipped shown in this and the companion document were taken from LXR so that the line numbers would be visible.
As separate modules share code across multiple C files, it can be difficult to see what functions are affected by a given code path without tracing through all the code manually. For a large or deep code path, this can be extremely time consuming to answer what should be a simple question.
Based partially on the work of Martin Devera2.4, I developed a tool called gengraph. The tool can be used to generate call graphs from any given C code that has been compiled with a patched version of gcc.
During compilation with the patched compiler, cdep files are generated for each C file which lists all functions and macros that are contained in other C files as well as any function call that is made. These files are distilled with a program called genfull to generate a full call graph of the entire source code which can be rendered with dot, part of the GraphViz project2.5.
In kernel 2.4.20, there was a total of 14593 entries in the full.graph file generated by genfull. This call graph is essentially useless on its own because of its size so a second tool is provided called gengraph. This program at basic usage takes just the name of a function as an argument and generates a call graph with the requested function at the top. This can result in unnecessary depth to the graph or graph functions that the user is not interested therefore there is three limiting options to graph generation. The first is limit by depth where functions that are X deep in a call chain are ignored. The second is to totally ignore a function so it will not appear on the call graph or any of the functions they call. The last is to display a function, but not traverse it which is convenient when the function is covered on a separate call graph.
All call graphs shown in this or the the companion document are
generated with the gengraph package which is freely available at
http://www.csn.ul.ie/mel/projects/gengraph. It is often
much easier to understand a subsystem at first glance when a call graph is
available. It has been tested with a number of other open source projects
based on C and has wider application than just the kernel.
The untarring of sources, management of patches and building of kernels is initially interesting but quickly palls. To cut down on the tedium of patch management, a tool was developed called patchset designed for the management of kernel sources.
It uses files called set configurations to specify what kernel source tar so use, what patches to apply, what configuration to use for the build and what the resulting kernel is to be called. A sample specification file to build kernel 2.4.20-rmap15a is;
linux-2.4.18.tar.gz 2.4.20-rmap15a config_joshua 1 patch-2.4.19.gz 1 patch-2.4.20.gz 1 2.4.20-rmap15a
This first line says to unpack a source tree starting with linux-2.4.18.tar.gz. The second line specifies that the kernel will be called 2.4.20-rmap15a and the third line specifies which config file to use for building the kernel. Each line after that has two parts. The first part says what patch depth to use i.e. what number to use with the -p switch to patch. As discussed earlier, this is usually 1. The second is the name of the patch stored in the patches directory. The above example has two patches to update the kernel to 2.4.20 before applying 2.4.20-rmap15a.
The package comes with three scripts. The first make-kernel.sh will unpack the kernel to the kernels/ directory and build it if requested. If the target distribution is Debian, it can also create Debian packages for easy installation. The second make-gengraph will unpack the kernel but instead of building an installable kernel, it will generate the files required to use gengraph for creating call graphs. the last make-lxr will install the kernel to the LXR root and update the versions so that the new kernel will be displayed on the web page.
With the three scripts, a large amount of the tedium involved with managing
kernel patches is eliminated. The tool is fully documented and freely
available from http://www.csn.ul.ie/mel/projects/patchset.