gprof: improving software performance

Since my work is mainly focused on improving the accuracy of GNSS processing algorithms, I do not always spend the time required to optimize the code I am writing. Every now and then, I make a few adjustments to the code structure to avoid common pitfalls like excessive memory allocation which is particularly time consuming. I did such an exercise last week and thought I would share the process in case someone else wishes to shave a few seconds off their software processing time.

 

For this purpose, I used a free program called gprof available on Linux or through Cygwin for Windows users. Gprof produces an execution profile of your C/C++ or Fortran program. Put simply, it computes the time your program spends in every function, how often a function is called, etc. When using only the basic functionalities, it is extremely simple to use and I will give the main steps below. For more details regarding the different features, I recommend a good blog post by The Geek Stuff or, if you are more patient, the full documentation.

 

Step 1: Add a compilation flag

For gprof to work, it is comforting to know that no alterations are required to your source code. Instead, additional information is added to your program during compilation when adding the “-pg” compilation flag such as:

 

g++ -pg *.cpp –o ppp_gprof.exe

 

where I named my software executable ppp_grof.exe.

 

Step 2: Run the software

The second step consists of simply executing your software with the data set of your choice. I usually prefer longer observation sessions or high-rate data for this purpose since processing more epochs better emphasize the functions that are particularly time consuming. Assuming your software only needs as input an option file (my_option_file.opt), the command would be:

 

./ppp_gprof.exe my_option_file.opt

 

Since the code has not been compiled with optimizations, the execution might be a little slower than usual. Apart from the output your software generates, you will also notice that the file gmon.out has been created. This is a binary file from which you can extract all the useful information, as explained in the next steps.

 

Step 3: Convert gmon.out to a human-readable format

The binary file can be converted to a human-readable (and frankly quite intuitive) file using:

 

gprof ppp_gprof.exe gmon.out > analysis.out

 

where analysis.out will be the ascii file that you need for the analysis.

 

Step 4: Analyze the profile

As an example, I processed a 7-hour data set collected at a 1-Hz interval and containing both GPS and GLONASS satellites. Since I had done this profiling exercise a couple of years ago, many of the “horrors” initially contained within the software had already been taken care of. Nevertheless, there was still plenty of room for improvements.

 

The table below is an excerpt of the first section of the gprof output (analysis.out). It shows, in decreasing order, the most time-consuming functions of your program. For brevity, I trimmed the table to only the first 2 entries:

 

 

The most time-consuming function is, by far, a string comparison function from the C++ standard template library. The gprof report shows that over 10% of the time (25 seconds) was spent in this function. This revelation came as a surprise since I do not even call this function in my code! Then I realized that I am a big user of the map container which, for people not familiar with C++, are basically associative arrays. I often use maps to store data (observations, satellite data, code biases, etc.) per satellite. However, my thinking is that the time spent in this function is probably drastically reduced when compiling with optimization flags such as -O3 so I did not consider replacing the map containers at this point.

 

The second most time-consuming function is, without surprises this time, the matrix inversion subroutine. Gprof tells me that it has been called 53056 times which makes sense (twice for every epoch in a sequential least-squares adjustment) and took close to 3% of the total time. The sequential adjustment, similar to a Kalman filter, is probably not the most efficient approach but changing this processing strategy at this point is a non-trivial endeavour. My goal with this exercise was to eliminate obvious coding inadequacies and not tackle a full software restructuration.

 

The second part of the gprof output is the granularity analysis. It tells you which function calls which functions and how much time is being spent in each of them. For example:

 

 

This example shows the 7th most time-consuming function of my program (Filter::Compute_Solution). It tells me that this function was executed 26528 times (the number of epochs in the RINEX file) and that it was called by the PPP program. The Compute_Solution function called other functions, referred to as “children”. The software spent 23.58 seconds in Compute_Solution and its children and, clearly, it is the Update_NEQ_Constraints (11.19 sec) and Solve_Filter (9.87 sec) that required the most processing resources. This is not a surprise to me since these are the two functions calling the matrix inversion subroutine.

 

Hopefully, this example gives you a good idea of what you can do with gprof and motivates you to use it for yourself. The interesting part of the analysis is in the details: functions that you did not expect would use up much resources and that reveal poor coding practices. Was this exercise useful for me? The initial run of the data set took 169 sec to process when compiled with optimizations. After a few fine-tuning adjustments, the same data set took 117 sec, a 30% improvement. In my mind, this was a Friday afternoon extremely well spent!



Write a comment

Comments: 1
  • #1

    Garrett (Thursday, 04 May 2017 11:20)

    Good tool. I'll try to use it over the coming weeks.

    I've been googling for something like this but documentation for tools I've found were too intense.

    I have some routines that uses the most un-optimal method to search and return data.