I released Devel::NYTProf v3 on Christmas Eve 2009. Over the next couple of months a few more features were added. The v3 work had involved a complete rewrite of the subroutine profiler and heavy work on much else besides. At that point I felt I’d done enough with NYTProf for now and it was time to focus on other more pressing projects.

Over those months I’d also started working on enhancements for PostgreSQL PL/Perl. That project turned into something of an epic adventure with more than its fair share of highs and lows and twists and turns. The dust is only just settling now. I would have blogged about it but security issues arose that led the PostgreSQL team to consider removing the plperl language entirely. Fortunately I was able to help avoid that by removing Safe.pm entirely! At some point I hope to write a blog post worthy of the journey. Meanwhile, if you’re using PostgreSQL, you really do want to upgrade to the latest point-release.

One of the my goals in enhancing PostgreSQL PL/Perl was improve the integration with NYTProf. I wanted to be able to profile PL/Perl code embedded in the database server. With PostgreSQL 8.4 I could get the profiler to run, with some hackery, but in the report the subroutines were all __ANON__ and you couldn’t see the source code, so there were no statement timings. It was useless.

The key problem was that Devel::NYTProf couldn’t see into string evals properly. To fix that I had to go back spelunking deep in the NYTProf guts again; mostly in the data model and report generation code. With NYTProf v4, string evals are now treated as files, mostly, and a whole new level of insight is opened up!

In the rest of this post I’ll be describing this and other new features.

Seeing Into String Evals

Let’s start by taking a look at a small example:

perl -d:NYTProf -e 'eval("sub { $_ }")->() for 1,2,2'

That executes three string evals, each of which defines an anonymous subroutine which is then executed. Two of the three evals have identical source code.

With NYTProf 3.11 the report for the “ -e file” looked like this:

Two key things to note: Firstly, there’s no link to drill-down to see the actual source code executed by the eval. (In this example we can see the source, but that’s rare in practice.) Secondly, the three anonymous subroutines have been merged. You can’t see individual details like call counts, callers, or timings.

(In case you’re wondering, the main::BEGIN subroutine is defined as a side effect of loading NYTProf, and the main::RUNTIME subroutine is a dummy created by NYTProf to act as the ‘root caller’. You’ll see it appear as the caller of the anonymous subs in a later screenshot.)

With NYTProf 4.00 the same report looks like this:

Now you can see much more detail right there. The two evals with identical source code have been merged, as have the identical anonymous subroutines defined by them. The eval and anonymous sub with different source code have been kept separate. What you can’t easily see from the image is that the “ string eval ” texts in the grey annotation are links. This is where it’s gets more interesting…

Clicking on the “ 2 string evals (merged) ” link takes us to a typical NYTProf report page showing the performance-annotated source code executed by the eval:

What you’re looking at here is source code that never existed as a file. (That second line containing a semicolon was added by perl as part of the implementation of eval .)

In the table at the top, you’ll see “Eval Invoked At” with a link that’ll take you to the eval statement that executed this source code. You’ll also see a “Sibling evals” row. That’s added in cases where an eval was executed multiple times and not all were merged into a single report. Finally, because this particular eval includes data merged from others, the report includes a clear banner alerting you to how many evals were merged to produce this report page.

Update: You’ll need a recent version of perl (5.8.9+, 5.10.1+, or 5.12+) to see the eval source code.



So why does NYTProf go to all the effort of merging evals and anonymous subs? Here’s a real-world example using a profile of perlcritic:

Without merging, those 4196 evals would have produced 4196 report pages! Their sheer volume making them almost useless. Now it’s clear from the report that only a few distinct source code strings are used and the reports for each are far more useful.

In that example perlcritic is compiling lots of tiny snippets of code. Many applications use string eval to compile large quantities of code. The ORLite module is one example. It dynamically generates and compiles a large chunk of code with many subroutines that implement a customized interface for a specific SQLite database file.

With NYTProf 3.11 you couldn’t see the hundreds of lines of source code, or the per-caller subroutine performance, or the individual statement performance. All you could see was a list of subs calls and the overall time spent in each:

That was better than nothing, but now, with NYTProf v4, you can see everything!

I see a SQLite in the Distance Aside… Getting all this working correctly, especially the data model manipulations required for merging evals and anonymous subroutines, was far more painful than I’d anticipated. I both blessed and cursed the test suite on numerous occasions! I think it would be wise for NYTProf reporting and data model code to read the data from, and manipulate the data in, an SQLite database. That would yield simpler more maintainable code. It would also be enable nytprofhtml to be used for presenting performance data from other sources, including perl6. If you’re interested in working on this, starting with a utility to load an nytprof.out file into a SQLite database, please contact the mailing list.

String Eval Timings

There’s a slight caveat worth noting about the timings shown for string evals that define subroutines.

Timings for string evals are taken from the statement profiler (the subroutine profiler doesn’t pay attention to evals). So the “time spent executing the eval” is the sum of the time spent executing any statements within the eval.

That’s fine for evals that don’t define subroutines. For those that do, the time for the eval includes not only the time spent executing the eval itself but also time spent executing statements in subroutines defined within the eval but called later from outside it.

Hence the careful wording you can see in the example from perlcritic shown previously:

# spent 2.65s executing statements in 1369 string evals (merged) # includes 1.83s spent executing 122015 calls to 1 sub defined therein.

I could have automatically subtracted the subroutine time from the eval time but I was wary about doing that (for some reason that currently escapes me). Maybe that’ll change in the near future. Further out, a future version of NYTProf might use the subroutine profiler logic to time evals. That’s a deeper change that would give a more natural view of the timings.

Other Changes in v4

Subroutines that couldn’t be associated with a perl source file, such as xsubs in packages with no perl source, used to not appear in reports at all. So associated caller and timing information couldn’t be seen. Now it can.

Similarly, subroutine calls that couldn’t be associated with a specific line, such as calls made by perl to END blocks, are now shown in reports. They appear as as calls from line 0.

NYTProf v3 added renaming of BEGIN subs, so a BEGIN (or use ) on line 3 would be called BEGIN@3 and so kept distinct from others in the same package. NYTProf v4 takes that further by detecting the rare cases where the modified name isn’t unique and adding a sequence number to it, like BEGIN@3.59 .

subs, so a BEGIN (or ) on line 3 would be called and so kept distinct from others in the same package. NYTProf v4 takes that further by detecting the rare cases where the modified name isn’t unique and adding a sequence number to it, like . NYTProf v2 added the savesrc option to enable storing a copy of the profiled perl source code into the profile data file itself. This makes report generation immune from later changes to the source files. NYTProf v4 now enables that option by default.

The report generator used to only generate report lines up to the maximum number of source lines present. If there was no source code available, for whatever reason, you’d get an empty report for that file, even though there was useful information to report. Now the report generates enough lines to ensure all available profile information gets included. This is especially useful for old perl versions where source code is more likely to be unavailable. Also, the report generator now collapses groups of three or more blank lines.

Nicholas Clark contributed changes to refine the timing of the beginning and end of profiling. Now END blocks defined at runtime are included in the profile and compilation-only checks (e.g., perl -c ) can also be profiled.

) can also be profiled. You may be aware that the POSIX::_exit function exits the process immediately, without flushing stdio buffers and without giving perl a chance to clean up. That means NYTProf didn’t get a chance to finish and the profile wasn’t usable. NYTProf v4 now intercepts calls to POSIX::_exit and cleans up properly.

function exits the process immediately, without flushing stdio buffers and without giving perl a chance to clean up. That means NYTProf didn’t get a chance to finish and the profile wasn’t usable. NYTProf v4 now intercepts calls to and cleans up properly. Finally, tired of waiting for nytprofhtml to produce a report from a long profile run? The new --minimal (-m) option makes nytprofhtml skip building reports for the rarely used ‘blocks’ and ‘subs’ levels of detail and skips generating the graphviz .dot files. That saves a lot of time.

Enjoy!

Translations: Belorussian translation thanks to Bohdan Zograf.