2116 lines
87 KiB
ReStructuredText
2116 lines
87 KiB
ReStructuredText
|
.. SPDX-License-Identifier: GPL-2.0
|
||
|
|
||
|
======================
|
||
|
Histogram Design Notes
|
||
|
======================
|
||
|
|
||
|
:Author: Tom Zanussi <zanussi@kernel.org>
|
||
|
|
||
|
This document attempts to provide a description of how the ftrace
|
||
|
histograms work and how the individual pieces map to the data
|
||
|
structures used to implement them in trace_events_hist.c and
|
||
|
tracing_map.c.
|
||
|
|
||
|
Note: All the ftrace histogram command examples assume the working
|
||
|
directory is the ftrace /tracing directory. For example::
|
||
|
|
||
|
# cd /sys/kernel/tracing
|
||
|
|
||
|
Also, the histogram output displayed for those commands will be
|
||
|
generally be truncated - only enough to make the point is displayed.
|
||
|
|
||
|
'hist_debug' trace event files
|
||
|
==============================
|
||
|
|
||
|
If the kernel is compiled with CONFIG_HIST_TRIGGERS_DEBUG set, an
|
||
|
event file named 'hist_debug' will appear in each event's
|
||
|
subdirectory. This file can be read at any time and will display some
|
||
|
of the hist trigger internals described in this document. Specific
|
||
|
examples and output will be described in test cases below.
|
||
|
|
||
|
Basic histograms
|
||
|
================
|
||
|
|
||
|
First, basic histograms. Below is pretty much the simplest thing you
|
||
|
can do with histograms - create one with a single key on a single
|
||
|
event and cat the output::
|
||
|
|
||
|
# echo 'hist:keys=pid' >> events/sched/sched_waking/trigger
|
||
|
|
||
|
# cat events/sched/sched_waking/hist
|
||
|
|
||
|
{ pid: 18249 } hitcount: 1
|
||
|
{ pid: 13399 } hitcount: 1
|
||
|
{ pid: 17973 } hitcount: 1
|
||
|
{ pid: 12572 } hitcount: 1
|
||
|
...
|
||
|
{ pid: 10 } hitcount: 921
|
||
|
{ pid: 18255 } hitcount: 1444
|
||
|
{ pid: 25526 } hitcount: 2055
|
||
|
{ pid: 5257 } hitcount: 2055
|
||
|
{ pid: 27367 } hitcount: 2055
|
||
|
{ pid: 1728 } hitcount: 2161
|
||
|
|
||
|
Totals:
|
||
|
Hits: 21305
|
||
|
Entries: 183
|
||
|
Dropped: 0
|
||
|
|
||
|
What this does is create a histogram on the sched_waking event using
|
||
|
pid as a key and with a single value, hitcount, which even if not
|
||
|
explicitly specified, exists for every histogram regardless.
|
||
|
|
||
|
The hitcount value is a per-bucket value that's automatically
|
||
|
incremented on every hit for the given key, which in this case is the
|
||
|
pid.
|
||
|
|
||
|
So in this histogram, there's a separate bucket for each pid, and each
|
||
|
bucket contains a value for that bucket, counting the number of times
|
||
|
sched_waking was called for that pid.
|
||
|
|
||
|
Each histogram is represented by a hist_data struct.
|
||
|
|
||
|
To keep track of each key and value field in the histogram, hist_data
|
||
|
keeps an array of these fields named fields[]. The fields[] array is
|
||
|
an array containing struct hist_field representations of each
|
||
|
histogram val and key in the histogram (variables are also included
|
||
|
here, but are discussed later). So for the above histogram we have one
|
||
|
key and one value; in this case the one value is the hitcount value,
|
||
|
which all histograms have, regardless of whether they define that
|
||
|
value or not, which the above histogram does not.
|
||
|
|
||
|
Each struct hist_field contains a pointer to the ftrace_event_field
|
||
|
from the event's trace_event_file along with various bits related to
|
||
|
that such as the size, offset, type, and a hist_field_fn_t function,
|
||
|
which is used to grab the field's data from the ftrace event buffer
|
||
|
(in most cases - some hist_fields such as hitcount don't directly map
|
||
|
to an event field in the trace buffer - in these cases the function
|
||
|
implementation gets its value from somewhere else). The flags field
|
||
|
indicates which type of field it is - key, value, variable, variable
|
||
|
reference, etc., with value being the default.
|
||
|
|
||
|
The other important hist_data data structure in addition to the
|
||
|
fields[] array is the tracing_map instance created for the histogram,
|
||
|
which is held in the .map member. The tracing_map implements the
|
||
|
lock-free hash table used to implement histograms (see
|
||
|
kernel/trace/tracing_map.h for much more discussion about the
|
||
|
low-level data structures implementing the tracing_map). For the
|
||
|
purposes of this discussion, the tracing_map contains a number of
|
||
|
buckets, each bucket corresponding to a particular tracing_map_elt
|
||
|
object hashed by a given histogram key.
|
||
|
|
||
|
Below is a diagram the first part of which describes the hist_data and
|
||
|
associated key and value fields for the histogram described above. As
|
||
|
you can see, there are two fields in the fields array, one val field
|
||
|
for the hitcount and one key field for the pid key.
|
||
|
|
||
|
Below that is a diagram of a run-time snapshot of what the tracing_map
|
||
|
might look like for a given run. It attempts to show the
|
||
|
relationships between the hist_data fields and the tracing_map
|
||
|
elements for a couple hypothetical keys and values.::
|
||
|
|
||
|
+------------------+
|
||
|
| hist_data |
|
||
|
+------------------+ +----------------+
|
||
|
| .fields[] |---->| val = hitcount |----------------------------+
|
||
|
+----------------+ +----------------+ |
|
||
|
| .map | | .size | |
|
||
|
+----------------+ +--------------+ |
|
||
|
| .offset | |
|
||
|
+--------------+ |
|
||
|
| .fn() | |
|
||
|
+--------------+ |
|
||
|
. |
|
||
|
. |
|
||
|
. |
|
||
|
+----------------+ <--- n_vals |
|
||
|
| key = pid |----------------------------|--+
|
||
|
+----------------+ | |
|
||
|
| .size | | |
|
||
|
+--------------+ | |
|
||
|
| .offset | | |
|
||
|
+--------------+ | |
|
||
|
| .fn() | | |
|
||
|
+----------------+ <--- n_fields | |
|
||
|
| unused | | |
|
||
|
+----------------+ | |
|
||
|
| | | |
|
||
|
+--------------+ | |
|
||
|
| | | |
|
||
|
+--------------+ | |
|
||
|
| | | |
|
||
|
+--------------+ | |
|
||
|
n_keys = n_fields - n_vals | |
|
||
|
|
||
|
The hist_data n_vals and n_fields delineate the extent of the fields[] | |
|
||
|
array and separate keys from values for the rest of the code. | |
|
||
|
|
||
|
Below is a run-time representation of the tracing_map part of the | |
|
||
|
histogram, with pointers from various parts of the fields[] array | |
|
||
|
to corresponding parts of the tracing_map. | |
|
||
|
|
||
|
The tracing_map consists of an array of tracing_map_entrys and a set | |
|
||
|
of preallocated tracing_map_elts (abbreviated below as map_entry and | |
|
||
|
map_elt). The total number of map_entrys in the hist_data.map array = | |
|
||
|
map->max_elts (actually map->map_size but only max_elts of those are | |
|
||
|
used. This is a property required by the map_insert() algorithm). | |
|
||
|
|
||
|
If a map_entry is unused, meaning no key has yet hashed into it, its | |
|
||
|
.key value is 0 and its .val pointer is NULL. Once a map_entry has | |
|
||
|
been claimed, the .key value contains the key's hash value and the | |
|
||
|
.val member points to a map_elt containing the full key and an entry | |
|
||
|
for each key or value in the map_elt.fields[] array. There is an | |
|
||
|
entry in the map_elt.fields[] array corresponding to each hist_field | |
|
||
|
in the histogram, and this is where the continually aggregated sums | |
|
||
|
corresponding to each histogram value are kept. | |
|
||
|
|
||
|
The diagram attempts to show the relationship between the | |
|
||
|
hist_data.fields[] and the map_elt.fields[] with the links drawn | |
|
||
|
between diagrams::
|
||
|
|
||
|
+-----------+ | |
|
||
|
| hist_data | | |
|
||
|
+-----------+ | |
|
||
|
| .fields | | |
|
||
|
+---------+ +-----------+ | |
|
||
|
| .map |---->| map_entry | | |
|
||
|
+---------+ +-----------+ | |
|
||
|
| .key |---> 0 | |
|
||
|
+---------+ | |
|
||
|
| .val |---> NULL | |
|
||
|
+-----------+ | |
|
||
|
| map_entry | | |
|
||
|
+-----------+ | |
|
||
|
| .key |---> pid = 999 | |
|
||
|
+---------+ +-----------+ | |
|
||
|
| .val |--->| map_elt | | |
|
||
|
+---------+ +-----------+ | |
|
||
|
. | .key |---> full key * | |
|
||
|
. +---------+ +---------------+ | |
|
||
|
. | .fields |--->| .sum (val) |<-+ |
|
||
|
+-----------+ +---------+ | 2345 | | |
|
||
|
| map_entry | +---------------+ | |
|
||
|
+-----------+ | .offset (key) |<----+
|
||
|
| .key |---> 0 | 0 | | |
|
||
|
+---------+ +---------------+ | |
|
||
|
| .val |---> NULL . | |
|
||
|
+-----------+ . | |
|
||
|
| map_entry | . | |
|
||
|
+-----------+ +---------------+ | |
|
||
|
| .key | | .sum (val) or | | |
|
||
|
+---------+ +---------+ | .offset (key) | | |
|
||
|
| .val |--->| map_elt | +---------------+ | |
|
||
|
+-----------+ +---------+ | .sum (val) or | | |
|
||
|
| map_entry | | .offset (key) | | |
|
||
|
+-----------+ +---------------+ | |
|
||
|
| .key |---> pid = 4444 | |
|
||
|
+---------+ +-----------+ | |
|
||
|
| .val | | map_elt | | |
|
||
|
+---------+ +-----------+ | |
|
||
|
| .key |---> full key * | |
|
||
|
+---------+ +---------------+ | |
|
||
|
| .fields |--->| .sum (val) |<-+ |
|
||
|
+---------+ | 65523 | |
|
||
|
+---------------+ |
|
||
|
| .offset (key) |<----+
|
||
|
| 0 |
|
||
|
+---------------+
|
||
|
.
|
||
|
.
|
||
|
.
|
||
|
+---------------+
|
||
|
| .sum (val) or |
|
||
|
| .offset (key) |
|
||
|
+---------------+
|
||
|
| .sum (val) or |
|
||
|
| .offset (key) |
|
||
|
+---------------+
|
||
|
|
||
|
Abbreviations used in the diagrams::
|
||
|
|
||
|
hist_data = struct hist_trigger_data
|
||
|
hist_data.fields = struct hist_field
|
||
|
fn = hist_field_fn_t
|
||
|
map_entry = struct tracing_map_entry
|
||
|
map_elt = struct tracing_map_elt
|
||
|
map_elt.fields = struct tracing_map_field
|
||
|
|
||
|
Whenever a new event occurs and it has a hist trigger associated with
|
||
|
it, event_hist_trigger() is called. event_hist_trigger() first deals
|
||
|
with the key: for each subkey in the key (in the above example, there
|
||
|
is just one subkey corresponding to pid), the hist_field that
|
||
|
represents that subkey is retrieved from hist_data.fields[] and the
|
||
|
hist_field_fn_t fn() associated with that field, along with the
|
||
|
field's size and offset, is used to grab that subkey's data from the
|
||
|
current trace record.
|
||
|
|
||
|
Once the complete key has been retrieved, it's used to look that key
|
||
|
up in the tracing_map. If there's no tracing_map_elt associated with
|
||
|
that key, an empty one is claimed and inserted in the map for the new
|
||
|
key. In either case, the tracing_map_elt associated with that key is
|
||
|
returned.
|
||
|
|
||
|
Once a tracing_map_elt available, hist_trigger_elt_update() is called.
|
||
|
As the name implies, this updates the element, which basically means
|
||
|
updating the element's fields. There's a tracing_map_field associated
|
||
|
with each key and value in the histogram, and each of these correspond
|
||
|
to the key and value hist_fields created when the histogram was
|
||
|
created. hist_trigger_elt_update() goes through each value hist_field
|
||
|
and, as for the keys, uses the hist_field's fn() and size and offset
|
||
|
to grab the field's value from the current trace record. Once it has
|
||
|
that value, it simply adds that value to that field's
|
||
|
continually-updated tracing_map_field.sum member. Some hist_field
|
||
|
fn()s, such as for the hitcount, don't actually grab anything from the
|
||
|
trace record (the hitcount fn() just increments the counter sum by 1),
|
||
|
but the idea is the same.
|
||
|
|
||
|
Once all the values have been updated, hist_trigger_elt_update() is
|
||
|
done and returns. Note that there are also tracing_map_fields for
|
||
|
each subkey in the key, but hist_trigger_elt_update() doesn't look at
|
||
|
them or update anything - those exist only for sorting, which can
|
||
|
happen later.
|
||
|
|
||
|
Basic histogram test
|
||
|
--------------------
|
||
|
|
||
|
This is a good example to try. It produces 3 value fields and 2 key
|
||
|
fields in the output::
|
||
|
|
||
|
# echo 'hist:keys=common_pid,call_site.sym:values=bytes_req,bytes_alloc,hitcount' >> events/kmem/kmalloc/trigger
|
||
|
|
||
|
To see the debug data, cat the kmem/kmalloc's 'hist_debug' file. It
|
||
|
will show the trigger info of the histogram it corresponds to, along
|
||
|
with the address of the hist_data associated with the histogram, which
|
||
|
will become useful in later examples. It then displays the number of
|
||
|
total hist_fields associated with the histogram along with a count of
|
||
|
how many of those correspond to keys and how many correspond to values.
|
||
|
|
||
|
It then goes on to display details for each field, including the
|
||
|
field's flags and the position of each field in the hist_data's
|
||
|
fields[] array, which is useful information for verifying that things
|
||
|
internally appear correct or not, and which again will become even
|
||
|
more useful in further examples::
|
||
|
|
||
|
# cat events/kmem/kmalloc/hist_debug
|
||
|
|
||
|
# event histogram
|
||
|
#
|
||
|
# trigger info: hist:keys=common_pid,call_site.sym:vals=hitcount,bytes_req,bytes_alloc:sort=hitcount:size=2048 [active]
|
||
|
#
|
||
|
|
||
|
hist_data: 000000005e48c9a5
|
||
|
|
||
|
n_vals: 3
|
||
|
n_keys: 2
|
||
|
n_fields: 5
|
||
|
|
||
|
val fields:
|
||
|
|
||
|
hist_data->fields[0]:
|
||
|
flags:
|
||
|
VAL: HIST_FIELD_FL_HITCOUNT
|
||
|
type: u64
|
||
|
size: 8
|
||
|
is_signed: 0
|
||
|
|
||
|
hist_data->fields[1]:
|
||
|
flags:
|
||
|
VAL: normal u64 value
|
||
|
ftrace_event_field name: bytes_req
|
||
|
type: size_t
|
||
|
size: 8
|
||
|
is_signed: 0
|
||
|
|
||
|
hist_data->fields[2]:
|
||
|
flags:
|
||
|
VAL: normal u64 value
|
||
|
ftrace_event_field name: bytes_alloc
|
||
|
type: size_t
|
||
|
size: 8
|
||
|
is_signed: 0
|
||
|
|
||
|
key fields:
|
||
|
|
||
|
hist_data->fields[3]:
|
||
|
flags:
|
||
|
HIST_FIELD_FL_KEY
|
||
|
ftrace_event_field name: common_pid
|
||
|
type: int
|
||
|
size: 8
|
||
|
is_signed: 1
|
||
|
|
||
|
hist_data->fields[4]:
|
||
|
flags:
|
||
|
HIST_FIELD_FL_KEY
|
||
|
ftrace_event_field name: call_site
|
||
|
type: unsigned long
|
||
|
size: 8
|
||
|
is_signed: 0
|
||
|
|
||
|
The commands below can be used to clean things up for the next test::
|
||
|
|
||
|
# echo '!hist:keys=common_pid,call_site.sym:values=bytes_req,bytes_alloc,hitcount' >> events/kmem/kmalloc/trigger
|
||
|
|
||
|
Variables
|
||
|
=========
|
||
|
|
||
|
Variables allow data from one hist trigger to be saved by one hist
|
||
|
trigger and retrieved by another hist trigger. For example, a trigger
|
||
|
on the sched_waking event can capture a timestamp for a particular
|
||
|
pid, and later a sched_switch event that switches to that pid event
|
||
|
can grab the timestamp and use it to calculate a time delta between
|
||
|
the two events::
|
||
|
|
||
|
# echo 'hist:keys=pid:ts0=common_timestamp.usecs' >>
|
||
|
events/sched/sched_waking/trigger
|
||
|
|
||
|
# echo 'hist:keys=next_pid:wakeup_lat=common_timestamp.usecs-$ts0' >>
|
||
|
events/sched/sched_switch/trigger
|
||
|
|
||
|
In terms of the histogram data structures, variables are implemented
|
||
|
as another type of hist_field and for a given hist trigger are added
|
||
|
to the hist_data.fields[] array just after all the val fields. To
|
||
|
distinguish them from the existing key and val fields, they're given a
|
||
|
new flag type, HIST_FIELD_FL_VAR (abbreviated FL_VAR) and they also
|
||
|
make use of a new .var.idx field member in struct hist_field, which
|
||
|
maps them to an index in a new map_elt.vars[] array added to the
|
||
|
map_elt specifically designed to store and retrieve variable values.
|
||
|
The diagram below shows those new elements and adds a new variable
|
||
|
entry, ts0, corresponding to the ts0 variable in the sched_waking
|
||
|
trigger above.
|
||
|
|
||
|
sched_waking histogram
|
||
|
----------------------::
|
||
|
|
||
|
+------------------+
|
||
|
| hist_data |<-------------------------------------------------------+
|
||
|
+------------------+ +-------------------+ |
|
||
|
| .fields[] |-->| val = hitcount | |
|
||
|
+----------------+ +-------------------+ |
|
||
|
| .map | | .size | |
|
||
|
+----------------+ +-----------------+ |
|
||
|
| .offset | |
|
||
|
+-----------------+ |
|
||
|
| .fn() | |
|
||
|
+-----------------+ |
|
||
|
| .flags | |
|
||
|
+-----------------+ |
|
||
|
| .var.idx | |
|
||
|
+-------------------+ |
|
||
|
| var = ts0 | |
|
||
|
+-------------------+ |
|
||
|
| .size | |
|
||
|
+-----------------+ |
|
||
|
| .offset | |
|
||
|
+-----------------+ |
|
||
|
| .fn() | |
|
||
|
+-----------------+ |
|
||
|
| .flags & FL_VAR | |
|
||
|
+-----------------+ |
|
||
|
| .var.idx |----------------------------+-+ |
|
||
|
+-----------------+ | | |
|
||
|
. | | |
|
||
|
. | | |
|
||
|
. | | |
|
||
|
+-------------------+ <--- n_vals | | |
|
||
|
| key = pid | | | |
|
||
|
+-------------------+ | | |
|
||
|
| .size | | | |
|
||
|
+-----------------+ | | |
|
||
|
| .offset | | | |
|
||
|
+-----------------+ | | |
|
||
|
| .fn() | | | |
|
||
|
+-----------------+ | | |
|
||
|
| .flags & FL_KEY | | | |
|
||
|
+-----------------+ | | |
|
||
|
| .var.idx | | | |
|
||
|
+-------------------+ <--- n_fields | | |
|
||
|
| unused | | | |
|
||
|
+-------------------+ | | |
|
||
|
| | | | |
|
||
|
+-----------------+ | | |
|
||
|
| | | | |
|
||
|
+-----------------+ | | |
|
||
|
| | | | |
|
||
|
+-----------------+ | | |
|
||
|
| | | | |
|
||
|
+-----------------+ | | |
|
||
|
| | | | |
|
||
|
+-----------------+ | | |
|
||
|
n_keys = n_fields - n_vals | | |
|
||
|
| | |
|
||
|
|
||
|
This is very similar to the basic case. In the above diagram, we can | | |
|
||
|
see a new .flags member has been added to the struct hist_field | | |
|
||
|
struct, and a new entry added to hist_data.fields representing the ts0 | | |
|
||
|
variable. For a normal val hist_field, .flags is just 0 (modulo | | |
|
||
|
modifier flags), but if the value is defined as a variable, the .flags | | |
|
||
|
contains a set FL_VAR bit. | | |
|
||
|
|
||
|
As you can see, the ts0 entry's .var.idx member contains the index | | |
|
||
|
into the tracing_map_elts' .vars[] array containing variable values. | | |
|
||
|
This idx is used whenever the value of the variable is set or read. | | |
|
||
|
The map_elt.vars idx assigned to the given variable is assigned and | | |
|
||
|
saved in .var.idx by create_tracing_map_fields() after it calls | | |
|
||
|
tracing_map_add_var(). | | |
|
||
|
|
||
|
Below is a representation of the histogram at run-time, which | | |
|
||
|
populates the map, along with correspondence to the above hist_data and | | |
|
||
|
hist_field data structures. | | |
|
||
|
|
||
|
The diagram attempts to show the relationship between the | | |
|
||
|
hist_data.fields[] and the map_elt.fields[] and map_elt.vars[] with | | |
|
||
|
the links drawn between diagrams. For each of the map_elts, you can | | |
|
||
|
see that the .fields[] members point to the .sum or .offset of a key | | |
|
||
|
or val and the .vars[] members point to the value of a variable. The | | |
|
||
|
arrows between the two diagrams show the linkages between those | | |
|
||
|
tracing_map members and the field definitions in the corresponding | | |
|
||
|
hist_data fields[] members.::
|
||
|
|
||
|
+-----------+ | | |
|
||
|
| hist_data | | | |
|
||
|
+-----------+ | | |
|
||
|
| .fields | | | |
|
||
|
+---------+ +-----------+ | | |
|
||
|
| .map |---->| map_entry | | | |
|
||
|
+---------+ +-----------+ | | |
|
||
|
| .key |---> 0 | | |
|
||
|
+---------+ | | |
|
||
|
| .val |---> NULL | | |
|
||
|
+-----------+ | | |
|
||
|
| map_entry | | | |
|
||
|
+-----------+ | | |
|
||
|
| .key |---> pid = 999 | | |
|
||
|
+---------+ +-----------+ | | |
|
||
|
| .val |--->| map_elt | | | |
|
||
|
+---------+ +-----------+ | | |
|
||
|
. | .key |---> full key * | | |
|
||
|
. +---------+ +---------------+ | | |
|
||
|
. | .fields |--->| .sum (val) | | | |
|
||
|
. +---------+ | 2345 | | | |
|
||
|
. +--| .vars | +---------------+ | | |
|
||
|
. | +---------+ | .offset (key) | | | |
|
||
|
. | | 0 | | | |
|
||
|
. | +---------------+ | | |
|
||
|
. | . | | |
|
||
|
. | . | | |
|
||
|
. | . | | |
|
||
|
. | +---------------+ | | |
|
||
|
. | | .sum (val) or | | | |
|
||
|
. | | .offset (key) | | | |
|
||
|
. | +---------------+ | | |
|
||
|
. | | .sum (val) or | | | |
|
||
|
. | | .offset (key) | | | |
|
||
|
. | +---------------+ | | |
|
||
|
. | | | |
|
||
|
. +---------------->+---------------+ | | |
|
||
|
. | ts0 |<--+ | |
|
||
|
. | 113345679876 | | | |
|
||
|
. +---------------+ | | |
|
||
|
. | unused | | | |
|
||
|
. | | | | |
|
||
|
. +---------------+ | | |
|
||
|
. . | | |
|
||
|
. . | | |
|
||
|
. . | | |
|
||
|
. +---------------+ | | |
|
||
|
. | unused | | | |
|
||
|
. | | | | |
|
||
|
. +---------------+ | | |
|
||
|
. | unused | | | |
|
||
|
. | | | | |
|
||
|
. +---------------+ | | |
|
||
|
. | | |
|
||
|
+-----------+ | | |
|
||
|
| map_entry | | | |
|
||
|
+-----------+ | | |
|
||
|
| .key |---> pid = 4444 | | |
|
||
|
+---------+ +-----------+ | | |
|
||
|
| .val |--->| map_elt | | | |
|
||
|
+---------+ +-----------+ | | |
|
||
|
. | .key |---> full key * | | |
|
||
|
. +---------+ +---------------+ | | |
|
||
|
. | .fields |--->| .sum (val) | | | |
|
||
|
+---------+ | 2345 | | | |
|
||
|
+--| .vars | +---------------+ | | |
|
||
|
| +---------+ | .offset (key) | | | |
|
||
|
| | 0 | | | |
|
||
|
| +---------------+ | | |
|
||
|
| . | | |
|
||
|
| . | | |
|
||
|
| . | | |
|
||
|
| +---------------+ | | |
|
||
|
| | .sum (val) or | | | |
|
||
|
| | .offset (key) | | | |
|
||
|
| +---------------+ | | |
|
||
|
| | .sum (val) or | | | |
|
||
|
| | .offset (key) | | | |
|
||
|
| +---------------+ | | |
|
||
|
| | | |
|
||
|
| +---------------+ | | |
|
||
|
+---------------->| ts0 |<--+ | |
|
||
|
| 213499240729 | | |
|
||
|
+---------------+ | |
|
||
|
| unused | | |
|
||
|
| | | |
|
||
|
+---------------+ | |
|
||
|
. | |
|
||
|
. | |
|
||
|
. | |
|
||
|
+---------------+ | |
|
||
|
| unused | | |
|
||
|
| | | |
|
||
|
+---------------+ | |
|
||
|
| unused | | |
|
||
|
| | | |
|
||
|
+---------------+ | |
|
||
|
|
||
|
For each used map entry, there's a map_elt pointing to an array of | |
|
||
|
.vars containing the current value of the variables associated with | |
|
||
|
that histogram entry. So in the above, the timestamp associated with | |
|
||
|
pid 999 is 113345679876, and the timestamp variable in the same | |
|
||
|
.var.idx for pid 4444 is 213499240729. | |
|
||
|
|
||
|
sched_switch histogram | |
|
||
|
---------------------- | |
|
||
|
|
||
|
The sched_switch histogram paired with the above sched_waking | |
|
||
|
histogram is shown below. The most important aspect of the | |
|
||
|
sched_switch histogram is that it references a variable on the | |
|
||
|
sched_waking histogram above. | |
|
||
|
|
||
|
The histogram diagram is very similar to the others so far displayed, | |
|
||
|
but it adds variable references. You can see the normal hitcount and | |
|
||
|
key fields along with a new wakeup_lat variable implemented in the | |
|
||
|
same way as the sched_waking ts0 variable, but in addition there's an | |
|
||
|
entry with the new FL_VAR_REF (short for HIST_FIELD_FL_VAR_REF) flag. | |
|
||
|
|
||
|
Associated with the new var ref field are a couple of new hist_field | |
|
||
|
members, var.hist_data and var_ref_idx. For a variable reference, the | |
|
||
|
var.hist_data goes with the var.idx, which together uniquely identify | |
|
||
|
a particular variable on a particular histogram. The var_ref_idx is | |
|
||
|
just the index into the var_ref_vals[] array that caches the values of | |
|
||
|
each variable whenever a hist trigger is updated. Those resulting | |
|
||
|
values are then finally accessed by other code such as trace action | |
|
||
|
code that uses the var_ref_idx values to assign param values. | |
|
||
|
|
||
|
The diagram below describes the situation for the sched_switch | |
|
||
|
histogram referred to before::
|
||
|
|
||
|
# echo 'hist:keys=next_pid:wakeup_lat=common_timestamp.usecs-$ts0' >> | |
|
||
|
events/sched/sched_switch/trigger | |
|
||
|
| |
|
||
|
+------------------+ | |
|
||
|
| hist_data | | |
|
||
|
+------------------+ +-----------------------+ | |
|
||
|
| .fields[] |-->| val = hitcount | | |
|
||
|
+----------------+ +-----------------------+ | |
|
||
|
| .map | | .size | | |
|
||
|
+----------------+ +---------------------+ | |
|
||
|
+--| .var_refs[] | | .offset | | |
|
||
|
| +----------------+ +---------------------+ | |
|
||
|
| | .fn() | | |
|
||
|
| var_ref_vals[] +---------------------+ | |
|
||
|
| +-------------+ | .flags | | |
|
||
|
| | $ts0 |<---+ +---------------------+ | |
|
||
|
| +-------------+ | | .var.idx | | |
|
||
|
| | | | +---------------------+ | |
|
||
|
| +-------------+ | | .var.hist_data | | |
|
||
|
| | | | +---------------------+ | |
|
||
|
| +-------------+ | | .var_ref_idx | | |
|
||
|
| | | | +-----------------------+ | |
|
||
|
| +-------------+ | | var = wakeup_lat | | |
|
||
|
| . | +-----------------------+ | |
|
||
|
| . | | .size | | |
|
||
|
| . | +---------------------+ | |
|
||
|
| +-------------+ | | .offset | | |
|
||
|
| | | | +---------------------+ | |
|
||
|
| +-------------+ | | .fn() | | |
|
||
|
| | | | +---------------------+ | |
|
||
|
| +-------------+ | | .flags & FL_VAR | | |
|
||
|
| | +---------------------+ | |
|
||
|
| | | .var.idx | | |
|
||
|
| | +---------------------+ | |
|
||
|
| | | .var.hist_data | | |
|
||
|
| | +---------------------+ | |
|
||
|
| | | .var_ref_idx | | |
|
||
|
| | +---------------------+ | |
|
||
|
| | . | |
|
||
|
| | . | |
|
||
|
| | . | |
|
||
|
| | +-----------------------+ <--- n_vals | |
|
||
|
| | | key = pid | | |
|
||
|
| | +-----------------------+ | |
|
||
|
| | | .size | | |
|
||
|
| | +---------------------+ | |
|
||
|
| | | .offset | | |
|
||
|
| | +---------------------+ | |
|
||
|
| | | .fn() | | |
|
||
|
| | +---------------------+ | |
|
||
|
| | | .flags | | |
|
||
|
| | +---------------------+ | |
|
||
|
| | | .var.idx | | |
|
||
|
| | +-----------------------+ <--- n_fields | |
|
||
|
| | | unused | | |
|
||
|
| | +-----------------------+ | |
|
||
|
| | | | | |
|
||
|
| | +---------------------+ | |
|
||
|
| | | | | |
|
||
|
| | +---------------------+ | |
|
||
|
| | | | | |
|
||
|
| | +---------------------+ | |
|
||
|
| | | | | |
|
||
|
| | +---------------------+ | |
|
||
|
| | | | | |
|
||
|
| | +---------------------+ | |
|
||
|
| | n_keys = n_fields - n_vals | |
|
||
|
| | | |
|
||
|
| | | |
|
||
|
| | +-----------------------+ | |
|
||
|
+---------------------->| var_ref = $ts0 | | |
|
||
|
| +-----------------------+ | |
|
||
|
| | .size | | |
|
||
|
| +---------------------+ | |
|
||
|
| | .offset | | |
|
||
|
| +---------------------+ | |
|
||
|
| | .fn() | | |
|
||
|
| +---------------------+ | |
|
||
|
| | .flags & FL_VAR_REF | | |
|
||
|
| +---------------------+ | |
|
||
|
| | .var.idx |--------------------------+ |
|
||
|
| +---------------------+ |
|
||
|
| | .var.hist_data |----------------------------+
|
||
|
| +---------------------+
|
||
|
+---| .var_ref_idx |
|
||
|
+---------------------+
|
||
|
|
||
|
Abbreviations used in the diagrams::
|
||
|
|
||
|
hist_data = struct hist_trigger_data
|
||
|
hist_data.fields = struct hist_field
|
||
|
fn = hist_field_fn_t
|
||
|
FL_KEY = HIST_FIELD_FL_KEY
|
||
|
FL_VAR = HIST_FIELD_FL_VAR
|
||
|
FL_VAR_REF = HIST_FIELD_FL_VAR_REF
|
||
|
|
||
|
When a hist trigger makes use of a variable, a new hist_field is
|
||
|
created with flag HIST_FIELD_FL_VAR_REF. For a VAR_REF field, the
|
||
|
var.idx and var.hist_data take the same values as the referenced
|
||
|
variable, as well as the referenced variable's size, type, and
|
||
|
is_signed values. The VAR_REF field's .name is set to the name of the
|
||
|
variable it references. If a variable reference was created using the
|
||
|
explicit system.event.$var_ref notation, the hist_field's system and
|
||
|
event_name variables are also set.
|
||
|
|
||
|
So, in order to handle an event for the sched_switch histogram,
|
||
|
because we have a reference to a variable on another histogram, we
|
||
|
need to resolve all variable references first. This is done via the
|
||
|
resolve_var_refs() calls made from event_hist_trigger(). What this
|
||
|
does is grabs the var_refs[] array from the hist_data representing the
|
||
|
sched_switch histogram. For each one of those, the referenced
|
||
|
variable's var.hist_data along with the current key is used to look up
|
||
|
the corresponding tracing_map_elt in that histogram. Once found, the
|
||
|
referenced variable's var.idx is used to look up the variable's value
|
||
|
using tracing_map_read_var(elt, var.idx), which yields the value of
|
||
|
the variable for that element, ts0 in the case above. Note that both
|
||
|
the hist_fields representing both the variable and the variable
|
||
|
reference have the same var.idx, so this is straightforward.
|
||
|
|
||
|
Variable and variable reference test
|
||
|
------------------------------------
|
||
|
|
||
|
This example creates a variable on the sched_waking event, ts0, and
|
||
|
uses it in the sched_switch trigger. The sched_switch trigger also
|
||
|
creates its own variable, wakeup_lat, but nothing yet uses it::
|
||
|
|
||
|
# echo 'hist:keys=pid:ts0=common_timestamp.usecs' >> events/sched/sched_waking/trigger
|
||
|
|
||
|
# echo 'hist:keys=next_pid:wakeup_lat=common_timestamp.usecs-$ts0' >> events/sched/sched_switch/trigger
|
||
|
|
||
|
Looking at the sched_waking 'hist_debug' output, in addition to the
|
||
|
normal key and value hist_fields, in the val fields section we see a
|
||
|
field with the HIST_FIELD_FL_VAR flag, which indicates that that field
|
||
|
represents a variable. Note that in addition to the variable name,
|
||
|
contained in the var.name field, it includes the var.idx, which is the
|
||
|
index into the tracing_map_elt.vars[] array of the actual variable
|
||
|
location. Note also that the output shows that variables live in the
|
||
|
same part of the hist_data->fields[] array as normal values::
|
||
|
|
||
|
# cat events/sched/sched_waking/hist_debug
|
||
|
|
||
|
# event histogram
|
||
|
#
|
||
|
# trigger info: hist:keys=pid:vals=hitcount:ts0=common_timestamp.usecs:sort=hitcount:size=2048:clock=global [active]
|
||
|
#
|
||
|
|
||
|
hist_data: 000000009536f554
|
||
|
|
||
|
n_vals: 2
|
||
|
n_keys: 1
|
||
|
n_fields: 3
|
||
|
|
||
|
val fields:
|
||
|
|
||
|
hist_data->fields[0]:
|
||
|
flags:
|
||
|
VAL: HIST_FIELD_FL_HITCOUNT
|
||
|
type: u64
|
||
|
size: 8
|
||
|
is_signed: 0
|
||
|
|
||
|
hist_data->fields[1]:
|
||
|
flags:
|
||
|
HIST_FIELD_FL_VAR
|
||
|
var.name: ts0
|
||
|
var.idx (into tracing_map_elt.vars[]): 0
|
||
|
type: u64
|
||
|
size: 8
|
||
|
is_signed: 0
|
||
|
|
||
|
key fields:
|
||
|
|
||
|
hist_data->fields[2]:
|
||
|
flags:
|
||
|
HIST_FIELD_FL_KEY
|
||
|
ftrace_event_field name: pid
|
||
|
type: pid_t
|
||
|
size: 8
|
||
|
is_signed: 1
|
||
|
|
||
|
Moving on to the sched_switch trigger hist_debug output, in addition
|
||
|
to the unused wakeup_lat variable, we see a new section displaying
|
||
|
variable references. Variable references are displayed in a separate
|
||
|
section because in addition to being logically separate from
|
||
|
variables and values, they actually live in a separate hist_data
|
||
|
array, var_refs[].
|
||
|
|
||
|
In this example, the sched_switch trigger has a reference to a
|
||
|
variable on the sched_waking trigger, $ts0. Looking at the details,
|
||
|
we can see that the var.hist_data value of the referenced variable
|
||
|
matches the previously displayed sched_waking trigger, and the var.idx
|
||
|
value matches the previously displayed var.idx value for that
|
||
|
variable. Also displayed is the var_ref_idx value for that variable
|
||
|
reference, which is where the value for that variable is cached for
|
||
|
use when the trigger is invoked::
|
||
|
|
||
|
# cat events/sched/sched_switch/hist_debug
|
||
|
|
||
|
# event histogram
|
||
|
#
|
||
|
# trigger info: hist:keys=next_pid:vals=hitcount:wakeup_lat=common_timestamp.usecs-$ts0:sort=hitcount:size=2048:clock=global [active]
|
||
|
#
|
||
|
|
||
|
hist_data: 00000000f4ee8006
|
||
|
|
||
|
n_vals: 2
|
||
|
n_keys: 1
|
||
|
n_fields: 3
|
||
|
|
||
|
val fields:
|
||
|
|
||
|
hist_data->fields[0]:
|
||
|
flags:
|
||
|
VAL: HIST_FIELD_FL_HITCOUNT
|
||
|
type: u64
|
||
|
size: 8
|
||
|
is_signed: 0
|
||
|
|
||
|
hist_data->fields[1]:
|
||
|
flags:
|
||
|
HIST_FIELD_FL_VAR
|
||
|
var.name: wakeup_lat
|
||
|
var.idx (into tracing_map_elt.vars[]): 0
|
||
|
type: u64
|
||
|
size: 0
|
||
|
is_signed: 0
|
||
|
|
||
|
key fields:
|
||
|
|
||
|
hist_data->fields[2]:
|
||
|
flags:
|
||
|
HIST_FIELD_FL_KEY
|
||
|
ftrace_event_field name: next_pid
|
||
|
type: pid_t
|
||
|
size: 8
|
||
|
is_signed: 1
|
||
|
|
||
|
variable reference fields:
|
||
|
|
||
|
hist_data->var_refs[0]:
|
||
|
flags:
|
||
|
HIST_FIELD_FL_VAR_REF
|
||
|
name: ts0
|
||
|
var.idx (into tracing_map_elt.vars[]): 0
|
||
|
var.hist_data: 000000009536f554
|
||
|
var_ref_idx (into hist_data->var_refs[]): 0
|
||
|
type: u64
|
||
|
size: 8
|
||
|
is_signed: 0
|
||
|
|
||
|
The commands below can be used to clean things up for the next test::
|
||
|
|
||
|
# echo '!hist:keys=next_pid:wakeup_lat=common_timestamp.usecs-$ts0' >> events/sched/sched_switch/trigger
|
||
|
|
||
|
# echo '!hist:keys=pid:ts0=common_timestamp.usecs' >> events/sched/sched_waking/trigger
|
||
|
|
||
|
Actions and Handlers
|
||
|
====================
|
||
|
|
||
|
Adding onto the previous example, we will now do something with that
|
||
|
wakeup_lat variable, namely send it and another field as a synthetic
|
||
|
event.
|
||
|
|
||
|
The onmatch() action below basically says that whenever we have a
|
||
|
sched_switch event, if we have a matching sched_waking event, in this
|
||
|
case if we have a pid in the sched_waking histogram that matches the
|
||
|
next_pid field on this sched_switch event, we retrieve the
|
||
|
variables specified in the wakeup_latency() trace action, and use
|
||
|
them to generate a new wakeup_latency event into the trace stream.
|
||
|
|
||
|
Note that the way the trace handlers such as wakeup_latency() (which
|
||
|
could equivalently be written trace(wakeup_latency,$wakeup_lat,next_pid)
|
||
|
are implemented, the parameters specified to the trace handler must be
|
||
|
variables. In this case, $wakeup_lat is obviously a variable, but
|
||
|
next_pid isn't, since it's just naming a field in the sched_switch
|
||
|
trace event. Since this is something that almost every trace() and
|
||
|
save() action does, a special shortcut is implemented to allow field
|
||
|
names to be used directly in those cases. How it works is that under
|
||
|
the covers, a temporary variable is created for the named field, and
|
||
|
this variable is what is actually passed to the trace handler. In the
|
||
|
code and documentation, this type of variable is called a 'field
|
||
|
variable'.
|
||
|
|
||
|
Fields on other trace event's histograms can be used as well. In that
|
||
|
case we have to generate a new histogram and an unfortunately named
|
||
|
'synthetic_field' (the use of synthetic here has nothing to do with
|
||
|
synthetic events) and use that special histogram field as a variable.
|
||
|
|
||
|
The diagram below illustrates the new elements described above in the
|
||
|
context of the sched_switch histogram using the onmatch() handler and
|
||
|
the trace() action.
|
||
|
|
||
|
First, we define the wakeup_latency synthetic event::
|
||
|
|
||
|
# echo 'wakeup_latency u64 lat; pid_t pid' >> synthetic_events
|
||
|
|
||
|
Next, the sched_waking hist trigger as before::
|
||
|
|
||
|
# echo 'hist:keys=pid:ts0=common_timestamp.usecs' >>
|
||
|
events/sched/sched_waking/trigger
|
||
|
|
||
|
Finally, we create a hist trigger on the sched_switch event that
|
||
|
generates a wakeup_latency() trace event. In this case we pass
|
||
|
next_pid into the wakeup_latency synthetic event invocation, which
|
||
|
means it will be automatically converted into a field variable::
|
||
|
|
||
|
# echo 'hist:keys=next_pid:wakeup_lat=common_timestamp.usecs-$ts0: \
|
||
|
onmatch(sched.sched_waking).wakeup_latency($wakeup_lat,next_pid)' >>
|
||
|
/sys/kernel/tracing/events/sched/sched_switch/trigger
|
||
|
|
||
|
The diagram for the sched_switch event is similar to previous examples
|
||
|
but shows the additional field_vars[] array for hist_data and shows
|
||
|
the linkages between the field_vars and the variables and references
|
||
|
created to implement the field variables. The details are discussed
|
||
|
below::
|
||
|
|
||
|
+------------------+
|
||
|
| hist_data |
|
||
|
+------------------+ +-----------------------+
|
||
|
| .fields[] |-->| val = hitcount |
|
||
|
+----------------+ +-----------------------+
|
||
|
| .map | | .size |
|
||
|
+----------------+ +---------------------+
|
||
|
+---| .field_vars[] | | .offset |
|
||
|
| +----------------+ +---------------------+
|
||
|
|+--| .var_refs[] | | .offset |
|
||
|
|| +----------------+ +---------------------+
|
||
|
|| | .fn() |
|
||
|
|| var_ref_vals[] +---------------------+
|
||
|
|| +-------------+ | .flags |
|
||
|
|| | $ts0 |<---+ +---------------------+
|
||
|
|| +-------------+ | | .var.idx |
|
||
|
|| | $next_pid |<-+ | +---------------------+
|
||
|
|| +-------------+ | | | .var.hist_data |
|
||
|
||+>| $wakeup_lat | | | +---------------------+
|
||
|
||| +-------------+ | | | .var_ref_idx |
|
||
|
||| | | | | +-----------------------+
|
||
|
||| +-------------+ | | | var = wakeup_lat |
|
||
|
||| . | | +-----------------------+
|
||
|
||| . | | | .size |
|
||
|
||| . | | +---------------------+
|
||
|
||| +-------------+ | | | .offset |
|
||
|
||| | | | | +---------------------+
|
||
|
||| +-------------+ | | | .fn() |
|
||
|
||| | | | | +---------------------+
|
||
|
||| +-------------+ | | | .flags & FL_VAR |
|
||
|
||| | | +---------------------+
|
||
|
||| | | | .var.idx |
|
||
|
||| | | +---------------------+
|
||
|
||| | | | .var.hist_data |
|
||
|
||| | | +---------------------+
|
||
|
||| | | | .var_ref_idx |
|
||
|
||| | | +---------------------+
|
||
|
||| | | .
|
||
|
||| | | .
|
||
|
||| | | .
|
||
|
||| | | .
|
||
|
||| +--------------+ | | .
|
||
|
+-->| field_var | | | .
|
||
|
|| +--------------+ | | .
|
||
|
|| | var | | | .
|
||
|
|| +------------+ | | .
|
||
|
|| | val | | | .
|
||
|
|| +--------------+ | | .
|
||
|
|| | field_var | | | .
|
||
|
|| +--------------+ | | .
|
||
|
|| | var | | | .
|
||
|
|| +------------+ | | .
|
||
|
|| | val | | | .
|
||
|
|| +------------+ | | .
|
||
|
|| . | | .
|
||
|
|| . | | .
|
||
|
|| . | | +-----------------------+ <--- n_vals
|
||
|
|| +--------------+ | | | key = pid |
|
||
|
|| | field_var | | | +-----------------------+
|
||
|
|| +--------------+ | | | .size |
|
||
|
|| | var |--+| +---------------------+
|
||
|
|| +------------+ ||| | .offset |
|
||
|
|| | val |-+|| +---------------------+
|
||
|
|| +------------+ ||| | .fn() |
|
||
|
|| ||| +---------------------+
|
||
|
|| ||| | .flags |
|
||
|
|| ||| +---------------------+
|
||
|
|| ||| | .var.idx |
|
||
|
|| ||| +---------------------+ <--- n_fields
|
||
|
|| |||
|
||
|
|| ||| n_keys = n_fields - n_vals
|
||
|
|| ||| +-----------------------+
|
||
|
|| |+->| var = next_pid |
|
||
|
|| | | +-----------------------+
|
||
|
|| | | | .size |
|
||
|
|| | | +---------------------+
|
||
|
|| | | | .offset |
|
||
|
|| | | +---------------------+
|
||
|
|| | | | .flags & FL_VAR |
|
||
|
|| | | +---------------------+
|
||
|
|| | | | .var.idx |
|
||
|
|| | | +---------------------+
|
||
|
|| | | | .var.hist_data |
|
||
|
|| | | +-----------------------+
|
||
|
|| +-->| val for next_pid |
|
||
|
|| | | +-----------------------+
|
||
|
|| | | | .size |
|
||
|
|| | | +---------------------+
|
||
|
|| | | | .offset |
|
||
|
|| | | +---------------------+
|
||
|
|| | | | .fn() |
|
||
|
|| | | +---------------------+
|
||
|
|| | | | .flags |
|
||
|
|| | | +---------------------+
|
||
|
|| | | | |
|
||
|
|| | | +---------------------+
|
||
|
|| | |
|
||
|
|| | |
|
||
|
|| | | +-----------------------+
|
||
|
+|------------------|-|>| var_ref = $ts0 |
|
||
|
| | | +-----------------------+
|
||
|
| | | | .size |
|
||
|
| | | +---------------------+
|
||
|
| | | | .offset |
|
||
|
| | | +---------------------+
|
||
|
| | | | .fn() |
|
||
|
| | | +---------------------+
|
||
|
| | | | .flags & FL_VAR_REF |
|
||
|
| | | +---------------------+
|
||
|
| | +---| .var_ref_idx |
|
||
|
| | +-----------------------+
|
||
|
| | | var_ref = $next_pid |
|
||
|
| | +-----------------------+
|
||
|
| | | .size |
|
||
|
| | +---------------------+
|
||
|
| | | .offset |
|
||
|
| | +---------------------+
|
||
|
| | | .fn() |
|
||
|
| | +---------------------+
|
||
|
| | | .flags & FL_VAR_REF |
|
||
|
| | +---------------------+
|
||
|
| +-----| .var_ref_idx |
|
||
|
| +-----------------------+
|
||
|
| | var_ref = $wakeup_lat |
|
||
|
| +-----------------------+
|
||
|
| | .size |
|
||
|
| +---------------------+
|
||
|
| | .offset |
|
||
|
| +---------------------+
|
||
|
| | .fn() |
|
||
|
| +---------------------+
|
||
|
| | .flags & FL_VAR_REF |
|
||
|
| +---------------------+
|
||
|
+------------------------| .var_ref_idx |
|
||
|
+---------------------+
|
||
|
|
||
|
As you can see, for a field variable, two hist_fields are created: one
|
||
|
representing the variable, in this case next_pid, and one to actually
|
||
|
get the value of the field from the trace stream, like a normal val
|
||
|
field does. These are created separately from normal variable
|
||
|
creation and are saved in the hist_data->field_vars[] array. See
|
||
|
below for how these are used. In addition, a reference hist_field is
|
||
|
also created, which is needed to reference the field variables such as
|
||
|
$next_pid variable in the trace() action.
|
||
|
|
||
|
Note that $wakeup_lat is also a variable reference, referencing the
|
||
|
value of the expression common_timestamp-$ts0, and so also needs to
|
||
|
have a hist field entry representing that reference created.
|
||
|
|
||
|
When hist_trigger_elt_update() is called to get the normal key and
|
||
|
value fields, it also calls update_field_vars(), which goes through
|
||
|
each field_var created for the histogram, and available from
|
||
|
hist_data->field_vars and calls val->fn() to get the data from the
|
||
|
current trace record, and then uses the var's var.idx to set the
|
||
|
variable at the var.idx offset in the appropriate tracing_map_elt's
|
||
|
variable at elt->vars[var.idx].
|
||
|
|
||
|
Once all the variables have been updated, resolve_var_refs() can be
|
||
|
called from event_hist_trigger(), and not only can our $ts0 and
|
||
|
$next_pid references be resolved but the $wakeup_lat reference as
|
||
|
well. At this point, the trace() action can simply access the values
|
||
|
assembled in the var_ref_vals[] array and generate the trace event.
|
||
|
|
||
|
The same process occurs for the field variables associated with the
|
||
|
save() action.
|
||
|
|
||
|
Abbreviations used in the diagram::
|
||
|
|
||
|
hist_data = struct hist_trigger_data
|
||
|
hist_data.fields = struct hist_field
|
||
|
field_var = struct field_var
|
||
|
fn = hist_field_fn_t
|
||
|
FL_KEY = HIST_FIELD_FL_KEY
|
||
|
FL_VAR = HIST_FIELD_FL_VAR
|
||
|
FL_VAR_REF = HIST_FIELD_FL_VAR_REF
|
||
|
|
||
|
trace() action field variable test
|
||
|
----------------------------------
|
||
|
|
||
|
This example adds to the previous test example by finally making use
|
||
|
of the wakeup_lat variable, but in addition also creates a couple of
|
||
|
field variables that then are all passed to the wakeup_latency() trace
|
||
|
action via the onmatch() handler.
|
||
|
|
||
|
First, we create the wakeup_latency synthetic event::
|
||
|
|
||
|
# echo 'wakeup_latency u64 lat; pid_t pid; char comm[16]' >> synthetic_events
|
||
|
|
||
|
Next, the sched_waking trigger from previous examples::
|
||
|
|
||
|
# echo 'hist:keys=pid:ts0=common_timestamp.usecs' >> events/sched/sched_waking/trigger
|
||
|
|
||
|
Finally, as in the previous test example, we calculate and assign the
|
||
|
wakeup latency using the $ts0 reference from the sched_waking trigger
|
||
|
to the wakeup_lat variable, and finally use it along with a couple
|
||
|
sched_switch event fields, next_pid and next_comm, to generate a
|
||
|
wakeup_latency trace event. The next_pid and next_comm event fields
|
||
|
are automatically converted into field variables for this purpose::
|
||
|
|
||
|
# echo 'hist:keys=next_pid:wakeup_lat=common_timestamp.usecs-$ts0:onmatch(sched.sched_waking).wakeup_latency($wakeup_lat,next_pid,next_comm)' >> /sys/kernel/tracing/events/sched/sched_switch/trigger
|
||
|
|
||
|
The sched_waking hist_debug output shows the same data as in the
|
||
|
previous test example::
|
||
|
|
||
|
# cat events/sched/sched_waking/hist_debug
|
||
|
|
||
|
# event histogram
|
||
|
#
|
||
|
# trigger info: hist:keys=pid:vals=hitcount:ts0=common_timestamp.usecs:sort=hitcount:size=2048:clock=global [active]
|
||
|
#
|
||
|
|
||
|
hist_data: 00000000d60ff61f
|
||
|
|
||
|
n_vals: 2
|
||
|
n_keys: 1
|
||
|
n_fields: 3
|
||
|
|
||
|
val fields:
|
||
|
|
||
|
hist_data->fields[0]:
|
||
|
flags:
|
||
|
VAL: HIST_FIELD_FL_HITCOUNT
|
||
|
type: u64
|
||
|
size: 8
|
||
|
is_signed: 0
|
||
|
|
||
|
hist_data->fields[1]:
|
||
|
flags:
|
||
|
HIST_FIELD_FL_VAR
|
||
|
var.name: ts0
|
||
|
var.idx (into tracing_map_elt.vars[]): 0
|
||
|
type: u64
|
||
|
size: 8
|
||
|
is_signed: 0
|
||
|
|
||
|
key fields:
|
||
|
|
||
|
hist_data->fields[2]:
|
||
|
flags:
|
||
|
HIST_FIELD_FL_KEY
|
||
|
ftrace_event_field name: pid
|
||
|
type: pid_t
|
||
|
size: 8
|
||
|
is_signed: 1
|
||
|
|
||
|
The sched_switch hist_debug output shows the same key and value fields
|
||
|
as in the previous test example - note that wakeup_lat is still in the
|
||
|
val fields section, but that the new field variables are not there -
|
||
|
although the field variables are variables, they're held separately in
|
||
|
the hist_data's field_vars[] array. Although the field variables and
|
||
|
the normal variables are located in separate places, you can see that
|
||
|
the actual variable locations for those variables in the
|
||
|
tracing_map_elt.vars[] do have increasing indices as expected:
|
||
|
wakeup_lat takes the var.idx = 0 slot, while the field variables for
|
||
|
next_pid and next_comm have values var.idx = 1, and var.idx = 2. Note
|
||
|
also that those are the same values displayed for the variable
|
||
|
references corresponding to those variables in the variable reference
|
||
|
fields section. Since there are two triggers and thus two hist_data
|
||
|
addresses, those addresses also need to be accounted for when doing
|
||
|
the matching - you can see that the first variable refers to the 0
|
||
|
var.idx on the previous hist trigger (see the hist_data address
|
||
|
associated with that trigger), while the second variable refers to the
|
||
|
0 var.idx on the sched_switch hist trigger, as do all the remaining
|
||
|
variable references.
|
||
|
|
||
|
Finally, the action tracking variables section just shows the system
|
||
|
and event name for the onmatch() handler::
|
||
|
|
||
|
# cat events/sched/sched_switch/hist_debug
|
||
|
|
||
|
# event histogram
|
||
|
#
|
||
|
# trigger info: hist:keys=next_pid:vals=hitcount:wakeup_lat=common_timestamp.usecs-$ts0:sort=hitcount:size=2048:clock=global:onmatch(sched.sched_waking).wakeup_latency($wakeup_lat,next_pid,next_comm) [active]
|
||
|
#
|
||
|
|
||
|
hist_data: 0000000008f551b7
|
||
|
|
||
|
n_vals: 2
|
||
|
n_keys: 1
|
||
|
n_fields: 3
|
||
|
|
||
|
val fields:
|
||
|
|
||
|
hist_data->fields[0]:
|
||
|
flags:
|
||
|
VAL: HIST_FIELD_FL_HITCOUNT
|
||
|
type: u64
|
||
|
size: 8
|
||
|
is_signed: 0
|
||
|
|
||
|
hist_data->fields[1]:
|
||
|
flags:
|
||
|
HIST_FIELD_FL_VAR
|
||
|
var.name: wakeup_lat
|
||
|
var.idx (into tracing_map_elt.vars[]): 0
|
||
|
type: u64
|
||
|
size: 0
|
||
|
is_signed: 0
|
||
|
|
||
|
key fields:
|
||
|
|
||
|
hist_data->fields[2]:
|
||
|
flags:
|
||
|
HIST_FIELD_FL_KEY
|
||
|
ftrace_event_field name: next_pid
|
||
|
type: pid_t
|
||
|
size: 8
|
||
|
is_signed: 1
|
||
|
|
||
|
variable reference fields:
|
||
|
|
||
|
hist_data->var_refs[0]:
|
||
|
flags:
|
||
|
HIST_FIELD_FL_VAR_REF
|
||
|
name: ts0
|
||
|
var.idx (into tracing_map_elt.vars[]): 0
|
||
|
var.hist_data: 00000000d60ff61f
|
||
|
var_ref_idx (into hist_data->var_refs[]): 0
|
||
|
type: u64
|
||
|
size: 8
|
||
|
is_signed: 0
|
||
|
|
||
|
hist_data->var_refs[1]:
|
||
|
flags:
|
||
|
HIST_FIELD_FL_VAR_REF
|
||
|
name: wakeup_lat
|
||
|
var.idx (into tracing_map_elt.vars[]): 0
|
||
|
var.hist_data: 0000000008f551b7
|
||
|
var_ref_idx (into hist_data->var_refs[]): 1
|
||
|
type: u64
|
||
|
size: 0
|
||
|
is_signed: 0
|
||
|
|
||
|
hist_data->var_refs[2]:
|
||
|
flags:
|
||
|
HIST_FIELD_FL_VAR_REF
|
||
|
name: next_pid
|
||
|
var.idx (into tracing_map_elt.vars[]): 1
|
||
|
var.hist_data: 0000000008f551b7
|
||
|
var_ref_idx (into hist_data->var_refs[]): 2
|
||
|
type: pid_t
|
||
|
size: 4
|
||
|
is_signed: 0
|
||
|
|
||
|
hist_data->var_refs[3]:
|
||
|
flags:
|
||
|
HIST_FIELD_FL_VAR_REF
|
||
|
name: next_comm
|
||
|
var.idx (into tracing_map_elt.vars[]): 2
|
||
|
var.hist_data: 0000000008f551b7
|
||
|
var_ref_idx (into hist_data->var_refs[]): 3
|
||
|
type: char[16]
|
||
|
size: 256
|
||
|
is_signed: 0
|
||
|
|
||
|
field variables:
|
||
|
|
||
|
hist_data->field_vars[0]:
|
||
|
|
||
|
field_vars[0].var:
|
||
|
flags:
|
||
|
HIST_FIELD_FL_VAR
|
||
|
var.name: next_pid
|
||
|
var.idx (into tracing_map_elt.vars[]): 1
|
||
|
|
||
|
field_vars[0].val:
|
||
|
ftrace_event_field name: next_pid
|
||
|
type: pid_t
|
||
|
size: 4
|
||
|
is_signed: 1
|
||
|
|
||
|
hist_data->field_vars[1]:
|
||
|
|
||
|
field_vars[1].var:
|
||
|
flags:
|
||
|
HIST_FIELD_FL_VAR
|
||
|
var.name: next_comm
|
||
|
var.idx (into tracing_map_elt.vars[]): 2
|
||
|
|
||
|
field_vars[1].val:
|
||
|
ftrace_event_field name: next_comm
|
||
|
type: char[16]
|
||
|
size: 256
|
||
|
is_signed: 0
|
||
|
|
||
|
action tracking variables (for onmax()/onchange()/onmatch()):
|
||
|
|
||
|
hist_data->actions[0].match_data.event_system: sched
|
||
|
hist_data->actions[0].match_data.event: sched_waking
|
||
|
|
||
|
The commands below can be used to clean things up for the next test::
|
||
|
|
||
|
# echo '!hist:keys=next_pid:wakeup_lat=common_timestamp.usecs-$ts0:onmatch(sched.sched_waking).wakeup_latency($wakeup_lat,next_pid,next_comm)' >> /sys/kernel/tracing/events/sched/sched_switch/trigger
|
||
|
|
||
|
# echo '!hist:keys=pid:ts0=common_timestamp.usecs' >> events/sched/sched_waking/trigger
|
||
|
|
||
|
# echo '!wakeup_latency u64 lat; pid_t pid; char comm[16]' >> synthetic_events
|
||
|
|
||
|
action_data and the trace() action
|
||
|
----------------------------------
|
||
|
|
||
|
As mentioned above, when the trace() action generates a synthetic
|
||
|
event, all the parameters to the synthetic event either already are
|
||
|
variables or are converted into variables (via field variables), and
|
||
|
finally all those variable values are collected via references to them
|
||
|
into a var_ref_vals[] array.
|
||
|
|
||
|
The values in the var_ref_vals[] array, however, don't necessarily
|
||
|
follow the same ordering as the synthetic event params. To address
|
||
|
that, struct action_data contains another array, var_ref_idx[] that
|
||
|
maps the trace action params to the var_ref_vals[] values. Below is a
|
||
|
diagram illustrating that for the wakeup_latency() synthetic event::
|
||
|
|
||
|
+------------------+ wakeup_latency()
|
||
|
| action_data | event params var_ref_vals[]
|
||
|
+------------------+ +-----------------+ +-----------------+
|
||
|
| .var_ref_idx[] |--->| $wakeup_lat idx |---+ | |
|
||
|
+----------------+ +-----------------+ | +-----------------+
|
||
|
| .synth_event | | $next_pid idx |---|-+ | $wakeup_lat val |
|
||
|
+----------------+ +-----------------+ | | +-----------------+
|
||
|
. | +->| $next_pid val |
|
||
|
. | +-----------------+
|
||
|
. | .
|
||
|
+-----------------+ | .
|
||
|
| | | .
|
||
|
+-----------------+ | +-----------------+
|
||
|
+--->| $wakeup_lat val |
|
||
|
+-----------------+
|
||
|
|
||
|
Basically, how this ends up getting used in the synthetic event probe
|
||
|
function, trace_event_raw_event_synth(), is as follows::
|
||
|
|
||
|
for each field i in .synth_event
|
||
|
val_idx = .var_ref_idx[i]
|
||
|
val = var_ref_vals[val_idx]
|
||
|
|
||
|
action_data and the onXXX() handlers
|
||
|
------------------------------------
|
||
|
|
||
|
The hist trigger onXXX() actions other than onmatch(), such as onmax()
|
||
|
and onchange(), also make use of and internally create hidden
|
||
|
variables. This information is contained in the
|
||
|
action_data.track_data struct, and is also visible in the hist_debug
|
||
|
output as will be described in the example below.
|
||
|
|
||
|
Typically, the onmax() or onchange() handlers are used in conjunction
|
||
|
with the save() and snapshot() actions. For example::
|
||
|
|
||
|
# echo 'hist:keys=next_pid:wakeup_lat=common_timestamp.usecs-$ts0: \
|
||
|
onmax($wakeup_lat).save(next_comm,prev_pid,prev_prio,prev_comm)' >>
|
||
|
/sys/kernel/tracing/events/sched/sched_switch/trigger
|
||
|
|
||
|
or::
|
||
|
|
||
|
# echo 'hist:keys=next_pid:wakeup_lat=common_timestamp.usecs-$ts0: \
|
||
|
onmax($wakeup_lat).snapshot()' >>
|
||
|
/sys/kernel/tracing/events/sched/sched_switch/trigger
|
||
|
|
||
|
save() action field variable test
|
||
|
---------------------------------
|
||
|
|
||
|
For this example, instead of generating a synthetic event, the save()
|
||
|
action is used to save field values whenever an onmax() handler
|
||
|
detects that a new max latency has been hit. As in the previous
|
||
|
example, the values being saved are also field values, but in this
|
||
|
case, are kept in a separate hist_data array named save_vars[].
|
||
|
|
||
|
As in previous test examples, we set up the sched_waking trigger::
|
||
|
|
||
|
# echo 'hist:keys=pid:ts0=common_timestamp.usecs' >> events/sched/sched_waking/trigger
|
||
|
|
||
|
In this case, however, we set up the sched_switch trigger to save some
|
||
|
sched_switch field values whenever we hit a new maximum latency. For
|
||
|
both the onmax() handler and save() action, variables will be created,
|
||
|
which we can use the hist_debug files to examine::
|
||
|
|
||
|
# echo 'hist:keys=next_pid:wakeup_lat=common_timestamp.usecs-$ts0:onmax($wakeup_lat).save(next_comm,prev_pid,prev_prio,prev_comm)' >> events/sched/sched_switch/trigger
|
||
|
|
||
|
The sched_waking hist_debug output shows the same data as in the
|
||
|
previous test examples::
|
||
|
|
||
|
# cat events/sched/sched_waking/hist_debug
|
||
|
|
||
|
#
|
||
|
# trigger info: hist:keys=pid:vals=hitcount:ts0=common_timestamp.usecs:sort=hitcount:size=2048:clock=global [active]
|
||
|
#
|
||
|
|
||
|
hist_data: 00000000e6290f48
|
||
|
|
||
|
n_vals: 2
|
||
|
n_keys: 1
|
||
|
n_fields: 3
|
||
|
|
||
|
val fields:
|
||
|
|
||
|
hist_data->fields[0]:
|
||
|
flags:
|
||
|
VAL: HIST_FIELD_FL_HITCOUNT
|
||
|
type: u64
|
||
|
size: 8
|
||
|
is_signed: 0
|
||
|
|
||
|
hist_data->fields[1]:
|
||
|
flags:
|
||
|
HIST_FIELD_FL_VAR
|
||
|
var.name: ts0
|
||
|
var.idx (into tracing_map_elt.vars[]): 0
|
||
|
type: u64
|
||
|
size: 8
|
||
|
is_signed: 0
|
||
|
|
||
|
key fields:
|
||
|
|
||
|
hist_data->fields[2]:
|
||
|
flags:
|
||
|
HIST_FIELD_FL_KEY
|
||
|
ftrace_event_field name: pid
|
||
|
type: pid_t
|
||
|
size: 8
|
||
|
is_signed: 1
|
||
|
|
||
|
The output of the sched_switch trigger shows the same val and key
|
||
|
values as before, but also shows a couple new sections.
|
||
|
|
||
|
First, the action tracking variables section now shows the
|
||
|
actions[].track_data information describing the special tracking
|
||
|
variables and references used to track, in this case, the running
|
||
|
maximum value. The actions[].track_data.var_ref member contains the
|
||
|
reference to the variable being tracked, in this case the $wakeup_lat
|
||
|
variable. In order to perform the onmax() handler function, there
|
||
|
also needs to be a variable that tracks the current maximum by getting
|
||
|
updated whenever a new maximum is hit. In this case, we can see that
|
||
|
an auto-generated variable named ' __max' has been created and is
|
||
|
visible in the actions[].track_data.track_var variable.
|
||
|
|
||
|
Finally, in the new 'save action variables' section, we can see that
|
||
|
the 4 params to the save() function have resulted in 4 field variables
|
||
|
being created for the purposes of saving the values of the named
|
||
|
fields when the max is hit. These variables are kept in a separate
|
||
|
save_vars[] array off of hist_data, so are displayed in a separate
|
||
|
section::
|
||
|
|
||
|
# cat events/sched/sched_switch/hist_debug
|
||
|
|
||
|
# event histogram
|
||
|
#
|
||
|
# trigger info: hist:keys=next_pid:vals=hitcount:wakeup_lat=common_timestamp.usecs-$ts0:sort=hitcount:size=2048:clock=global:onmax($wakeup_lat).save(next_comm,prev_pid,prev_prio,prev_comm) [active]
|
||
|
#
|
||
|
|
||
|
hist_data: 0000000057bcd28d
|
||
|
|
||
|
n_vals: 2
|
||
|
n_keys: 1
|
||
|
n_fields: 3
|
||
|
|
||
|
val fields:
|
||
|
|
||
|
hist_data->fields[0]:
|
||
|
flags:
|
||
|
VAL: HIST_FIELD_FL_HITCOUNT
|
||
|
type: u64
|
||
|
size: 8
|
||
|
is_signed: 0
|
||
|
|
||
|
hist_data->fields[1]:
|
||
|
flags:
|
||
|
HIST_FIELD_FL_VAR
|
||
|
var.name: wakeup_lat
|
||
|
var.idx (into tracing_map_elt.vars[]): 0
|
||
|
type: u64
|
||
|
size: 0
|
||
|
is_signed: 0
|
||
|
|
||
|
key fields:
|
||
|
|
||
|
hist_data->fields[2]:
|
||
|
flags:
|
||
|
HIST_FIELD_FL_KEY
|
||
|
ftrace_event_field name: next_pid
|
||
|
type: pid_t
|
||
|
size: 8
|
||
|
is_signed: 1
|
||
|
|
||
|
variable reference fields:
|
||
|
|
||
|
hist_data->var_refs[0]:
|
||
|
flags:
|
||
|
HIST_FIELD_FL_VAR_REF
|
||
|
name: ts0
|
||
|
var.idx (into tracing_map_elt.vars[]): 0
|
||
|
var.hist_data: 00000000e6290f48
|
||
|
var_ref_idx (into hist_data->var_refs[]): 0
|
||
|
type: u64
|
||
|
size: 8
|
||
|
is_signed: 0
|
||
|
|
||
|
hist_data->var_refs[1]:
|
||
|
flags:
|
||
|
HIST_FIELD_FL_VAR_REF
|
||
|
name: wakeup_lat
|
||
|
var.idx (into tracing_map_elt.vars[]): 0
|
||
|
var.hist_data: 0000000057bcd28d
|
||
|
var_ref_idx (into hist_data->var_refs[]): 1
|
||
|
type: u64
|
||
|
size: 0
|
||
|
is_signed: 0
|
||
|
|
||
|
action tracking variables (for onmax()/onchange()/onmatch()):
|
||
|
|
||
|
hist_data->actions[0].track_data.var_ref:
|
||
|
flags:
|
||
|
HIST_FIELD_FL_VAR_REF
|
||
|
name: wakeup_lat
|
||
|
var.idx (into tracing_map_elt.vars[]): 0
|
||
|
var.hist_data: 0000000057bcd28d
|
||
|
var_ref_idx (into hist_data->var_refs[]): 1
|
||
|
type: u64
|
||
|
size: 0
|
||
|
is_signed: 0
|
||
|
|
||
|
hist_data->actions[0].track_data.track_var:
|
||
|
flags:
|
||
|
HIST_FIELD_FL_VAR
|
||
|
var.name: __max
|
||
|
var.idx (into tracing_map_elt.vars[]): 1
|
||
|
type: u64
|
||
|
size: 8
|
||
|
is_signed: 0
|
||
|
|
||
|
save action variables (save() params):
|
||
|
|
||
|
hist_data->save_vars[0]:
|
||
|
|
||
|
save_vars[0].var:
|
||
|
flags:
|
||
|
HIST_FIELD_FL_VAR
|
||
|
var.name: next_comm
|
||
|
var.idx (into tracing_map_elt.vars[]): 2
|
||
|
|
||
|
save_vars[0].val:
|
||
|
ftrace_event_field name: next_comm
|
||
|
type: char[16]
|
||
|
size: 256
|
||
|
is_signed: 0
|
||
|
|
||
|
hist_data->save_vars[1]:
|
||
|
|
||
|
save_vars[1].var:
|
||
|
flags:
|
||
|
HIST_FIELD_FL_VAR
|
||
|
var.name: prev_pid
|
||
|
var.idx (into tracing_map_elt.vars[]): 3
|
||
|
|
||
|
save_vars[1].val:
|
||
|
ftrace_event_field name: prev_pid
|
||
|
type: pid_t
|
||
|
size: 4
|
||
|
is_signed: 1
|
||
|
|
||
|
hist_data->save_vars[2]:
|
||
|
|
||
|
save_vars[2].var:
|
||
|
flags:
|
||
|
HIST_FIELD_FL_VAR
|
||
|
var.name: prev_prio
|
||
|
var.idx (into tracing_map_elt.vars[]): 4
|
||
|
|
||
|
save_vars[2].val:
|
||
|
ftrace_event_field name: prev_prio
|
||
|
type: int
|
||
|
size: 4
|
||
|
is_signed: 1
|
||
|
|
||
|
hist_data->save_vars[3]:
|
||
|
|
||
|
save_vars[3].var:
|
||
|
flags:
|
||
|
HIST_FIELD_FL_VAR
|
||
|
var.name: prev_comm
|
||
|
var.idx (into tracing_map_elt.vars[]): 5
|
||
|
|
||
|
save_vars[3].val:
|
||
|
ftrace_event_field name: prev_comm
|
||
|
type: char[16]
|
||
|
size: 256
|
||
|
is_signed: 0
|
||
|
|
||
|
The commands below can be used to clean things up for the next test::
|
||
|
|
||
|
# echo '!hist:keys=next_pid:wakeup_lat=common_timestamp.usecs-$ts0:onmax($wakeup_lat).save(next_comm,prev_pid,prev_prio,prev_comm)' >> events/sched/sched_switch/trigger
|
||
|
|
||
|
# echo '!hist:keys=pid:ts0=common_timestamp.usecs' >> events/sched/sched_waking/trigger
|
||
|
|
||
|
A couple special cases
|
||
|
======================
|
||
|
|
||
|
While the above covers the basics of the histogram internals, there
|
||
|
are a couple of special cases that should be discussed, since they
|
||
|
tend to create even more confusion. Those are field variables on other
|
||
|
histograms, and aliases, both described below through example tests
|
||
|
using the hist_debug files.
|
||
|
|
||
|
Test of field variables on other histograms
|
||
|
-------------------------------------------
|
||
|
|
||
|
This example is similar to the previous examples, but in this case,
|
||
|
the sched_switch trigger references a hist trigger field on another
|
||
|
event, namely the sched_waking event. In order to accomplish this, a
|
||
|
field variable is created for the other event, but since an existing
|
||
|
histogram can't be used, as existing histograms are immutable, a new
|
||
|
histogram with a matching variable is created and used, and we'll see
|
||
|
that reflected in the hist_debug output shown below.
|
||
|
|
||
|
First, we create the wakeup_latency synthetic event. Note the
|
||
|
addition of the prio field::
|
||
|
|
||
|
# echo 'wakeup_latency u64 lat; pid_t pid; int prio' >> synthetic_events
|
||
|
|
||
|
As in previous test examples, we set up the sched_waking trigger::
|
||
|
|
||
|
# echo 'hist:keys=pid:ts0=common_timestamp.usecs' >> events/sched/sched_waking/trigger
|
||
|
|
||
|
Here we set up a hist trigger on sched_switch to send a wakeup_latency
|
||
|
event using an onmatch handler naming the sched_waking event. Note
|
||
|
that the third param being passed to the wakeup_latency() is prio,
|
||
|
which is a field name that needs to have a field variable created for
|
||
|
it. There isn't however any prio field on the sched_switch event so
|
||
|
it would seem that it wouldn't be possible to create a field variable
|
||
|
for it. The matching sched_waking event does have a prio field, so it
|
||
|
should be possible to make use of it for this purpose. The problem
|
||
|
with that is that it's not currently possible to define a new variable
|
||
|
on an existing histogram, so it's not possible to add a new prio field
|
||
|
variable to the existing sched_waking histogram. It is however
|
||
|
possible to create an additional new 'matching' sched_waking histogram
|
||
|
for the same event, meaning that it uses the same key and filters, and
|
||
|
define the new prio field variable on that.
|
||
|
|
||
|
Here's the sched_switch trigger::
|
||
|
|
||
|
# echo 'hist:keys=next_pid:wakeup_lat=common_timestamp.usecs-$ts0:onmatch(sched.sched_waking).wakeup_latency($wakeup_lat,next_pid,prio)' >> events/sched/sched_switch/trigger
|
||
|
|
||
|
And here's the output of the hist_debug information for the
|
||
|
sched_waking hist trigger. Note that there are two histograms
|
||
|
displayed in the output: the first is the normal sched_waking
|
||
|
histogram we've seen in the previous examples, and the second is the
|
||
|
special histogram we created to provide the prio field variable.
|
||
|
|
||
|
Looking at the second histogram below, we see a variable with the name
|
||
|
synthetic_prio. This is the field variable created for the prio field
|
||
|
on that sched_waking histogram::
|
||
|
|
||
|
# cat events/sched/sched_waking/hist_debug
|
||
|
|
||
|
# event histogram
|
||
|
#
|
||
|
# trigger info: hist:keys=pid:vals=hitcount:ts0=common_timestamp.usecs:sort=hitcount:size=2048:clock=global [active]
|
||
|
#
|
||
|
|
||
|
hist_data: 00000000349570e4
|
||
|
|
||
|
n_vals: 2
|
||
|
n_keys: 1
|
||
|
n_fields: 3
|
||
|
|
||
|
val fields:
|
||
|
|
||
|
hist_data->fields[0]:
|
||
|
flags:
|
||
|
VAL: HIST_FIELD_FL_HITCOUNT
|
||
|
type: u64
|
||
|
size: 8
|
||
|
is_signed: 0
|
||
|
|
||
|
hist_data->fields[1]:
|
||
|
flags:
|
||
|
HIST_FIELD_FL_VAR
|
||
|
var.name: ts0
|
||
|
var.idx (into tracing_map_elt.vars[]): 0
|
||
|
type: u64
|
||
|
size: 8
|
||
|
is_signed: 0
|
||
|
|
||
|
key fields:
|
||
|
|
||
|
hist_data->fields[2]:
|
||
|
flags:
|
||
|
HIST_FIELD_FL_KEY
|
||
|
ftrace_event_field name: pid
|
||
|
type: pid_t
|
||
|
size: 8
|
||
|
is_signed: 1
|
||
|
|
||
|
|
||
|
# event histogram
|
||
|
#
|
||
|
# trigger info: hist:keys=pid:vals=hitcount:synthetic_prio=prio:sort=hitcount:size=2048 [active]
|
||
|
#
|
||
|
|
||
|
hist_data: 000000006920cf38
|
||
|
|
||
|
n_vals: 2
|
||
|
n_keys: 1
|
||
|
n_fields: 3
|
||
|
|
||
|
val fields:
|
||
|
|
||
|
hist_data->fields[0]:
|
||
|
flags:
|
||
|
VAL: HIST_FIELD_FL_HITCOUNT
|
||
|
type: u64
|
||
|
size: 8
|
||
|
is_signed: 0
|
||
|
|
||
|
hist_data->fields[1]:
|
||
|
flags:
|
||
|
HIST_FIELD_FL_VAR
|
||
|
ftrace_event_field name: prio
|
||
|
var.name: synthetic_prio
|
||
|
var.idx (into tracing_map_elt.vars[]): 0
|
||
|
type: int
|
||
|
size: 4
|
||
|
is_signed: 1
|
||
|
|
||
|
key fields:
|
||
|
|
||
|
hist_data->fields[2]:
|
||
|
flags:
|
||
|
HIST_FIELD_FL_KEY
|
||
|
ftrace_event_field name: pid
|
||
|
type: pid_t
|
||
|
size: 8
|
||
|
is_signed: 1
|
||
|
|
||
|
Looking at the sched_switch histogram below, we can see a reference to
|
||
|
the synthetic_prio variable on sched_waking, and looking at the
|
||
|
associated hist_data address we see that it is indeed associated with
|
||
|
the new histogram. Note also that the other references are to a
|
||
|
normal variable, wakeup_lat, and to a normal field variable, next_pid,
|
||
|
the details of which are in the field variables section::
|
||
|
|
||
|
# cat events/sched/sched_switch/hist_debug
|
||
|
|
||
|
# event histogram
|
||
|
#
|
||
|
# trigger info: hist:keys=next_pid:vals=hitcount:wakeup_lat=common_timestamp.usecs-$ts0:sort=hitcount:size=2048:clock=global:onmatch(sched.sched_waking).wakeup_latency($wakeup_lat,next_pid,prio) [active]
|
||
|
#
|
||
|
|
||
|
hist_data: 00000000a73b67df
|
||
|
|
||
|
n_vals: 2
|
||
|
n_keys: 1
|
||
|
n_fields: 3
|
||
|
|
||
|
val fields:
|
||
|
|
||
|
hist_data->fields[0]:
|
||
|
flags:
|
||
|
VAL: HIST_FIELD_FL_HITCOUNT
|
||
|
type: u64
|
||
|
size: 8
|
||
|
is_signed: 0
|
||
|
|
||
|
hist_data->fields[1]:
|
||
|
flags:
|
||
|
HIST_FIELD_FL_VAR
|
||
|
var.name: wakeup_lat
|
||
|
var.idx (into tracing_map_elt.vars[]): 0
|
||
|
type: u64
|
||
|
size: 0
|
||
|
is_signed: 0
|
||
|
|
||
|
key fields:
|
||
|
|
||
|
hist_data->fields[2]:
|
||
|
flags:
|
||
|
HIST_FIELD_FL_KEY
|
||
|
ftrace_event_field name: next_pid
|
||
|
type: pid_t
|
||
|
size: 8
|
||
|
is_signed: 1
|
||
|
|
||
|
variable reference fields:
|
||
|
|
||
|
hist_data->var_refs[0]:
|
||
|
flags:
|
||
|
HIST_FIELD_FL_VAR_REF
|
||
|
name: ts0
|
||
|
var.idx (into tracing_map_elt.vars[]): 0
|
||
|
var.hist_data: 00000000349570e4
|
||
|
var_ref_idx (into hist_data->var_refs[]): 0
|
||
|
type: u64
|
||
|
size: 8
|
||
|
is_signed: 0
|
||
|
|
||
|
hist_data->var_refs[1]:
|
||
|
flags:
|
||
|
HIST_FIELD_FL_VAR_REF
|
||
|
name: wakeup_lat
|
||
|
var.idx (into tracing_map_elt.vars[]): 0
|
||
|
var.hist_data: 00000000a73b67df
|
||
|
var_ref_idx (into hist_data->var_refs[]): 1
|
||
|
type: u64
|
||
|
size: 0
|
||
|
is_signed: 0
|
||
|
|
||
|
hist_data->var_refs[2]:
|
||
|
flags:
|
||
|
HIST_FIELD_FL_VAR_REF
|
||
|
name: next_pid
|
||
|
var.idx (into tracing_map_elt.vars[]): 1
|
||
|
var.hist_data: 00000000a73b67df
|
||
|
var_ref_idx (into hist_data->var_refs[]): 2
|
||
|
type: pid_t
|
||
|
size: 4
|
||
|
is_signed: 0
|
||
|
|
||
|
hist_data->var_refs[3]:
|
||
|
flags:
|
||
|
HIST_FIELD_FL_VAR_REF
|
||
|
name: synthetic_prio
|
||
|
var.idx (into tracing_map_elt.vars[]): 0
|
||
|
var.hist_data: 000000006920cf38
|
||
|
var_ref_idx (into hist_data->var_refs[]): 3
|
||
|
type: int
|
||
|
size: 4
|
||
|
is_signed: 1
|
||
|
|
||
|
field variables:
|
||
|
|
||
|
hist_data->field_vars[0]:
|
||
|
|
||
|
field_vars[0].var:
|
||
|
flags:
|
||
|
HIST_FIELD_FL_VAR
|
||
|
var.name: next_pid
|
||
|
var.idx (into tracing_map_elt.vars[]): 1
|
||
|
|
||
|
field_vars[0].val:
|
||
|
ftrace_event_field name: next_pid
|
||
|
type: pid_t
|
||
|
size: 4
|
||
|
is_signed: 1
|
||
|
|
||
|
action tracking variables (for onmax()/onchange()/onmatch()):
|
||
|
|
||
|
hist_data->actions[0].match_data.event_system: sched
|
||
|
hist_data->actions[0].match_data.event: sched_waking
|
||
|
|
||
|
The commands below can be used to clean things up for the next test::
|
||
|
|
||
|
# echo '!hist:keys=next_pid:wakeup_lat=common_timestamp.usecs-$ts0:onmatch(sched.sched_waking).wakeup_latency($wakeup_lat,next_pid,prio)' >> events/sched/sched_switch/trigger
|
||
|
|
||
|
# echo '!hist:keys=pid:ts0=common_timestamp.usecs' >> events/sched/sched_waking/trigger
|
||
|
|
||
|
# echo '!wakeup_latency u64 lat; pid_t pid; int prio' >> synthetic_events
|
||
|
|
||
|
Alias test
|
||
|
----------
|
||
|
|
||
|
This example is very similar to previous examples, but demonstrates
|
||
|
the alias flag.
|
||
|
|
||
|
First, we create the wakeup_latency synthetic event::
|
||
|
|
||
|
# echo 'wakeup_latency u64 lat; pid_t pid; char comm[16]' >> synthetic_events
|
||
|
|
||
|
Next, we create a sched_waking trigger similar to previous examples,
|
||
|
but in this case we save the pid in the waking_pid variable::
|
||
|
|
||
|
# echo 'hist:keys=pid:waking_pid=pid:ts0=common_timestamp.usecs' >> events/sched/sched_waking/trigger
|
||
|
|
||
|
For the sched_switch trigger, instead of using $waking_pid directly in
|
||
|
the wakeup_latency synthetic event invocation, we create an alias of
|
||
|
$waking_pid named $woken_pid, and use that in the synthetic event
|
||
|
invocation instead::
|
||
|
|
||
|
# echo 'hist:keys=next_pid:woken_pid=$waking_pid:wakeup_lat=common_timestamp.usecs-$ts0:onmatch(sched.sched_waking).wakeup_latency($wakeup_lat,$woken_pid,next_comm)' >> events/sched/sched_switch/trigger
|
||
|
|
||
|
Looking at the sched_waking hist_debug output, in addition to the
|
||
|
normal fields, we can see the waking_pid variable::
|
||
|
|
||
|
# cat events/sched/sched_waking/hist_debug
|
||
|
|
||
|
# event histogram
|
||
|
#
|
||
|
# trigger info: hist:keys=pid:vals=hitcount:waking_pid=pid,ts0=common_timestamp.usecs:sort=hitcount:size=2048:clock=global [active]
|
||
|
#
|
||
|
|
||
|
hist_data: 00000000a250528c
|
||
|
|
||
|
n_vals: 3
|
||
|
n_keys: 1
|
||
|
n_fields: 4
|
||
|
|
||
|
val fields:
|
||
|
|
||
|
hist_data->fields[0]:
|
||
|
flags:
|
||
|
VAL: HIST_FIELD_FL_HITCOUNT
|
||
|
type: u64
|
||
|
size: 8
|
||
|
is_signed: 0
|
||
|
|
||
|
hist_data->fields[1]:
|
||
|
flags:
|
||
|
HIST_FIELD_FL_VAR
|
||
|
ftrace_event_field name: pid
|
||
|
var.name: waking_pid
|
||
|
var.idx (into tracing_map_elt.vars[]): 0
|
||
|
type: pid_t
|
||
|
size: 4
|
||
|
is_signed: 1
|
||
|
|
||
|
hist_data->fields[2]:
|
||
|
flags:
|
||
|
HIST_FIELD_FL_VAR
|
||
|
var.name: ts0
|
||
|
var.idx (into tracing_map_elt.vars[]): 1
|
||
|
type: u64
|
||
|
size: 8
|
||
|
is_signed: 0
|
||
|
|
||
|
key fields:
|
||
|
|
||
|
hist_data->fields[3]:
|
||
|
flags:
|
||
|
HIST_FIELD_FL_KEY
|
||
|
ftrace_event_field name: pid
|
||
|
type: pid_t
|
||
|
size: 8
|
||
|
is_signed: 1
|
||
|
|
||
|
The sched_switch hist_debug output shows that a variable named
|
||
|
woken_pid has been created but that it also has the
|
||
|
HIST_FIELD_FL_ALIAS flag set. It also has the HIST_FIELD_FL_VAR flag
|
||
|
set, which is why it appears in the val field section.
|
||
|
|
||
|
Despite that implementation detail, an alias variable is actually more
|
||
|
like a variable reference; in fact it can be thought of as a reference
|
||
|
to a reference. The implementation copies the var_ref->fn() from the
|
||
|
variable reference being referenced, in this case, the waking_pid
|
||
|
fn(), which is hist_field_var_ref() and makes that the fn() of the
|
||
|
alias. The hist_field_var_ref() fn() requires the var_ref_idx of the
|
||
|
variable reference it's using, so waking_pid's var_ref_idx is also
|
||
|
copied to the alias. The end result is that when the value of alias
|
||
|
is retrieved, in the end it just does the same thing the original
|
||
|
reference would have done and retrieves the same value from the
|
||
|
var_ref_vals[] array. You can verify this in the output by noting
|
||
|
that the var_ref_idx of the alias, in this case woken_pid, is the same
|
||
|
as the var_ref_idx of the reference, waking_pid, in the variable
|
||
|
reference fields section.
|
||
|
|
||
|
Additionally, once it gets that value, since it is also a variable, it
|
||
|
then saves that value into its var.idx. So the var.idx of the
|
||
|
woken_pid alias is 0, which it fills with the value from var_ref_idx 0
|
||
|
when its fn() is called to update itself. You'll also notice that
|
||
|
there's a woken_pid var_ref in the variable refs section. That is the
|
||
|
reference to the woken_pid alias variable, and you can see that it
|
||
|
retrieves the value from the same var.idx as the woken_pid alias, 0,
|
||
|
and then in turn saves that value in its own var_ref_idx slot, 3, and
|
||
|
the value at this position is finally what gets assigned to the
|
||
|
$woken_pid slot in the trace event invocation::
|
||
|
|
||
|
# cat events/sched/sched_switch/hist_debug
|
||
|
|
||
|
# event histogram
|
||
|
#
|
||
|
# trigger info: hist:keys=next_pid:vals=hitcount:woken_pid=$waking_pid,wakeup_lat=common_timestamp.usecs-$ts0:sort=hitcount:size=2048:clock=global:onmatch(sched.sched_waking).wakeup_latency($wakeup_lat,$woken_pid,next_comm) [active]
|
||
|
#
|
||
|
|
||
|
hist_data: 0000000055d65ed0
|
||
|
|
||
|
n_vals: 3
|
||
|
n_keys: 1
|
||
|
n_fields: 4
|
||
|
|
||
|
val fields:
|
||
|
|
||
|
hist_data->fields[0]:
|
||
|
flags:
|
||
|
VAL: HIST_FIELD_FL_HITCOUNT
|
||
|
type: u64
|
||
|
size: 8
|
||
|
is_signed: 0
|
||
|
|
||
|
hist_data->fields[1]:
|
||
|
flags:
|
||
|
HIST_FIELD_FL_VAR
|
||
|
HIST_FIELD_FL_ALIAS
|
||
|
var.name: woken_pid
|
||
|
var.idx (into tracing_map_elt.vars[]): 0
|
||
|
var_ref_idx (into hist_data->var_refs[]): 0
|
||
|
type: pid_t
|
||
|
size: 4
|
||
|
is_signed: 1
|
||
|
|
||
|
hist_data->fields[2]:
|
||
|
flags:
|
||
|
HIST_FIELD_FL_VAR
|
||
|
var.name: wakeup_lat
|
||
|
var.idx (into tracing_map_elt.vars[]): 1
|
||
|
type: u64
|
||
|
size: 0
|
||
|
is_signed: 0
|
||
|
|
||
|
key fields:
|
||
|
|
||
|
hist_data->fields[3]:
|
||
|
flags:
|
||
|
HIST_FIELD_FL_KEY
|
||
|
ftrace_event_field name: next_pid
|
||
|
type: pid_t
|
||
|
size: 8
|
||
|
is_signed: 1
|
||
|
|
||
|
variable reference fields:
|
||
|
|
||
|
hist_data->var_refs[0]:
|
||
|
flags:
|
||
|
HIST_FIELD_FL_VAR_REF
|
||
|
name: waking_pid
|
||
|
var.idx (into tracing_map_elt.vars[]): 0
|
||
|
var.hist_data: 00000000a250528c
|
||
|
var_ref_idx (into hist_data->var_refs[]): 0
|
||
|
type: pid_t
|
||
|
size: 4
|
||
|
is_signed: 1
|
||
|
|
||
|
hist_data->var_refs[1]:
|
||
|
flags:
|
||
|
HIST_FIELD_FL_VAR_REF
|
||
|
name: ts0
|
||
|
var.idx (into tracing_map_elt.vars[]): 1
|
||
|
var.hist_data: 00000000a250528c
|
||
|
var_ref_idx (into hist_data->var_refs[]): 1
|
||
|
type: u64
|
||
|
size: 8
|
||
|
is_signed: 0
|
||
|
|
||
|
hist_data->var_refs[2]:
|
||
|
flags:
|
||
|
HIST_FIELD_FL_VAR_REF
|
||
|
name: wakeup_lat
|
||
|
var.idx (into tracing_map_elt.vars[]): 1
|
||
|
var.hist_data: 0000000055d65ed0
|
||
|
var_ref_idx (into hist_data->var_refs[]): 2
|
||
|
type: u64
|
||
|
size: 0
|
||
|
is_signed: 0
|
||
|
|
||
|
hist_data->var_refs[3]:
|
||
|
flags:
|
||
|
HIST_FIELD_FL_VAR_REF
|
||
|
name: woken_pid
|
||
|
var.idx (into tracing_map_elt.vars[]): 0
|
||
|
var.hist_data: 0000000055d65ed0
|
||
|
var_ref_idx (into hist_data->var_refs[]): 3
|
||
|
type: pid_t
|
||
|
size: 4
|
||
|
is_signed: 1
|
||
|
|
||
|
hist_data->var_refs[4]:
|
||
|
flags:
|
||
|
HIST_FIELD_FL_VAR_REF
|
||
|
name: next_comm
|
||
|
var.idx (into tracing_map_elt.vars[]): 2
|
||
|
var.hist_data: 0000000055d65ed0
|
||
|
var_ref_idx (into hist_data->var_refs[]): 4
|
||
|
type: char[16]
|
||
|
size: 256
|
||
|
is_signed: 0
|
||
|
|
||
|
field variables:
|
||
|
|
||
|
hist_data->field_vars[0]:
|
||
|
|
||
|
field_vars[0].var:
|
||
|
flags:
|
||
|
HIST_FIELD_FL_VAR
|
||
|
var.name: next_comm
|
||
|
var.idx (into tracing_map_elt.vars[]): 2
|
||
|
|
||
|
field_vars[0].val:
|
||
|
ftrace_event_field name: next_comm
|
||
|
type: char[16]
|
||
|
size: 256
|
||
|
is_signed: 0
|
||
|
|
||
|
action tracking variables (for onmax()/onchange()/onmatch()):
|
||
|
|
||
|
hist_data->actions[0].match_data.event_system: sched
|
||
|
hist_data->actions[0].match_data.event: sched_waking
|
||
|
|
||
|
The commands below can be used to clean things up for the next test::
|
||
|
|
||
|
# echo '!hist:keys=next_pid:woken_pid=$waking_pid:wakeup_lat=common_timestamp.usecs-$ts0:onmatch(sched.sched_waking).wakeup_latency($wakeup_lat,$woken_pid,next_comm)' >> events/sched/sched_switch/trigger
|
||
|
|
||
|
# echo '!hist:keys=pid:ts0=common_timestamp.usecs' >> events/sched/sched_waking/trigger
|
||
|
|
||
|
# echo '!wakeup_latency u64 lat; pid_t pid; char comm[16]' >> synthetic_events
|