Skip to content

Commit 66a8cb9

Browse files
Steven Rostedtrostedt
authored andcommitted
ring-buffer: Add place holder recording of dropped events
Currently, when the ring buffer drops events, it does not record the fact that it did so. It does inform the writer that the event was dropped by returning a NULL event, but it does not put in any place holder where the event was dropped. This is not a trivial thing to add because the ring buffer mostly runs in overwrite (flight recorder) mode. That is, when the ring buffer is full, new data will overwrite old data. In a produce/consumer mode, where new data is simply dropped when the ring buffer is full, it is trivial to add the placeholder for dropped events. When there's more room to write new data, then a special event can be added to notify the reader about the dropped events. But in overwrite mode, any new write can overwrite events. A place holder can not be inserted into the ring buffer since there never may be room. A reader could also come in at anytime and miss the placeholder. Luckily, the way the ring buffer works, the read side can find out if events were lost or not, and how many events. Everytime a write takes place, if it overwrites the header page (the next read) it updates a "overrun" variable that keeps track of the number of lost events. When a reader swaps out a page from the ring buffer, it can record this number, perfom the swap, and then check to see if the number changed, and take the diff if it has, which would be the number of events dropped. This can be stored by the reader and returned to callers of the reader. Since the reader page swap will fail if the writer moved the head page since the time the reader page set up the swap, this gives room to record the overruns without worrying about races. If the reader sets up the pages, records the overrun, than performs the swap, if the swap succeeds, then the overrun variable has not been updated since the setup before the swap. For binary readers of the ring buffer, a flag is set in the header of each sub page (sub buffer) of the ring buffer. This flag is embedded in the size field of the data on the sub buffer, in the 31st bit (the size can be 32 or 64 bits depending on the architecture), but only 27 bits needs to be used for the actual size (less actually). We could add a new field in the sub buffer header to also record the number of events dropped since the last read, but this will change the format of the binary ring buffer a bit too much. Perhaps this change can be made if the information on the number of events dropped is considered important enough. Note, the notification of dropped events is only used by consuming reads or peeking at the ring buffer. Iterating over the ring buffer does not keep this information because the necessary data is only available when a page swap is made, and the iterator does not swap out pages. Cc: Robert Richter <robert.richter@amd.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Li Zefan <lizf@cn.fujitsu.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: "Luis Claudio R. Goncalves" <lclaudio@uudg.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
1 parent eb0c537 commit 66a8cb9

File tree

7 files changed

+79
-16
lines changed

7 files changed

+79
-16
lines changed

drivers/oprofile/cpu_buffer.c

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -186,14 +186,14 @@ int op_cpu_buffer_write_commit(struct op_entry *entry)
186186
struct op_sample *op_cpu_buffer_read_entry(struct op_entry *entry, int cpu)
187187
{
188188
struct ring_buffer_event *e;
189-
e = ring_buffer_consume(op_ring_buffer_read, cpu, NULL);
189+
e = ring_buffer_consume(op_ring_buffer_read, cpu, NULL, NULL);
190190
if (e)
191191
goto event;
192192
if (ring_buffer_swap_cpu(op_ring_buffer_read,
193193
op_ring_buffer_write,
194194
cpu))
195195
return NULL;
196-
e = ring_buffer_consume(op_ring_buffer_read, cpu, NULL);
196+
e = ring_buffer_consume(op_ring_buffer_read, cpu, NULL, NULL);
197197
if (e)
198198
goto event;
199199
return NULL;

include/linux/ring_buffer.h

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -120,9 +120,11 @@ int ring_buffer_write(struct ring_buffer *buffer,
120120
unsigned long length, void *data);
121121

122122
struct ring_buffer_event *
123-
ring_buffer_peek(struct ring_buffer *buffer, int cpu, u64 *ts);
123+
ring_buffer_peek(struct ring_buffer *buffer, int cpu, u64 *ts,
124+
unsigned long *lost_events);
124125
struct ring_buffer_event *
125-
ring_buffer_consume(struct ring_buffer *buffer, int cpu, u64 *ts);
126+
ring_buffer_consume(struct ring_buffer *buffer, int cpu, u64 *ts,
127+
unsigned long *lost_events);
126128

127129
struct ring_buffer_iter *
128130
ring_buffer_read_start(struct ring_buffer *buffer, int cpu);

kernel/trace/ring_buffer.c

Lines changed: 66 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -318,6 +318,9 @@ EXPORT_SYMBOL_GPL(ring_buffer_event_data);
318318
#define TS_MASK ((1ULL << TS_SHIFT) - 1)
319319
#define TS_DELTA_TEST (~TS_MASK)
320320

321+
/* Flag when events were overwritten */
322+
#define RB_MISSED_EVENTS (1 << 31)
323+
321324
struct buffer_data_page {
322325
u64 time_stamp; /* page time stamp */
323326
local_t commit; /* write committed index */
@@ -416,6 +419,12 @@ int ring_buffer_print_page_header(struct trace_seq *s)
416419
(unsigned int)sizeof(field.commit),
417420
(unsigned int)is_signed_type(long));
418421

422+
ret = trace_seq_printf(s, "\tfield: int overwrite;\t"
423+
"offset:%u;\tsize:%u;\tsigned:%u;\n",
424+
(unsigned int)offsetof(typeof(field), commit),
425+
1,
426+
(unsigned int)is_signed_type(long));
427+
419428
ret = trace_seq_printf(s, "\tfield: char data;\t"
420429
"offset:%u;\tsize:%u;\tsigned:%u;\n",
421430
(unsigned int)offsetof(typeof(field), data),
@@ -439,6 +448,8 @@ struct ring_buffer_per_cpu {
439448
struct buffer_page *tail_page; /* write to tail */
440449
struct buffer_page *commit_page; /* committed pages */
441450
struct buffer_page *reader_page;
451+
unsigned long lost_events;
452+
unsigned long last_overrun;
442453
local_t commit_overrun;
443454
local_t overrun;
444455
local_t entries;
@@ -2835,6 +2846,7 @@ static struct buffer_page *
28352846
rb_get_reader_page(struct ring_buffer_per_cpu *cpu_buffer)
28362847
{
28372848
struct buffer_page *reader = NULL;
2849+
unsigned long overwrite;
28382850
unsigned long flags;
28392851
int nr_loops = 0;
28402852
int ret;
@@ -2895,6 +2907,18 @@ rb_get_reader_page(struct ring_buffer_per_cpu *cpu_buffer)
28952907
/* The reader page will be pointing to the new head */
28962908
rb_set_list_to_head(cpu_buffer, &cpu_buffer->reader_page->list);
28972909

2910+
/*
2911+
* We want to make sure we read the overruns after we set up our
2912+
* pointers to the next object. The writer side does a
2913+
* cmpxchg to cross pages which acts as the mb on the writer
2914+
* side. Note, the reader will constantly fail the swap
2915+
* while the writer is updating the pointers, so this
2916+
* guarantees that the overwrite recorded here is the one we
2917+
* want to compare with the last_overrun.
2918+
*/
2919+
smp_mb();
2920+
overwrite = local_read(&(cpu_buffer->overrun));
2921+
28982922
/*
28992923
* Here's the tricky part.
29002924
*
@@ -2926,6 +2950,11 @@ rb_get_reader_page(struct ring_buffer_per_cpu *cpu_buffer)
29262950
cpu_buffer->reader_page = reader;
29272951
rb_reset_reader_page(cpu_buffer);
29282952

2953+
if (overwrite != cpu_buffer->last_overrun) {
2954+
cpu_buffer->lost_events = overwrite - cpu_buffer->last_overrun;
2955+
cpu_buffer->last_overrun = overwrite;
2956+
}
2957+
29292958
goto again;
29302959

29312960
out:
@@ -3002,8 +3031,14 @@ static void rb_advance_iter(struct ring_buffer_iter *iter)
30023031
rb_advance_iter(iter);
30033032
}
30043033

3034+
static int rb_lost_events(struct ring_buffer_per_cpu *cpu_buffer)
3035+
{
3036+
return cpu_buffer->lost_events;
3037+
}
3038+
30053039
static struct ring_buffer_event *
3006-
rb_buffer_peek(struct ring_buffer_per_cpu *cpu_buffer, u64 *ts)
3040+
rb_buffer_peek(struct ring_buffer_per_cpu *cpu_buffer, u64 *ts,
3041+
unsigned long *lost_events)
30073042
{
30083043
struct ring_buffer_event *event;
30093044
struct buffer_page *reader;
@@ -3055,6 +3090,8 @@ rb_buffer_peek(struct ring_buffer_per_cpu *cpu_buffer, u64 *ts)
30553090
ring_buffer_normalize_time_stamp(cpu_buffer->buffer,
30563091
cpu_buffer->cpu, ts);
30573092
}
3093+
if (lost_events)
3094+
*lost_events = rb_lost_events(cpu_buffer);
30583095
return event;
30593096

30603097
default:
@@ -3165,12 +3202,14 @@ static inline int rb_ok_to_lock(void)
31653202
* @buffer: The ring buffer to read
31663203
* @cpu: The cpu to peak at
31673204
* @ts: The timestamp counter of this event.
3205+
* @lost_events: a variable to store if events were lost (may be NULL)
31683206
*
31693207
* This will return the event that will be read next, but does
31703208
* not consume the data.
31713209
*/
31723210
struct ring_buffer_event *
3173-
ring_buffer_peek(struct ring_buffer *buffer, int cpu, u64 *ts)
3211+
ring_buffer_peek(struct ring_buffer *buffer, int cpu, u64 *ts,
3212+
unsigned long *lost_events)
31743213
{
31753214
struct ring_buffer_per_cpu *cpu_buffer = buffer->buffers[cpu];
31763215
struct ring_buffer_event *event;
@@ -3185,7 +3224,7 @@ ring_buffer_peek(struct ring_buffer *buffer, int cpu, u64 *ts)
31853224
local_irq_save(flags);
31863225
if (dolock)
31873226
spin_lock(&cpu_buffer->reader_lock);
3188-
event = rb_buffer_peek(cpu_buffer, ts);
3227+
event = rb_buffer_peek(cpu_buffer, ts, lost_events);
31893228
if (event && event->type_len == RINGBUF_TYPE_PADDING)
31903229
rb_advance_reader(cpu_buffer);
31913230
if (dolock)
@@ -3227,13 +3266,17 @@ ring_buffer_iter_peek(struct ring_buffer_iter *iter, u64 *ts)
32273266
/**
32283267
* ring_buffer_consume - return an event and consume it
32293268
* @buffer: The ring buffer to get the next event from
3269+
* @cpu: the cpu to read the buffer from
3270+
* @ts: a variable to store the timestamp (may be NULL)
3271+
* @lost_events: a variable to store if events were lost (may be NULL)
32303272
*
32313273
* Returns the next event in the ring buffer, and that event is consumed.
32323274
* Meaning, that sequential reads will keep returning a different event,
32333275
* and eventually empty the ring buffer if the producer is slower.
32343276
*/
32353277
struct ring_buffer_event *
3236-
ring_buffer_consume(struct ring_buffer *buffer, int cpu, u64 *ts)
3278+
ring_buffer_consume(struct ring_buffer *buffer, int cpu, u64 *ts,
3279+
unsigned long *lost_events)
32373280
{
32383281
struct ring_buffer_per_cpu *cpu_buffer;
32393282
struct ring_buffer_event *event = NULL;
@@ -3254,9 +3297,11 @@ ring_buffer_consume(struct ring_buffer *buffer, int cpu, u64 *ts)
32543297
if (dolock)
32553298
spin_lock(&cpu_buffer->reader_lock);
32563299

3257-
event = rb_buffer_peek(cpu_buffer, ts);
3258-
if (event)
3300+
event = rb_buffer_peek(cpu_buffer, ts, lost_events);
3301+
if (event) {
3302+
cpu_buffer->lost_events = 0;
32593303
rb_advance_reader(cpu_buffer);
3304+
}
32603305

32613306
if (dolock)
32623307
spin_unlock(&cpu_buffer->reader_lock);
@@ -3405,6 +3450,9 @@ rb_reset_cpu(struct ring_buffer_per_cpu *cpu_buffer)
34053450
cpu_buffer->write_stamp = 0;
34063451
cpu_buffer->read_stamp = 0;
34073452

3453+
cpu_buffer->lost_events = 0;
3454+
cpu_buffer->last_overrun = 0;
3455+
34083456
rb_head_page_activate(cpu_buffer);
34093457
}
34103458

@@ -3684,6 +3732,7 @@ int ring_buffer_read_page(struct ring_buffer *buffer,
36843732
unsigned int commit;
36853733
unsigned int read;
36863734
u64 save_timestamp;
3735+
int missed_events = 0;
36873736
int ret = -1;
36883737

36893738
if (!cpumask_test_cpu(cpu, buffer->cpumask))
@@ -3716,6 +3765,10 @@ int ring_buffer_read_page(struct ring_buffer *buffer,
37163765
read = reader->read;
37173766
commit = rb_page_commit(reader);
37183767

3768+
/* Check if any events were dropped */
3769+
if (cpu_buffer->lost_events)
3770+
missed_events = 1;
3771+
37193772
/*
37203773
* If this page has been partially read or
37213774
* if len is not big enough to read the rest of the page or
@@ -3779,6 +3832,13 @@ int ring_buffer_read_page(struct ring_buffer *buffer,
37793832
}
37803833
ret = read;
37813834

3835+
cpu_buffer->lost_events = 0;
3836+
/*
3837+
* Set a flag in the commit field if we lost events
3838+
*/
3839+
if (missed_events)
3840+
local_add(RB_MISSED_EVENTS, &bpage->commit);
3841+
37823842
out_unlock:
37833843
spin_unlock_irqrestore(&cpu_buffer->reader_lock, flags);
37843844

kernel/trace/ring_buffer_benchmark.c

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -81,7 +81,7 @@ static enum event_status read_event(int cpu)
8181
int *entry;
8282
u64 ts;
8383

84-
event = ring_buffer_consume(buffer, cpu, &ts);
84+
event = ring_buffer_consume(buffer, cpu, &ts, NULL);
8585
if (!event)
8686
return EVENT_DROPPED;
8787

kernel/trace/trace.c

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1556,7 +1556,7 @@ peek_next_entry(struct trace_iterator *iter, int cpu, u64 *ts)
15561556
if (buf_iter)
15571557
event = ring_buffer_iter_peek(buf_iter, ts);
15581558
else
1559-
event = ring_buffer_peek(iter->tr->buffer, cpu, ts);
1559+
event = ring_buffer_peek(iter->tr->buffer, cpu, ts, NULL);
15601560

15611561
ftrace_enable_cpu();
15621562

@@ -1635,7 +1635,7 @@ static void trace_consume(struct trace_iterator *iter)
16351635
{
16361636
/* Don't allow ftrace to trace into the ring buffers */
16371637
ftrace_disable_cpu();
1638-
ring_buffer_consume(iter->tr->buffer, iter->cpu, &iter->ts);
1638+
ring_buffer_consume(iter->tr->buffer, iter->cpu, &iter->ts, NULL);
16391639
ftrace_enable_cpu();
16401640
}
16411641

kernel/trace/trace_functions_graph.c

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -489,9 +489,10 @@ get_return_for_leaf(struct trace_iterator *iter,
489489
* We need to consume the current entry to see
490490
* the next one.
491491
*/
492-
ring_buffer_consume(iter->tr->buffer, iter->cpu, NULL);
492+
ring_buffer_consume(iter->tr->buffer, iter->cpu,
493+
NULL, NULL);
493494
event = ring_buffer_peek(iter->tr->buffer, iter->cpu,
494-
NULL);
495+
NULL, NULL);
495496
}
496497

497498
if (!event)

kernel/trace/trace_selftest.c

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -29,7 +29,7 @@ static int trace_test_buffer_cpu(struct trace_array *tr, int cpu)
2929
struct trace_entry *entry;
3030
unsigned int loops = 0;
3131

32-
while ((event = ring_buffer_consume(tr->buffer, cpu, NULL))) {
32+
while ((event = ring_buffer_consume(tr->buffer, cpu, NULL, NULL))) {
3333
entry = ring_buffer_event_data(event);
3434

3535
/*

0 commit comments

Comments
 (0)