Skip to content

How does fio measure latency

Vincent Fu edited this page Jun 8, 2022 · 8 revisions

Along with bandwidth and IOPS, latency is a frequently reported performance metric. How quickly a storage device can respond to requests has implications for data center deployments. Because of how consequential these latency measurements can be, it is useful to have a clear understanding of how exactly fio measures latency.

Fio reports up to three latency measurements. The table below lists them and paraphrases the descriptions from the documentation.

Measurement Abbreviation Description
Submission latency slat time required to submit an I/O
Completion latency clat time required to complete an I/O after it has been submitted
Total latency lat time from when the I/O was created to when it was completed

How does fio actually compute these measurements? This discussion will be based on fio 3.30. The starting point for latency measurement is struct io_u which fio uses to keep track of information for each I/O unit created. The first two members of this structure are timestamps start_time and issue_time:

/*
 * The io unit
 */
struct io_u {
        struct timespec start_time;
        struct timespec issue_time;
<snip>
}

These timestamps are used in a straightforward way to calculate the three latency measurements:

Figure 1 Fio latency definitions

  • Submission latency is the difference between issue_time and start_time.
  • Completion latency is the difference between completion time and issue_time.
  • Total latency is the difference between completion time and start_time.
  • There is no need to persistently store completion time because it is not needed after completion and total latency are calculated.

Latency measurement for asynchronous I/O

There are many variations of how fio uses these timestamps to produce latency measurements, but let us begin with the steps for an asynchronous I/O. Below is a selective call tree for latency measurement for an asynchronous I/O:

do_io
   get_io_u				# start_time set
   io_u_submit
      td_io_queue
         io_ops->queue			# call ioengine's queue()
         td_io_commit 			# if submission batch size met
            io_ops->commit		# if ioengine's commit() is defined
               fill issue_time
               io_u_queued		# record submission latency only for ioengines *with* commit function
         fill issue_time
   io_queue_event
      io_u_queued			# record submission latency only for ioengines *without* commit function
   wait_for_completions			# only called when it is time to begin reaping I/O
      io_u_queued_complete
         td_io_getevents
            io_ops->getevents		# get completions from ioengine
         init_icd			# record completion time
         ios_completed
            io_completed
               account_io_completion	# calculate completion and total latency

When fio's main I/O worker function do_io() creates an I/O for submission, it calls get_io_u() which sets the io_u's start_time member. After some housekeeping related to error handling and verification, io_u_submit() calls td_io_queue() which hands off the io_u to the ioengine's queue function. Asynchronous ioengines have either a two-step queue and commit I/O submission process or they submit I/Os with a single-step queue function call.

For asynchronous ioengines with commit functions (e.g., io_uring, libaio), the commit function is called to actually submit the I/O to the operating system. Once this is done, the io_u's issue_time is recorded. The ioengine is also responsible for calling io_u_queued() to calculate and record the submission latency measurement. If the batch submit size is greater than one, multiple I/Os may be queued before the ioengine's commit function is called.

For asynchronous ioengines without commit functions, where submission is accomplished in a single step in the ioengine's queue function (e.g., rados, rbd), submission latency is calculated in a slightly different way. The io_u's issue_time is recorded within td_io_queue() and eventually the submission latency measurement is calculated and recorded within io_u_queued().

When it is time to begin checking for I/O completions, inside do_io()'s main loop is a call to the aptly named wait_for_completions(). This sets off a chain of function calls where the ioengine's getevents handler is called to obtain completed I/Os, init_icd() is called to record the completion time, and finally account_io_completion() is called to calculate and record the completion and total latencies.

Latency measurement for synchronous I/O

Synchronous ioengines begin at the same point as asynchronous ioengines with do_io() creating an I/O for submission by calling get_io_u() which sets the io_u's start_time. A selective call tree for synchronous ioengine latency measurement is below:

do_io
  get_io_u			# start_time set
  io_u_submit
    td_io_queue
      fill issue_time
      io_ops->queue		# call ioengine's queue()
  io_queue_event
    io_u_sync_complete
      init_icd			# record completion time
      io_completed
        account_io_completion	# calculate completion and total latency

Once the io_u is initialized fio does some housekeeping and then records the io_u's issue_time shortly before handing off the I/O to the synchronous ioengine's queue handler. Notice that issue_time is recorded before calling the ioengine's queue function whereas for asynchronous ioengines issue_time is recorded after the call to queue (and if it exists a call to the ioengine's commit). After the synchronous ioengine has submitted the I/O to the operating system and control has returned to fio, execution reaches io_queue_event(). Then eventually init_icd() is called to record a completion time stamp and then io_completed() calls account_io_completion() to calculate completion and total latency.

This latency measurement pathway is much simpler than it was for asynchronous ioengines because I/Os have already been completed by the time the ioengine's queue function returns. Note that submission latency is never calculated for synchronous ioengines because it is not reported and submission latency would only encompasses fio's housekeeping activities. Also note that the above call tree describes only the most common case where there are no errors and a request is fully completed.

Summary

The diagram below summarizes fio's latency measurement pathways. It contains the main branches for the different paths and illustrates the different points where timestamps are recorded and latencies calculated.

Figure 2 Fio latency summary

Observations

  1. For asynchronous ioengines with commit functions, issue_time is actually assigned twice, once inside the ioengine code and a second time inside td_io_queue(). This appears to be unintentional and I will be submitting a patch to clean this up. Based on how submission, completion, and total latency are calculated the sum of mean submission latency and mean completion latency should be equal to the mean total latency, but if fio sets issue_time twice then this equality does not hold.

The listing below is the result of issuing a single I/O, and when we evaluate the relationship among submission, completion, and total latency we see that 61424 + 242709 != 308346. The sum is actually 304133 which means that about 4000ns were lost.

user@ubuntu:~/fio-dev$ fio-canonical/fio --name=test --ioengine=io_uring --number_ios=1 --rw=randread --size=1M
test: (g=0): rw=randread, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=io_uring, iodepth=1
fio-3.30-48-g26fa
Starting 1 process

test: (groupid=0, jobs=1): err= 0: pid=172145: Mon Jun  6 16:12:42 2022
  read: IOPS=1000, BW=4000KiB/s (4096kB/s)(4096B/1msec)
    slat (nsec): min=61424, max=61424, avg=61424.00, stdev= 0.00
    clat (nsec): min=242709, max=242709, avg=242709.00, stdev= 0.00
     lat (nsec): min=308346, max=308346, avg=308346.00, stdev= 0.00
<snip>

The listing below has issue_time filled in only once via the ioengine code and now the equality does hold: 53701 + 259566 = 313267.

user@ubuntu:~/fio-dev$ fio-latency/fio --name=test --ioengine=io_uring --number_ios=1 --rw=randread --size=1M
test: (g=0): rw=randread, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=io_uring, iodepth=1
fio-3.30-48-g26fa-dirty
Starting 1 process

test: (groupid=0, jobs=1): err= 0: pid=172220: Mon Jun  6 16:12:47 2022
  read: IOPS=1000, BW=4000KiB/s (4096kB/s)(4096B/1msec)
    slat (nsec): min=53701, max=53701, avg=53701.00, stdev= 0.00
    clat (nsec): min=259566, max=259566, avg=259566.00, stdev= 0.00
     lat (nsec): min=313267, max=313267, avg=313267.00, stdev= 0.00
<snip>
  1. Some asynchronous ioengines (io_uring, libaio) support trim commands but trim commands are synchronous operations. We have the flag FIO_ASYNCIO_SYNC_TRIM to identify these ioengines and fio follows the synchronous latency measurement path when trim commands are handled by these ioengines. This leads to ugly code like:
if (!td_ioengine_flagged(td, FIO_SYNCIO) &&
        (!td_ioengine_flagged(td, FIO_ASYNCIO_SYNC_TRIM) ||
         io_u->ddir != DDIR_TRIM)) {

We should have a helper to identify when these ioengines are handling trim commands to make this code easier to read.

  1. We should update the documentation to make explicit that the timer begins counting when fio initializes an io_u. Submission latency for asynchronous ioengines is more than the time the operating system takes to queue an I/O. Submission latency also includes housekeeping that fio does.

  2. The documentation for completion latency says that it is very close to zero for synchronous ioengines. This appears to be a typographical error and is likely meant to convey instead that submission latency is very close to zero.

  3. cscope is immensely useful for navigating fio's codebase. While writing this blog post it was very helpful to use cscope to be able to quickly see all of the places where start_time and issue_time were touched.

  4. How is latency measured with options like io_uring's sqthread_poll? This option has a polling thread in the kernel monitor the submission queue for new requests. We should think about how to handle this case appropriately for submission latency measurement and confirm that fio is doing the right thing.