Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fio-plot Could not find any (matching) JSON files in the specified directory + Solution (K8s) #64

Open
manjo-git opened this issue Nov 9, 2021 · 25 comments

Comments

@manjo-git
Copy link

I generated JSON formatted data by running fio benchmark ( not using the script provided here ). Here is a snip from the JSON file:

{
"fio version" : "fio-3.28",
"timestamp" : 1636479730,
"timestamp_ms" : 1636479730127,
"time" : "Tue Nov 9 17:42:10 2021",
"global options" : {
"directory" : "/scratch/fio-759f77f784-rvkxg",
"ioengine" : "libaio",
"direct" : "1",
"size" : "10G",
"iodepth" : "64",
"numjobs" : "100",
"bs" : "512K",
"rw" : "randrw",
"runtime" : "1800",
"name" : "rwlatency-test-job"
},
"jobs" : [
{
"jobname" : "testjob",
.
. < removed lines till the end of file >
.
}

I have fio-plot installed for system wide use using pip3 install fio-plot on Ubuntu 20.04.3 LTS, and updated PATH to point to /home//.local/bin/. I have stored the fio results JSON file test2.json under /tmp/fioout/, and running fio-plot on this directory that contains the JSON file I get the following error message.

$ fio-plot -i ./fioout/ -T "rwlatency-test-job" -d 64 -n 100 -l -r randrw

Could not find any (matching) JSON files in the specified directory /tmp/fioout

Are the correct directories specified?

If so, please check the -d ([64]) -n ([100]) and -r (randrw) parameters.

Can someone please tell me what I might be doing wrong here?

@louwrentius
Copy link
Owner

Hello! What is the name of the json output file?

@manjo-git
Copy link
Author

manjo-git commented Nov 9, 2021

The json output file is called test2.json and the directory is under is /tmp/fioout/

@manjo-git
Copy link
Author

debugging this a bit looks like the script needs the file name to startwith() setttings["rw"] and endswith() .json. and then it splits() for some reason and finds iodepth and numjobs in the file name for some apparent reason. So I changed the filename to randrwtest-64-100.json because my rw: randrw and iodepth: 64 and numjobs: 100

But now it looks for a key called 'job options' which my JSON files does not have. Looks like this is hard coded in the script. Could you please explain what this is doing?

$ ls /tmp/FIOTEST/
randrwtest-64-100.json

$ fio-plot -i /tmp/FIOTEST/ --source "Manoj Iyer" -T "rwlatency-test-job" -d 64 -n 100 -l -r randrw
found filename /tmp/FIOTEST/randrwtest-64-100.json
Traceback (most recent call last):
File "/home/maniye01/.local/bin/fio-plot", line 8, in
sys.exit(main())
File "/home/maniye01/.local/lib/python3.8/site-packages/fio_plot/init.py", line 27, in main
data = routing_dict[item]"get_data"
File "/home/maniye01/.local/lib/python3.8/site-packages/fio_plot/fiolib/getdata.py", line 42, in get_json_data
parsed_data = jsonimport.get_flat_json_mapping(settings, dataset)
File "/home/maniye01/.local/lib/python3.8/site-packages/fio_plot/fiolib/jsonimport.py", line 163, in get_flat_json_mapping
"iodepth": int(get_nested_value(record, m["iodepth"])),
File "/home/maniye01/.local/lib/python3.8/site-packages/fio_plot/fiolib/jsonimport.py", line 88, in get_nested_value
dictionary = dictionary[item]
KeyError: 'job options'

@manjo-git
Copy link
Author

manjo-git commented Nov 9, 2021

randrw-64-100.json.txt

Here is my JSON file that is now correctly named. But the script is looking for "job options" which is not in the json file. Can you please tell me what needs to be modified to make it work ?

@manjo-git
Copy link
Author

Looks like you require a per job iodepth option and not just global. Browsing around looks like a fix for this was proposed #2 But this is to an older version of fio-plot looks like. Your newer version handles this in jasonimport.py and its broken there.

@manjo-git
Copy link
Author

jsonimport.py: looks like the following needs to be defined per-job in the fio job config file:
"iodepth": (jobOptions + ["iodepth"]),
"numjobs": (jobOptions + ["numjobs"]),
"bs": (jobOptions + ["bs"]),
"rw": (jobOptions + ["rw"]),

@louwrentius
Copy link
Owner

Thanks for figuring out what the root-cause is and showing the old code that did address the issue. I have to spend some time getting my brains into the code and validate the change against a few known good 'datasets' before I want to release it.

I'll try and see if I can fix it this week.

In the mean time / temporary work-around /, Maybe if you use the benc-fio script for running benchmark, I'm quite positive that this output will be properly handled. Not sure if that suits your need.

@manjo-git
Copy link
Author

which is fine. I think I can modify my fio config file to fit what the script needs. Also can you share what your typical fio config looks like? how you specify 1,2..128 iodepth and numjobs ? Can you please post your config ?

@paul-mccutcheon
Copy link

bench-fio script output also cannot be process, I recieved the same error:

[root@gold384 bin]# ./fio-plot -i /root/fio-plot/bin/RAID_ARRAY/sdb/4k --source "https://louwrentius.com" -T "Gold384" -l -r randread

Could not find any (matching) JSON files in the specified directory /root/fio-plot/bin/RAID_ARRAY/sdb/4k

Are the correct directories specified?

If so, please check the -d ([1, 2, 4, 8, 16, 32, 64]) -n ([1]) and -r (randread) parameters.

[root@gold384 bin]# ls -al RAID_ARRAY/sdb/4k/*json
-rw-r--r--. 1 root root 6946 Nov 10 22:50 RAID_ARRAY/sdb/4k/randread-16-8.json
-rw-r--r--. 1 root root 6926 Nov 10 22:46 RAID_ARRAY/sdb/4k/randread-1-8.json
-rw-r--r--. 1 root root 6926 Nov 10 22:47 RAID_ARRAY/sdb/4k/randread-2-8.json
-rw-r--r--. 1 root root 6953 Nov 10 22:51 RAID_ARRAY/sdb/4k/randread-32-8.json
-rw-r--r--. 1 root root 6931 Nov 10 22:48 RAID_ARRAY/sdb/4k/randread-4-8.json
-rw-r--r--. 1 root root 6972 Nov 10 22:52 RAID_ARRAY/sdb/4k/randread-64-8.json
-rw-r--r--. 1 root root 6935 Nov 10 22:49 RAID_ARRAY/sdb/4k/randread-8-8.json
-rw-r--r--. 1 root root 6954 Nov 10 22:57 RAID_ARRAY/sdb/4k/randwrite-16-8.json
-rw-r--r--. 1 root root 6930 Nov 10 22:53 RAID_ARRAY/sdb/4k/randwrite-1-8.json
-rw-r--r--. 1 root root 6932 Nov 10 22:54 RAID_ARRAY/sdb/4k/randwrite-2-8.json
-rw-r--r--. 1 root root 6964 Nov 10 22:58 RAID_ARRAY/sdb/4k/randwrite-32-8.json
-rw-r--r--. 1 root root 6941 Nov 10 22:55 RAID_ARRAY/sdb/4k/randwrite-4-8.json
-rw-r--r--. 1 root root 6984 Nov 10 22:59 RAID_ARRAY/sdb/4k/randwrite-64-8.json
-rw-r--r--. 1 root root 6948 Nov 10 22:56 RAID_ARRAY/sdb/4k/randwrite-8-8.json
[root@gold384 bin]#

@paul-mccutcheon
Copy link

[root@gold384 4k]# more randread-16-8.json
{
"fio version" : "fio-3.19",
"timestamp" : 1636581018,
"timestamp_ms" : 1636581018574,
"time" : "Wed Nov 10 22:50:18 2021",
"global options" : {
"runtime" : "60",
"filename" : "/dev/sdb"
},
"jobs" : [
{
"jobname" : "iotest",
"groupid" : 0,
"error" : 0,
"eta" : 0,
"elapsed" : 61,
"job options" : {
"rw" : "randread",
"bs" : "4k",
"ioengine" : "libaio",
"iodepth" : "16",
"numjobs" : "8",
"direct" : "1",
"group_reporting" : "1",
"invalidate" : "1",
"loops" : "1",
"write_bw_log" : "RAID_ARRAY/sdb/4k/randread-iodepth-16-numjobs-8",
"write_lat_log" : "RAID_ARRAY/sdb/4k/randread-iodepth-16-numjobs-8",
"write_iops_log" : "RAID_ARRAY/sdb/4k/randread-iodepth-16-numjobs-8",
"log_avg_msec" : "500"
},
"read" : {
"io_bytes" : 18102784000,
"io_kbytes" : 17678500,
"bw_bytes" : 301682898,
"bw" : 294612,
"iops" : 73653.051362,
"runtime" : 60006,
"total_ios" : 4419625,
"short_ios" : 0,
"drop_ios" : 0,
"slat_ns" : {
"min" : 2632,

@louwrentius
Copy link
Owner

Hi, so what I recommend is using the bench-fio tool as included for running the benchmarks. I'm not sure what you exactly want to test. Normally you start out with iodepth=1 and numjobs=1 and go from there, depending on how a realistic workload would look like for you.

By default, the bench-fio tool does numjobs/iodepth 1 2 4 8 16 32 64 to get a rough impression.

@paul-mccutcheon
Copy link

Thanks Louwrentious - I had actually run the bench-fio tool to gather the stats originally. It still is unable to process the data at present. If you need any more info, let me know

@paul-mccutcheon
Copy link

The problem was actually the fio_plot usage examples beneath the charts; I was following those. Anyway, it does actually work nicely. One question that has just arisen during our testing was if it is possible to run sequential I/O tests with the bench-fio script? It appears only to have random r/w as the default setting?

thanks again - the plots look really nice!

@paul-mccutcheon
Copy link

So, I am assuming sequential mode is actually referred to as simply "read" or "write" (as it is in fio)? It would be nice to include this in your help pages, just to clarify. :)

@manjo-git
Copy link
Author

manjo-git commented Nov 11, 2021

@paul-mccutcheon can you please share your command line that worked for you? I have rw: randrw which requires a -f read/write filter.

@louwrentius
Copy link
Owner

So, I am assuming sequential mode is actually referred to as simply "read" or "write" (as it is in fio)? It would be nice to include this in your help pages, just to clarify. :)

Yes that is correct.

The previous issue with 'could not find json' is that you also need to specify the appropriate -n and -j parameters. I got the impression that you figured that one out in the mean time. I'm curious where in the manual it was unclear/confusing how this works.

With regards!

@louwrentius
Copy link
Owner

louwrentius commented Nov 11, 2021

@manjo-git I'm curious about the exact fio command line you used to generate that json, if you would mind sharing that, it would be helpful.

If you need help with that randrw stuff and the -f filter, please share your bench-fio command and I can help.

@manjo-git
Copy link
Author

manjo-git commented Nov 12, 2021

@louwrentius I am explaining my whole setup just for clarity, but you could as well use the relevant parts of the config file and run this stand alone with fio. I have a 4 node kube cluster running rook+ceph with 24TB storage. I have 3 pods running fio, and the docker container for this fio can be found here: https://hub.docker.com/repository/docker/manjo8/fio
I have 2 yaml files that I use to launch fio tests. These yamls were ideas borrowed from https://joshua-robinson.medium.com/storage-benchmarking-with-fio-in-kubernetes-14cf29dc5375

  1. config.yaml
  2. fio tests yaml

config
apiVersion: v1
metadata:
name: fio-job-config
namespace: rook-ceph
data:
fio.job: |-
[global]
ioengine=libaio
direct=1
size=10G
group_reporting
time_based
runtime=180
name=fio-rand-rw

[testjob1]
iodepth=1
numjobs=1
bs=4k
rw=randrw
rwmixread=60
rwmixwrite=40
nrfiles=100

fio test yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: fio
namespace: rook-ceph
labels:
app: fio
spec:
replicas: 3
selector:
matchLabels:
app: fio
template:
metadata:
labels:
app: fio
spec:
containers:
- name: fio
image: manjo8/fio:latest
command: ["sh"]
args: ["-c", "echo ${HOSTNAME} && mkdir -p /scratch/${HOSTNAME} && fio /configs/fio.job --eta=never --directory=/scratch/${HOSTNAME} --output-format=json"]
volumeMounts:
- name: fio-config-vol
mountPath: /configs
- name: fio-data
mountPath: /scratch
imagePullPolicy: Always
restartPolicy: Always
volumes:
- name: fio-config-vol
configMap:
name: fio-job-config
- name: fio-data
persistentVolumeClaim:
claimName: my-pvc

The tests are launched using kubectl apply -f config.yaml ; kubectl apply -f fio-deployment.yaml. I collect data for iodepth 1 - 128 and numjobs 1 - 128. I store the json outpout in filenames the script needs for example randrw-1-1.json randwe-2-1.json etc and for numjobs randrw-128-[1-128].json. Then run your fio-plot script. I was expecting fio-plot would pick up all the json files and generate the graphs for me.

@manjo-git
Copy link
Author

manjo-git commented Nov 12, 2021

@louwrentius let me generate files like randrw-1-1.json randrw-2-1.json randrw-3-1.json... etc randrw-64-1-json. ie keep numjobs constant and iterate over iodepth. Then keep iodepth at 64 and iterate over numjobs 1 - 100, generate those json files and run your script. It seems that is the way bench-fio is generating those json files. Looks like the way I was generating data might not be the way the fio-plot expects. I was iterating over iodepth and numjobs at the same time.

@louwrentius
Copy link
Owner

It is really about which graph you want to view. The 3D graph can do ranges of iodepth and numjobs at the same time. The other graphs are only really usable if you either fix numjobs or iodepth, I think this is part of the manual.

I am quite curious what the exact bench-fio or fio command line was that was used to generate the json that fio-plot could not parse. If you could share this, I would be quite interested. Even if it is maybe not the right way to go about things.

@manjo-git
Copy link
Author

manjo-git commented Nov 19, 2021

Fyi for those who are not using the bench-fio script to generate the files that fio-plot needs, but using fio benchmark directory ( or as a pod in k8s) here is what you need.

  1. your filename has to be in a certain format, in my case the rw is randrw, so my file name is randrw-{iodepth}-{numjobs}.json.
  2. You can keep numjobs a constant say numjobs=1 and iterate over iodepth of values 1,2,4,8,16,32,64. Or keep iodepth a constant ie iodepth=32 and iterate over numjobs of 1,2,4,8,16,32,64.
  3. if for example your iodepth=32 and numjobs=64, store the resulting json output in a file called randrw-32-64.json.

Here is my sample config, I am running my fio as a pod in k8 on a rook/ceph cluster.

kind: ConfigMap
apiVersion: v1
metadata:
name: fio-job-config
namespace: rook-ceph
data:
fio.job: |-
[global]
ioengine=libaio
direct=1
size=10G
group_reporting
time_based
runtime=180
name=fio-rand-rw

[testjob1]
iodepth=32
numjobs=64
bs=4k
rw=randrw
rwmixread=60
rwmixwrite=40
nrfiles=100

What I found was you cannot have multiple jobs in the same file, you need to have only one job per config, and change the value of iodepth/numjobs and run them separately and capture results separately. After this you can run the fio-plot script as the manual says, and you will generate nice pretty graphs.

@louwrentius louwrentius changed the title fio-plot Could not find any (matching) JSON files in the specified directory fio-plot Could not find any (matching) JSON files in the specified directory + Solution Feb 3, 2022
@louwrentius louwrentius changed the title fio-plot Could not find any (matching) JSON files in the specified directory + Solution fio-plot Could not find any (matching) JSON files in the specified directory + Solution (K8s) Feb 3, 2022
@louwrentius
Copy link
Owner

louwrentius commented Feb 3, 2022

I will leave this open because of the solution. I consider this also a bug because the code is such that it should be able to handle multiple jobs in the json output.

@leonanu
Copy link

leonanu commented Aug 30, 2022

I still can't get this work well....

OS: Ubuntu 22.04 x86_64
fio: 3.28

bench.ini:

[benchfio]
target = /dev/sdb
output = EXASCEND-480G-SATA
type = device
size = 10G
iodepth = 1,2,4,8,16,32,64
numjobs = 1,2,4,8,16,32,64
direct = 1
engine = libaio
precondition = False
precondition_repeat = False
runtime = 30
destructive = True

# bench-fio ./bench.ini
This works well and there are .json & .log files in EXASCEND-480G-SATA directory.

But I got this after running fio-plot:
# fio-plot -i EXASCEND-480G-SATA --source "http://www.example.com" -T "EXASCEND 480G SATA SSD" -L -t iops -r randrw

Could not find any (matching) JSON files in the specified directory /home/workspace/fio-test/EXASCEND-480G-SATA

Are the correct directories specified?

If so, please check the -d ([1, 2, 4, 8, 16, 32, 64]) -n ([1, 2, 4, 8, 16, 32, 64]) and -r (randrw) parameters.

I read all information above, but I still can't get this worked.

@leonanu
Copy link

leonanu commented Aug 30, 2022

I found this problem caused by 'fio_plot/fiolib/jsonimport.py'

Function list_json_files can not list json files in result directory correctly.

directory["files"] is empty:
{'directory': '/home/workspace/fio-test/EXASCEND-480G-SATA', 'files': []}

@leonanu
Copy link

leonanu commented Aug 30, 2022

Sorry.... After I point --input-directory to EXASCEND-480G-SATA/sdb/4k, everything goes well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants