-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathplfsrc.example
92 lines (85 loc) · 4.03 KB
/
plfsrc.example
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
# also check man 5 plfsrc for more info about this file
#
###############
# plfsrc rules
###############
#
# Official rule:
# The backends directive is IMMUTABLE. Once a mount point is defined, the
# backends can NEVER be changed. If multiple machines (e.g. FTA's and multiple
# clusters) share a PLFS mount point, they must all define the same exact set of
# backends in the same exact ordering.
#
# All other directives are merely hints and can be set to any value without
# affecting correctness.
#
# Explanation:
# Once a mount point is assigned a set of backends, the mapping between that
# mount point and that set of backends can NEVER be changed. Changing backends
# for an existing mount point WILL ORPHAN files on that mount point. If multiple
# clusters and FTA's share a PLFS mount point, they MUST have the same exact set
# of backends, in the same exact order, defined for that mount point.
#
# NO OTHER values in the plfsrc affect correctness. They are just hints to help
# with performance. Therefore we can safely have different values for
# threadpool_size and num_hostdirs on different machines even if those machines
# share a mount point. We can change values for threadpool_size and num_hostdirs
# whenever we want. Existing files will not be adversely affected in any way
# although performance may change.
#################
# num_hostdirs:
# this should be set to a number near the square root of the total number of
# PE's on the system
#################
#
# Official rule:
# Let N be the size in PE's of the largest job that will be run on the system.
# Let S be the sqrt of that.
# Let P be the lowest prime number greater than or equal to S.
# Set num_hostdirs to be P.
#
# For example, say that the largest job that can be run on cluster C is 10K PEs.
# num_hostdirs should be set to 101.
#
# Explanation:
#
# PLFS stores file data and file metadata in a PLFS structure called a container
# which is built using directories and files on an underlying file system. The
# number of files within a container is basically equal to the number of PE's
# that wrote the file. We store these files within subdirs within the container.
# At exascale, we might have a billion cores and we cannot put a billion files in
# a single directory. Therefore we use num_hostdirs to create a balanced internal
# hierarchy within the container. If there are 10K PE's, this means that we will
# have 100 hostdirs and each of those hostdirs will have 100 files in them. However,
# the distribution of files into subdirs is not deterministic; rather we rely on
# hashing. Hashing plays well with prime numbers; that's why we do the prime number
# bit.
#
# [This means that for 1B cores, we will have about 30K files per directory. We
# might need to move to a 3 level hierarchy for exascale.]
#num_hostdirs 41
###############
# threadpool_size:
# this should be related to how many threads you want running on a machine
###############
#
# Official rule:
# If compute node, 1. If FTA, 16.
#
# Explanation:
# We observed that for two different read operations, the FTA performance was
# very low due to a lack of concurrency resulting in too few PanFS components
# being engaged. One operation was reading all the multiple index files within
# the container which PLFS does on the read open(). The second was reading from
# multiple data files within the container to fill in a large logical read. We
# then added threads to increase concurrency, engage more PanFS hardware, and got
# much better results. But we assume that large jobs run on the compute nodes so
# we don't want to add even more concurrency since the large number of PE's
# should provide sufficient concurrency to engage all of the PanFS hardware; in
# fact, adding more streams can hurt per client PanFS performance. This is just
# a performance hint; it can be set to any value without affecting correctness.
#threadpool_size 1
# this must match where FUSE is mounted and the logical paths passed to ADIO
#mount_point /tmp/plfs
# these must be full paths, can be just a single one
#backends /tmp/.plfs_store