-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathproc-pid.txt
1471 lines (973 loc) · 69.2 KB
/
proc-pid.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
What does /proc/PID/ have?
==========================
/proc/[pid]
There is a numerical subdirectory for each running process; the subdirectory is named by the process ID.
Each /proc/[pid] subdirectory contains the pseudo-files and directories described below. These files are normally owned by the effective user and effective group ID
of the process. However, as a security measure, the ownership is made root:root if the process's "dumpable" attribute is set to a value other than 1. This attribute
may change for the following reasons:
* The attribute was explicitly set via the prctl(2) PR_SET_DUMPABLE operation.
* The attribute was reset to the value in the file /proc/sys/fs/suid_dumpable (described below), for the reasons described in prctl(2).
Resetting the "dumpable" attribute to 1 reverts the ownership of the /proc/[pid]/* files to the process's real UID and real GID.
attr/
-----
The files in this directory provide an API for security modules. The contents of this directory are files that can be read and written in order to set security-
related attributes. This directory was added to support SELinux, but the intention was that the API be general enough to support other security modules. For the
purpose of explanation, examples of how SELinux uses these files are provided below.
This directory is present only if the kernel was configured with CONFIG_SECURITY.
autogroup
---------
Process's Autogroup (Task Group) Membership
Since Linux 2.6.38, the kernel provides a feature known as autogrouping to improve interactive desktop performance in the face of multiprocess, CPU-intensive workloads such
as building the Linux kernel with large numbers of parallel build processes (i.e., the make(1) -j flag).
This feature operates in conjunction with the CFS scheduler and requires a kernel that is configured with CONFIG_SCHED_AUTOGROUP. On a running system, this feature is
enabled or disabled via the file /proc/sys/kernel/sched_autogroup_enabled; a value of 0 disables the feature, while a value of 1 enables it. The default value in this file
is 1, unless the kernel was booted with the noautogroup parameter.
A new autogroup is created when a new session is created via setsid(2); this happens, for example, when a new terminal window is started. A new process created by fork(2)
inherits its parent's autogroup membership. Thus, all of the processes in a session are members of the same autogroup. An autogroup is automatically destroyed when the
last process in the group terminates.
When autogrouping is enabled, all of the members of an autogroup are placed in the same kernel scheduler "task group". The CFS scheduler employs an algorithm that equalizes
the distribution of CPU cycles across task groups. The benefits of this for interactive desktop performance can be described via the following example.
Suppose that there are two autogroups competing for the same CPU (i.e., presume either a single CPU system or the use of taskset(1) to confine all the processes to the same
CPU on an SMP system). The first group contains ten CPU-bound processes from a kernel build started with make -j10. The other contains a single CPU-bound process: a video
player. The effect of autogrouping is that the two groups will each receive half of the CPU cycles. That is, the video player will receive 50% of the CPU cycles, rather
than just 9% of the cycles, which would likely lead to degraded video playback. The situation on an SMP system is more complex, but the general effect is the same: the
scheduler distributes CPU cycles across task groups such that an autogroup that contains a large number of CPU-bound processes does not end up hogging CPU cycles at the
expense of the other jobs on the system.
A process's autogroup (task group) membership can be viewed via the file /proc/[pid]/autogroup:
$ cat /proc/1/autogroup
/autogroup-1 nice 0
This file can also be used to modify the CPU bandwidth allocated to an autogroup. This is done by writing a number in the "nice" range to the file to set the autogroup's
nice value. The allowed range is from +19 (low priority) to -20 (high priority). (Writing values outside of this range causes write(2) to fail with the error EINVAL.)
The autogroup nice setting has the same meaning as the process nice value, but applies to distribution of CPU cycles to the autogroup as a whole, based on the relative nice
values of other autogroups. For a process inside an autogroup, the CPU cycles that it receives will be a product of the autogroup's nice value (compared to other auto‐
groups) and the process's nice value (compared to other processes in the same autogroup.
The use of the cgroups(7) CPU controller to place processes in cgroups other than the root CPU cgroup overrides the effect of autogrouping.
The autogroup feature groups only processes scheduled under non-real-time policies (SCHED_OTHER, SCHED_BATCH, and SCHED_IDLE). It does not group processes scheduled under
real-time and deadline policies. Those processes are scheduled according to the rules described earlier.
auxv
----
This contains the contents of the ELF interpreter information passed to the process at exec time. The format is one unsigned long ID plus one unsigned long value for
each entry. The last entry contains two zeros. See also getauxval(3).
Permission to access this file is governed by a ptrace access mode PTRACE_MODE_READ_FSCREDS check; see ptrace(2).
cgroup
------
See cgroups(7).
clear_refs
----------
This is a write-only file, writable only by owner of the process.
The following values may be written to the file:
1 (since Linux 2.6.22)
Reset the PG_Referenced and ACCESSED/YOUNG bits for all the pages associated with the process. (Before kernel 2.6.32, writing any nonzero value to this file
had this effect.)
2 (since Linux 2.6.32)
Reset the PG_Referenced and ACCESSED/YOUNG bits for all anonymous pages associated with the process.
3 (since Linux 2.6.32)
Reset the PG_Referenced and ACCESSED/YOUNG bits for all file-mapped pages associated with the process.
Clearing the PG_Referenced and ACCESSED/YOUNG bits provides a method to measure approximately how much memory a process is using. One first inspects the values in
the "Referenced" fields for the VMAs shown in /proc/[pid]/smaps to get an idea of the memory footprint of the process. One then clears the PG_Referenced and
ACCESSED/YOUNG bits and, after some measured time interval, once again inspects the values in the "Referenced" fields to get an idea of the change in memory footprint
of the process during the measured interval. If one is interested only in inspecting the selected mapping types, then the value 2 or 3 can be used instead of 1.
Further values can be written to affect different properties:
4 (since Linux 3.11)
Clear the soft-dirty bit for all the pages associated with the process. This is used (in conjunction with /proc/[pid]/pagemap) by the check-point restore sys‐
tem to discover which pages of a process have been dirtied since the file /proc/[pid]/clear_refs was written to.
5 (since Linux 4.0)
Reset the peak resident set size ("high water mark") to the process's current resident set size value.
Writing any value to /proc/[pid]/clear_refs other than those listed above has no effect.
The /proc/[pid]/clear_refs file is present only if the CONFIG_PROC_PAGE_MONITOR kernel configuration option is enabled.
cmdline
-------
This read-only file holds the complete command line for the process, unless the process is a zombie. In the latter case, there is nothing in this file: that is, a
read on this file will return 0 characters. The command-line arguments appear in this file as a set of strings separated by null bytes ('\0'), with a further null
byte after the last string.
When the process is started by the kernel, cmdline is NUL-separated, and the kernel code simply copies the range of memory where argv[] was at process startup into the
output buffer when you read /proc/PID/cmdline.
e.g.
I once met a problem where /proc/PID/cmdline contains incorrect value. It turns out that I was using an uninitialzed char *s in program.
The codes are like:
{
char *s;
...
sprintf(s, "%d\n", getpid());
...
}
And it's incorrectly writing to memory that should be used by argv.
To fix the problem, use char s[16] instead of char *s.
comm
----
This file exposes the process's comm value—that is, the command name associated with the process. Different threads in the same process may have different comm val‐
ues, accessible via /proc/[pid]/task/[tid]/comm. A thread may modify its comm value, or that of any of other thread in the same thread group (see the discussion of
CLONE_THREAD in clone(2)), by writing to the file /proc/self/task/[tid]/comm. Strings longer than TASK_COMM_LEN (16) characters are silently truncated.
This file provides a superset of the prctl(2) PR_SET_NAME and PR_GET_NAME operations, and is employed by pthread_setname_np(3) when used to rename threads other than
the caller.
coredump_filter
---------------
See core(5).
cpuset
------
See cpuset(7).
cwd@
----
This is a symbolic link to the current working directory of the process. To find out the current working directory of process 20, for instance, you can do this:
$ cd /proc/20/cwd; /bin/pwd
Note that the pwd command is often a shell built-in, and might not work properly. In bash(1), you may use pwd -P.
In a multithreaded process, the contents of this symbolic link are not available if the main thread has already terminated (typically by calling pthread_exit(3)).
Permission to dereference or read (readlink(2)) this symbolic link is governed by a ptrace access mode PTRACE_MODE_READ_FSCREDS check; see ptrace(2).
environ
-------
This file contains the initial environment that was set when the currently executing program was started via execve(2). The entries are separated by null bytes
('\0'), and there may be a null byte at the end. Thus, to print out the environment of process 1, you would do:
$ strings /proc/1/environ
If, after an execve(2), the process modifies its environment (e.g., by calling functions such as putenv(3) or modifying the environ(7) variable directly), this file
will not reflect those changes.
Furthermore, a process may change the memory location that this file refers via prctl(2) operations such as PR_SET_MM_ENV_START.
Permission to access this file is governed by a ptrace access mode PTRACE_MODE_READ_FSCREDS check; see ptrace(2).
In fact, when using `strings' command to examine the /proc/PID/environ file, '-w' option is needed to get correct output in case '\n' is in some value.
exe@
----
Under Linux 2.2 and later, this file is a symbolic link containing the actual pathname of the executed command. This symbolic link can be dereferenced normally;
attempting to open it will open the executable. You can even type /proc/[pid]/exe to run another copy of the same executable that is being run by process [pid]. If
the pathname has been unlinked, the symbolic link will contain the string '(deleted)' appended to the original pathname. In a multithreaded process, the contents of
this symbolic link are not available if the main thread has already terminated (typically by calling pthread_exit(3)).
Permission to dereference or read (readlink(2)) this symbolic link is governed by a ptrace access mode PTRACE_MODE_READ_FSCREDS check; see ptrace(2).
Under Linux 2.0 and earlier, /proc/[pid]/exe is a pointer to the binary which was executed, and appears as a symbolic link. A readlink(2) call on this file under
Linux 2.0 returns a string in the format:
[device]:inode
For example, [0301]:1502 would be inode 1502 on device major 03 (IDE, MFM, etc. drives) minor 01 (first partition on the first drive).
find(1) with the -inum option can be used to locate the file.
fd/
---
This is a subdirectory containing one entry for each file which the process has open, named by its file descriptor, and which is a symbolic link to the actual file.
Thus, 0 is standard input, 1 standard output, 2 standard error, and so on.
For file descriptors for pipes and sockets, the entries will be symbolic links whose content is the file type with the inode. A readlink(2) call on this file returns
a string in the format:
type:[inode]
For example, socket:[2248868] will be a socket and its inode is 2248868. For sockets, that inode can be used to find more information in one of the files under
/proc/net/.
For file descriptors that have no corresponding inode (e.g., file descriptors produced by bpf(2), epoll_create(2), eventfd(2), inotify_init(2), perf_event_open(2),
signalfd(2), timerfd_create(2), and userfaultfd(2)), the entry will be a symbolic link with contents of the form
anon_inode:<file-type>
In many cases (but not all), the file-type is surrounded by square brackets.
For example, an epoll file descriptor will have a symbolic link whose content is the string anon_inode:[eventpoll].
In a multithreaded process, the contents of this directory are not available if the main thread has already terminated (typically by calling pthread_exit(3)).
Programs that take a filename as a command-line argument, but don't take input from standard input if no argument is supplied, and programs that write to a file named
as a command-line argument, but don't send their output to standard output if no argument is supplied, can nevertheless be made to use standard input or standard out‐
put by using /proc/[pid]/fd files as command-line arguments. For example, assuming that -i is the flag designating an input file and -o is the flag designating an
output file:
$ foobar -i /proc/self/fd/0 -o /proc/self/fd/1 ...
and you have a working filter.
/proc/self/fd/N is approximately the same as /dev/fd/N in some UNIX and UNIX-like systems. Most Linux MAKEDEV scripts symbolically link /dev/fd to /proc/self/fd, in
fact.
Most systems provide symbolic links /dev/stdin, /dev/stdout, and /dev/stderr, which respectively link to the files 0, 1, and 2 in /proc/self/fd. Thus the example
command above could be written as:
$ foobar -i /dev/stdin -o /dev/stdout ...
Permission to dereference or read (readlink(2)) the symbolic links in this directory is governed by a ptrace access mode PTRACE_MODE_READ_FSCREDS check; see
ptrace(2).
fdinfo/
-------
This is a subdirectory containing one entry for each file which the process has open, named by its file descriptor. The files in this directory are readable only by
the owner of the process. The contents of each file can be read to obtain information about the corresponding file descriptor. The content depends on the type of
file referred to by the corresponding file descriptor.
For regular files and directories, we see something like:
$ cat /proc/12015/fdinfo/4
pos: 1000
flags: 01002002
mnt_id: 21
The fields are as follows:
pos This is a decimal number showing the file offset.
flags This is an octal number that displays the file access mode and file status flags (see open(2)). If the close-on-exec file descriptor flag is set, then flags
will also include the value O_CLOEXEC.
Before Linux 3.1, this field incorrectly displayed the setting of O_CLOEXEC at the time the file was opened, rather than the current setting of the close-on-
exec flag.
mnt_id This field, present since Linux 3.15, is the ID of the mount point containing this file. See the description of /proc/[pid]/mountinfo.
For eventfd file descriptors (see eventfd(2)), we see (since Linux 3.8) the following fields:
pos: 0
flags: 02
mnt_id: 10
eventfd-count: 40
eventfd-count is the current value of the eventfd counter, in hexadecimal.
For epoll file descriptors (see epoll(7)), we see (since Linux 3.8) the following fields:
pos: 0
flags: 02
mnt_id: 10
tfd: 9 events: 19 data: 74253d2500000009
tfd: 7 events: 19 data: 74253d2500000007
Each of the lines beginning tfd describes one of the file descriptors being monitored via the epoll file descriptor (see epoll_ctl(2) for some details). The tfd
field is the number of the file descriptor. The events field is a hexadecimal mask of the events being monitored for this file descriptor. The data field is the
data value associated with this file descriptor.
For signalfd file descriptors (see signalfd(2)), we see (since Linux 3.8) the following fields:
pos: 0
flags: 02
mnt_id: 10
sigmask: 0000000000000006
sigmask is the hexadecimal mask of signals that are accepted via this signalfd file descriptor. (In this example, bits 2 and 3 are set, corresponding to the signals
SIGINT and SIGQUIT; see signal(7).)
For inotify file descriptors (see inotify(7)), we see (since Linux 3.8) the following fields:
pos: 0
flags: 00
mnt_id: 11
inotify wd:2 ino:7ef82a sdev:800001 mask:800afff ignored_mask:0 fhandle-bytes:8 fhandle-type:1 f_handle:2af87e00220ffd73
inotify wd:1 ino:192627 sdev:800001 mask:800afff ignored_mask:0 fhandle-bytes:8 fhandle-type:1 f_handle:27261900802dfd73
Each of the lines beginning with "inotify" displays information about one file or directory that is being monitored. The fields in this line are as follows:
wd A watch descriptor number (in decimal).
ino The inode number of the target file (in hexadecimal).
sdev The ID of the device where the target file resides (in hexadecimal).
mask The mask of events being monitored for the target file (in hexadecimal).
If the kernel was built with exportfs support, the path to the target file is exposed as a file handle, via three hexadecimal fields: fhandle-bytes, fhandle-type, and
f_handle.
For fanotify file descriptors (see fanotify(7)), we see (since Linux 3.8) the following fields:
pos: 0
flags: 02
mnt_id: 11
fanotify flags:0 event-flags:88002
fanotify ino:19264f sdev:800001 mflags:0 mask:1 ignored_mask:0 fhandle-bytes:8 fhandle-type:1 f_handle:4f261900a82dfd73
The fourth line displays information defined when the fanotify group was created via fanotify_init(2):
flags The flags argument given to fanotify_init(2) (expressed in hexadecimal).
event-flags
The event_f_flags argument given to fanotify_init(2) (expressed in hexadecimal).
Each additional line shown in the file contains information about one of the marks in the fanotify group. Most of these fields are as for inotify, except:
mflags The flags associated with the mark (expressed in hexadecimal).
mask The events mask for this mark (expressed in hexadecimal).
ignored_mask
The mask of events that are ignored for this mark (expressed in hexadecimal).
For details on these fields, see fanotify_mark(2).
gid_map
-------
See user_namespaces(7).
e.g.
$ unshare --user --mount --fork --pid --map-root-user chroot /home/chenqi/playground/container-root /sbin/busybox sh -l
$ cat /proc/20629/gid_map
0 1000 1
The above means we have a gid mapping, starting from 0 in the container user namespace, mapping it to 1000 in its parent user namespace the the range is 1.
io
--
This file contains I/O statistics for the process, for example:
# cat /proc/3828/io
rchar: 323934931
wchar: 323929600
syscr: 632687
syscw: 632675
read_bytes: 0
write_bytes: 323932160
cancelled_write_bytes: 0
The fields are as follows:
rchar: characters read
The number of bytes which this task has caused to be read from storage. This is simply the sum of bytes which this process passed to read(2) and similar sys‐
tem calls. It includes things such as terminal I/O and is unaffected by whether or not actual physical disk I/O was required (the read might have been satis‐
fied from pagecache).
wchar: characters written
The number of bytes which this task has caused, or shall cause to be written to disk. Similar caveats apply here as with rchar.
syscr: read syscalls
Attempt to count the number of read I/O operations—that is, system calls such as read(2) and pread(2).
syscw: write syscalls
Attempt to count the number of write I/O operations—that is, system calls such as write(2) and pwrite(2).
read_bytes: bytes read
Attempt to count the number of bytes which this process really did cause to be fetched from the storage layer. This is accurate for block-backed filesystems.
write_bytes: bytes written
Attempt to count the number of bytes which this process caused to be sent to the storage layer.
cancelled_write_bytes:
The big inaccuracy here is truncate. If a process writes 1MB to a file and then deletes the file, it will in fact perform no writeout. But it will have been
accounted as having caused 1MB of write. In other words: this field represents the number of bytes which this process caused to not happen, by truncating
pagecache. A task can cause "negative" I/O too. If this task truncates some dirty pagecache, some I/O which another task has been accounted for (in its
write_bytes) will not be happening.
Note: In the current implementation, things are a bit racy on 32-bit systems: if process A reads process B's /proc/[pid]/io while process B is updating one of these
64-bit counters, process A could see an intermediate result.
Permission to access this file is governed by a ptrace access mode PTRACE_MODE_READ_FSCREDS check; see ptrace(2).
latency
-------
CONFIG_HAVE_LATENCYTOP_SUPPORT=y
CONFIG_LATENCYTOP=y
It's used by `latencytop'. But I really doubt whether there's any people still using it.
limits
------
This file displays the soft limit, hard limit, and units of measurement for each of the process's resource limits (see getrlimit(2)). Up to and including Linux
2.6.35, this file is protected to allow reading only by the real UID of the process. Since Linux 2.6.36, this file is readable by all users on the system.
loginuid
--------
login user id of the process. perserved by child process.
-1 means that loginuid was not set. This is normal behavior for processes that were not spawned by any login process (e.g. for daemons). loginuid is -1 by default;
pam_loginuid module changes it to your user id whenever you login (in a tty/in DM/via ssh), and this value is preserved by child processes.
map_files/
----------
This subdirectory contains entries corresponding to memory-mapped files (see mmap(2)). Entries are named by memory region start and end address pair (expressed as
hexadecimal numbers), and are symbolic links to the mapped files themselves. Here is an example, with the output wrapped and reformatted to fit on an 80-column dis‐
play:
# ls -l /proc/self/map_files/
lr--------. 1 root root 64 Apr 16 21:31
3252e00000-3252e20000 -> /usr/lib64/ld-2.15.so
...
Although these entries are present for memory regions that were mapped with the MAP_FILE flag, the way anonymous shared memory (regions created with the MAP_ANON |
MAP_SHARED flags) is implemented in Linux means that such regions also appear on this directory. Here is an example where the target file is the deleted /dev/zero
one:
lrw-------. 1 root root 64 Apr 16 21:33
7fc075d2f000-7fc075e6f000 -> /dev/zero (deleted)
This directory appears only if the CONFIG_CHECKPOINT_RESTORE kernel configuration option is enabled. Privilege (CAP_SYS_ADMIN) is required to view the contents of
this directory.
* When we use strace to track system calls, we always see mmap, so what's happening?
The operations are performed by run-time linker/loader, ld.so or ld-linux.so.
It performs mmap on /etc/ld.so.cache and then unmmap it.
It performs mmap on libc.so.6 and continues to use it.
maps
----
A file containing the currently mapped memory regions and their access permissions. See mmap(2) for some further information about memory mappings.
Permission to access this file is governed by a ptrace access mode PTRACE_MODE_READ_FSCREDS check; see ptrace(2).
The format of the file is:
address perms offset dev inode pathname
00400000-00452000 r-xp 00000000 08:02 173521 /usr/bin/dbus-daemon
00651000-00652000 r--p 00051000 08:02 173521 /usr/bin/dbus-daemon
00652000-00655000 rw-p 00052000 08:02 173521 /usr/bin/dbus-daemon
00e03000-00e24000 rw-p 00000000 00:00 0 [heap]
00e24000-011f7000 rw-p 00000000 00:00 0 [heap]
...
35b1800000-35b1820000 r-xp 00000000 08:02 135522 /usr/lib64/ld-2.15.so
35b1a1f000-35b1a20000 r--p 0001f000 08:02 135522 /usr/lib64/ld-2.15.so
35b1a20000-35b1a21000 rw-p 00020000 08:02 135522 /usr/lib64/ld-2.15.so
35b1a21000-35b1a22000 rw-p 00000000 00:00 0
35b1c00000-35b1dac000 r-xp 00000000 08:02 135870 /usr/lib64/libc-2.15.so
35b1dac000-35b1fac000 ---p 001ac000 08:02 135870 /usr/lib64/libc-2.15.so
35b1fac000-35b1fb0000 r--p 001ac000 08:02 135870 /usr/lib64/libc-2.15.so
35b1fb0000-35b1fb2000 rw-p 001b0000 08:02 135870 /usr/lib64/libc-2.15.so
...
f2c6ff8c000-7f2c7078c000 rw-p 00000000 00:00 0 [stack:986]
...
7fffb2c0d000-7fffb2c2e000 rw-p 00000000 00:00 0 [stack]
7fffb2d48000-7fffb2d49000 r-xp 00000000 00:00 0 [vdso]
The address field is the address space in the process that the mapping occupies. The perms field is a set of permissions:
r = read
w = write
x = execute
s = shared
p = private (copy on write)
The offset field is the offset into the file/whatever; dev is the device (major:minor); inode is the inode on that device. 0 indicates that no inode is associated
with the memory region, as would be the case with BSS (uninitialized data).
The pathname field will usually be the file that is backing the mapping. For ELF files, you can easily coordinate with the offset field by looking at the Offset
field in the ELF program headers (readelf -l).
There are additional helpful pseudo-paths:
[stack]
The initial process's (also known as the main thread's) stack.
[stack:<tid>] (since Linux 3.4)
A thread's stack (where the <tid> is a thread ID). It corresponds to the /proc/[pid]/task/[tid]/ path.
[vdso] The virtual dynamically linked shared object. See vdso(7).
[heap] The process's heap.
If the pathname field is blank, this is an anonymous mapping as obtained via mmap(2). There is no easy way to coordinate this back to a process's source, short of
running it through gdb(1), strace(1), or similar.
Under Linux 2.0, there is no field giving pathname.
A brief intro to vdso: for effeciency purpose, frequently used system calls could be done via vdso so that a normal function call + a few memory access is used instead
of the expensive system call.
mem
---
This file can be used to access the pages of a process's memory through open(2), read(2), and lseek(2).
Permission to access this file is governed by a ptrace access mode PTRACE_MODE_ATTACH_FSCREDS check; see ptrace(2).
mountinfo
---------
This file contains information about mount points in the process's mount namespace (see mount_namespaces(7)). It supplies various information (e.g., propagation
state, root of mount for bind mounts, identifier for each mount and its parent) that is missing from the (older) /proc/[pid]/mounts file, and fixes various other
problems with that file (e.g., nonextensibility, failure to distinguish per-mount versus per-superblock options).
The file contains lines of the form:
36 35 98:0 /mnt1 /mnt2 rw,noatime master:1 - ext3 /dev/root rw,errors=continue
(1)(2)(3) (4) (5) (6) (7) (8) (9) (10) (11)
The numbers in parentheses are labels for the descriptions below:
(1) mount ID: a unique ID for the mount (may be reused after umount(2)).
(2) parent ID: the ID of the parent mount (or of self for the top of the mount tree).
(3) major:minor: the value of st_dev for files on this filesystem (see stat(2)).
(4) root: the pathname of the directory in the filesystem which forms the root of this mount.
(5) mount point: the pathname of the mount point relative to the process's root directory.
(6) mount options: per-mount options.
(7) optional fields: zero or more fields of the form "tag[:value]"; see below.
(8) separator: the end of the optional fields is marked by a single hyphen.
(9) filesystem type: the filesystem type in the form "type[.subtype]".
(10) mount source: filesystem-specific information or "none".
(11) super options: per-superblock options.
Currently, the possible optional fields are shared, master, propagate_from, and unbindable. See mount_namespaces(7) for a description of these fields. Parsers
should ignore all unrecognized optional fields.
For more information on mount propagation see: Documentation/filesystems/sharedsubtree.txt in the Linux kernel source tree.
mounts
------
This file lists all the filesystems currently mounted in the process's mount namespace (see mount_namespaces(7)). The format of this file is documented in fstab(5).
Since kernel version 2.6.15, this file is pollable: after opening the file for reading, a change in this file (i.e., a filesystem mount or unmount) causes select(2)
to mark the file descriptor as having an exceptional condition, and poll(2) and epoll_wait(2) mark the file as having a priority event (POLLPRI). (Before Linux
2.6.30, a change in this file was indicated by the file descriptor being marked as readable for select(2), and being marked as having an error condition for poll(2)
and epoll_wait(2).)
mountstats
----------
This file exports information (statistics, configuration information) about the mount points in the process's mount namespace (see mount_namespaces(7)). Lines in
this file have the form:
device /dev/sda7 mounted on /home with fstype ext3 [statistics]
( 1 ) ( 2 ) (3 ) (4)
The fields in each line are:
(1) The name of the mounted device (or "nodevice" if there is no corresponding device).
(2) The mount point within the filesystem tree.
(3) The filesystem type.
(4) Optional statistics and configuration information. Currently (as at Linux 2.6.26), only NFS filesystems export information via this field.
This file is readable only by the owner of the process.
net/
----
TODO
ns/
---
This is a subdirectory containing one entry for each namespace that supports being manipulated by setns(2). For more information, see namespaces(7).
Each process has a /proc/[pid]/ns/ subdirectory containing one entry for each namespace that supports being manipulated by setns(2):
$ ls -l /proc/$$/ns
total 0
lrwxrwxrwx. 1 mtk mtk 0 Apr 28 12:46 cgroup -> cgroup:[4026531835]
lrwxrwxrwx. 1 mtk mtk 0 Apr 28 12:46 ipc -> ipc:[4026531839]
lrwxrwxrwx. 1 mtk mtk 0 Apr 28 12:46 mnt -> mnt:[4026531840]
lrwxrwxrwx. 1 mtk mtk 0 Apr 28 12:46 net -> net:[4026531969]
lrwxrwxrwx. 1 mtk mtk 0 Apr 28 12:46 pid -> pid:[4026531836]
lrwxrwxrwx. 1 mtk mtk 0 Apr 28 12:46 pid_for_children -> pid:[4026531834]
lrwxrwxrwx. 1 mtk mtk 0 Apr 28 12:46 user -> user:[4026531837]
lrwxrwxrwx. 1 mtk mtk 0 Apr 28 12:46 uts -> uts:[4026531838]
Bind mounting (see mount(2)) one of the files in this directory to somewhere else in the filesystem keeps the corresponding namespace of the process specified by pid alive
even if all processes currently in the namespace terminate.
Opening one of the files in this directory (or a file that is bind mounted to one of these files) returns a file handle for the corresponding namespace of the process speci‐
fied by pid. As long as this file descriptor remains open, the namespace will remain alive, even if all processes in the namespace terminate. The file descriptor can be
passed to setns(2).
In Linux 3.7 and earlier, these files were visible as hard links. Since Linux 3.8, they appear as symbolic links. If two processes are in the same namespace, then the
inode numbers of their /proc/[pid]/ns/xxx symbolic links will be the same; an application can check this using the stat.st_ino field returned by stat(2). The content of
this symbolic link is a string containing the namespace type and inode number as in the following example:
$ readlink /proc/$$/ns/uts
uts:[4026531838]
numa_maps
---------
See numa(7).
Non-Uniform Memory Access (NUMA) refers to multiprocessor systems whose memory is divided into multiple memory nodes. The access time of a memory node depends on the rela‐
tive locations of the accessing CPU and the accessed node
/proc/[number]/numa_maps (since Linux 2.6.14)
This file displays information about a process's NUMA memory policy and allocation.
Each line contains information about a memory range used by the process, displaying—among other information—the effective memory policy for that memory range and on which
nodes the pages have been allocated.
numa_maps is a read-only file. When /proc/<pid>/numa_maps is read, the kernel will scan the virtual address space of the process and report how memory is used. One line is
displayed for each unique memory range of the process.
The first field of each line shows the starting address of the memory range. This field allows a correlation with the contents of the /proc/<pid>/maps file, which contains
the end address of the range and other information, such as the access permissions and sharing.
The second field shows the memory policy currently in effect for the memory range. Note that the effective policy is not necessarily the policy installed by the process for
that memory range. Specifically, if the process installed a "default" policy for that range, the effective policy for that range will be the process policy, which may or
may not be "default".
The rest of the line contains information about the pages allocated in the memory range, as follows:
N<node>=<nr_pages>
The number of pages allocated on <node>. <nr_pages> includes only pages currently mapped by the process. Page migration and memory reclaim may have temporarily
unmapped pages associated with this memory range. These pages may show up again only after the process has attempted to reference them. If the memory range repre‐
sents a shared memory area or file mapping, other processes may currently have additional pages mapped in a corresponding memory range.
file=<filename>
The file backing the memory range. If the file is mapped as private, write accesses may have generated COW (Copy-On-Write) pages in this memory range. These pages
are displayed as anonymous pages.
heap Memory range is used for the heap.
stack Memory range is used for the stack.
huge Huge memory range. The page counts shown are huge pages and not regular sized pages.
anon=<pages>
The number of anonymous page in the range.
dirty=<pages>
Number of dirty pages.
mapped=<pages>
Total number of mapped pages, if different from dirty and anon pages.
mapmax=<count>
Maximum mapcount (number of processes mapping a single page) encountered during the scan. This may be used as an indicator of the degree of sharing occurring in a
given memory range.
swapcache=<count>
Number of pages that have an associated entry on a swap device.
active=<pages>
The number of pages on the active list. This field is shown only if different from the number of pages in this range. This means that some inactive pages exist in
the memory range that may be removed from memory by the swapper soon.
writeback=<pages>
Number of pages that are currently being written out to disk.
oom_adj
-------
This file can be used to adjust the score used to select which process should be killed in an out-of-memory (OOM) situation. The kernel uses this value for a bit-
shift operation of the process's oom_score value: valid values are in the range -16 to +15, plus the special value -17, which disables OOM-killing altogether for this
process. A positive score increases the likelihood of this process being killed by the OOM-killer; a negative score decreases the likelihood.
The default value for this file is 0; a new process inherits its parent's oom_adj setting. A process must be privileged (CAP_SYS_RESOURCE) to update this file.
Since Linux 2.6.36, use of this file is deprecated in favor of /proc/[pid]/oom_score_adj.
oom_score
---------
This file displays the current score that the kernel gives to this process for the purpose of selecting a process for the OOM-killer. A higher score means that the
process is more likely to be selected by the OOM-killer. The basis for this score is the amount of memory used by the process, with increases (+) or decreases (-)
for factors including:
* whether the process creates a lot of children using fork(2) (+);
* whether the process has been running a long time, or has used a lot of CPU time (-);
* whether the process has a low nice value (i.e., > 0) (+);
* whether the process is privileged (-); and
* whether the process is making direct hardware access (-).
The oom_score also reflects the adjustment specified by the oom_score_adj or oom_adj setting for the process.
oom_score_adj
-------------
This file can be used to adjust the badness heuristic used to select which process gets killed in out-of-memory conditions.
The badness heuristic assigns a value to each candidate task ranging from 0 (never kill) to 1000 (always kill) to determine which process is targeted. The units are
roughly a proportion along that range of allowed memory the process may allocate from, based on an estimation of its current memory and swap use. For example, if a
task is using all allowed memory, its badness score will be 1000. If it is using half of its allowed memory, its score will be 500.
There is an additional factor included in the badness score: root processes are given 3% extra memory over other tasks.
The amount of "allowed" memory depends on the context in which the OOM-killer was called. If it is due to the memory assigned to the allocating task's cpuset being
exhausted, the allowed memory represents the set of mems assigned to that cpuset (see cpuset(7)). If it is due to a mempolicy's node(s) being exhausted, the allowed
memory represents the set of mempolicy nodes. If it is due to a memory limit (or swap limit) being reached, the allowed memory is that configured limit. Finally, if
it is due to the entire system being out of memory, the allowed memory represents all allocatable resources.
The value of oom_score_adj is added to the badness score before it is used to determine which task to kill. Acceptable values range from -1000 (OOM_SCORE_ADJ_MIN) to
+1000 (OOM_SCORE_ADJ_MAX). This allows user space to control the preference for OOM-killing, ranging from always preferring a certain task or completely disabling it
from OOM killing. The lowest possible value, -1000, is equivalent to disabling OOM-killing entirely for that task, since it will always report a badness score of 0.
Consequently, it is very simple for user space to define the amount of memory to consider for each task. Setting an oom_score_adj value of +500, for example, is
roughly equivalent to allowing the remainder of tasks sharing the same system, cpuset, mempolicy, or memory controller resources to use at least 50% more memory. A
value of -500, on the other hand, would be roughly equivalent to discounting 50% of the task's allowed memory from being considered as scoring against the task.
For backward compatibility with previous kernels, /proc/[pid]/oom_adj can still be used to tune the badness score. Its value is scaled linearly with oom_score_adj.
Writing to /proc/[pid]/oom_score_adj or /proc/[pid]/oom_adj will change the other with its scaled value.
pagemap
-------
This file shows the mapping of each of the process's virtual pages into physical page frames or swap area. It contains one 64-bit value for each virtual page, with
the bits set as follows:
63 If set, the page is present in RAM.
62 If set, the page is in swap space
61 (since Linux 3.5)
The page is a file-mapped page or a shared anonymous page.
60-56 (since Linux 3.11)
Zero
55 (since Linux 3.11)
PTE is soft-dirty (see the kernel source file Documentation/vm/soft-dirty.txt).
54-0 If the page is present in RAM (bit 63), then these bits provide the page frame number, which can be used to index /proc/kpageflags and /proc/kpagecount.
If the page is present in swap (bit 62), then bits 4-0 give the swap type, and bits 54-5 encode the swap offset.
Before Linux 3.11, bits 60-55 were used to encode the base-2 log of the page size.
To employ /proc/[pid]/pagemap efficiently, use /proc/[pid]/maps to determine which areas of memory are actually mapped and seek to skip over unmapped regions.
The /proc/[pid]/pagemap file is present only if the CONFIG_PROC_PAGE_MONITOR kernel configuration option is enabled.
Permission to access this file is governed by a ptrace access mode PTRACE_MODE_READ_FSCREDS check; see ptrace(2).
personality
-----------
This read-only file exposes the process's execution domain, as set by personality(2). The value is displayed in hexadecimal notation.
Permission to access this file is governed by a ptrace access mode PTRACE_MODE_ATTACH_FSCREDS check; see ptrace(2).
Linux supports different execution domains, or personalities, for each process. Among other things, execution domains tell Linux how to map signal numbers into signal
actions. The execution domain system allows Linux to provide limited support for binaries compiled under other UNIX-like operating systems.
projid_map
----------
TODO
root@
-----
UNIX and Linux support the idea of a per-process root of the filesystem, set by the chroot(2) system call. This file is a symbolic link that points to the process's
root directory, and behaves in the same way as exe, and fd/*.
Note however that this file is not merely a symbolic link. It provides the same view of the filesystem (including namespaces and the set of per-process mounts) as
the process itself. An example illustrates this point. In one terminal, we start a shell in new user and mount namespaces, and in that shell we create some new
mount points:
$ PS1='sh1# ' unshare -Urnm
sh1# mount -t tmpfs tmpfs /etc # Mount empty tmpfs at /etc
sh1# mount --bind /usr /dev # Mount /usr at /dev
sh1# echo $$
27123
In a second terminal window, in the initial mount namespace, we look at the contents of the corresponding mounts in the initial and new namespaces:
$ PS1='sh2# ' sudo sh
sh2# ls /etc | wc -l # In initial NS
309
sh2# ls /proc/27123/root/etc | wc -l # /etc in other NS
0 # The empty tmpfs dir
sh2# ls /dev | wc -l # In initial NS
205
sh2# ls /proc/27123/root/dev | wc -l # /dev in other NS
11 # Actually bind
# mounted to /usr
sh2# ls /usr | wc -l # /usr in initial NS
11
In a multithreaded process, the contents of the /proc/[pid]/root symbolic link are not available if the main thread has already terminated (typically by calling
pthread_exit(3)).
Permission to dereference or read (readlink(2)) this symbolic link is governed by a ptrace access mode PTRACE_MODE_READ_FSCREDS check; see ptrace(2).
sched
-----
Not much information.
schedstat
---------
schedstats also adds a new /proc/<pid>/schedstat file to include some of
the same information on a per-process level. There are three fields in
this file correlating for that process to:
1) time spent on the cpu
2) time spent waiting on a runqueue
3) # of timeslices run on this cpu
sessionid
---------
Related to Linux audit.
setgroups
---------
See user_namespaces(7).
"""
In the case of gid_map, use of the setgroups(2) system call must first be denied by writing "deny" to the /proc/[pid]/setgroups file (see below) before writing to
gid_map.
"""
smaps
-----
This file shows memory consumption for each of the process's mappings. (The pmap(1) command displays similar information, in a form that may be easier for parsing.)
For each mapping there is a series of lines such as the following:
00400000-0048a000 r-xp 00000000 fd:03 960637 /bin/bash
Size: 552 kB
Rss: 460 kB
Pss: 100 kB
Shared_Clean: 452 kB
Shared_Dirty: 0 kB
Private_Clean: 8 kB
Private_Dirty: 0 kB
Referenced: 460 kB
Anonymous: 0 kB
AnonHugePages: 0 kB
ShmemHugePages: 0 kB
ShmemPmdMapped: 0 kB
Swap: 0 kB
KernelPageSize: 4 kB
MMUPageSize: 4 kB
KernelPageSize: 4 kB
MMUPageSize: 4 kB
Locked: 0 kB
ProtectionKey: 0
VmFlags: rd ex mr mw me dw
The first of these lines shows the same information as is displayed for the mapping in /proc/[pid]/maps. The following lines show the size of the mapping, the amount
of the mapping that is currently resident in RAM ("Rss"), the process's proportional share of this mapping ("Pss"), the number of clean and dirty shared pages in the
mapping, and the number of clean and dirty private pages in the mapping. "Referenced" indicates the amount of memory currently marked as referenced or accessed.
"Anonymous" shows the amount of memory that does not belong to any file. "Swap" shows how much would-be-anonymous memory is also used, but out on swap.
The "KernelPageSize" line (available since Linux 2.6.29) is the page size used by the kernel to back the virtual memory area. This matches the size used by the MMU
in the majority of cases. However, one counter-example occurs on PPC64 kernels whereby a kernel using 64kB as a base page size may still use 4kB pages for the MMU on
older processors. To distinguish the two attributes, the "MMUPageSize" line (also available since Linux 2.6.29) reports the page size used by the MMU.
The "Locked" indicates whether the mapping is locked in memory or not.
The "ProtectionKey" line (available since Linux 4.9, on x86 only) contains the memory protection key (see pkeys(7)) associated with the virtual memory area. This
entry is present only if the kernel was built with the CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS configuration option.
The "VmFlags" line (available since Linux 3.8) represents the kernel flags associated with the virtual memory area, encoded using the following two-letter codes:
rd - readable
wr - writable
ex - executable
sh - shared
mr - may read
mw - may write
me - may execute
ms - may share
gd - stack segment grows down
pf - pure PFN range
dw - disabled write to the mapped file
lo - pages are locked in memory
io - memory mapped I/O area
sr - sequential read advise provided
rr - random read advise provided
dc - do not copy area on fork
de - do not expand area on remapping
ac - area is accountable
nr - swap space is not reserved for the area
ht - area uses huge tlb pages
nl - non-linear mapping
ar - architecture specific flag
dd - do not include area into core dump
sd - soft-dirty flag
mm - mixed map area
hg - huge page advise flag
nh - no-huge page advise flag
mg - mergeable advise flag
"ProtectionKey" field contains the memory protection key (see pkeys(5)) associated with the virtual memory area. Present only if the kernel was built with the CON‐
FIG_X86_INTEL_MEMORY_PROTECTION_KEYS configuration option. (since Linux 4.6)
The /proc/[pid]/smaps file is present only if the CONFIG_PROC_PAGE_MONITOR kernel configuration option is enabled.
stack
-----
This file provides a symbolic trace of the function calls in this process's kernel stack. This file is provided only if the kernel was built with the CONFIG_STACK‐
TRACE configuration option.
Permission to access this file is governed by a ptrace access mode PTRACE_MODE_ATTACH_FSCREDS check; see ptrace(2).