forked from symanzik/MicromapPlotsInR
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy path01-introduction.Rmd
1481 lines (1246 loc) · 65.5 KB
/
01-introduction.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
\mainmatter
# An Introduction to Micromaps {#Ch1}
\chapterauthor{J{\"u}rgen Symanzik}
Welcome to the _Micromap Plots in R Book_. In this chapter, we provide a general
overview of micromap visualizations. We present the three major classes of micromaps
and we introduce the nomenclature used throughout the book.
For each of the three major classes of micromaps,
we look at the historical developments, their design, features, and interpretation, and
the primary software and online developments for their construction.
Also provided are an outlook on the remaining
chapters of this book and how to best use the accompanying R code,
webpage, and newly developed **micromapExtra**\index{R Packages!micromapExtra} R package.
## Introduction {#Ch1-Introduction}
Micromap visualizations\index{Micromap visualizations} are an important tool
for the visualization of statistical data in a geographic context,
providing a rich context for interpretation.
Micromap visualizations\index{Micromap visualizations} are based on numerous
small panels\index{Panel} that are arranged in specific ways.
Apparently, the most important types of panels\index{Panel}
are those that contain one or multiple series of small maps, the _micromaps_,
that are used to show spatial patterns. These maps are often used
to link to panels that contain underlying statistical data and region identifiers.
There exist three major classes of micromaps, i.e.,
linked micromap plots,\index{Linked micromap plot}
introduced in Section \@ref(Ch1-LinkedMicromapPlots),
conditioned micromaps,\index{Conditioned micromaps}
introduced in Section \@ref(Ch1-ConditionedMicromaps), and
comparative micromaps,\index{Comparative micromaps}
introduced in Section \@ref(Ch1-ComparativeMicromaps).
In each of these three sections, we will first look at the
historical development of that class of micromaps,
then discuss their design, features, and interpretation,
and finish with an overview of past and current software and online developments.
Our nomenclature will closely follow the one from @SCMW2017.
Alternatives can sometimes be found elsewhere in the literature,
but it should be easy to match alternative terms with the terminology used in this book.
As an example, linked micromap plots\index{Linked micromap plot}
were originally called _map row plots_ (@OCCP1996). They often have been abbreviated as _LMplots_.
Conditioned micromaps\index{Conditioned micromaps} initially have been
called _conditioned choropleth maps_ (@CWC2000) and have been abbreviated as _CCmaps_.
What follows in Section \@ref(Ch1-Outlook)
is an outlook on the 14 chapters in this book.
Finally, Section \@ref(Ch1-HowToUse) will provide
some recommendations how to best use this book and
the accompanying R code, webpage, and the newly developed
**micromapExtra**\index{R Packages!micromapExtra} R package.
## Linked Micromap Plots {#Ch1-LinkedMicromapPlots}
### Historical Development {#Ch1-LinkedMicromapPlotsHistoricalDevelopment}
Linked micromap plots\index{Linked micromap plot} were introduced as the first
of the three major classes of micromap visualizations.\index{Micromap visualizations}
They were originally presented at the Joint Statistical Meetings (JSM)
in Chicago, Illinois, in 1996 [@OCCP1996].
A series of applications was published soon thereafter,
dealing with unemployment data of the United States and Washington, D.C. [@CaPi1996],
carbon dioxide (CO~2~) emissions in the
Organization for Economic Co-operation and Development (OECD) countries
and wheat price and labor force statistics in the United States and Washington, D.C. [@COCPC1998],
and of various climate variables in Omernik ecoregions
(that were named after James M. Omernik) [@COPC1998;@COPC2000].
The early linked micromap plots\index{Linked micromap plot} development was
based on ideas from several other visualization developments of the 1980s and 1990s. In particular,
their development was heavily inspired by concepts developed for the conversion of
statistical tables into graphical displays [@Carr1994a;@CN95],
the small multiples principle that was made widely popular by Edward R. Tufte [@Tu83;@Tu90;@Tu97],
as well as early map caricatures, such as Mark S. Monmonier's state visibilty map [@Mon93].
Linked micromap plots\index{Linked micromap plot} quickly gained popularity
among researchers at United States (U.S.) Federal Agencies such as
the U.S. Department of Agriculture – National Agricultural Statistics Service (USDA–NASS),
various branches of the U.S. Environmental Protection Agency (USEPA),
the National Cancer Institute (NCI),
the U.S. Census Bureau, and
the U.S. Bureau of Labor Statistics (BLS).
In addition to static linked micromap plots,\index{Linked micromap plot}
some of these agencies (in particular the NCI, the USEPA, and the USADA-NASS) developed early
interactive and web-based versions of linked micromap plots\index{Linked micromap plot}
as discussed in more detail in @SC2008.
As indicated in @OCCP1996, linked micromap plots\index{Linked micromap plot}
"give equal consideration to presenting data in attribute [i.e., statistical] space and
in geographic space." In contrast, traditionally used choropleth maps\index{Choropleth map} emphasize
the map visualization and put less emphasis on the visualization of the statistical data.
According to @PoLa2018, a choropleth map\index{Choropleth map}
is a "map in which regions with differing occurrence rates
of conditions of interest (e.g., cancer) are visually distinguished by different color or shading
corresponding to rates at which the designated conditions have occurred in each region during the
period of observation."
@SC2008 pointed out three main advantages of
linked micromap plots\index{Linked micromap plot} over
choropleth maps.\index{Choropleth map}
First, small map regions (such as Washington, D.C., in a map of the
United States) may be hard to see in a choropleth map.\index{Choropleth map}
In linked micromap plots,\index{Linked micromap plot} small map regions
are often enlarged and sometimes also pulled to the outside of the main map area,
resulting in some kind of a map caricature that helps to make the smaller
map regions better visible.
Further changes of the underlying maps often take place, such as moving
far-away subregions closer to the main geographic region (such as Alaska and Hawaii
for the United States)
and simplifying the boundaries of the subregions (think of river boundaries or rugged coastlines)
for faster plotting
and obtaining a less dominant, i.e., thinner, borderline.
Second, when converting data from an ordered statistical variable into five to eight
discrete colors for coloring the map area in a choropleth map,\index{Choropleth map}
there is an immediate loss of information.
The exact numerical values of the variable are lost, the ranking of the values is lost,
and the relative proximities of the values that are translated into a certain color class
that represents an interval of values is lost. This problem is solved by
using row-labeled statistical displays such as dotplots\index{Dotplot}
that are linked to the mircomaps in linked micromap plots,\index{Linked micromap plot}
rather than showing the statistical data directly on the map via a small
set of discrete colors.
Third, it is difficult to show more than one statistical variable directly
on a map. This becomes even more difficult when trying to display confidence
bounds or the minimum, first quartile, median, third quartile, and maximum
that make up the summary statistics that are visually displayed in a boxplot.\index{Boxplot}
This problem is resolved in linked micromap plots\index{Linked micromap plot}
by linking the statistical graphic displays to certain spatial areas,
rather than displaying the statistical information directly on a map.
Also, multiple types of statistical graphic displays can be shown simultaneously
in a linked micromap plot.\index{Linked micromap plot}
Readers who are interested to learn more about the background and history
of linked micromap plots\index{Linked micromap plot} are encouraged
to read the book chapters and encyclopedia entries by
@SC2008 and @SCMW2017. Even more,
for readers who want to learn the full background behind all three main types
of micromap visualizations,\index{Micromap visualizations}
we strongly recommend to read the excellent book by @CP2010.
### Design, Features, and Interpretation {#Ch1-LinkedMicromapPlotsDesignFeaturesInterpretation}
We will use three mock-up linked micromap plots\index{Linked micromap plot},
shown in Figures \@ref(fig:Ch1-NJ-Layout1), \@ref(fig:Ch1-NJ-Layout2), and \@ref(fig:Ch1-NJ-Layout3)
to introduce the main nomenclature for linked micromap plots\index{Linked micromap plot}
and how to interpret such plots. These three figures are based
on actual boundaries for New Jersey in the United States, obtained from
@USCensus2013Shapefiles. The statistical data are entirely made up for demonstration purposes.
In particular, the values for `StatVar4`, one of the four statistical variables,
originate from a Normal distribution with
mean 55 and standard deviation 10 and have been randomly assigned
to the 21 counties in New Jersey.
```{r Ch1-NJ-Layout1, fig.cap = 'Mock-up linked micromap plot\\index{Linked micromap plot}, based on the 21 counties of New Jersey. `StatVar1` is used as the sorting variable.', fig.width = 7, fig.height = 6, echo = FALSE}
library(micromap)
library(raster)
nj_shapefile <- raster::shapefile(
x = "data/NJ_Shapefiles/co34_d00.shp",
verbose = FALSE
)
# add missing CRS
proj4string(nj_shapefile) <- sp::CRS(projargs = "+init=epsg:4326")
# thin shapefile
nj_shapefile_thin <- sf::st_as_sf(nj_shapefile) %>%
st_simplify(dTolerance = 50) %>%
as_Spatial()
# create map table
nj_polys_table <- create_map_table(
tmp.map = nj_shapefile_thin,
IDcolumn = "NAME"
)
# create mock data frame
stat_county <- c(
"Atlantic", "Bergen", "Burlington", "Camden", "Cape May", "Cumberland", "Essex",
"Gloucester", "Hudson", "Hunterdon", "Mercer", "Middlesex", "Monmouth", "Morris",
"Ocean", "Passaic", "Salem", "Somerset", "Sussex", "Union", "Warren"
)
stat_data <- c(
2, 17, 7, 6, 1, 3, 15,
5, 16, 11, 10, 13, 9, 19,
8, 18, 4, 12, 21, 14, 20
)
set.seed(123)
nj_data <- data.frame(
County = stat_county,
StatVar1 = stat_data * 3 + 5 + runif(n = 21, min = -1.5, max = 1.5),
StatVar2 = stat_data * 2 + 3 + runif(n = 21, min = -3, max = 3),
StatVar3 = 95 - 3 * stat_data + runif(n = 21, min = -5, max = 5),
StatVar4 = rnorm(n = 21, mean = 55, sd = 10)
)
nj_data$StatVar2[8] <- 35 # create outlier
nj_data$StatVar2[14] <- 15 # create outlier
# create linked micromap plot
mmplot(
stat.data = nj_data,
map.data = nj_polys_table,
map.link = c("County", "ID"),
panel.types = c("map", "dot_legend", "labels", "dot", "dot", "dot", "dot"),
panel.data = list(NA, NA, "County", "StatVar1", "StatVar2", "StatVar3", "StatVar4"),
ord.by = "StatVar1",
rev.ord = TRUE,
grouping = c(4, 4, 5, 4, 4),
colors = RColorBrewer::brewer.pal(n = 5, name = "YlGnBu"),
vertical.align = "center",
panel.att = list(
list(
1,
header = "Light Gray Means\nPreviously Displayed",
map.all = TRUE,
fill.regions = "aggregate",
active.border.color = "black",
active.border.size = 1.2,
inactive.border.color = gray(0.7),
inactive.border.size = 1,
panel.width = 1.3
),
list(
2,
point.type = 20,
point.border = TRUE,
point.size = 2,
panel.width = 1.5
),
list(
3,
header = "Counties",
align = "left",
right.margin = -0.1,
left.margin = -1,
text.size = 0.9,
panel.width = 0.7
),
list(
4,
header = "StatVar1",
graph.bgcolor = "lightgray",
right.margin = 0.1,
left.margin = -0.6,
point.size = 1.5,
xaxis.ticks = list(0, 25, 50, 75),
xaxis.labels = list(0, 25, 50, 75),
xaxis.title = "Percent"
),
list(
5,
header = "StatVar2",
graph.bgcolor = "lightgray",
right.margin = 0.1,
left.margin = -0.6,
point.size = 1.5,
xaxis.ticks = list(0, 25, 50),
xaxis.labels = list(0, 25, 50),
xaxis.title = "Percent"
),
list(
6,
header = "StatVar3",
graph.bgcolor = "lightgray",
right.margin = 0.1,
left.margin = -0.6,
point.size = 1.5,
xaxis.ticks = list(25, 50, 75, 100),
xaxis.labels = list(25, 50, 75, 100),
xaxis.title = "Percent"
),
list(
7,
header = "StatVar4",
graph.bgcolor = "lightgray",
right.margin = 0.25,
left.margin = -0.6,
point.size = 1.5,
xaxis.ticks = list(20, 40, 60, 80),
xaxis.labels = list(20, 40, 60, 80),
xaxis.title = "Percent"
)
)
)
```
```{r Ch1-NJ-Layout2, fig.cap = 'Mock-up linked micromap plot\\index{Linked micromap plot}, based on the 21 counties of New Jersey. `StatVar2` is used as the sorting variable. Further changes are the use of two-ended cumulative maps, the reduced number of perceptual groups, and the introduction of a median row.', fig.width = 7, fig.height = 6, echo = FALSE}
# create linked micromap plot
mmplot(
stat.data = nj_data,
map.data = nj_polys_table,
map.link = c("County", "ID"),
panel.types = c("map", "dot_legend", "labels", "dot", "dot", "dot", "dot"),
panel.data = list(NA, NA, "County", "StatVar1", "StatVar2", "StatVar3", "StatVar4"),
ord.by = "StatVar2",
rev.ord = TRUE,
grouping = 5,
median.row = TRUE,
colors = RColorBrewer::brewer.pal(n = 5, name = "BrBG"),
vertical.align = "center",
panel.att = list(
list(
1,
header = "Two-ended\nCumulative Maps",
map.all = TRUE,
fill.regions = "two ended",
active.border.color = "black",
active.border.size = 1.2,
inactive.border.color = gray(0.7),
inactive.border.size = 1,
panel.width = 1.3
),
list(
2,
point.type = 20,
point.border = TRUE,
point.size = 2,
panel.width = 1.5
),
list(
3,
header = "Counties",
align = "left",
right.margin = -0.1,
left.margin = -1,
text.size = 0.9,
panel.width = 0.7
),
list(
4,
header = "StatVar1",
graph.bgcolor = "lightgray",
right.margin = 0.1,
left.margin = -0.6,
point.size = 1.5,
xaxis.ticks = list(0, 25, 50, 75),
xaxis.labels = list(0, 25, 50, 75),
xaxis.title = "Percent"
),
list(
5,
header = "StatVar2",
graph.bgcolor = "lightgray",
right.margin = 0.1,
left.margin = -0.6,
point.size = 1.5,
xaxis.ticks = list(0, 25, 50),
xaxis.labels = list(0, 25, 50),
xaxis.title = "Percent"
),
list(
6,
header = "StatVar3",
graph.bgcolor = "lightgray",
right.margin = 0.1,
left.margin = -0.6,
point.size = 1.5,
xaxis.ticks = list(25, 50, 75, 100),
xaxis.labels = list(25, 50, 75, 100),
xaxis.title = "Percent"
),
list(
7,
header = "StatVar4",
graph.bgcolor = "lightgray",
right.margin = 0.25,
left.margin = -0.6,
point.size = 1.5,
xaxis.ticks = list(20, 40, 60, 80),
xaxis.labels = list(20, 40, 60, 80),
xaxis.title = "Percent"
)
)
)
```
In general, linked micromap plots\index{Linked micromap plot}
consist of four to seven columns of panels.\index{Panel}
One to four of these columns are statistical graphics columns that can be used to display
various statistical displays such as dotplots\index{Dotplot} (with and without confidence bounds),
boxplots\index{Boxplot},
scatterplots\index{Scatterplot}, time series\index{Time Series}, and other graphical displays
of observed or estimated statistical data. Sometimes, an additional global summary statistic is
also shown in these columns. The three remaining columns are used for the maps,
a color-coded legend, and a column with the subregion names, abbreviations, or
some other meaningful subregion identifiers.
The color-coded legend links the subregion names, the map regions, and the statistical data.
There exists no fixed order how the columns have to be arranged.
In particular, there is no strong recommendation where the column with the maps should be placed.
This often depends on personal preferences of the authors and plot designers.
There exist examples in the literature where the maps were placed on the left,
e.g., in @Mast2013LinkedMicromaps and @DaWo2021,
on the right,
e.g., in @Wartenberg2009,
and even in the middle,
e.g., in @TaSt2019.
Each row in a linked micromap plot\index{Linked micromap plot} is representing one subregion.
The rows with the subregions are sorted according to a
sorting variable that may or may not be one of the variables that are displayed
in the statistical graphics columns.
Moreover, the rows are arranged in
perceptual groups.\index{Perceptual group}
Ideally, each perceptual group\index{Perceptual group} should contain the same
number of subregions, but that is not always possible.
Table \@ref(tab:Ch1-PartitioningTable) provides suggestions
for common partitionings
of the number of subregions in each perceptual group,\index{Perceptual group}
depending on the total number of subregions in the area of interest.
In practice, having close to five subregions in most of the perceptual groups\index{Perceptual group}
is ideal from a cognitive perspective. However, sometimes it is desirable to highlight
one subregion in a median row and emphasize which subregions fall above and which
fall below with respect to the sorting variable. Those partitionings
that contain a `1` in Table \@ref(tab:Ch1-PartitioningTable)
should be interpreted as partitionings with a median row.
Within each perceptual group,\index{Perceptual group} the subregions shown in the map,
the color-coded legend next to the subregion names, and the graphics in the statistical graphics columns are linked
by the color from the color-coded legend. This allows to quickly link information
from the statistical graphics columns
to the underlying subregions in the maps and to the subregion names via the color-coded legend. Within each
perceptual group,\index{Perceptual group} a particular color, shown in the color-coded legend, represents a single
subregion in the map column and all statistical data for this subregion in
the graphs in the statistical graphics columns.
The same set of colors is used in all perceptual groups.\index{Perceptual group}
It might be tempting to assume that subregions that appear in the same color
in different perceptual groups\index{Perceptual group} are somehow related,
but this is not the case. Rather, the colors could be used as an ordering
criterion within each perceptual group.\index{Perceptual group}
When using a sequential color scheme that goes from light to dark
as in Figure \@ref(fig:Ch1-NJ-Layout1), the lightest color (here yellow)
always represents the largest value according to the variable that is used as the sorting variable
in each perceptual group,\index{Perceptual group}
the second lightest color (here green) always represents the second largest value
in each perceptual group,\index{Perceptual group}
and so on.
(ref:Ch1-PartitioningTable-reference) @SC2008
```{r Ch1-PartitioningTable, echo = FALSE}
library(kableExtra)
partitionings <- matrix(
data = c(
"1", "1", "",
"2", "2", "",
"3", "3", "",
"4", "4", "",
"5", "5", "",
"6", "3 3", "",
"7", "3 1 3", "2 3 2",
"8", "4 4", "",
"9", "4 1 4", "3 3 3",
"10", "5 5", "",
"11", "5 1 5", "3 5 3",
"12", "5 2 5", "4 4 4",
"13", "5 3 5", "4 5 4",
"14", "5 4 5", "",
"15", "5 5 5", "",
"16", "5 3 3 5", "4 4 4 4",
"17", "5 3 1 3 5", "3 4 3 4 3",
"18", "5 4 4 5", "4 5 5 4",
"19", "5 4 1 4 5", "4 4 3 4 4",
"20", "5 5 5 5", "",
"21", "5 5 1 5 5", "4 4 5 4 4",
"22", "5 5 2 5 5", "5 4 4 4 5",
"23", "5 5 3 5 5", "",
"24", "5 5 4 5 5", "",
"25", "5 5 5 5 5", "",
"26", "5 5 3 3 5 5", "5 4 4 4 4 5",
"27", "5 5 3 1 3 5 5", "4 4 4 3 4 4 4",
"28", "5 5 4 4 5 5", "4 5 5 5 5 4",
"29", "5 5 4 1 4 5 5", "4 4 4 5 4 4 4",
"30", "5 5 5 5 5 5", "",
"31", "5 5 5 1 5 5 5", "4 4 5 5 5 4 4",
"32", "5 5 5 2 5 5 5", "5 5 4 4 4 5 5",
"33", "5 5 5 3 5 5 5", "4 5 5 5 5 5 4",
"34", "5 5 5 4 5 5 5", "",
"35", "5 5 5 5 5 5 5", "",
"36", "5 5 5 3 3 5 5 5", "4 4 5 5 5 5 4 4",
"37", "5 5 5 3 1 3 5 5 5", "4 4 4 4 5 4 4 4 4",
"38", "5 5 5 4 4 5 5 5", "4 5 5 5 5 5 5 4",
"39", "5 5 5 4 1 4 5 5 5", "4 4 4 5 5 5 4 4 4",
"40", "5 5 5 5 5 5 5 5", "",
"41", "5 5 5 5 1 5 5 5 5", "4 4 5 5 5 5 5 4 4",
"42", "5 5 5 5 2 5 5 5 5", "5 5 5 4 4 4 5 5 5",
"43", "5 5 5 5 3 5 5 5 5", "4 5 5 5 5 5 5 5 4",
"44", "5 5 5 5 4 5 5 5 5", "",
"45", "5 5 5 5 5 5 5 5 5", "",
"46", "5 5 5 5 3 3 5 5 5 5", "4 4 5 5 5 5 5 5 4 4",
"47", "5 5 5 5 3 1 3 5 5 5 5", "4 4 4 4 5 5 5 4 4 4 4",
"48", "5 5 5 5 4 4 5 5 5 5", "4 5 5 5 5 5 5 5 5 4",
"49", "5 5 5 5 4 1 4 5 5 5 5", "4 4 4 5 5 5 5 5 4 4 4",
"50", "5 5 5 5 5 5 5 5 5 5", "",
"51", "5 5 5 5 5 1 5 5 5 5 5", "4 4 5 5 5 5 5 5 5 4 4"
),
nrow = 51, ncol = 3, byrow = TRUE
)
knitr::kable(
partitionings,
align = c(rep("c", times = 3)),
booktabs = TRUE,
caption = "Full symmetry partitionings with targeting groups of size 5. The left column (Number) contains the number of regions. The middle column (Partitioning 1) puts smallest counts in the middle. Full symmetry alternatives that avoid small counts appear in the right column (Partitioning 2). Abandoning full symmetry can lead to fewer panels. The table ends with 51 regions (the number of U.S. states plus Washington, D.C.), but it can be easily extended. Table originally published in (ref:Ch1-PartitioningTable-reference).",
col.names = c("Number", "Partitioning 1", "Partitioning 2")
) %>%
kable_styling(
latex_options = "striped",
font_size = 8
) %>%
column_spec(1, width = "0.5in") %>%
column_spec(2, width = "2in") %>%
column_spec(3, width = "2in")
```
Figures \@ref(fig:Ch1-NJ-Layout1) and \@ref(fig:Ch1-NJ-Layout2) each show a seven-column
linked micromap plot\index{Linked micromap plot} for the 21 counties of New Jersey.
The first column shows the maps, the second column shows the color-coded legend,
and the third column shows the subregion identifiers, here the county names in New Jersey.
Columns four, five, six, and seven are the statistical graphics columns
that show dotplots\index{Dotplot} that are by far the most frequently used
graph types in linked micromap plots.\index{Linked micromap plot}
In Figure \@ref(fig:Ch1-NJ-Layout1), the rows are sorted
from highest (at the top) to lowest (at the bottom) according to `StatVar1`.
This layout makes use of Partitioning 2 from Table \@ref(tab:Ch1-PartitioningTable)
and uses five perceptual groups and no median row.
Each consecutive map shows in light gray which subregions have been colored previously in the map(s) above,
resulting in some aggregated maps.
In the map at the bottom, eventually all subregions have been colored.
With this sorting, the maps show some strong spatial pattern. The subregions
with the largest values of `StatVar1` can be found (by construction) in the northern part of New Jersey
while the subregions with the smallest values of `StatVar1` can be found (by construction) in the
southern part of New Jersey.
In Figure \@ref(fig:Ch1-NJ-Layout2), the rows are sorted
from highest (at the top) to lowest (at the bottom) according to `StatVar2`.
This layout makes use of Partitioning 1 from Table \@ref(tab:Ch1-PartitioningTable)
and uses four perceptual groups and a median row.
A median row is often a good solution if the
number of subregions is odd such as for the 21 counties of New Jersey.
Here, Somerset is the county that appears (by construction) in the median row.
This implies that it has the 11th highest and 11th lowest `StatVar2` value.
Somerset does not appear in a map by itself, but rather is added to the maps
in the panels\index{Panel} above and below the median row in a neutral (gray) color,
thus increasing the number of subregions shown in each of these two maps by one (i.e., six here).
This plot makes use of a two-ended aggregate coloring of the maps.
The subregions from all previous perceptual groups again are filled in the
subsequent perceptual groups. But, this filling proceeds from
the top perceptual group to the median row only
and also from the bottom perceptual group to the median row
by sequentially
filling the subregions that have already been displayed on the more extreme ends.
This allows to better distinguish between the top-50% and bottom-50% of the data
according to the sorting variable. Here, it becomes obvious that
Morris (missing in the northern part of New Jersey)
and Gloucester (missing in the southern part of New Jersey)
are potential spatial outliers with respect to `StatVar2`.
Morris has a much lower value of `StatVar2` than its geographic neighbors and
Gloucester has a much higher value of `StatVar2` than its geographic neighbors.
Finally, we want to describe and interpret the patterns from the
statistical graphics columns. In Figures \@ref(fig:Ch1-NJ-Layout1) and \@ref(fig:Ch1-NJ-Layout2),
the dots for `StatVar1` and `StatVar2` run almost in parallel, i.e.,
high values of `StatVar1` are associated with high values of `StatVar2` and
low values of `StatVar1` are associated with low values of `StatVar2`.
There are only a few exceptions -- the previously mentioned Morris and Gloucester
that do not fit in the overall pattern. Overall, such a pattern is an
indicator of a strong positive (linear) association between these two variables.
In contrast, the dots for `StatVar1` and `StatVar3` (and `StatVar2` and `StatVar3`)
diverge, forming some crude caret shape
(resembling an upside down V-shape), i.e.,
high values of `StatVar1` (and `StatVar2`) are associated with low values of `StatVar3` and
low values of `StatVar1` (and `StatVar2`) are associated with high values of `StatVar3`.
Overall, such a pattern is an
indicator of a strong negative (linear) association between these two variables.
This can be confirmed numerically.
The correlation coefficients $r$ are
about `r round(cor(nj_data$StatVar1, nj_data$StatVar2), digits = 2)`
for `StatVar1` and `StatVar2`,
about `r round(cor(nj_data$StatVar1, nj_data$StatVar3), digits = 2)`
for `StatVar1` and `StatVar3`,
and about `r round(cor(nj_data$StatVar2, nj_data$StatVar3), digits = 2)`
for `StatVar2` and `StatVar3`.
```{r Ch1-NJ-Layout3, fig.cap = 'Mock-up linked micromap plot\\index{Linked micromap plot}, based on the 21 counties of New Jersey. `StatVar4` is used as the sorting variable. The maps are placed on the right.', fig.width = 7, fig.height = 6, echo = FALSE}
# create linked micromap plot
mmplot(
stat.data = nj_data,
map.data = nj_polys_table,
map.link = c("County", "ID"),
panel.types = c("dot_legend", "labels", "dot", "dot", "dot", "dot", "map"),
panel.data = list(NA, "County", "StatVar1", "StatVar2", "StatVar3", "StatVar4", NA),
ord.by = "StatVar4",
rev.ord = TRUE,
grouping = 5,
median.row = TRUE,
colors = RColorBrewer::brewer.pal(n = 5, name = "RdYlBu"),
vertical.align = "center",
panel.att = list(
list(
1,
point.type = 20,
point.border = TRUE,
point.size = 2,
panel.width = 1.6
),
list(
2,
header = "Counties",
align = "left",
right.margin = -0.1,
left.margin = -1,
text.size = 0.9,
panel.width = 0.7
),
list(
3,
header = "StatVar1",
graph.bgcolor = "lightgray",
right.margin = 0.1,
left.margin = -0.6,
point.size = 1.5,
xaxis.ticks = list(0, 25, 50, 75),
xaxis.labels = list(0, 25, 50, 75),
xaxis.title = "Percent"
),
list(
4,
header = "StatVar2",
graph.bgcolor = "lightgray",
right.margin = 0.1,
left.margin = -0.6,
point.size = 1.5,
xaxis.ticks = list(0, 25, 50),
xaxis.labels = list(0, 25, 50),
xaxis.title = "Percent"
),
list(
5,
header = "StatVar3",
graph.bgcolor = "lightgray",
right.margin = 0.1,
left.margin = -0.6,
point.size = 1.5,
xaxis.ticks = list(25, 50, 75, 100),
xaxis.labels = list(25, 50, 75, 100),
xaxis.title = "Percent"
),
list(
6,
header = "StatVar4",
graph.bgcolor = "lightgray",
right.margin = 0.1,
left.margin = -0.6,
point.size = 1.5,
xaxis.ticks = list(20, 40, 60, 80),
xaxis.labels = list(20, 40, 60, 80),
xaxis.title = "Percent"
),
list(
7,
header = "Two-ended\nCumulative Maps",
map.all = TRUE,
fill.regions = "two ended",
active.border.color = "black",
active.border.size = 1.2,
inactive.border.color = gray(0.7),
inactive.border.size = 1,
panel.width = 1.3
)
)
)
```
In Figure \@ref(fig:Ch1-NJ-Layout3), the rows are sorted
from highest (at the top) to lowest (at the bottom) according to `StatVar4`.
This layout also makes use of Partitioning 1 from Table \@ref(tab:Ch1-PartitioningTable)
and uses four perceptual groups and a median row.
Moreover, it places the maps on the right.
There is no strong noticeable pattern when comparing `StatVar1`, `StatVar2`, and `StatVar3`
with `StatVar4`.
While this can also been seen in the previously created
Figures \@ref(fig:Ch1-NJ-Layout1) and \@ref(fig:Ch1-NJ-Layout2),
this becomes most obvious in Figure \@ref(fig:Ch1-NJ-Layout3).
In fact, the correlation coefficients $r$ are
about `r round(cor(nj_data$StatVar1, nj_data$StatVar4), digits = 2)`
for `StatVar1` and `StatVar4`,
about `r round(cor(nj_data$StatVar2, nj_data$StatVar4), digits = 2)`
for `StatVar2` and `StatVar4`,
and about `r round(cor(nj_data$StatVar3, nj_data$StatVar4), digits = 2)`
for `StatVar3` and `StatVar4`.
The reader should not be tempted to overinterpret the moderate correlation coefficient $r$ of
about `r round(cor(nj_data$StatVar2, nj_data$StatVar4), digits = 2)`
for `StatVar2` and `StatVar4` as this is entirely due to chance:
The values for `StatVar4` originate from a Normal distribution with
mean 55 and standard deviation 10 and have been randomly assigned
to the 21 counties in New Jersey.
Therefore, the band of five connected subregions in the top perceptual group
in Figure \@ref(fig:Ch1-NJ-Layout3) is entirely due to chance.
It is easy to be misguided by an apparent pattern in a statistical graphic
or a map visualization. Some considerable work has been done over the past 15 years,
trying to answer the question "Is what we see really there?" (@WCHB2010).
Readers are encouraged to assess their visual capabilities to detect true patterns using the examples
from @WCHB2010 and @BDMSTW2017 where line-ups of one data-based graphic (or map) have been compared
with several graphics (or maps) that have been created under the null hypothesis of
no pattern (or no spatial dependence).
For readers who cannot immediately interpret the patterns from
parallel dotplots,\index{Dotplot} it is sometimes helpful to visualize (or mentally visualize)
these patterns via scatterplots\index{Scatterplot} or scatterplot matrices\index{Scatterplot matrix}
as shown in Figure \@ref(fig:Ch1-NJ-Scatterplot). Doing such a visual translation
from parallel dotplots\index{Dotplot} to
scatterplots\index{Scatterplot} or scatterplot matrices\index{Scatterplot matrix}
a few times should help to correctly interpret the patterns
from parallel dotplots\index{Dotplot} in the future.
```{r Ch1-NJ-Scatterplot, fig.cap = 'Mock-up scatterplot of the four statistical variables shown in the three previous linked micromap plots\\index{Linked micromap plot}, based on the 21 counties of New Jersey.', fig.width = 5, fig.height = 5, echo = FALSE}
graphics::pairs(x = nj_data[, 2:5])
```
### Software and Online Developments {#Ch1-LinkedMicromapPlotsSoftwareDevelopments}
As mentioned in Section \@ref(Ch1-LinkedMicromapPlotsHistoricalDevelopment),
linked micromap plots\index{Linked micromap plot} were
first presented at the JSM
in Chicago, Illinois, in 1996 [@OCCP1996].
Lead contributors were Daniel (Dan) B. Carr from George Mason University and
Anthony R. Olsen and Susanne M. Pierson from the U.S. Environmental Protection Agency in Corvallis, Oregon.
During the first five years of their existence, almost all
linked micromap plots\index{Linked micromap plot}
were created in S-Plus\index{S-Plus}. The basis for these was a set of panel function
that eventually were translated into R functions and later formed the basis for
the R-based linked micromap plots\index{Linked micromap plot} in @CP2010.
These panel functions are further used and described in Chapter \@ref(Ch10)
for non-traditional linked micromap plots.\index{Linked micromap plot}
Other experimental approaches to construct linked micromap plots\index{Linked micromap plot}
in R have been described in @SC2013.
Linda W. Pickle and B. Sue Bell from the National Cancer Institute,
Jim X. Chen from George Mason University, and a few others
soon developed an interest in the use of interactive linked micromap plots\index{Linked micromap plot}
for cancer data [@CCBPZ2002;@WCCBP2002;@CBPZL2003;@CCWS2006;@BHPW2006].
This resulted in the creation of stand-alone Java code and the Java-based
National Cancer Institute’s State Cancer Profiles web site in 2002.
This web site allowed the user to easily compare graphs of
national or state trends for Whites, Blacks, Hispanics, Asian Pacific Islanders, and American Indian
Alaskan Natives for both genders for a variety of cancer types.
While the State Cancer Profiles web site still exists, the interactive linked micromap plots\index{Linked micromap plot}
component was retired after about 12 years of use around 2014/2015.
Jürgen Symanzik (first at George Mason University and later at Utah State University)
and his colleagues and students experimented with a variety
of other software approaches to creating linked micromap plots\index{Linked micromap plot}.
These included linked micromap plots\index{Linked micromap plot} based on the
Graphics Production Library\index{Graphics Production Library} (GPL)\index{GPL|see {Graphics Production Library}}
[@CVR96;@WRCR2000], developed at the Bureau of Labor Statistics,
for the USEPA's Cumulative Exposure Project (CEP) [@SWWCWA1998;@SACWWW1999;@SCAWWW1999].
The GPL, extended and renamed to nViZn\index{nViZn} (read envision) [@WRRN2001] and distributed by Illumitek, Inc./SPSS,
was also used in @JoSy2001, @SyJo2001ASA, @SHG2002ASA, and @HSG2003
for the construction of interactive linked micromap plots.\index{Linked micromap plot}
Add historical and online developments. Also introduce both main R packages that are used throughout the book.
Summarize their main features, differences, and limitations.
### Challenges and Open Research Questions {#Ch1-LinkedMicromapPlotsChallenges}
One of the main challenges regarding linked micromap plots\index{Linked micromap plot}
relates to geographic regions with many subareas. The challenge is that more
than twelve perceptual groups\index{Perceptual group} with five subregions
each typically do not fit on a single print page.
Possible solutions have been
discussed for U.S. states with many counties such as Iowa and Tennessee
that have between 60 and 120 counties [@Carr2001].
However, those were specially tailored solutions for these two states.
Neither of the two current R packages for linked micromap plots\index{Linked micromap plot}
supports similar solutions.
Both current R packages only allow to stitch together multiple linked micromap plots,\index{Linked micromap plot}
e.g., for the 255 counties of Texas [@PMWOK2015JSS]
or for the 55 sub-boroughs of New York City [@MPS2019ASA;@Medri2021],
which are less sophisticated than the designs in @Carr2001.
Other conceptual challenges for linked micromap plots\index{Linked micromap plot}
are imposed by the underlying geographic regions. No good solutions have been
developed so far for long and narrow countries such as Chile where many of the
provinces are so narrow that they can be barely colored without using
a considerable distortion of the map.
Problems exist if the geographic region consists of many small polygons, such
as the numerous islands of the Philippines and Indonesia.
Problems also exist when there are numerous big and numerous small subregions
that cannot be resized in any obvious way as is the case
for the 89 federal subjects of Russia where the largest subregions are
about three orders of magnitude bigger than the smallest ones.
## Conditioned Micromaps {#Ch1-ConditionedMicromaps}
### Historical Development {#Ch1-ConditionedMicromapsHistoricalDevelopment}
### Design, Features, and Interpretation {#Ch1-ConditionedMicromapsDesignFeaturesInterpretation}
### Software and Online Developments {#Ch1-ConditionedMicromapsSoftwareDevelopments}
@Baulier2011 created conditioned micromaps\index{Conditioned micromaps} in JMP.
See this ccmap macro from Friendly for CCmaps in SAS:
http://euclid.psych.yorku.ca/datavis/sasmac/ccmap.html
### Challenges and Open Research Questions {#Ch1-ConditionedMicromapsChallenges}
## Comparative Micromaps {#Ch1-ComparativeMicromaps}
### Historical Development {#Ch1-ComparativeMicromapsHistoricalDevelopment}
### Design, Features, and Interpretation {#Ch1-ComparativeMicromapssDesignFeaturesInterpretation}
### Software and Online Developments {#Ch1-ComparativeMicromapsSoftwareDevelopments}
@Baulier2011 created comparative micromaps\index{Comparative micromaps} in JMP.
### Challenges and Open Research Questions {#Ch1-ComparativeMicromapsChallenges}
## Outlook on the Book Chapters {#Ch1-Outlook}
## How to Use this Book and the Accompanying R Code, Webpage, and R Package {#Ch1-HowToUse}
## INSTRUCTIONS FOR CHAPTER AUTHORS: Use of Bookdown {#Ch1-InstructionsChapterAuthors}
Please look at the source code for this chapter (i.e., the file `01-introduction.Rmd`) carefully.
It provides an overview how to label and reference other chapters, sections, figures, and tables in your chapter.
It also outlines how and what to index and how to include citations in your chapter.
Eventually, this chapter will become the real introduction for our `Micromap Plots in R` book.
Minimal templates for actual chapters can be found in the files `02-micromap.Rmd`, `03-micromapST.Rmd`, etc.
The ultimate summary for this chapter will be based on the following text:
This chapter will provide a brief overview of the history of micromap plots,
main application areas, and existing software for the creation of micromaps.
A summary of the following eleven chapters of this book will also be provided.
## Use of Bookdown {#Ch1-Bookdown}
This section contains some basic information related to bookdown. For further details, see
https://bookdown.org/
and specifically
https://bookdown.org/yihui/bookdown/.
For an overview how to use bookdown for CRC Press / Taylor & Francis books, see
https://yihui.org/en/2018/08/bookdown-crc/
and
https://www.routledge.com/bookdown-Authoring-Books-and-Technical-Documents-with-R-Markdown/Xie/p/book/9781138700109.
## Creating Figures and Tables {#Ch1-FigsAndTables}
Here are some basic examples how to create figures and tables in bookdown.
We have a figure in Figure \@ref(fig:Ch1-CarsScatterplot) that makes
use of the _cars_\index{Datasets!cars} dataset
and also a table in Table \@ref(tab:Ch1-IrisTable) that makes
use of the _iris_\index{Datasets!iris} dataset.
See these examples how to create automatic figure and table numbers in your R code chunks
and how to reference them in the main text. Also, please cite all datasets that
are used in your chapter.
Use meaningful identifiers that start with the letters **Ch**,
followed by the number of your chapter and a dash (such as Ch1-, Ch2-, etc.) so that we
can eventually cross-reference figures across chapters (and also avoid that the
same identifier is used more than once in different chapters).
```{r Ch1-CarsScatterplot, out.width = '90%', fig.cap = 'A trivial scatterplot of the cars dataset.'}
par(mar = c(4, 4, 1, 0.1))
plot(cars, pch = 19)
```
```{r Ch1-IrisTable}
knitr::kable(
head(iris),
caption = "A table of the iris data.",
booktabs = TRUE
)
```
## Indexing {#Ch1-Indexing}
As already done in the previous sections, index all R packages such as
**ggplot2**\index{R Packages!ggplot2} and
**ggmap**\index{R Packages!ggmap} R packages.
Also provide an index entry for all datasets, such as the
_cars_\index{Datasets!cars} and the _iris_\index{Datasets!iris} datasets.
For R packages and datasets, use the actual R spelling. Do not change the capitalization.
One final word on indexing: Please capitalize the first word of the index entry in a sequence of words, e.g.,
perceptual group,\index{Perceptual group}
color blindness,\index{Color blindness},
and quantile-quantile plot.\index{Quantile-quantile plot}
In case you want to use abbreviations, please introduce them first, e.g.,
linked micromap plots\index{Linked micromap plot} (LMplots\index{LMplot|see {Linked micromap plot}}) and
conditioned choropleth maps\index{Conditioned choropleth map} (CCmaps\index{CCmap|see {Conditioned choropleth map}}).
Introduce a cross-reference index entry for the abbreviation (see the examples above),
but always list the full length-index entry for the index.
See https://en.wikibooks.org/wiki/LaTeX/Indexing
for other indexing options.
## Citations and References {#Ch1-CitationsReferences}
Here are some examples for citations: @Xie2022knitr and @Xie2022bookdown are references from the preface.
These citations appear as nouns in the text.
These are some micromap articles, book chapters, and books [@Carr2001;@SC2008;@CP2010].
These are references for the **micromap**\index{R Packages!micromap} [@PaOl2015] and
**micromapST**\index{R Packages!micromapST} [@CP2015CRAN] R packages.
All of these citations appear in parentheses.
**Note the use of the semicolon in the first set of articles, book chapters, and books.**
Also note that three different bib files have been used here to create the final bibliography.
See here for further details on citations in bookdown:
https://bookdown.org/yihui/bookdown/citations.html.
**I have provided an updated bibtex file with a large number of micromap-related references,
called `referencesMicromaps.bib` - see
https://github.com/symanzik/MicromapPlotsInR/blob/master/referencesMicromaps.bib.
See Appendix \@ref(Ch99-MicromapReferenceOverview) for the references that are already listed
in the bibtex file.
Most of the underlying articles, book chapters, posters, etc. have been made available via a Box folder now.
You should have received an e-mail invitation to this Box folder on 3/25/2022.
If you did not receive such an invitation or cannot access the files,
please let me know.**
## Micromap Examples {#Ch1-MicromapExamples}
While Section \@ref(Ch1-FigsAndTables) discussed general figure and table creation, this section focuses on micromaps.
The following examples have been taken from the `lmplot()` help page of the **micromap**\index{R Packages!micromap} R package.
Figure \@ref(fig:Ch1-micromap1) shows the first basic linked micromap plot\index{Linked micromap plot}
that makes use of the _USstates_\index{Datasets!USstates} and _edPov_\index{Datasets!edPov} datasets.
Overall, write and format your R code according to the tidyverse R style, summarized at
https://style.tidyverse.org/index.html.