🏫 School of Computing, Southern Adventist University, PO Box 370, Collegedale, TN, 37315, USA.
🏫 Geoscience Research Institute, 11060 Campus Street, Loma Linda, CA, 92350, USA.
Corresponding author(s). E-mail(s): [email protected]
Contributing authors: [email protected], [email protected]
Bi-plots are commonly used in geochemical analyses. However, their use can be- come cumbersome in the case of multi-variate analyses. Therefore, this paper explores the application of unsupervised machine learning techniques, specifically PCA and K-Means, to analyze large geochemical data sets from two distinct ge- ological regions, Hawaii and the Peninsular Ranges Batholith (PRB) in Southern California. The IBM Foundational Methodology for Data Science was utilized to ensure proper data preparation and analysis. PCA provided dimensionality reduc- tion, revealing which features correlated most strongly with variances within the data. K-Means clustering allowed for deeper interpretation of the data. The anal- ysis yielded valuable insights into the composition and differentiation of magma and rocks from the two regions. Future work should include a deeper analysis of the clusters and a determination of how geochemical plots relate to underlying geochemical processes.
Keywords: Geochemistry, PCA, K-Means, Machine Learning, Hawaii, Peninsular Ranges Batholith (PRB)
-
Dhar, V.: Data science and prediction. Communications of the ACM 56 (12), 64–73 (2013)
-
Snijders, C., Matzat, U., Reips, U.-D.: Big data: big gaps of knowledge in the field of internet science. International Journal of Internet Science 7 (1), 1–5 (2012)
-
Jiao, S., Zhang, Q., Zhou, Y., Chen, W., Liu, X., Gopalakrishnan, G.: Progress and challenges of big data research on petrology and geochemistry. Solid Earth Sciences 3 (4), 105–114 (2018)
-
Sand, K., Wasser, L., Herwehe, L.: Why Earth data scientists are in demand (2019). https://www.earthdatascience.org/blog/earth-data-scientist-demand/
-
McGovern, A., Allen, J.: Training the next generation of physical data scientists. Eos 102, 1–9 (2021)
-
Pearce, J.A., Harris, N.B., Tindle, A.G.: Trace element discrimination diagrams for the tectonic interpretation of granitic rocks. Journal of Petrology 25 (4), 956–983 (1984)
-
Rollinson, H.R.: Using Geochemical Data: Evaluation, Presentation, Interpretation. Routledge, (2014)
-
Alf ́erez, G.H., Esteban, O.A., Clausen, B.L., Mart ́ınez Ardila, A.M.: Automated machine learning pipeline for geochemical analysis. Earth Science Informatics 15, 1–16 (2022)
-
Dramsch, J.S.: 70 years of machine learning in geoscience in review. Advances in Geophysics 61, 1–55 (2020)
-
Rouet-Leduc, B., Hulbert, C., Lubbers, N., Barros, K., Humphreys, C.J., Johnson, P.A.: Machine learning predicts laboratory earthquakes. Geophysical Research Letters 44 (18), 9276–9282 (2017)
-
Ham, F.M., Iyengar, I., Hambebo, B.M., Garces, M., Deaton, J., Perttu, A., Williams, B.: A neurocomputing approach for monitoring plinian volcanic eruptions using infrasound. Procedia Computer Science 13 , 7–17 (2012)
-
Korup, O., Stolle, A.: Landslide prediction from machine learning. Geology Today 30 (1), 26–33 (2014)
-
Kuwatani, T., Nagata, K., Okada, M., Watanabe, T., Ogawa, Y., Komai, T.,Tsuchiya, N.: Machine-learning techniques for geochemical discrimination of 2011Tohoku tsunami deposits. Scientific Reports 4 (1), 7077 (2014)
-
Zuo, R., Carranza, E.J.M.: Support vector machine: A tool for mapping mineralprospectivity. Computers & Geosciences 37 (12), 1967–1975 (2011)
-
Ding, C., He, X.: K-means clustering via principal component analysis. Pro-ceedings of the twenty-first international conference on Machine learning 4 , 29(2004)
-
Cooper, J.N., Cooper, A.M., Clausen, B.L., Nick, K.E.: Regional bedrockgeochemistry associated with podoconiosis evaluated by multivariate analysis.Environmental Geochemistry and Health 41 (2), 649–665 (2019)
-
Perkel, J.M.: Why jupyter is data scientists’ computational notebook of choice.Nature 563 (7732), 145–147 (2018)
-
Ihaka, R., Gentleman, R.: R: a language for data analysis and graphics. Journalof Computational and Graphical Statistics 5 (3), 299–314 (1996)
-
Wickham, H., Bryan, J., Kalicinski, M., Valery, K., Leitienne, C., Colbert, B.,Hoerl, D., Miller, E., Bryan, M.J.: Package ‘readxl’. Version, 1.3 1 (2019)
-
Wickham, H., Averick, M., Bryan, J., Chang, W., McGowan, L.D., Fran ̧cois, R.,Grolemund, G., Hayes, A., Henry, L., Hester, J.,et al.: Welcome to the tidyverse.Journal of Open Source Software 4 (43), 1686 (2019)
-
Kassambara, A., Mundt, F.: Package ’factoextra’: Extract and visualize theresults of multivariate data analyses (2017)
-
McKenna, S., Meyer, M., Gregg, C., Gerber, S.: s-corrplot: an interactive scatter-plot for exploring correlation. Journal of Computational and Graphical Statistics25 (2), 445–463 (2016)
-
Adler, D., Nenadic, O., Zucchini, W.: Rgl: A r-library for 3d visualization withopengl (2003)
-
Angel, E., Shreiner, D.: Interactive Computer Graphics with WebGL. Addison-Wesley Professional, (2014)
-
Rollins, J.: Foundational Methodology for Data Science, IBM Anal.(2015)
-
Alf ́erez, G.H., V ́azquez, E.L., Mart ́ınez Ardila, A.M., Clausen, B.L.: Automaticclassification of plutonic rocks with deep learning. Applied Computing andGeosciences 10 , 100061 (2021)
-
Alf ́erez, G.H., Hern ́andez Serrano, S., Mart ́ınez Ardila, A.M., Clausen, B.L.:Automatic classification of plutonic rocks with machine learning applied to ex-tracted shades and colors on ios devices. Proceedings of the Future TechnologiesConference (FTC) 2021 1 , 72–88 (2022). Springer
-
Alf ́erez, G.H., Rodr ́ıguez, J., Clausen, B., Pompe, L.: Interpreting the geochem-istry of southern California granitic rocks using machine learning. In: Proceedingson the International Conference on Artificial Intelligence (ICAI), vol. 2015, p. 592(2015)
-
Boschetty, F.O., Ferguson, D.J., Cort ́es, J.A., Morgado, E., Ebmeier, S.K., Mor-gan, D.J., Romero, J.E., Silva Parejas, C.: Insights into magma storage beneatha frequently erupting arc volcano (Villarrica, Chile) from unsupervised machinelearning analysis of mineral compositions. Geochemistry, Geophysics, Geosystems23 (4), 2022–010333 (2022)
-
Stracke, A., Willig, M., Genske, F., B ́eguelin, P., Todd, E.: Chemical geody-namics insights from a machine learning approach. Geochemistry, Geophysics,Geosystems 23 (10), 2022–010606 (2022)
-
Jiang, Y., Guo, H., Jia, Y., Cao, Y., Hu, C.: Principal component analysis andhierarchical cluster analyses of arsenic groundwater geochemistry in the Hetaobasin, inner Mongolia. Geochemistry 75 (2), 197–205 (2015)
-
Betrie, G.D., Tesfamariam, S., Morin, K.A., Sadiq, R.: Predicting copper concen-trations in acid mine drainage: a comparative analysis of five machine learningtechniques. Environmental Monitoring and Assessment 185 , 4171–4182 (2013)
-
Ellefsen, K.J., Smith, D.B.: Manual hierarchical clustering of regional geochemicaldata using a bayesian finite mixture model. Applied Geochemistry 75 , 200–210(2016)
-
Hasterok, D., Gard, M., Bishop, C., Kelsey, D.: Chemical identification of meta-morphic protoliths using machine learning methods. Computers & Geosciences132 , 56–68 (2019)
-
Itano, K., Ueki, K., Iizuka, T., Kuwatani, T.: Geochemical discrimination of mon-azite source rock based on machine learning techniques and multinomial logisticregression analysis. Geosciences 10 (2), 63 (2020)
-
Pearce, J.A., Harris, N.B., Tindle, A.G.: Trace element discrimination diagramsfor the tectonic interpretation of granitic rocks. Journal of Petrology 25 (4), 956–983 (1984)
-
Bosquez, S., Alf ́erez, G.H., Mart ́ınez Ardila, A.M., Clausen, B.L.: Automatic clas-sification of felsic, mafic, and ultramafic rocks in satellite images from palmira andla victoria, colombia. Intelligent Computing: Proceedings of the 2022 ComputingConference 2 , 531–547 (2022). Springer
-
Beier, C., Turner, S.P., Haase, K.M., Pearce, J.A., M ̈unker, C., Regelous, M.:Trace element and isotope geochemistry of the northern and central Tongan is-lands with an emphasis on the genesis of high Nb/Ta signatures at the northernvolcanoes of Tafahi and Niuatoputapu. Journal of Petrology 58 (6), 1073–1106(2017)
-
Hodson, T.O.: Root-mean-square error (RMSE) or mean absolute error (MAE):when to use them or not. Geoscientific Model Development 15 (14), 5481–5487(2022)
-
White, W.M.: Geochemistry. John Wiley & Sons, (2020)
-
Buccianti, A., Grunsky, E.: Compositional data analysis in geochemistry: are wesure to see what really occurs during natural processes? Elsevier (2014)
-
Aitchison, J.: The statistical analysis of compositional data. Journal of the RoyalStatistical Society: Series B (Methodological) 44 (2), 139–160 (1982)