From ff69ec1a6e44332d05b380e92c75b05ce2b753b1 Mon Sep 17 00:00:00 2001 From: Sangjoon Bob Lee Date: Tue, 25 Jun 2024 12:26:28 -0400 Subject: [PATCH] Update README.md --- README.md | 183 ++++++++++++++++++++++++++++-------------------------- 1 file changed, 94 insertions(+), 89 deletions(-) diff --git a/README.md b/README.md index bcb54fe..6df92e2 100644 --- a/README.md +++ b/README.md @@ -4,105 +4,98 @@ [![Integration tests](https://github.com/bobleesj/cif-bond-analyzer/actions/workflows/python-run-pytest.yml/badge.svg)](https://github.com/bobleesj/cif-bond-analyzer/actions/workflows/python-run-pytest.yml) ![Python 3.10](https://img.shields.io/badge/python-3.10-blue.svg) ![Python 3.11](https://img.shields.io/badge/python-3.11-blue.svg) ![Python 3.11](https://img.shields.io/badge/python-3.12-blue.svg) -## Description -CIF Bond Analyzer (CBA) is an interactive, command-line Python application designed for the high-throughput extraction of bonding information from CIF (Crystallographic Information File) file. +CIF Bond Analyzer (CBA) is an interactive, command-line-based application designed for the high-throughput extraction of bonding information from CIF (Crystallographic Information File) files. CBA offers (1) Site Analysis (2) System Analysis for binary/ternary systems, and (3) Coordination Analysis. The outputs are saved in .json, xlsl, and .png formats. -## Overview +The current README.md serves as a tutorial and documentation. -Defect .cif files. CBA is a prompt-based and codeless application built in Python. To begin, CBA detects folders containing .cif files located at the project level. It also counts .cif files that are nested within the folder. +## Demo -Preprocess .cif files and standarlize site labels. Due to atomic mixing, site labels may have a comma and symbols such as `M` is used. CBA reformats them that is easily parsable into an element. Also, we noticed that many files have problems with the author section and publication, we also remove the author loop section. +The code is designed to be used interactively without writing any code. -Move ill-formatted files. -4. Choose one of the options -5. Generate a unitcell and a supercell by applying +-1, +-1, +-1 shifts in fractional coordinates. +![CBA-demo-gif](https://github.com/bobleesj/cif-bond-analyzer/assets/14892262/fad16f21-93d8-4954-8efe-c04fbc68a9b7) -6. Generate a supercell for each file and determine the shortest distance and pair from each atomic site. The atomic site is selected based on the atom with the greatest number of minimum distances in the surrounding atoms. +## Installation and tutorial -## Demo +Copy each line to command-line applications. -The program has been designed to be run with intuitive user-interactive commands only. +```text +$ git clone https://github.com/bobleesj/cif-bond-analyzer.git +$ cd cif-bond-analyzer +$ pip install -r requirements.txt +$ python main.py +``` -![CBA-demo-gif](https://github.com/bobleesj/cif-bond-analyzer/assets/14892262/fad16f21-93d8-4954-8efe-c04fbc68a9b7) +Once the code is executed using `python main.py`, the following prompt will appear and ask to choose one of the three analysis options: +```text +Welcome! Please choose an option to proceed: +[1] Conduct site analysis. +[2] Conduct system analysis. +[3] Conduct coordination analysis. +Enter your choice (1-3): 1 +``` -## How to use +For any option, CBA asks you to choose folder(s) containing .cif files. -Download required depdencies. The code has been tested on Python version 3.10, 3.11, 3.12. +```text -```bash -pip install -r requirements.txt +Folders with .cif files: +1. 20240623_ErCoIn_nested, 16 files, 136 nested files +2. 20240612_ternary_only, 2 files +3. 20240611_ternary_binary_combined, 5 files +4. 20240623_teranry_3_unique_elements, 3 files +5. 20240611_binary_2_unique_elements, 4 files + +Would you like to process each folder above sequentially? +(Default: Y) [Y/n]: ``` -Run via: +You may then choose to process folders either sequentially or choose specific folders by entering numbers associated with the folders prompted. -```bash -python main.py -``` +## Preprocess +The following discusses formatting and supercell generation methods. -## Options +### 1. Format files -CBA supports 3 options belows. +CBA uses the `CifEnsemble` object from `cifkit` to conduct preprocessing automatically. -```text -python main.py +CBA standardizes the site labels in `atom_site_label`. Some site labels may contain a comma or a symbol such as `M` due to atomic mixing. CBA reformats each `atom_site_label` so it can be parsed into an element type that matches `atom_site_type_symbol`. -Welcome! Please choose an option to proceed: -[1] Conduct site analysis. -[2] Conduct system analysis. -[3] Conduct coordination analysis. -Enter your choice (1-3): -``` +CBA removes the content of `publ_author_address`. This section often has an incorrect format which otherwise requires manual modifications. +CBA relocates any ill-formatted files such as duplicate labels in `atom_site_label`, missing fractional coordinates, and files that generate a supercell. -### Option 1. Site Analysis +### 2. Supercell generation -From a single `.cif` file, a supercell is generated and determines the shortest distance and the connecting site. +For each `.cif` file, a unit cell is generated by applying the symmetry operations. Supercell is generated by applying +-1, +-1, +-1 shifts from the unit cell. -#### Output 1.1 text summary +### 3. Atomic mixing info -A text file `summary.txt` is generated in the folder to provide an overview of the shortest bonding pairs and missing pairs in the selected folders. +Each bonding pair is defined with one of the four atomic mixing categories: -```txt -Summary: -Pair: In-In, Count: 4, Distances: 2.736, 2.782, 2.785, 2.793 -Pair: Pd-Ge, Count: 4, Distances: 2.449, 2.455, 2.489, 2.672 -Pair: Pd-Sb, Count: 4, Distances: 2.505, 2.700, 2.737, 2.793 -Pair: Si-Si, Count: 4, Distances: 1.975, 2.289, 2.325, 2.533 -Pair: Rh-Ge, Count: 2, Distances: 2.484, 2.495 -Pair: Ru-Si, Count: 2, Distances: 2.394, 2.519 -Pair: Sb-Sb, Count: 2, Distances: 2.573, 2.793 -Pair: Co-Ga, Count: 1, Distances: 2.485 -Pair: Co-Sb, Count: 1, Distances: 2.594 -Pair: Co-Sn, Count: 1, Distances: 2.737 +- **Full occupancy** is assigned when a single atomic site occupies the fractional coordinate with an occupancy value of 1. +- **Full occupancy with mixing** is assigned when multiple atomic sites collectively occupy the fractional coordinate to a sum of 1. +- **Deficiency without mixing** is assigned when a single atomic site occupying the fractional coordinate with a sum less than 1. +- **Deficiency with atomic mixing** is assigned when multiple atomic sites occupy the fractional coordinate with a sum less than 1. -Missing pairs: -Co-In -Co-Ir -Co-Ni -Co-Pd -Co-Pt -Co-Rh -Co-Si -Fe-Co -``` +## Analysis Options -#### Output 1.2 histograms +CBA provides 3 options for analysis. -In the `output` folder, histograms per shortest pair distance from each atom will be saved. +### Option 1. Site Analysis -![Histograms for label pair](https://s9.gifyu.com/images/SViMv.png) +Site Analysis determines the shortest distance and its nearest neighbor for each label in `atom_site_label`. -To modify the histograms, run `python plot-histogram.py`. This script allows you to interactively specify parameters, such as the bin width and x-axis range: +For each atom in the unit cell, Euclidean distances are calculated from the atom to all atoms in the supercell. The position of the atom in the unit cell for each site label is determined based on the atom with the greatest number of shortest distances to its neighbors. -#### Output 1.3 Excel and JSON +Assume `.cif` contains four site labels: `Er1`, `Er2`, `Er3`, and `Er4`. The bonding pair from the site label `Er4` and its nearest neighbor `Er2` is unique and recorded. The bonding pair from `Er3` to `Er2` is also considered unique. However, the pairs `Er4-Er2` and `Er2-Er4` are considered identical. Out of the two pairs, the one with the shorter distance is recorded. -For each folder, CBA generates `.xlsx` and `.json` files containing the shortest distance and the connecting site from each reference site. +#### Output 1.1 Excel and JSON - It also determines the atomic mixing and occupacny information at the pair level. It extracts the tag from the .cif file if provided. +For each folder, CBA generates `.xlsx` and `.json` files containing site data described above. It also determines the atomic mixing and occupacny information at the pair level. It extracts the tag from the .cif file if provided. `site_pairs.json` is produced shown below. @@ -119,13 +112,6 @@ For each folder, CBA generates `.xlsx` and `.json` files containing the shortest } ], "1955204": [ - { - "dist": 2.404, - "mixing": "full_occupancy", - "formula": "Er2Co17", - "tag": "hex", - "structure": "Th2Ni17" - }, { "dist": 2.46, "mixing": "full_occupancy", @@ -140,16 +126,7 @@ For each folder, CBA generates `.xlsx` and `.json` files containing the shortest "tag": "hex", "structure": "Th2Ni17" } - ], - "1644636": [ - { - "dist": 2.49, - "mixing": "full_occupancy", - "formula": "ErCo2", - "tag": "lt", - "structure": "TbFe2" - } - ], + ] } } ``` @@ -176,15 +153,6 @@ For each folder, CBA generates `.xlsx` and `.json` files containing the shortest "tag": "hex", "structure": "Th2Ni17" } - ], - "1644636": [ - { - "dist": 2.49, - "mixing": "full_occupancy", - "formula": "ErCo2", - "tag": "lt", - "structure": "TbFe2" - } ] } } @@ -194,6 +162,43 @@ An Excel file containing the information and each sheet having the bond pair. ![Excel screenshot](https://github.com/bobleesj/cif-bond-analyzer/assets/14892262/d6bed0df-b9ea-4922-967b-4656bb3ab3e0) +#### Output 1.2 text summary + +A text file `output/summary_element.txt` is generated in the folder to provide an overview of the shortest bonding pairs and missing pairs in the selected folders across all files. + +```txt +Summary: +Pair: In-In, Count: 4, Distances: 2.736, 2.782, 2.785, 2.793 +Pair: Pd-Ge, Count: 4, Distances: 2.449, 2.455, 2.489, 2.672 +Pair: Pd-Sb, Count: 4, Distances: 2.505, 2.700, 2.737, 2.793 +Pair: Si-Si, Count: 4, Distances: 1.975, 2.289, 2.325, 2.533 +Pair: Rh-Ge, Count: 2, Distances: 2.484, 2.495 +Pair: Ru-Si, Count: 2, Distances: 2.394, 2.519 +Pair: Sb-Sb, Count: 2, Distances: 2.573, 2.793 +Pair: Co-Ga, Count: 1, Distances: 2.485 +Pair: Co-Sb, Count: 1, Distances: 2.594 +Pair: Co-Sn, Count: 1, Distances: 2.737 + +Missing pairs: +Co-In +Co-Ir +Co-Ni +Co-Pd +Co-Pt +Co-Rh +Co-Si +Fe-Co +``` + +#### Output 1.3 histograms + +In the `output` folder, histograms per shortest pair distance from each atom will be saved. + +![Histograms for label pair](https://s9.gifyu.com/images/SViMv.png) + +To modify the histograms, run `python plot-histogram.py`. This script allows you to interactively specify parameters, such as the bin width and x-axis range: + + ### Option 2. System Analysis