From f5083ea72f177cc0b47ea15cd31a301098b0d4f7 Mon Sep 17 00:00:00 2001 From: Sangjoon Bob Lee Date: Tue, 25 Jun 2024 15:16:24 -0400 Subject: [PATCH] Update README.md --- .github/ISSUE_TEMPLATE/bug_report.md | 24 ++- .github/ISSUE_TEMPLATE/feature_request.md | 7 +- README.md | 246 ++++++++++------------ 3 files changed, 128 insertions(+), 149 deletions(-) diff --git a/.github/ISSUE_TEMPLATE/bug_report.md b/.github/ISSUE_TEMPLATE/bug_report.md index dd84ea7..9b77ea7 100644 --- a/.github/ISSUE_TEMPLATE/bug_report.md +++ b/.github/ISSUE_TEMPLATE/bug_report.md @@ -1,10 +1,9 @@ --- name: Bug report about: Create a report to help us improve -title: '' -labels: '' -assignees: '' - +title: "" +labels: "" +assignees: "" --- **Describe the bug** @@ -12,6 +11,7 @@ A clear and concise description of what the bug is. **To Reproduce** Steps to reproduce the behavior: + 1. Go to '...' 2. Click on '....' 3. Scroll down to '....' @@ -24,15 +24,17 @@ A clear and concise description of what you expected to happen. If applicable, add screenshots to help explain your problem. **Desktop (please complete the following information):** - - OS: [e.g. iOS] - - Browser [e.g. chrome, safari] - - Version [e.g. 22] + +- OS: [e.g. iOS] +- Browser [e.g. chrome, safari] +- Version [e.g. 22] **Smartphone (please complete the following information):** - - Device: [e.g. iPhone6] - - OS: [e.g. iOS8.1] - - Browser [e.g. stock browser, safari] - - Version [e.g. 22] + +- Device: [e.g. iPhone6] +- OS: [e.g. iOS8.1] +- Browser [e.g. stock browser, safari] +- Version [e.g. 22] **Additional context** Add any other context about the problem here. diff --git a/.github/ISSUE_TEMPLATE/feature_request.md b/.github/ISSUE_TEMPLATE/feature_request.md index bbcbbe7..2bc5d5f 100644 --- a/.github/ISSUE_TEMPLATE/feature_request.md +++ b/.github/ISSUE_TEMPLATE/feature_request.md @@ -1,10 +1,9 @@ --- name: Feature request about: Suggest an idea for this project -title: '' -labels: '' -assignees: '' - +title: "" +labels: "" +assignees: "" --- **Is your feature request related to a problem? Please describe.** diff --git a/README.md b/README.md index 6df92e2..0bab605 100644 --- a/README.md +++ b/README.md @@ -2,23 +2,26 @@ ![Header](https://s9.gifyu.com/images/SViLp.png) -[![Integration tests](https://github.com/bobleesj/cif-bond-analyzer/actions/workflows/python-run-pytest.yml/badge.svg)](https://github.com/bobleesj/cif-bond-analyzer/actions/workflows/python-run-pytest.yml) ![Python 3.10](https://img.shields.io/badge/python-3.10-blue.svg) ![Python 3.11](https://img.shields.io/badge/python-3.11-blue.svg) ![Python 3.11](https://img.shields.io/badge/python-3.12-blue.svg) +[![Integration tests](https://github.com/bobleesj/cif-bond-analyzer/actions/workflows/python-run-pytest.yml/badge.svg)](https://github.com/bobleesj/cif-bond-analyzer/actions/workflows/python-run-pytest.yml) ![Python 3.10](https://img.shields.io/badge/python-3.10-blue.svg) ![Python 3.11](https://img.shields.io/badge/python-3.11-blue.svg) ![Python 3.11](https://img.shields.io/badge/python-3.12-blue.svg) - -CIF Bond Analyzer (CBA) is an interactive, command-line-based application designed for the high-throughput extraction of bonding information from CIF (Crystallographic Information File) files. CBA offers (1) Site Analysis (2) System Analysis for binary/ternary systems, and (3) Coordination Analysis. The outputs are saved in .json, xlsl, and .png formats. +The CIF Bond Analyzer (CBA) is an interactive, command-line-based application designed for high-throughput extraction of bonding information from CIF (Crystallographic Information File) files. CBA offers Site Analysis, System Analysis for binary/ternary systems, and Coordination Analysis. The outputs are saved in `.json`, `.xlsx`, and `.png `formats. The current README.md serves as a tutorial and documentation. +## Value + +CBA simplifies crystal structure analysis by automating the extraction of minimum bond lengths, which are crucial for understanding geometric configurations and identifying irregularities. Histograms and figures assist in identifying distinct bond lengths and structural patterns. + + ## Demo -The code is designed to be used interactively without writing any code. +The code is designed for interactive use without the need to write any code. ![CBA-demo-gif](https://github.com/bobleesj/cif-bond-analyzer/assets/14892262/fad16f21-93d8-4954-8efe-c04fbc68a9b7) - ## Installation and tutorial -Copy each line to command-line applications. +Copy each line into your command-line applications: ```text $ git clone https://github.com/bobleesj/cif-bond-analyzer.git @@ -27,7 +30,7 @@ $ pip install -r requirements.txt $ python main.py ``` -Once the code is executed using `python main.py`, the following prompt will appear and ask to choose one of the three analysis options: +Once the code is executed using `python main.py`, the following prompt will appear, asking you to choose one of the three analysis options: ```text Welcome! Please choose an option to proceed: @@ -37,7 +40,7 @@ Welcome! Please choose an option to proceed: Enter your choice (1-3): 1 ``` -For any option, CBA asks you to choose folder(s) containing .cif files. +For any option, CBA will ask you to choose folders containing `.cif` files: ```text @@ -49,32 +52,33 @@ Folders with .cif files: 5. 20240611_binary_2_unique_elements, 4 files Would you like to process each folder above sequentially? -(Default: Y) [Y/n]: +(Default: Y) [Y/n]: ``` -You may then choose to process folders either sequentially or choose specific folders by entering numbers associated with the folders prompted. +You may then choose to process folders either sequentially or select specific folders by entering numbers associated with the folders prompted. +For each folder, CBA generates site pair data saved in `site_pairs.json` or `site_pairs.xlsx`. ## Preprocess -The following discusses formatting and supercell generation methods. +The following discusses formatting, supercell generation, and atomic mixing information. ### 1. Format files CBA uses the `CifEnsemble` object from `cifkit` to conduct preprocessing automatically. -CBA standardizes the site labels in `atom_site_label`. Some site labels may contain a comma or a symbol such as `M` due to atomic mixing. CBA reformats each `atom_site_label` so it can be parsed into an element type that matches `atom_site_type_symbol`. +- CBA standardizes the site labels in `atom_site_label`. Some site labels may contain a comma or a symbol such as `M` due to atomic mixing. CBA reformats each `atom_site_label` so it can be parsed into an element type that matches `atom_site_type_symbol`. -CBA removes the content of `publ_author_address`. This section often has an incorrect format which otherwise requires manual modifications. +- CBA removes the content of `publ_author_address`. This section often has an incorrect format that otherwise requires manual modifications. -CBA relocates any ill-formatted files such as duplicate labels in `atom_site_label`, missing fractional coordinates, and files that generate a supercell. +- CBA relocates any ill-formatted files, such as those with duplicate labels in `atom_site_label`, missing fractional coordinates, or files that require supercell generation. ### 2. Supercell generation -For each `.cif` file, a unit cell is generated by applying the symmetry operations. Supercell is generated by applying +-1, +-1, +-1 shifts from the unit cell. +For each `.cif` file, a unit cell is generated by applying the symmetry operations. A supercell is generated by applying ±1 shifts from the unit cell. ### 3. Atomic mixing info -Each bonding pair is defined with one of the four atomic mixing categories: +Each bonding pair is defined with one of four atomic mixing categories: - **Full occupancy** is assigned when a single atomic site occupies the fractional coordinate with an occupancy value of 1. - **Full occupancy with mixing** is assigned when multiple atomic sites collectively occupy the fractional coordinate to a sum of 1. @@ -83,88 +87,86 @@ Each bonding pair is defined with one of the four atomic mixing categories: ## Analysis Options -CBA provides 3 options for analysis. +CBA provides three options for analysis. -### Option 1. Site Analysis +### Option 1. Site Analysis -Site Analysis determines the shortest distance and its nearest neighbor for each label in `atom_site_label`. +- **Purpose:** Site Analysis determines the shortest distance and its nearest neighbor for each label in `atom_site_label`. -For each atom in the unit cell, Euclidean distances are calculated from the atom to all atoms in the supercell. The position of the atom in the unit cell for each site label is determined based on the atom with the greatest number of shortest distances to its neighbors. +- **Process:** For each atom in the unit cell, Euclidean distances are calculated from the atom to all atoms in the supercell. The position of the atom in the unit cell for each site label is determined based on the atom with the greatest number of shortest distances to its neighbors. -Assume `.cif` contains four site labels: `Er1`, `Er2`, `Er3`, and `Er4`. The bonding pair from the site label `Er4` and its nearest neighbor `Er2` is unique and recorded. The bonding pair from `Er3` to `Er2` is also considered unique. However, the pairs `Er4-Er2` and `Er2-Er4` are considered identical. Out of the two pairs, the one with the shorter distance is recorded. +- **Example:** If a `.cif` file under `atom_site_label` contains four site labels: `Er1`, `Er2`, `Er3`, and `Er4`. The bonding pair from the site label `Er4` and its nearest neighbor `Er2` is unique and recorded. The bonding pair from `Er3` to `Er2` is also considered unique. However, the pairs `Er4-Er2` and `Er2-Er4` are considered identical. Out of the two pairs, the pair with the shorter distance is recorded below. #### Output 1.1 Excel and JSON -For each folder, CBA generates `.xlsx` and `.json` files containing site data described above. It also determines the atomic mixing and occupacny information at the pair level. It extracts the tag from the .cif file if provided. - -`site_pairs.json` is produced shown below. +Data for each folder is saved in `site_pairs.json` or `site_pairs.xlsx`. Below is an example of the JSON structure for bond pairs: ```json { - "Co-Co": { - "250361": [ - { - "dist": 2.529, - "mixing": "full_occupancy", - "formula": "ErCo2", - "tag": "rt", - "structure": "MgCu2" - } - ], - "1955204": [ - { - "dist": 2.46, - "mixing": "full_occupancy", - "formula": "Er2Co17", - "tag": "hex", - "structure": "Th2Ni17" - }, - { - "dist": 2.274, - "mixing": "full_occupancy", - "formula": "Er2Co17", - "tag": "hex", - "structure": "Th2Ni17" - } - ] - } + "Co-Co": { + "250361": [ + { + "dist": 2.529, + "mixing": "full_occupancy", + "formula": "ErCo2", + "tag": "rt", + "structure": "MgCu2" + } + ], + "1955204": [ + { + "dist": 2.46, + "mixing": "full_occupancy", + "formula": "Er2Co17", + "tag": "hex", + "structure": "Th2Ni17" + }, + { + "dist": 2.274, + "mixing": "full_occupancy", + "formula": "Er2Co17", + "tag": "hex", + "structure": "Th2Ni17" + } + ] + } } ``` -`element_pairs.json` is generated that it determines the shortest distance for each bond pair in a file. +The minimum bond pair for each file is saved in `element_pairs.json` and `element_pairs.xlsx`. ```json { - "Co-Co": { - "250361": [ - { - "dist": 2.529, - "mixing": "full_occupancy", - "formula": "ErCo2", - "tag": "rt", - "structure": "MgCu2" - } - ], - "1955204": [ - { - "dist": 2.274, - "mixing": "full_occupancy", - "formula": "Er2Co17", - "tag": "hex", - "structure": "Th2Ni17" - } - ] - } + "Co-Co": { + "250361": [ + { + "dist": 2.529, + "mixing": "full_occupancy", + "formula": "ErCo2", + "tag": "rt", + "structure": "MgCu2" + } + ], + "1955204": [ + { + "dist": 2.274, + "mixing": "full_occupancy", + "formula": "Er2Co17", + "tag": "hex", + "structure": "Th2Ni17" + } + ] + } } ``` -An Excel file containing the information and each sheet having the bond pair. +Here is a screenshot of `element_pairs.xlsx`. ![Excel screenshot](https://github.com/bobleesj/cif-bond-analyzer/assets/14892262/d6bed0df-b9ea-4922-967b-4656bb3ab3e0) #### Output 1.2 text summary -A text file `output/summary_element.txt` is generated in the folder to provide an overview of the shortest bonding pairs and missing pairs in the selected folders across all files. +A summary text file, `summary_element.txt`, lists the shortest bonding pairs and identifies missing pairs across selected folders: ```txt Summary: @@ -192,90 +194,76 @@ Fe-Co #### Output 1.3 histograms -In the `output` folder, histograms per shortest pair distance from each atom will be saved. +`histogram_element_pair.png` and `histogram_site_pair.png` are used visualize data, with colors indicating atomic mixing types. + +- To modify the x-axis, run `python plot-histogram.py`. This script allows you to interactively specify parameters such as the bin width and x-axis range: ![Histograms for label pair](https://s9.gifyu.com/images/SViMv.png) -To modify the histograms, run `python plot-histogram.py`. This script allows you to interactively specify parameters, such as the bin width and x-axis range: +### Option 2. System Analysis +- **Purpose:** System Analysis provides an overview of bond fractions acquired from Option 1: Site Analysis, or bond fractions in coordination number geometries. +- **Scope:** System Analysis is applicable for folders containing either 2 or 3 unique elements. -### Option 2. System Analysis +4 types of folders are applicable for System Analysis. -System Analyiss is applicable for a folder containing either 2 or 3 unique elements. Four types are possible. - -``` -4 types of folders are processed: - Type 1. Binary files, 2 unique elements - Type 2. Binary files, 3 unique elements - Type 3. Ternary files, 3 unique elements - Type 4. Ternary and binary combined, 3 unique elements -``` -Here is an example below. -``` + +Here is an example of CBA detecting folders containing 2 or 3 unique elements. + +````` Available folders containing 2 or 3 unique elements: 1. 20240623_ErCoIn_nested, 3 elements (In, Er, Co), 152 files 2. 20240612_ternary_only, 3 elements (In, Er, Co), 2 files 3. 20240611_ternary_binary_combined, 3 elements (In, Er, Co), 5 files 4. 20240623_teranry_3_unique_elements, 2 elements (Er, Co), 3 files 5. 20240611_binary_2_unique_elements, 2 elements (Er, Co), 4 files```` -``` - +````` #### Output 2.1 Binary/ternary figures -By deafult, all of the nested folders containing .cif files are automatically added. - -For Type 1, the following is generated. - -For Type 2, 3, 4, the following is generated. - -How to customize: -Customizaiton: You move the positino of the legend in the ternary diagram, you may modify the values of `X_SHIFT = 0.0` and `Y_SHIFT = 0.0` in `core/configs/ternary.py`. - -Individual hexagons are also produced. - -![composite_binary_1](https://github.com/bobleesj/cif-bond-analyzer/assets/14892262/2f0e7076-50cd-4356-8ca0-0714571d8944) - -![composite_ternary_1](https://github.com/bobleesj/cif-bond-analyzer/assets/14892262/5620581c-9764-4b27-bf99-14e15adbb73b) - -Ternary diagram +For Types 2, 3, and 4: ![ternary](https://github.com/bobleesj/cif-bond-analyzer/assets/14892262/7496f433-c218-49ac-8372-cb75a369e409) -Binary files +To customize the legend position in the ternary diagram, you may modify the values of `X_SHIFT = 0.0` and `Y_SHIFT = 0.0` in `core/configs/ternary.py`. -![binary_single](https://github.com/bobleesj/cif-bond-analyzer/assets/14892262/21f25fb3-79ea-4cd1-931d-ad5b3ea55189) +For Type 1: +![binary_single](https://github.com/bobleesj/cif-bond-analyzer/assets/14892262/21f25fb3-79ea-4cd1-931d-ad5b3ea55189) #### Output 2.2 Color map -Color map for each bond type and the overall is generated for Type 2, 3, 4 above. +For Types 2, 3, and 4, color maps for each bond type and overall are generated. + +![color_map_overall](https://github.com/bobleesj/cif-bond-analyzer/assets/14892262/f5ca3dd2-c6cb-40b8-aff9-af2be90c700f) + +![color_map_In-In](https://github.com/bobleesj/cif-bond-analyzer/assets/14892262/8f6bd208-9e4d-4dfe-a6a1-04b70af1aacc) #### Output 2.3 Excel -`system_analysis_files.xlsx` +Bond count per each `cif` file is recorded in `system_analysis_files.xlsx`. SA_main - -`system_analysis_main.xlsx` +Average bond lenghts, count, and statistical values are recorded in `system_analysis_main.xlsx`. SA_file - ### Option 3. Coordination Analysis -#### Ouput 3.1 JSON - -It determines the best cooridnation geometry using 4 methods provided in `cifkit`. Save Excel file and JSON on nearest neighbor info. +- **Purpose:** This option determines the best coordination geometry using four methods provided in `cifkit`. Excel files and JSON are saved with nearest neighbor information. -The Excel contains ∆ which is defined as the interactomic distance substracted by the sum of atomic radii. Note: For the CN methods, please refer to README.md. Note: ∆ is (interatomic distance - sum of atomic radii). -You may provide your radii values by modifying the radii.xlsx file. +- **Customization:** The Excel contains `Δ`, which is defined as the interatomic distance subtracted by the sum of atomic radii. You may provide your radii values by modifying the radii.xlsx file. +#### Ouput 3.1 JSON ```python { @@ -296,13 +284,6 @@ You may provide your radii values by modifying the radii.xlsx file. "neighbor": 2 }, ... - { - "connected_label": "Er", - "distance": 2.966, - "delta": -0.603, - "mixing": "full_occupancy", - "neighbor": 10 - }, { "connected_label": "Er", "distance": 2.966, @@ -328,10 +309,9 @@ A screenshot is provided below. Each sheet contains the file name and the formul CN_excel - ## Installation -```bash +```text git clone https://github.com/bobleesj/cif-bond-analyzer.git cd cif-bond-analyzer pip install -r requirements.txt @@ -340,7 +320,7 @@ python main.py If you are interested in using `Conda` with a new environment run the following: -```bash +```text git clone https://github.com/bobleesj/cif-bond-analyzer.git cd cif-bond-analyzer conda create -n cif python=3.12 @@ -359,15 +339,13 @@ python main.py Please feel free to reach out via sl5400@columbia.edu for any questions. - ## Changelog -- 20240623 - Implement CN bond fractions, refactor code, nested code, add integration tests -- 20240331 - Added integration test for JSON result verification. -- 20240330 - Added sequential folder processing and customizable histogram generation. See [Pull #16](https://github.com/bobleesj/cif-bond-analyzer/pull/16). -- 20240326 - Implemented automatic preprocessing and relocation of unsupported CIF files. -- 20240311 - Integrated PEP8 linting with `black`. See [Pull #12](https://github.com/bobleesj/cif-bond-analyzer/pull/12). -- 20240310 - Enhanced output options to include both element-based and label-based data for Excel, JSON, and histograms. See [Pull #11](https://github.com/bobleesj/cif-bond-analyzer/pull/11). -- 20240301 - Provided translation options for unit cells with more than 100 atoms, either in all ±1 directions or just +1 in each. -- 20240301 - Displayed atom counts and execution time per file in Terminal; added CSV logging. -- 20240229 - Expanded file support to include all CIF files. +- 20240623 - Implement CN bond fractions, add GitHub CI. See Pull #17. +- 20240330 - Add sequential folder processing and customizable histogram generation. See [Pull #16](https://github.com/bobleesj/cif-bond-analyzer/pull/16). +- 20240326 - Implement automatic preprocessing and relocation of unsupported CIF files. +- 20240311 - Integrate PEP8 linting with `black`. See [Pull #12](https://github.com/bobleesj/cif-bond-analyzer/pull/12). +- 20240310 - Enhance output options to include both element-based and label-based data for Excel, JSON, and histograms. See [Pull #11](https://github.com/bobleesj/cif-bond-analyzer/pull/11). +- 20240301 - Provide translation options for unit cells with more than 100 atoms, either in all ±1 directions or just +1 in each. +- 20240301 - Display atom counts and execution time per file in Terminal; adds CSV logging. +- 20240229 - Expand file support to include all CIF files.