Skip to content

Commit

Permalink
Update README.md
Browse files Browse the repository at this point in the history
  • Loading branch information
bobleesj committed Jun 25, 2024
1 parent bfd70f9 commit ff69ec1
Showing 1 changed file with 94 additions and 89 deletions.
183 changes: 94 additions & 89 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,105 +4,98 @@

[![Integration tests](https://github.com/bobleesj/cif-bond-analyzer/actions/workflows/python-run-pytest.yml/badge.svg)](https://github.com/bobleesj/cif-bond-analyzer/actions/workflows/python-run-pytest.yml) ![Python 3.10](https://img.shields.io/badge/python-3.10-blue.svg) ![Python 3.11](https://img.shields.io/badge/python-3.11-blue.svg) ![Python 3.11](https://img.shields.io/badge/python-3.12-blue.svg)

## Description

CIF Bond Analyzer (CBA) is an interactive, command-line Python application designed for the high-throughput extraction of bonding information from CIF (Crystallographic Information File) file.
CIF Bond Analyzer (CBA) is an interactive, command-line-based application designed for the high-throughput extraction of bonding information from CIF (Crystallographic Information File) files. CBA offers (1) Site Analysis (2) System Analysis for binary/ternary systems, and (3) Coordination Analysis. The outputs are saved in .json, xlsl, and .png formats.

## Overview
The current README.md serves as a tutorial and documentation.

Defect .cif files. CBA is a prompt-based and codeless application built in Python. To begin, CBA detects folders containing .cif files located at the project level. It also counts .cif files that are nested within the folder.
## Demo

Preprocess .cif files and standarlize site labels. Due to atomic mixing, site labels may have a comma and symbols such as `M` is used. CBA reformats them that is easily parsable into an element. Also, we noticed that many files have problems with the author section and publication, we also remove the author loop section.
The code is designed to be used interactively without writing any code.

Move ill-formatted files.
4. Choose one of the options
5. Generate a unitcell and a supercell by applying +-1, +-1, +-1 shifts in fractional coordinates.
![CBA-demo-gif](https://github.com/bobleesj/cif-bond-analyzer/assets/14892262/fad16f21-93d8-4954-8efe-c04fbc68a9b7)

6. Generate a supercell for each file and determine the shortest distance and pair from each atomic site. The atomic site is selected based on the atom with the greatest number of minimum distances in the surrounding atoms.

## Installation and tutorial

## Demo
Copy each line to command-line applications.

The program has been designed to be run with intuitive user-interactive commands only.
```text
$ git clone https://github.com/bobleesj/cif-bond-analyzer.git
$ cd cif-bond-analyzer
$ pip install -r requirements.txt
$ python main.py
```

![CBA-demo-gif](https://github.com/bobleesj/cif-bond-analyzer/assets/14892262/fad16f21-93d8-4954-8efe-c04fbc68a9b7)
Once the code is executed using `python main.py`, the following prompt will appear and ask to choose one of the three analysis options:

```text
Welcome! Please choose an option to proceed:
[1] Conduct site analysis.
[2] Conduct system analysis.
[3] Conduct coordination analysis.
Enter your choice (1-3): 1
```

## How to use
For any option, CBA asks you to choose folder(s) containing .cif files.

Download required depdencies. The code has been tested on Python version 3.10, 3.11, 3.12.
```text
```bash
pip install -r requirements.txt
Folders with .cif files:
1. 20240623_ErCoIn_nested, 16 files, 136 nested files
2. 20240612_ternary_only, 2 files
3. 20240611_ternary_binary_combined, 5 files
4. 20240623_teranry_3_unique_elements, 3 files
5. 20240611_binary_2_unique_elements, 4 files
Would you like to process each folder above sequentially?
(Default: Y) [Y/n]:
```

Run via:
You may then choose to process folders either sequentially or choose specific folders by entering numbers associated with the folders prompted.

```bash
python main.py
```
## Preprocess

The following discusses formatting and supercell generation methods.

## Options
### 1. Format files

CBA supports 3 options belows.
CBA uses the `CifEnsemble` object from `cifkit` to conduct preprocessing automatically.

```text
python main.py
CBA standardizes the site labels in `atom_site_label`. Some site labels may contain a comma or a symbol such as `M` due to atomic mixing. CBA reformats each `atom_site_label` so it can be parsed into an element type that matches `atom_site_type_symbol`.

Welcome! Please choose an option to proceed:
[1] Conduct site analysis.
[2] Conduct system analysis.
[3] Conduct coordination analysis.
Enter your choice (1-3):
```
CBA removes the content of `publ_author_address`. This section often has an incorrect format which otherwise requires manual modifications.

CBA relocates any ill-formatted files such as duplicate labels in `atom_site_label`, missing fractional coordinates, and files that generate a supercell.

### Option 1. Site Analysis
### 2. Supercell generation

From a single `.cif` file, a supercell is generated and determines the shortest distance and the connecting site.
For each `.cif` file, a unit cell is generated by applying the symmetry operations. Supercell is generated by applying +-1, +-1, +-1 shifts from the unit cell.

#### Output 1.1 text summary
### 3. Atomic mixing info

A text file `summary.txt` is generated in the folder to provide an overview of the shortest bonding pairs and missing pairs in the selected folders.
Each bonding pair is defined with one of the four atomic mixing categories:

```txt
Summary:
Pair: In-In, Count: 4, Distances: 2.736, 2.782, 2.785, 2.793
Pair: Pd-Ge, Count: 4, Distances: 2.449, 2.455, 2.489, 2.672
Pair: Pd-Sb, Count: 4, Distances: 2.505, 2.700, 2.737, 2.793
Pair: Si-Si, Count: 4, Distances: 1.975, 2.289, 2.325, 2.533
Pair: Rh-Ge, Count: 2, Distances: 2.484, 2.495
Pair: Ru-Si, Count: 2, Distances: 2.394, 2.519
Pair: Sb-Sb, Count: 2, Distances: 2.573, 2.793
Pair: Co-Ga, Count: 1, Distances: 2.485
Pair: Co-Sb, Count: 1, Distances: 2.594
Pair: Co-Sn, Count: 1, Distances: 2.737
- **Full occupancy** is assigned when a single atomic site occupies the fractional coordinate with an occupancy value of 1.
- **Full occupancy with mixing** is assigned when multiple atomic sites collectively occupy the fractional coordinate to a sum of 1.
- **Deficiency without mixing** is assigned when a single atomic site occupying the fractional coordinate with a sum less than 1.
- **Deficiency with atomic mixing** is assigned when multiple atomic sites occupy the fractional coordinate with a sum less than 1.

Missing pairs:
Co-In
Co-Ir
Co-Ni
Co-Pd
Co-Pt
Co-Rh
Co-Si
Fe-Co
```
## Analysis Options

#### Output 1.2 histograms
CBA provides 3 options for analysis.

In the `output` folder, histograms per shortest pair distance from each atom will be saved.
### Option 1. Site Analysis

![Histograms for label pair](https://s9.gifyu.com/images/SViMv.png)
Site Analysis determines the shortest distance and its nearest neighbor for each label in `atom_site_label`.

To modify the histograms, run `python plot-histogram.py`. This script allows you to interactively specify parameters, such as the bin width and x-axis range:
For each atom in the unit cell, Euclidean distances are calculated from the atom to all atoms in the supercell. The position of the atom in the unit cell for each site label is determined based on the atom with the greatest number of shortest distances to its neighbors.

#### Output 1.3 Excel and JSON
Assume `.cif` contains four site labels: `Er1`, `Er2`, `Er3`, and `Er4`. The bonding pair from the site label `Er4` and its nearest neighbor `Er2` is unique and recorded. The bonding pair from `Er3` to `Er2` is also considered unique. However, the pairs `Er4-Er2` and `Er2-Er4` are considered identical. Out of the two pairs, the one with the shorter distance is recorded.

For each folder, CBA generates `.xlsx` and `.json` files containing the shortest distance and the connecting site from each reference site.
#### Output 1.1 Excel and JSON

It also determines the atomic mixing and occupacny information at the pair level. It extracts the tag from the .cif file if provided.
For each folder, CBA generates `.xlsx` and `.json` files containing site data described above. It also determines the atomic mixing and occupacny information at the pair level. It extracts the tag from the .cif file if provided.

`site_pairs.json` is produced shown below.

Expand All @@ -119,13 +112,6 @@ For each folder, CBA generates `.xlsx` and `.json` files containing the shortest
}
],
"1955204": [
{
"dist": 2.404,
"mixing": "full_occupancy",
"formula": "Er2Co17",
"tag": "hex",
"structure": "Th2Ni17"
},
{
"dist": 2.46,
"mixing": "full_occupancy",
Expand All @@ -140,16 +126,7 @@ For each folder, CBA generates `.xlsx` and `.json` files containing the shortest
"tag": "hex",
"structure": "Th2Ni17"
}
],
"1644636": [
{
"dist": 2.49,
"mixing": "full_occupancy",
"formula": "ErCo2",
"tag": "lt",
"structure": "TbFe2"
}
],
]
}
}
```
Expand All @@ -176,15 +153,6 @@ For each folder, CBA generates `.xlsx` and `.json` files containing the shortest
"tag": "hex",
"structure": "Th2Ni17"
}
],
"1644636": [
{
"dist": 2.49,
"mixing": "full_occupancy",
"formula": "ErCo2",
"tag": "lt",
"structure": "TbFe2"
}
]
}
}
Expand All @@ -194,6 +162,43 @@ An Excel file containing the information and each sheet having the bond pair.

![Excel screenshot](https://github.com/bobleesj/cif-bond-analyzer/assets/14892262/d6bed0df-b9ea-4922-967b-4656bb3ab3e0)

#### Output 1.2 text summary

A text file `output/summary_element.txt` is generated in the folder to provide an overview of the shortest bonding pairs and missing pairs in the selected folders across all files.

```txt
Summary:
Pair: In-In, Count: 4, Distances: 2.736, 2.782, 2.785, 2.793
Pair: Pd-Ge, Count: 4, Distances: 2.449, 2.455, 2.489, 2.672
Pair: Pd-Sb, Count: 4, Distances: 2.505, 2.700, 2.737, 2.793
Pair: Si-Si, Count: 4, Distances: 1.975, 2.289, 2.325, 2.533
Pair: Rh-Ge, Count: 2, Distances: 2.484, 2.495
Pair: Ru-Si, Count: 2, Distances: 2.394, 2.519
Pair: Sb-Sb, Count: 2, Distances: 2.573, 2.793
Pair: Co-Ga, Count: 1, Distances: 2.485
Pair: Co-Sb, Count: 1, Distances: 2.594
Pair: Co-Sn, Count: 1, Distances: 2.737
Missing pairs:
Co-In
Co-Ir
Co-Ni
Co-Pd
Co-Pt
Co-Rh
Co-Si
Fe-Co
```

#### Output 1.3 histograms

In the `output` folder, histograms per shortest pair distance from each atom will be saved.

![Histograms for label pair](https://s9.gifyu.com/images/SViMv.png)

To modify the histograms, run `python plot-histogram.py`. This script allows you to interactively specify parameters, such as the bin width and x-axis range:



### Option 2. System Analysis

Expand Down

0 comments on commit ff69ec1

Please sign in to comment.