This repository is designed to post-process and analyze the classification and internal representation data produced by the LLaVA-Based Wildfire Detection pipeline. The code is provided in a single Jupyter notebook, Analyze Data.ipynb, which demonstrates how to:
- Load saved checkpoints (
checkpoint.txt
) and internal embeddings (internal_rep.pt
) - Reduce the dimensionality of embeddings using t-SNE
- Plot images on a scatter plot to visualize clusters and identify nearest neighbors
- Display a grid of the nearest images for a chosen coordinate in the t-SNE space
- Compute performance metrics (like an F-measure)
-
Analyze Data.ipynb
- Reads classification results from
output.csv
- Loads internal representations from
internal_rep.pt
- Performs a t-SNE (2D) dimensionality reduction for visualization of data clusters
- Plots the data with color-coding for different classes
- Shows a 5×4 grid of the nearest-neighbor images for a chosen point in t-SNE space (see the sample image below)
- Computes the F-measure to evaluate classification performance
- Reads classification results from
-
HPWREN_RESULTS/
(Mentioned in the Notebook)checkpoint.txt
: Tracks which images have been processedinternal_rep.pt
: A PyTorch tensor of internal feature representations for analysisoutput.csv
: Classification results (e.g.,file_path, predicted_label
)
-
HPWREN/
- A folder (or symlink) containing the original images used for visualization. The exact path is read from
output.csv
.
- A folder (or symlink) containing the original images used for visualization. The exact path is read from
You will need:
- Python 3.8+
- Jupyter Notebook or JupyterLab
- Python libraries:
numpy
pandas
matplotlib
scikit-learn
(for t-SNE)torch
(for PyTorch tensor handling)Pillow
(for image handling)
If you used a conda environment or the requirements.txt
from the main LLaVA-based repository, you likely have most dependencies in place already.
A typical folder structure might look like this:
HPWREN_Experiments/
├── Analyze Data.ipynb
├── HPWREN_RESULTS/
│ ├── checkpoint.txt
│ ├── internal_rep.pt
│ ├── output.csv
│ └── ...
├── HPWREN/
│ ├── ...
│ └── (images or symbolic links to them)
└── README.md
Make sure:
checkpoint.txt
,internal_rep.pt
,output.csv
reside inHPWREN_RESULTS
.- The images referred to in
output.csv
match paths in theHPWREN/
folder (or whichever folder you configure).
-
Clone or download this repository:
git clone https://github.com/your-username/HPWREN_Experiments.git cd HPWREN_Experiments
-
Install dependencies (if not already installed):
pip install -r requirements.txt
or install them manually.
-
Launch Jupyter Notebook:
jupyter notebook
Then open Analyze Data.ipynb in your browser.
-
Run the notebook cells in sequence:
- The notebook loads the embeddings from
internal_rep.pt
. - It reads classification results from
output.csv
. - It performs a t-SNE and plots the data points, color-coding each class.
- It can generate a 5×4 grid of images showing the nearest neighbors to a chosen point in t-SNE space.
- Finally, it computes the F-measure based on ground-truth labels encoded in the filenames.
- The notebook loads the embeddings from
-
t-SNE Scatter Plot:
Shows each sample’s 2D embedding. Black dots could be class “0,” red dots could be class “1.” -
Nearest Neighbor Image Grid:
The notebook allows you to pick a 2D coordinate in the t-SNE space and retrieve the closest 20 samples. They are displayed in a 5×4 grid (as shown in the sample image below), providing a quick way to visually check classification consistency.
-
Metrics:
- F-Measure is computed using true positives, false positives, and false negatives.
- Fork this repository.
- Create a new branch for your feature or bugfix.
- Make changes and test thoroughly.
- Submit a pull request describing your modifications.