Installation
System Requirements
Operating Systems:
- Windows 11
- MacOS
Recommended Hardware:
- Intel Core i5 or greater
- 16 GB RAM or more
- 5 GB hard drive space
Installation Steps
- Install Anaconda or Miniconda.
-
Create a new conda environment and activate:
conda create -n deepmass python=3.8.13 conda activate deepmass
-
Clone the repository and enter:
git clone https://github.com/hcji/DeepMASS2_GUI.git cd DeepMASS2_GUI
-
Install dependencies (for MacOS, some dependencies may need manual installation with conda):
pip install -r requirements.txt
-
Download the dependent data.
-
Put the following files into the data folder:
DeepMassStructureDB-v1.0.csv references_index_negative_spec2vec.bin references_index_positive_spec2vec.bin references_spectrums_negative.pickle references_spectrums_positive.pickle
-
Put the following files into the model folder:
Ms2Vec_allGNPSnegative.hdf5 Ms2Vec_allGNPSnegative.hdf5.syn1neg.npy Ms2Vec_allGNPSnegative.hdf5.wv.vectors.npy Ms2Vec_allGNPSpositive.hdf5 Ms2Vec_allGNPSpositive.hdf5.syn1neg.npy Ms2Vec_allGNPSpositive.hdf5.wv.vectors.npy
Please note that these dependent data are based on the GNPS dataset only. If you have licence of NIST software, please refer the next section: Training models with NIST data
-
-
Run DeepMASS:
python DeepMASS2.py
Training models with NIST data
The NIST MS/MS Library is not available for download directly, but can be purchased from an authorized distributor and exported using the instructions below.
Exporting NIST data
Note: this step requires a Windows System or Virtual Machine.
The spectra and associated compounds can be exported to MSP/MOL format using the free lib2nist software. The resulting export will contain a single MSP file with all of the mass spectra, and multiple MOL files which include the molecular structure information (linked to the spectra by ID). The screenshot below indicates appropriate lib2nist export settings.
After exporting the files, create a directory on your primary computer and save them there.
If done correctly, inside "nist_20" there should be a single .MSP file with all the spectra,
hr_nist_msms.MSP
, and a directory of .MOL files, hr_nist_msms.MOL
.
Preprocessing NIST data
Refer the Python script processes mass spectrometry (MS/MS) data and corresponding chemical structures to extract and clean metadata, filter the data based on adduct types, and organize the data for further analysis.
Training the Word2Vec model
Refer the Python script processes mass spectrometry (MS/MS) data to train word2vec models for positive and negative ion mode spectra using the spec2vec approach.This script ultimately creates two spec2vec models: one for positive ion mode spectra and one for negative ion mode spectra.
Generating spectral representation index
Refer the Python script processes mass spectrometry (MS/MS) data to compute vector representations of spectra using a pre-trained word2vec model, and then indexes these vectors for fast similarity searching using the HNSW algorithm. It handles both positive and negative ion mode spectra. This script ultimately creates and saves HNSW indices for fast similarity searching of positive and negative ion mode spectra using the spec2vec vector representations.
Replacing the data
Copy all the generated files into corresponding folder of DeepMASS, refer the installation step 5.