Skip to content

QuantumPioneer/database

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

85 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

QuantumPioneer Database

Data availability

The QuantumPioneer datasets are available for download on Zenodo.

Data provenance

Log files generated by Gaussian and ORCA were parsed using generator.py, which relies on the FastLogfileParser package. The resulting parquet files were further processed using the scripts in scripts/qm_results. Importantly, these scripts matched the atom-mapped smiles to the respective data points.

Files generated by COSMOtherm were parsed and filtered separately to produce two master CSV files: one for transition states and the other for ground-state species. These CSV files were then split by solvent using scripts in scripts/solvation.

QuantumPioneer Property Datasets

QuantumPioneer Kinetics Dataset

quantumpioneer_kinetics_dataset.csv (39.11 MB) πŸ” πŸ“₯

Column Type Units Description
rxn_smi string β€” Reaction SMILES (r1.r2>>p1.p2)
k_298 number mΒ³/(molΒ·s) Bimolecular rate coefficient at 298 K
A_low number mΒ³/(molΒ·s) Arrhenius pre-exponential factor, 300–1000 K
Ea_low number kcal/mol Activation energy, 300–1000 K
A_high number mΒ³/(molΒ·s) Arrhenius pre-exponential factor, 1000–2000 K
Ea_high number kcal/mol Activation energy, 1000–2000 K
barrier number kcal/mol Forward barrier (DLPNO + scaled DFT ZPE)
Hrxn number kcal/mol Forward reaction enthalpy (DLPNO + scaled DFT ZPE)
deltaHrxn298 number kcal/mol Forward reaction enthalpy at 298 K
deltaGrxn298 number kcal/mol Forward reaction Gibbs energy at 298 K
P2M number kcal/mol Petersson-to-Melius energy difference at 298 K

The thermodynamic properties deltaHrxn298 and deltaGrxn298 are derived from calculations using Petersson-type bond additivity corrections (BACs). Add P2M to these to obtain their Melius-type BAC-corrected versions.

QuantumPioneer Solvation Dataset

Computed solvation free energies and enthalpies at 298.15 K for solute–solvent pairs, generated by the COSMO-RS-based workflow described in the QuantumPioneer project paper. Each CSV file corresponds to a single solvent (295 solvents total) and contains solvation properties for every solute evaluated in that solvent.

The full list of solvents is available in solvents.md.

Closed-Shell Species, Open-Shell Species, and Transition States

quantumpioneer_solvation_dataset_closed_shell_species.zip (759.5 MB) πŸ” πŸ“₯

quantumpioneer_solvation_dataset_open_shell_species.zip (994.9 MB) πŸ” πŸ“₯

quantumpioneer_solvation_dataset_reactions.zip (1.1 GB) πŸ” πŸ“₯

Column Type Units Description
smiles string β€” Canonical SMILES of the solute
Gsolv number kcal/mol Solvation free energy of the solute in this solvent at 298.15 K
Hsolv number kcal/mol Solvation enthalpy of the solute in this solvent at 298.15 K

Note: The transition states are represented as reaction SMILES (r1.r2>>p1.p2).

Reactions

quantumpioneer_solvation_dataset_reactions.zip (1.6 GB) πŸ” πŸ“₯

Column Type Units Description
rxn_smiles string β€” Reaction SMILES (r1.r2>>p1.p2)
DDGsolv_forward number kcal/mol Solvation free energy of activation in the forward direction (r1.r2>>ts)
DDGsolv_reverse number kcal/mol Solvation free energy of activation in the reverse direction (p1.p2>>ts)
DDHsolv_forward number kcal/mol Solvation enthalpy of activation in the forward direction (r1.r2>>ts)
DDHsolv_reverse number kcal/mol Solvation enthalpy of activation in the reverse direction (p1.p2>>ts)

All energies are in kcal/mol.

Directory Structure

.
β”œβ”€β”€ quantumpioneer_solvation_dataset_closed_shell_species/
β”‚   β”œβ”€β”€ a/
β”‚   β”‚   β”œβ”€β”€ 2-(2-aminoethoxy)ethanol.csv
β”‚   β”‚   └── ...
β”‚   β”œβ”€β”€ b/
β”‚   └── ...
β”œβ”€β”€ quantumpioneer_solvation_dataset_open_shell_species/
β”‚   └── ...
β”œβ”€β”€ quantumpioneer_solvation_dataset_reactions/
β”‚   └── ...
└── quantumpioneer_solvation_dataset_transition_states/
    └── ...

Within each top-level category, the files are organized into subdirectories named after the first alphabetical character of the solvent name (e.g., a/, b/, …). Each CSV file is named <solvent_name>.csv, where <solvent_name> is the COSMO-RS solvent identifier.

QuantumPioneer Thermodynamics Dataset

Closed-Shell and Open-Shell Species

quantumpioneer_thermo_dataset_closed_shell_species.csv (31.8 MB) πŸ” πŸ“₯

quantumpioneer_thermo_dataset_open_shell_species.csv (41.5 MB) πŸ” πŸ“₯

Column Type Units Description
smiles string β€” Canonical SMILES representation of the species
H298 number J/mol Standard enthalpy of formation at 298 K
S298 number J/(molΒ·K) Standard entropy of formation at 298 K
Cp300 number J/(molΒ·K) Constant pressure heat capacity at 300 K
dlpno_sp_hartree number Hartree DLPNO-CCSD(T)-F12d single-point energy
dft_zpe_scaled_hartree number Hartree Scaled DFT zero-point energy (factor: 0.972387)
CpInf number J/(molΒ·K) Heat capacity at infinite temperature
a0 number β€” Zeroth-order Wilhoit polynomial coefficient
a1 number β€” First-order Wilhoit polynomial coefficient
a2 number β€” Second-order Wilhoit polynomial coefficient
a3 number β€” Third-order Wilhoit polynomial coefficient
H0 number J/mol Wilhoit integration constant for enthalpy
S0 number J/(molΒ·K) Wilhoit integration constant for entropy
B number K Wilhoit scaled temperature coefficient
P2M number J/mol Petersson-to-Melius energy difference

Transition States

quantumpioneer_thermo_dataset_transition_states.csv (14.1 MB) πŸ” πŸ“₯

Column Type Units Description
rxn_smi string β€” Canonical reaction SMILES (r1.r2>>p1.p2)
dlpno_sp_hartree number Hartree DLPNO-CCSD(T)-F12d single-point energy
dft_zpe_scaled_hartree number Hartree Scaled DFT zero-point energy (factor: 0.972387)

Notes

  • All molecular structures are represented using canonical SMILES without atom map numbers

  • Thermodynamic properties (H298, S298, Cp300) are calculated from DFT-optimized geometries with DLPNO-CCSD(T)-F12d single-point calculations

  • The standard enthalpy of formation (H298) and Wilhoit integration constant for enthalpy (H0) derive from calculations using Petersson-type bond additivity corrections (BACs). Add P2M to either of these in order to obtain their Melius-type BAC-corrected versions.

  • The Wilhoit model is described [here].

Wilhoit Model

The Wilhoit model provides a physically meaningful representation of temperature-dependent heat capacity, guaranteeing correct limits at zero and infinite temperature. The model is defined by the following equations:

Heat Capacity

$$ C_\mathrm{p}(T) = C_\mathrm{p}(0) + \left[ C_\mathrm{p}(\infty) - C_\mathrm{p}(0) \right] y^2 \left[ 1 + (y - 1) \sum_{i=0}^3 a_i y^i \right] $$

where $y \equiv T/(T + B)$ is a scaled temperature ranging from zero to one.

$C_\mathrm{p}(0)$ is the heat capacity at zero temperature, whose value is equal to 33.2579 J/(molΒ·K) for all species in the dataset.

Enthalpy

$$ \begin{aligned} H(T) &= H_0 + C_\mathrm{p}(0) T - \Bigg\lbrace \left(2 + \sum_{i=0}^3 a_i\right) \left[ \frac{y}{2} - 1 + \left( \frac{1}{y} - 1 \right) \ln \frac{T}{y} \right] \\ &+ y^2 \sum_{i=0}^3 \frac{y^i}{(i+2)(i+3)} \sum_{j=0}^3 f_{ij} a_j \Bigg\rbrace \left[ C_\mathrm{p}(\infty) - C_\mathrm{p}(0) \right] T \end{aligned} $$

where

$$ f_{ij} = \begin{cases} 0 & \text{if } i < j \\ 3 + j & \text{if } i = j \\ 1 & \text{if } i > j \end{cases} $$

Entropy

$$ S(T) = S_0 + C_\mathrm{p}(\infty) \ln T - \left[ C_\mathrm{p}(\infty) - C_\mathrm{p}(0) \right] \left[ \ln y + \left( 1 + y \sum_{i=0}^3 \frac{a_i y^i}{2+i} \right) y \right] $$

QuantumPioneer Quantum Mechanics Dataset

DLPNO Results

species_dlpno_results.parquet (193.3 MB) πŸ“₯

transition_state_dlpno_results.parquet (116.6 MB) πŸ“₯

Column Type Units Description
smiles string β€” SMILES with atom mapping indicating xyz and force order
route_section string β€” Level of theory
charge integer β€” Molecular formal charge
multiplicity integer β€” Electron multiplicity
energy double Hartree Final single-point energy
run_time integer s Total run time
input_coordinates list[list[double]] Γ… Input XYZ coordinates
dipole_au float a.u. Molecular dipole
t1_diagnostic float β€” T1 diagnostic

DFT Results

species_dft_results.parquet (2.3 GB) πŸ“₯

transition_state_dft_results.parquet (1.4 GB) πŸ“₯

Column Type Units Description
smiles string β€” SMILES with atom mapping indicating xyz and force order
route_section string β€” Level of theory
charge integer e Molecular formal charge
multiplicity integer β€” Electron multiplicity
max_steps integer β€” Maximum allowed optimization steps
cpu_time integer s CPU time
wall_time integer s Wall time
e0_h double Hartree Enthalpy at 298 K
hf double Hartree E0 for non-wavefunction methods
zpe_per_atom double Hartree Per-atom zero-point energy
e0_zpe double Hartree Gibbs free energy at 0 K
gibbs double Hartree Gibbs free energy at 298 K
dipole_au double a.u. Molecular dipole
homo_lumo_gap double Hartree HOMO-LUMO energy gap
beta_homo_lumo_gap double Hartree HOMO-LUMO energy gap for beta orbitals
dipole_moment_debye list[float] Debye X, Y, and Z components of dipole moment
aniso_polarizability_au double a.u. Anisotropic polarizability
iso_polarizability_au double a.u. Isotropic polarizability
scf double Hartree SCF energy after optimization
mulliken_charges_summed list[list[double]] e Mulliken charges with protons summed into heavy atoms
frequencies list[double] cm⁻¹ Vibrational frequencies
frequency_modes list[list[list[double]]] β€” Vibrational normal modes
initial_xyz list[list[float]] Γ… Input XYZ coordinates
std_xyz list[list[double]] Γ… Standardized XYZ coordinates after optimization
std_forces list[list[double]] Hartree/Bohr Standardized forces after optimization

Note: The beta orbital HOMO-LUMO energy gap is None when multiplicity is 1 (i.e., closed-shell species).

Semiempirical Results

species_semiempirical_results.parquet (2.1 GB) πŸ“₯

transition_state_semiempirical_results.parquet (1.4 GB) πŸ“₯

Column Type Units Description
smiles string β€” SMILES with atom mapping indicating xyz and force order
route_section string β€” Level of theory
charge integer e Molecular formal charge
multiplicity integer β€” Electron multiplicity
max_steps integer β€” Maximum allowed optimization steps
cpu_time integer s CPU time
wall_time integer s Wall time
e0_h double Hartree Enthalpy at 298 K
hf double Hartree E0 for non-wavefunction methods
zpe_per_atom double Hartree Per-atom zero-point energy
e0_zpe double Hartree Gibbs free energy at 0 K
gibbs double Hartree Gibbs free energy at 298 K
frequencies list[double] cm⁻¹ Vibrational frequencies
frequency_modes list[list[list[double]]] β€” Vibrational normal modes
initial_xyz list[list[float]] Γ… Input XYZ coordinates
std_xyz list[list[double]] Γ… Standardized XYZ coordinates after optimization
std_forces list[list[double]] Hartree/Bohr Standardized forces after optimization

Note: Standardized forces from semiempirical calculations are only provided for transition states. In the species dataset, this column is included for consistency, but all values are set to None.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors