The QuantumPioneer datasets are available for download on Zenodo.
Log files generated by Gaussian and ORCA were parsed using generator.py, which relies on the FastLogfileParser package. The resulting parquet files were further processed using the scripts in scripts/qm_results. Importantly, these scripts matched the atom-mapped smiles to the respective data points.
Files generated by COSMOtherm were parsed and filtered separately to produce two master CSV files: one for transition states and the other for ground-state species. These CSV files were then split by solvent using scripts in scripts/solvation.
quantumpioneer_kinetics_dataset.csv (39.11 MB)
π
π₯
| Column | Type | Units | Description |
|---|---|---|---|
rxn_smi |
string | β | Reaction SMILES (r1.r2>>p1.p2) |
k_298 |
number | mΒ³/(molΒ·s) | Bimolecular rate coefficient at 298 K |
A_low |
number | mΒ³/(molΒ·s) | Arrhenius pre-exponential factor, 300β1000 K |
Ea_low |
number | kcal/mol | Activation energy, 300β1000 K |
A_high |
number | mΒ³/(molΒ·s) | Arrhenius pre-exponential factor, 1000β2000 K |
Ea_high |
number | kcal/mol | Activation energy, 1000β2000 K |
barrier |
number | kcal/mol | Forward barrier (DLPNO + scaled DFT ZPE) |
Hrxn |
number | kcal/mol | Forward reaction enthalpy (DLPNO + scaled DFT ZPE) |
deltaHrxn298 |
number | kcal/mol | Forward reaction enthalpy at 298 K |
deltaGrxn298 |
number | kcal/mol | Forward reaction Gibbs energy at 298 K |
P2M |
number | kcal/mol | Petersson-to-Melius energy difference at 298 K |
The thermodynamic properties deltaHrxn298 and deltaGrxn298 are derived from calculations using
Petersson-type bond additivity corrections (BACs). Add P2M to these to obtain their Melius-type
BAC-corrected versions.
Computed solvation free energies and enthalpies at 298.15 K for soluteβsolvent pairs, generated by the COSMO-RS-based workflow described in the QuantumPioneer project paper. Each CSV file corresponds to a single solvent (295 solvents total) and contains solvation properties for every solute evaluated in that solvent.
The full list of solvents is available in solvents.md.
quantumpioneer_solvation_dataset_closed_shell_species.zip (759.5 MB)
π
π₯
quantumpioneer_solvation_dataset_open_shell_species.zip (994.9 MB)
π
π₯
quantumpioneer_solvation_dataset_reactions.zip (1.1 GB)
π
π₯
| Column | Type | Units | Description |
|---|---|---|---|
smiles |
string | β | Canonical SMILES of the solute |
Gsolv |
number | kcal/mol | Solvation free energy of the solute in this solvent at 298.15 K |
Hsolv |
number | kcal/mol | Solvation enthalpy of the solute in this solvent at 298.15 K |
Note: The transition states are represented as reaction SMILES (r1.r2>>p1.p2).
quantumpioneer_solvation_dataset_reactions.zip (1.6 GB)
π
π₯
| Column | Type | Units | Description |
|---|---|---|---|
rxn_smiles |
string | β | Reaction SMILES (r1.r2>>p1.p2) |
DDGsolv_forward |
number | kcal/mol | Solvation free energy of activation in the forward direction (r1.r2>>ts) |
DDGsolv_reverse |
number | kcal/mol | Solvation free energy of activation in the reverse direction (p1.p2>>ts) |
DDHsolv_forward |
number | kcal/mol | Solvation enthalpy of activation in the forward direction (r1.r2>>ts) |
DDHsolv_reverse |
number | kcal/mol | Solvation enthalpy of activation in the reverse direction (p1.p2>>ts) |
All energies are in kcal/mol.
.
βββ quantumpioneer_solvation_dataset_closed_shell_species/
β βββ a/
β β βββ 2-(2-aminoethoxy)ethanol.csv
β β βββ ...
β βββ b/
β βββ ...
βββ quantumpioneer_solvation_dataset_open_shell_species/
β βββ ...
βββ quantumpioneer_solvation_dataset_reactions/
β βββ ...
βββ quantumpioneer_solvation_dataset_transition_states/
βββ ...
Within each top-level category, the files are organized into subdirectories named after
the first alphabetical character of the solvent name (e.g., a/, b/, β¦). Each CSV
file is named <solvent_name>.csv, where <solvent_name> is the COSMO-RS solvent
identifier.
quantumpioneer_thermo_dataset_closed_shell_species.csv (31.8 MB)
π
π₯
quantumpioneer_thermo_dataset_open_shell_species.csv (41.5 MB)
π
π₯
| Column | Type | Units | Description |
|---|---|---|---|
smiles |
string | β | Canonical SMILES representation of the species |
H298 |
number | J/mol | Standard enthalpy of formation at 298 K |
S298 |
number | J/(molΒ·K) | Standard entropy of formation at 298 K |
Cp300 |
number | J/(molΒ·K) | Constant pressure heat capacity at 300 K |
dlpno_sp_hartree |
number | Hartree | DLPNO-CCSD(T)-F12d single-point energy |
dft_zpe_scaled_hartree |
number | Hartree | Scaled DFT zero-point energy (factor: 0.972387) |
CpInf |
number | J/(molΒ·K) | Heat capacity at infinite temperature |
a0 |
number | β | Zeroth-order Wilhoit polynomial coefficient |
a1 |
number | β | First-order Wilhoit polynomial coefficient |
a2 |
number | β | Second-order Wilhoit polynomial coefficient |
a3 |
number | β | Third-order Wilhoit polynomial coefficient |
H0 |
number | J/mol | Wilhoit integration constant for enthalpy |
S0 |
number | J/(molΒ·K) | Wilhoit integration constant for entropy |
B |
number | K | Wilhoit scaled temperature coefficient |
P2M |
number | J/mol | Petersson-to-Melius energy difference |
quantumpioneer_thermo_dataset_transition_states.csv (14.1 MB)
π
π₯
| Column | Type | Units | Description |
|---|---|---|---|
rxn_smi |
string | β | Canonical reaction SMILES (r1.r2>>p1.p2) |
dlpno_sp_hartree |
number | Hartree | DLPNO-CCSD(T)-F12d single-point energy |
dft_zpe_scaled_hartree |
number | Hartree | Scaled DFT zero-point energy (factor: 0.972387) |
-
All molecular structures are represented using canonical SMILES without atom map numbers
-
Thermodynamic properties (
H298,S298,Cp300) are calculated from DFT-optimized geometries with DLPNO-CCSD(T)-F12d single-point calculations -
The standard enthalpy of formation (
H298) and Wilhoit integration constant for enthalpy (H0) derive from calculations using Petersson-type bond additivity corrections (BACs). AddP2Mto either of these in order to obtain their Melius-type BAC-corrected versions. -
The Wilhoit model is described [here].
The Wilhoit model provides a physically meaningful representation of temperature-dependent heat capacity, guaranteeing correct limits at zero and infinite temperature. The model is defined by the following equations:
where
where
species_dlpno_results.parquet (193.3 MB)
π₯
transition_state_dlpno_results.parquet (116.6 MB)
π₯
| Column | Type | Units | Description |
|---|---|---|---|
smiles |
string | β | SMILES with atom mapping indicating xyz and force order |
route_section |
string | β | Level of theory |
charge |
integer | β | Molecular formal charge |
multiplicity |
integer | β | Electron multiplicity |
energy |
double | Hartree | Final single-point energy |
run_time |
integer | s | Total run time |
input_coordinates |
list[list[double]] | Γ | Input XYZ coordinates |
dipole_au |
float | a.u. | Molecular dipole |
t1_diagnostic |
float | β | T1 diagnostic |
species_dft_results.parquet (2.3 GB)
π₯
transition_state_dft_results.parquet (1.4 GB)
π₯
| Column | Type | Units | Description |
|---|---|---|---|
smiles |
string | β | SMILES with atom mapping indicating xyz and force order |
route_section |
string | β | Level of theory |
charge |
integer | e | Molecular formal charge |
multiplicity |
integer | β | Electron multiplicity |
max_steps |
integer | β | Maximum allowed optimization steps |
cpu_time |
integer | s | CPU time |
wall_time |
integer | s | Wall time |
e0_h |
double | Hartree | Enthalpy at 298 K |
hf |
double | Hartree | E0 for non-wavefunction methods |
zpe_per_atom |
double | Hartree | Per-atom zero-point energy |
e0_zpe |
double | Hartree | Gibbs free energy at 0 K |
gibbs |
double | Hartree | Gibbs free energy at 298 K |
dipole_au |
double | a.u. | Molecular dipole |
homo_lumo_gap |
double | Hartree | HOMO-LUMO energy gap |
beta_homo_lumo_gap |
double | Hartree | HOMO-LUMO energy gap for beta orbitals |
dipole_moment_debye |
list[float] | Debye | X, Y, and Z components of dipole moment |
aniso_polarizability_au |
double | a.u. | Anisotropic polarizability |
iso_polarizability_au |
double | a.u. | Isotropic polarizability |
scf |
double | Hartree | SCF energy after optimization |
mulliken_charges_summed |
list[list[double]] | e | Mulliken charges with protons summed into heavy atoms |
frequencies |
list[double] | cmβ»ΒΉ | Vibrational frequencies |
frequency_modes |
list[list[list[double]]] | β | Vibrational normal modes |
initial_xyz |
list[list[float]] | Γ | Input XYZ coordinates |
std_xyz |
list[list[double]] | Γ | Standardized XYZ coordinates after optimization |
std_forces |
list[list[double]] | Hartree/Bohr | Standardized forces after optimization |
Note: The beta orbital HOMO-LUMO energy gap is None when multiplicity is 1 (i.e., closed-shell species).
species_semiempirical_results.parquet (2.1 GB)
π₯
transition_state_semiempirical_results.parquet (1.4 GB)
π₯
| Column | Type | Units | Description |
|---|---|---|---|
smiles |
string | β | SMILES with atom mapping indicating xyz and force order |
route_section |
string | β | Level of theory |
charge |
integer | e | Molecular formal charge |
multiplicity |
integer | β | Electron multiplicity |
max_steps |
integer | β | Maximum allowed optimization steps |
cpu_time |
integer | s | CPU time |
wall_time |
integer | s | Wall time |
e0_h |
double | Hartree | Enthalpy at 298 K |
hf |
double | Hartree | E0 for non-wavefunction methods |
zpe_per_atom |
double | Hartree | Per-atom zero-point energy |
e0_zpe |
double | Hartree | Gibbs free energy at 0 K |
gibbs |
double | Hartree | Gibbs free energy at 298 K |
frequencies |
list[double] | cmβ»ΒΉ | Vibrational frequencies |
frequency_modes |
list[list[list[double]]] | β | Vibrational normal modes |
initial_xyz |
list[list[float]] | Γ | Input XYZ coordinates |
std_xyz |
list[list[double]] | Γ | Standardized XYZ coordinates after optimization |
std_forces |
list[list[double]] | Hartree/Bohr | Standardized forces after optimization |
Note: Standardized forces from semiempirical calculations are only provided for
transition states. In the species dataset, this column is included for consistency, but
all values are set to None.