-
Notifications
You must be signed in to change notification settings - Fork 3
Description
I've been running some long repex (25-50ns) FEP calculations and noticed that while they start off fast, they tend slowdown as the calculation progresses. For example, looking at a log file from one of the simulations involving running a molecule in vacuum leg:
2026-02-24 16:15:47.882 | INFO | somd2.runner._repex:run:959 - Running dynamics for cycle 1 of 12500
2026-02-24 16:15:49.811 | INFO | somd2.runner._repex:run:959 - Running dynamics for cycle 2 of 12500We can see that it took approx 1.93s to complete this repex cycle. When the full calculation (25ns) is finishing up, we can see:
2026-02-25 03:22:08.605 | INFO | somd2.runner._repex:run:959 - Running dynamics for cycle 12499 of 12500
2026-02-25 03:22:14.038 | INFO | somd2.runner._repex:run:959 - Running dynamics for cycle 12500 of 12500Here it took 5.43s to complete this cycle, almost 3x longer than the initial cycle. To investigate this further, I generated some code (provided below) to plot the the timings between each consecutive cycles, yielding the following plot:
This essentially illustrates the observation above; as the SOMD2 calculation progresses, the timings between each cycle are increasing in a linear/O(n) fashion. We can additionally see the following:
- The orange line representing the mixing times (obtained from the log file too) stays consistent throughout the simulation - mixing times are constant and independent of the number of cycles carried out.
- There are blue dot outliers that are significantly higher than the average blue line - I believe these most likely represent checkpoint blocks that occur at the end of N number of cycles (in this simulation there are 250 repex cycles to each checkpoint cycle) - This tells us that the progressive slow down is not caused by I/O operations that happen much less frequently during the dynamics.
This progressive slowdown is also a behaviour that's not always been part of SOMD2, but was introduced as part of some kind of change in the last year or so. For example taking a log file from a simulation I ran in March 2025:
We can see there isn't any increase in consecutive cycle timings here.
In other calculations, the increase isn't as dramatic, but is still very much there:
These calculations follow a fairly standard production FEP setup with approx 1-2fs exchange frequency, frame frequency of around 100ps and checkpoint frequency of 500ps.
Reproducibility
Unfortunately, I can't easily provide the affected SOMD2 versions as these calculations were run from my own fork with additional commits by me, meaning that logged git hashes are not comparable to the OBS git repo. I can see that recent SOMD2 versions are affected by this, but the ones from last year are not. I thought about using git bisect to identify the commits that might have introduced this, but the standalone SOMD2 pixi build doesn't work for me, because of some kind of OpenCL driver package issue. I'm happy to provide further information if needed.
Plotting code:
import re
from datetime import datetime
import pandas as pd
import matplotlib.pyplot as plt
def plot_cumulative_split_timings(log_file_path):
cycle_pattern = re.compile(r"^(\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}\.\d{3})\s*\|.*?Running dynamics for cycle (\d+)")
matrix_pattern = re.compile(r"^(\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}\.\d{3})\s*\|.*?Assembling energy matrix")
data = []
# Trackers for cumulative cycle counting
cycle_offset = 0
last_logged_cycle = 0
current_actual_cycle = None
cycle_start_time = None
matrix_time = None
with open(log_file_path, "r") as f:
for line in f:
# 1. Look for the start of a cycle
cycle_match = cycle_pattern.search(line)
if cycle_match:
dt_str, cycle_str = cycle_match.groups()
dt = datetime.strptime(dt_str, "%Y-%m-%d %H:%M:%S.%f")
logged_cycle = int(cycle_str)
# Detect if a simulation extension restarted the counter
if logged_cycle < last_logged_cycle:
cycle_offset += last_logged_cycle
actual_cycle = logged_cycle + cycle_offset
last_logged_cycle = logged_cycle
# Close out the previous cycle if data exists
if current_actual_cycle is not None and cycle_start_time and matrix_time:
mixing_duration = (dt - matrix_time).total_seconds()
if mixing_duration > 0:
data[-1]["mixing_duration"] = mixing_duration
current_actual_cycle = actual_cycle
cycle_start_time = dt
matrix_time = None
data.append({
"actual_cycle": current_actual_cycle,
"dynamics_duration": None,
"mixing_duration": None
})
# 2. Look for the start of the mixing phase
matrix_match = matrix_pattern.search(line)
if matrix_match and current_actual_cycle is not None:
dt_str = matrix_match.group(1)
dt = datetime.strptime(dt_str, "%Y-%m-%d %H:%M:%S.%f")
matrix_time = dt
dynamics_duration = (dt - cycle_start_time).total_seconds()
data[-1]["dynamics_duration"] = dynamics_duration
# Convert to DataFrame
df = pd.DataFrame(data).dropna()
# Filter out initial startup spikes (adjust threshold if necessary)
df = df[(df["dynamics_duration"] < 20.0) & (df["mixing_duration"] < 20.0)].copy()
# Calculate rolling averages
window_size = 50
df["dyn_rolling"] = df["dynamics_duration"].rolling(window=window_size).mean()
df["mix_rolling"] = df["mixing_duration"].rolling(window=window_size).mean()
# Plotting
plt.figure(figsize=(12, 6))
plt.scatter(df["actual_cycle"], df["dynamics_duration"], alpha=0.3, s=10, color='blue', label="Dynamics (Raw)")
plt.plot(df["actual_cycle"], df["dyn_rolling"], color="darkblue", linewidth=2, label="Dynamics (50-avg)")
plt.scatter(df["actual_cycle"], df["mixing_duration"], alpha=0.3, s=10, color='orange', label="Mixing (Raw)")
plt.plot(df["actual_cycle"], df["mix_rolling"], color="darkorange", linewidth=2, label="Mixing (50-avg)")
plt.xlabel("Cumulative Cycle Number")
plt.ylabel("Duration (seconds)")
plt.legend()
plt.grid(True, linestyle="--", alpha=0.6)
plt.tight_layout()
plt.show()
# Run it on your log file
plot_cumulative_split_timings("log.txt")