Skip to content

Added a Linear TS search job adapter#695

Open
alongd wants to merge 38 commits intomainfrom
linear_ts
Open

Added a Linear TS search job adapter#695
alongd wants to merge 38 commits intomainfrom
linear_ts

Conversation

@alongd
Copy link
Copy Markdown
Member

@alongd alongd commented Aug 21, 2023

Adds a new Linear TS search adapter that builds TS guesses from atom-mapped reactants and products via Z-matrix chimera construction with Hammond-biased weighting. The adapter is incore-only, plugs into ARC's scheduler like heuristics/autotst_ts, and delegates heavy geometry work to a new linear_utils/ subpackage (5 modules). Currently implemented for isomerization/unimolecular reactions.

The PR also carries supporting additions to arc/species/zmat.py (anchors, smart-anchor detection, zmat re-indexing helpers), arc/species/converter.py (atom-map reordering), arc/reaction/reaction.py (is_unimolecular, refined is_isomerization), and a biradical-preservation fix in arc/species/species.py. CI was hardened: pinned action, -n 4 --dist worksteal for stability, and obabel test made self-contained.

Comment thread arc/job/adapters/ts/linear.py Fixed
Comment thread arc/job/adapters/ts/linear.py Fixed
Comment thread arc/job/adapters/ts/linear_test.py Fixed
Comment thread arc/job/adapters/ts/linear_test.py Fixed
Comment thread arc/job/adapters/ts/linear_test.py Fixed
Comment thread arc/job/adapters/ts/linear_test.py Fixed
Comment thread arc/job/adapters/ts/linear_test.py Fixed
@codecov
Copy link
Copy Markdown

codecov Bot commented Aug 22, 2023

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 62.41%. Comparing base (f30f9cc) to head (ee61d31).

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #695      +/-   ##
==========================================
+ Coverage   60.42%   62.41%   +1.98%     
==========================================
  Files         102      112      +10     
  Lines       31096    38217    +7121     
  Branches     8103    10018    +1915     
==========================================
+ Hits        18791    23853    +5062     
- Misses       9961    11463    +1502     
- Partials     2344     2901     +557     
Flag Coverage Δ
functionaltests 62.41% <ø> (+1.98%) ⬆️
unittests 62.41% <ø> (+1.98%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Comment thread arc/species/species.py Fixed
Comment thread arc/rmgdb.py Fixed
@alongd alongd force-pushed the linear_ts branch 5 times, most recently from 8825e9f to cbec920 Compare May 15, 2024 07:02
@alongd alongd marked this pull request as ready for review May 15, 2024 09:26
Comment thread arc/job/adapters/ts/linear.py Fixed
Comment thread arc/job/adapters/ts/linear_test.py Fixed
Comment thread arc/job/adapters/ts/linear_test.py Fixed
Comment thread arc/job/adapters/ts/linear.py Fixed
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

Comment thread arc/job/adapters/ts/linear_test.py Fixed
Comment thread arc/species/zmat.py Fixed
Comment thread arc/species/zmat.py Fixed
Comment thread arc/species/zmat.py Fixed
@alongd alongd force-pushed the linear_ts branch 2 times, most recently from 908773e to 0acc5d4 Compare March 23, 2026 21:26
alongd added 11 commits April 28, 2026 22:40
so scissors / atom mapping / tests can use fewer confs
To allow another SMILES string.
Actually, the test isn't great, the multiplicity cannot be 1, it should probably be 2. However, multiplicity 2 gives an even worse result with many radicals: 'OCCC(C(COO)(O[CH2])C)(C[C]1[CH][CH][C]([CH][CH]1)NO)SC' (all carbon atoms on the benzene ring are marked as having a radical).
This is out of scope for the current linear TS adapter PR. The fix should be easy (detect a possibly aromatic ring with all  radicals, at least for the C-only case, then make it aromatic). Leaving it as is for now since this is indeed a crazy molecule.
In arc/family/family.py:

  1. Caches ReactionFamily instances per process. Adds a _cache: Dict[(label, consider_arc_families), ReactionFamily] class dict, plus a __new__ that  returns the cached instance for a repeat key. __init__ early-returns on cache hits via a _initialized flag, so the parsing work runs at most once per
   family per process.
  2. Caches groups.py file reads. Extracts the file-loading logic out of ReactionFamily.get_groups_file_as_lines into a module-level _read_groups_file_lines(label, consider_arc_families) decorated with @functools.lru_cache(maxsize=None). Returns a tuple (immutable) so callers can't poison the cache. The method now just delegates.
  3. Caches recommended family sets. Decorates get_rmg_recommended_family_sets with @functools.lru_cache(maxsize=1) since its result is also process-stable.
  4. Moves the label is None guard from __init__ to __new__ (it has to run before cache lookup).

Net effect: repeated ReactionFamily(label) calls — common in mapping/heuristics — stop re-reading and re-parsing the same groups.py.
Comment thread arc/common.py Dismissed
Comment thread arc/common.py Dismissed
# Calibrated TS target distances for [1,3]-sigmatropic shifts.
sbl_break = get_single_bond_length(symbols[origin], symbols[migrating]) or 1.5
sbl_form = get_single_bond_length(symbols[target], symbols[migrating]) or 1.5
_SIGMA_BREAK_STRETCH = 0.77
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these (incl. _SIGMA_FORM_STRETCH) should become module level

if symbols[ib] == 'H':
continue
d = float(np.linalg.norm(coords[ia] - coords[ib]))
if d < 0.9:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there also a situation where it can be wildly above?

Given two spheres of radii ``r1`` (around ``center1``) and ``r2``
(around ``center2``), this returns the intersection point that lies
on the same side of the inter-center axis as ``ref_pos``. When the
spheres do not overlap, the point is placed at distance ``r1`` from
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe its just worth mentionig in the doctstring but if there is a case when th circle is inside the other circle completely. I guess it depends if you consider that as a 'no-intersection' and if that situation is every possible.

o_hydroxyl = nbr_idx
break

_BV_CO_SHORTEN_TARGET = 1.28 # Å, calibrated from DFT TS (1.279)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Module level

sbl_cc = get_single_bond_length('C', 'C') or 1.54
sbl_co = get_single_bond_length('C', 'O') or 1.43

_BV_OO_STRETCH = 0.27 # Å per side (total ≈ 2 × 0.27 above sbl)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Module level

sbl_co = get_single_bond_length('C', 'O') or 1.43

_BV_OO_STRETCH = 0.27 # Å per side (total ≈ 2 × 0.27 above sbl)
_BV_CC_STRETCH = 0.76 # Å above sbl for migrating-group departure
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Module level


_BV_OO_STRETCH = 0.27 # Å per side (total ≈ 2 × 0.27 above sbl)
_BV_CC_STRETCH = 0.76 # Å above sbl for migrating-group departure
_BV_CO_STRETCH = 0.73 # Å above sbl(C,O) for migrating-group approach
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Module level


# 4d. Concerted H migration from O_hydroxyl to the carbonyl O
# (~1.40 Å from each, calibrated).
_BV_H_TRANSFER_TARGET = 1.40
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Module level

h_idx = h_on_oh[0]
p1 = coords[o_hydroxyl]
p2 = coords[o_carbonyl_dbl]
ax = p2 - p1
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this not something already kinda done in _two_sphere_intersection function?

frag_mig = bfs_fragment(adj, c_migrating, block={c_parent})
p1 = coords[c_parent]
p2 = coords[o_other_side]
axis = p2 - p1
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this not something similar in _two_sphere_intersection?

for c_migrating in mig_candidates:
coords = np.array(uni_xyz['coords'], dtype=float).copy()

# 4a. Stretch the O-O peroxide bond.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's under the comment of step 5 but then here its 4a (then 4b etc.)

d_oo = float(np.linalg.norm(coords[oo_bond[0]] - coords[oo_bond[1]]))
d_cc = float(np.linalg.norm(coords[c_parent] - coords[c_migrating]))
d_co = float(np.linalg.norm(coords[c_migrating] - coords[o_other_side]))
score = ((d_oo - d_oo_target) ** 2
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if the function also does C_parent–O_hydroxyl shortening and H transfer, the score ignores those by the looks of it. Maybe do add optionl score terms when those atoms exist?
Cause if two candidates tie on the 3 main distances, one could, I guess, have a muc better or worse H migration geom

# Step 1: Find the O-O peroxide bond.
oo_bond = None
for a, b in split_bonds:
if symbols[a] == 'O' and symbols[b] == 'O':
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will there ever be a chance when a/b can be not valid indexes?

Comment thread arc/family/family.py

Instances are cached per ``(label, consider_arc_families)`` so the
family ``groups.py`` file is read and parsed at most once per process.
The cached object is treated as immutable; do not mutate its public
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to double check, are any of self.reactants, self.entries, or self.actions ever mutated downstream? If so, then the cache can leak state between calls

Comment thread arc/family/family.py


@functools.lru_cache(maxsize=None)
def _read_groups_file_lines(label: str, consider_arc_families: bool) -> tuple[str, ...]:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this feeds into get_groups_file_as_lines right, which method says def get_groups_file_as_lines(...) -> list[str]:

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also since we have this function, it seems redundant to then have a self function inside the class thats sole purpose is to call this function? But okay with leaving it as is, just thought to point it out

Comment thread arc/family/family.py
return tuple(f.readlines())


class ReactionFamily(object):
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since ReactionFamily is now cached, there is a line that appears in nested loops

Group().from_adjacency_list(get_group_adjlist(...))

We should consider caching it too?

self.groups_by_label = {
    label: Group().from_adjacency_list(adjlist)
    for label, adjlist in self.entries.items()
}

@calvinp0
Copy link
Copy Markdown
Member

Not in this PR, but it does touch this file significantly family.py

if not isinstance(rmg_families, list) and rmg_family_set not in list(family_sets) + ['all']:

I believe that two lines above, we just initialised rmg_families, arc_families = list(), list(), so it is always False

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants