Added a `Linear` TS search job adapter by alongd · Pull Request #695 · ReactionMechanismGenerator/ARC

alongd · 2023-08-21T23:01:09Z

Adds a new Linear TS search adapter that builds TS guesses from atom-mapped reactants and products via Z-matrix chimera construction with Hammond-biased weighting. The adapter is incore-only, plugs into ARC's scheduler like heuristics/autotst_ts, and delegates heavy geometry work to a new linear_utils/ subpackage (5 modules). Currently implemented for isomerization/unimolecular reactions.

The PR also carries supporting additions to arc/species/zmat.py (anchors, smart-anchor detection, zmat re-indexing helpers), arc/species/converter.py (atom-map reordering), arc/reaction/reaction.py (is_unimolecular, refined is_isomerization), and a biradical-preservation fix in arc/species/species.py. CI was hardened: pinned action, -n 4 --dist worksteal for stability, and obabel test made self-contained.

codecov · 2023-08-22T00:09:46Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 62.41%. Comparing base (f30f9cc) to head (ee61d31).

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #695      +/-   ##
==========================================
+ Coverage   60.42%   62.41%   +1.98%     
==========================================
  Files         102      112      +10     
  Lines       31096    38217    +7121     
  Branches     8103    10018    +1915     
==========================================
+ Hits        18791    23853    +5062     
- Misses       9961    11463    +1502     
- Partials     2344     2901     +557

Flag	Coverage Δ
functionaltests	`62.41% <ø> (+1.98%)`	⬆️
unittests	`62.41% <ø> (+1.98%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copilot

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

so scissors / atom mapping / tests can use fewer confs

To allow another SMILES string. Actually, the test isn't great, the multiplicity cannot be 1, it should probably be 2. However, multiplicity 2 gives an even worse result with many radicals: 'OCCC(C(COO)(O[CH2])C)(C[C]1[CH][CH][C]([CH][CH]1)NO)SC' (all carbon atoms on the benzene ring are marked as having a radical). This is out of scope for the current linear TS adapter PR. The fix should be easy (detect a possibly aromatic ring with all radicals, at least for the C-only case, then make it aromatic). Leaving it as is for now since this is indeed a crazy molecule.

In arc/family/family.py: 1. Caches ReactionFamily instances per process. Adds a _cache: Dict[(label, consider_arc_families), ReactionFamily] class dict, plus a __new__ that returns the cached instance for a repeat key. __init__ early-returns on cache hits via a _initialized flag, so the parsing work runs at most once per family per process. 2. Caches groups.py file reads. Extracts the file-loading logic out of ReactionFamily.get_groups_file_as_lines into a module-level _read_groups_file_lines(label, consider_arc_families) decorated with @functools.lru_cache(maxsize=None). Returns a tuple (immutable) so callers can't poison the cache. The method now just delegates. 3. Caches recommended family sets. Decorates get_rmg_recommended_family_sets with @functools.lru_cache(maxsize=1) since its result is also process-stable. 4. Moves the label is None guard from __init__ to __new__ (it has to run before cache lookup). Net effect: repeated ReactionFamily(label) calls — common in mapping/heuristics — stop re-reading and re-parsing the same groups.py.

calvinp0 · 2026-04-29T07:56:26Z

+    # Calibrated TS target distances for [1,3]-sigmatropic shifts.
+    sbl_break = get_single_bond_length(symbols[origin], symbols[migrating]) or 1.5
+    sbl_form = get_single_bond_length(symbols[target], symbols[migrating]) or 1.5
+    _SIGMA_BREAK_STRETCH = 0.77


I think these (incl. _SIGMA_FORM_STRETCH) should become module level

calvinp0 · 2026-04-29T08:59:32Z

+            if symbols[ib] == 'H':
+                continue
+            d = float(np.linalg.norm(coords[ia] - coords[ib]))
+            if d < 0.9:


Is there also a situation where it can be wildly above?

calvinp0 · 2026-04-29T09:29:11Z

+    Given two spheres of radii ``r1`` (around ``center1``) and ``r2``
+    (around ``center2``), this returns the intersection point that lies
+    on the same side of the inter-center axis as ``ref_pos``. When the
+    spheres do not overlap, the point is placed at distance ``r1`` from


Maybe its just worth mentionig in the doctstring but if there is a case when th circle is inside the other circle completely. I guess it depends if you consider that as a 'no-intersection' and if that situation is every possible.

calvinp0 · 2026-04-29T09:56:57Z

+            o_hydroxyl = nbr_idx
+            break
+
+    _BV_CO_SHORTEN_TARGET = 1.28  # Å, calibrated from DFT TS (1.279)


Module level

calvinp0 · 2026-04-29T09:57:07Z

+    sbl_cc = get_single_bond_length('C', 'C') or 1.54
+    sbl_co = get_single_bond_length('C', 'O') or 1.43
+
+    _BV_OO_STRETCH = 0.27    # Å per side (total ≈ 2 × 0.27 above sbl)


Module level

calvinp0 · 2026-04-29T09:57:11Z

+    sbl_co = get_single_bond_length('C', 'O') or 1.43
+
+    _BV_OO_STRETCH = 0.27    # Å per side (total ≈ 2 × 0.27 above sbl)
+    _BV_CC_STRETCH = 0.76    # Å above sbl for migrating-group departure


Module level

calvinp0 · 2026-04-29T09:57:15Z

+
+    _BV_OO_STRETCH = 0.27    # Å per side (total ≈ 2 × 0.27 above sbl)
+    _BV_CC_STRETCH = 0.76    # Å above sbl for migrating-group departure
+    _BV_CO_STRETCH = 0.73    # Å above sbl(C,O) for migrating-group approach


Module level

calvinp0 · 2026-04-29T09:57:31Z

+
+        # 4d. Concerted H migration from O_hydroxyl to the carbonyl O
+        # (~1.40 Å from each, calibrated).
+        _BV_H_TRANSFER_TARGET = 1.40


Module level

calvinp0 · 2026-04-29T09:59:05Z

+                h_idx = h_on_oh[0]
+                p1 = coords[o_hydroxyl]
+                p2 = coords[o_carbonyl_dbl]
+                ax = p2 - p1


Is this not something already kinda done in _two_sphere_intersection function?

calvinp0 · 2026-04-29T10:00:29Z

+        frag_mig = bfs_fragment(adj, c_migrating, block={c_parent})
+        p1 = coords[c_parent]
+        p2 = coords[o_other_side]
+        axis = p2 - p1


Is this not something similar in _two_sphere_intersection?

calvinp0 · 2026-04-29T10:01:11Z

+    for c_migrating in mig_candidates:
+        coords = np.array(uni_xyz['coords'], dtype=float).copy()
+
+        # 4a. Stretch the O-O peroxide bond.


It's under the comment of step 5 but then here its 4a (then 4b etc.)

calvinp0 · 2026-04-29T10:03:55Z

+        d_oo = float(np.linalg.norm(coords[oo_bond[0]] - coords[oo_bond[1]]))
+        d_cc = float(np.linalg.norm(coords[c_parent] - coords[c_migrating]))
+        d_co = float(np.linalg.norm(coords[c_migrating] - coords[o_other_side]))
+        score = ((d_oo - d_oo_target) ** 2


if the function also does C_parent–O_hydroxyl shortening and H transfer, the score ignores those by the looks of it. Maybe do add optionl score terms when those atoms exist?
Cause if two candidates tie on the 3 main distances, one could, I guess, have a muc better or worse H migration geom

calvinp0 · 2026-04-29T10:06:19Z

+    # Step 1: Find the O-O peroxide bond.
+    oo_bond = None
+    for a, b in split_bonds:
+        if symbols[a] == 'O' and symbols[b] == 'O':


will there ever be a chance when a/b can be not valid indexes?

calvinp0 · 2026-04-29T11:17:32Z


+    Instances are cached per ``(label, consider_arc_families)`` so the
+    family ``groups.py`` file is read and parsed at most once per process.
+    The cached object is treated as immutable; do not mutate its public


Just to double check, are any of self.reactants, self.entries, or self.actions ever mutated downstream? If so, then the cache can leak state between calls

calvinp0 · 2026-04-29T11:19:38Z



+@functools.lru_cache(maxsize=None)
+def _read_groups_file_lines(label: str, consider_arc_families: bool) -> tuple[str, ...]:


So this feeds into get_groups_file_as_lines right, which method says def get_groups_file_as_lines(...) -> list[str]:

Also since we have this function, it seems redundant to then have a self function inside the class thats sole purpose is to call this function? But okay with leaving it as is, just thought to point it out

calvinp0 · 2026-04-29T11:45:19Z

+        return tuple(f.readlines())
+
+
 class ReactionFamily(object):


Since ReactionFamily is now cached, there is a line that appears in nested loops

Group().from_adjacency_list(get_group_adjlist(...))

We should consider caching it too?

self.groups_by_label = { label: Group().from_adjacency_list(adjlist) for label, adjlist in self.entries.items() }

calvinp0 · 2026-04-29T11:50:16Z

Not in this PR, but it does touch this file significantly family.py

ARC/arc/family/family.py

Line 713 in 5877052

    
           if not isinstance(rmg_families, list) and rmg_family_set not in list(family_sets) + ['all']:

I believe that two lines above, we just initialised rmg_families, arc_families = list(), list(), so it is always False

github-actions Bot added Module: Mapping Module: Reaction labels Aug 21, 2023

github-advanced-security AI found potential problems Aug 21, 2023

View reviewed changes

alongd force-pushed the linear_ts branch from fa030cb to a841d2c Compare October 2, 2023 11:37

github-actions Bot added the Module: Species label Oct 2, 2023

github-advanced-security AI found potential problems Oct 2, 2023

View reviewed changes

Comment thread arc/species/species.py Fixed

alongd force-pushed the linear_ts branch from a841d2c to 7aeadea Compare December 4, 2023 05:11

alongd force-pushed the linear_ts branch from 7aeadea to af103f3 Compare January 28, 2024 07:58

github-actions Bot added the Module: rmgdb label Jan 28, 2024

github-advanced-security AI found potential problems Jan 28, 2024

View reviewed changes

Comment thread arc/rmgdb.py Fixed

alongd force-pushed the linear_ts branch 5 times, most recently from 8825e9f to cbec920 Compare May 15, 2024 07:02

alongd marked this pull request as ready for review May 15, 2024 09:26

github-actions Bot added the Module: Converter label May 28, 2024

github-advanced-security AI found potential problems May 28, 2024

View reviewed changes

Comment thread arc/job/adapters/ts/linear.py Fixed

alongd force-pushed the linear_ts branch from 69759f8 to 2afb15d Compare March 24, 2025 05:24

github-advanced-security AI found potential problems Mar 24, 2025

View reviewed changes

Comment thread arc/job/adapters/ts/linear_test.py Fixed

Comment thread arc/job/adapters/ts/linear_test.py Fixed

alongd force-pushed the linear_ts branch from 2afb15d to 6eccc9a Compare August 25, 2025 11:14

alongd requested a review from Copilot August 25, 2025 11:14

github-advanced-security AI found potential problems Aug 25, 2025

View reviewed changes

Comment thread arc/job/adapters/ts/linear.py Fixed

Copilot AI reviewed Aug 25, 2025

View reviewed changes

alongd force-pushed the linear_ts branch from 6eccc9a to c7ebdff Compare December 31, 2025 17:22

alongd force-pushed the linear_ts branch from 22dd43c to f1b79b7 Compare March 17, 2026 08:09

github-advanced-security AI found potential problems Mar 17, 2026

View reviewed changes

Comment thread arc/job/adapters/ts/linear_test.py Fixed

Comment thread arc/species/zmat.py Fixed

Comment thread arc/species/zmat.py Fixed

Comment thread arc/species/zmat.py Fixed

alongd force-pushed the linear_ts branch 2 times, most recently from 908773e to 0acc5d4 Compare March 23, 2026 21:26

alongd added 11 commits April 28, 2026 22:40

Implement ts_adapters_for_unknown_unimolecular() in Scheduler

c9258a4

Added economic_generation to conformers

d290687

so scissors / atom mapping / tests can use fewer confs

Tests: economic conformer generation

971d9c2

Use economic_generation in scissors

2d058ba

Minor: Added exist_ok to project directory os.makedirs in main

ac53159

Type hints leftover fixes

035d6f3

f! ci

ad4dab2

f common tst

14b2ea3

f common 14

5877052

alongd force-pushed the linear_ts branch from b979932 to 5877052 Compare April 28, 2026 19:40

github-advanced-security AI found potential problems Apr 28, 2026

View reviewed changes

Comment thread arc/common.py Dismissed

Comment thread arc/common.py Dismissed

calvinp0 reviewed Apr 29, 2026

View reviewed changes



		@functools.lru_cache(maxsize=None)
		def _read_groups_file_lines(label: str, consider_arc_families: bool) -> tuple[str, ...]:

Conversation

alongd commented Aug 21, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

codecov Bot commented Aug 22, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

calvinp0 commented Apr 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

alongd commented Aug 21, 2023 •

edited

Loading

codecov Bot commented Aug 22, 2023 •

edited

Loading