Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #695 +/- ##
==========================================
+ Coverage 60.42% 62.41% +1.98%
==========================================
Files 102 112 +10
Lines 31096 38217 +7121
Branches 8103 10018 +1915
==========================================
+ Hits 18791 23853 +5062
- Misses 9961 11463 +1502
- Partials 2344 2901 +557
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
8825e9f to
cbec920
Compare
908773e to
0acc5d4
Compare
so scissors / atom mapping / tests can use fewer confs
To allow another SMILES string. Actually, the test isn't great, the multiplicity cannot be 1, it should probably be 2. However, multiplicity 2 gives an even worse result with many radicals: 'OCCC(C(COO)(O[CH2])C)(C[C]1[CH][CH][C]([CH][CH]1)NO)SC' (all carbon atoms on the benzene ring are marked as having a radical). This is out of scope for the current linear TS adapter PR. The fix should be easy (detect a possibly aromatic ring with all radicals, at least for the C-only case, then make it aromatic). Leaving it as is for now since this is indeed a crazy molecule.
In arc/family/family.py: 1. Caches ReactionFamily instances per process. Adds a _cache: Dict[(label, consider_arc_families), ReactionFamily] class dict, plus a __new__ that returns the cached instance for a repeat key. __init__ early-returns on cache hits via a _initialized flag, so the parsing work runs at most once per family per process. 2. Caches groups.py file reads. Extracts the file-loading logic out of ReactionFamily.get_groups_file_as_lines into a module-level _read_groups_file_lines(label, consider_arc_families) decorated with @functools.lru_cache(maxsize=None). Returns a tuple (immutable) so callers can't poison the cache. The method now just delegates. 3. Caches recommended family sets. Decorates get_rmg_recommended_family_sets with @functools.lru_cache(maxsize=1) since its result is also process-stable. 4. Moves the label is None guard from __init__ to __new__ (it has to run before cache lookup). Net effect: repeated ReactionFamily(label) calls — common in mapping/heuristics — stop re-reading and re-parsing the same groups.py.
| # Calibrated TS target distances for [1,3]-sigmatropic shifts. | ||
| sbl_break = get_single_bond_length(symbols[origin], symbols[migrating]) or 1.5 | ||
| sbl_form = get_single_bond_length(symbols[target], symbols[migrating]) or 1.5 | ||
| _SIGMA_BREAK_STRETCH = 0.77 |
There was a problem hiding this comment.
I think these (incl. _SIGMA_FORM_STRETCH) should become module level
| if symbols[ib] == 'H': | ||
| continue | ||
| d = float(np.linalg.norm(coords[ia] - coords[ib])) | ||
| if d < 0.9: |
There was a problem hiding this comment.
Is there also a situation where it can be wildly above?
| Given two spheres of radii ``r1`` (around ``center1``) and ``r2`` | ||
| (around ``center2``), this returns the intersection point that lies | ||
| on the same side of the inter-center axis as ``ref_pos``. When the | ||
| spheres do not overlap, the point is placed at distance ``r1`` from |
There was a problem hiding this comment.
Maybe its just worth mentionig in the doctstring but if there is a case when th circle is inside the other circle completely. I guess it depends if you consider that as a 'no-intersection' and if that situation is every possible.
| o_hydroxyl = nbr_idx | ||
| break | ||
|
|
||
| _BV_CO_SHORTEN_TARGET = 1.28 # Å, calibrated from DFT TS (1.279) |
| sbl_cc = get_single_bond_length('C', 'C') or 1.54 | ||
| sbl_co = get_single_bond_length('C', 'O') or 1.43 | ||
|
|
||
| _BV_OO_STRETCH = 0.27 # Å per side (total ≈ 2 × 0.27 above sbl) |
| sbl_co = get_single_bond_length('C', 'O') or 1.43 | ||
|
|
||
| _BV_OO_STRETCH = 0.27 # Å per side (total ≈ 2 × 0.27 above sbl) | ||
| _BV_CC_STRETCH = 0.76 # Å above sbl for migrating-group departure |
|
|
||
| _BV_OO_STRETCH = 0.27 # Å per side (total ≈ 2 × 0.27 above sbl) | ||
| _BV_CC_STRETCH = 0.76 # Å above sbl for migrating-group departure | ||
| _BV_CO_STRETCH = 0.73 # Å above sbl(C,O) for migrating-group approach |
|
|
||
| # 4d. Concerted H migration from O_hydroxyl to the carbonyl O | ||
| # (~1.40 Å from each, calibrated). | ||
| _BV_H_TRANSFER_TARGET = 1.40 |
| h_idx = h_on_oh[0] | ||
| p1 = coords[o_hydroxyl] | ||
| p2 = coords[o_carbonyl_dbl] | ||
| ax = p2 - p1 |
There was a problem hiding this comment.
Is this not something already kinda done in _two_sphere_intersection function?
| frag_mig = bfs_fragment(adj, c_migrating, block={c_parent}) | ||
| p1 = coords[c_parent] | ||
| p2 = coords[o_other_side] | ||
| axis = p2 - p1 |
There was a problem hiding this comment.
Is this not something similar in _two_sphere_intersection?
| for c_migrating in mig_candidates: | ||
| coords = np.array(uni_xyz['coords'], dtype=float).copy() | ||
|
|
||
| # 4a. Stretch the O-O peroxide bond. |
There was a problem hiding this comment.
It's under the comment of step 5 but then here its 4a (then 4b etc.)
| d_oo = float(np.linalg.norm(coords[oo_bond[0]] - coords[oo_bond[1]])) | ||
| d_cc = float(np.linalg.norm(coords[c_parent] - coords[c_migrating])) | ||
| d_co = float(np.linalg.norm(coords[c_migrating] - coords[o_other_side])) | ||
| score = ((d_oo - d_oo_target) ** 2 |
There was a problem hiding this comment.
if the function also does C_parent–O_hydroxyl shortening and H transfer, the score ignores those by the looks of it. Maybe do add optionl score terms when those atoms exist?
Cause if two candidates tie on the 3 main distances, one could, I guess, have a muc better or worse H migration geom
| # Step 1: Find the O-O peroxide bond. | ||
| oo_bond = None | ||
| for a, b in split_bonds: | ||
| if symbols[a] == 'O' and symbols[b] == 'O': |
There was a problem hiding this comment.
will there ever be a chance when a/b can be not valid indexes?
|
|
||
| Instances are cached per ``(label, consider_arc_families)`` so the | ||
| family ``groups.py`` file is read and parsed at most once per process. | ||
| The cached object is treated as immutable; do not mutate its public |
There was a problem hiding this comment.
Just to double check, are any of self.reactants, self.entries, or self.actions ever mutated downstream? If so, then the cache can leak state between calls
|
|
||
|
|
||
| @functools.lru_cache(maxsize=None) | ||
| def _read_groups_file_lines(label: str, consider_arc_families: bool) -> tuple[str, ...]: |
There was a problem hiding this comment.
So this feeds into get_groups_file_as_lines right, which method says def get_groups_file_as_lines(...) -> list[str]:
There was a problem hiding this comment.
Also since we have this function, it seems redundant to then have a self function inside the class thats sole purpose is to call this function? But okay with leaving it as is, just thought to point it out
| return tuple(f.readlines()) | ||
|
|
||
|
|
||
| class ReactionFamily(object): |
There was a problem hiding this comment.
Since ReactionFamily is now cached, there is a line that appears in nested loops
Group().from_adjacency_list(get_group_adjlist(...))
We should consider caching it too?
self.groups_by_label = {
label: Group().from_adjacency_list(adjlist)
for label, adjlist in self.entries.items()
}
|
Not in this PR, but it does touch this file significantly Line 713 in 5877052 I believe that two lines above, we just initialised rmg_families, arc_families = list(), list(), so it is always False
|
Adds a new Linear TS search adapter that builds TS guesses from atom-mapped reactants and products via Z-matrix chimera construction with Hammond-biased weighting. The adapter is incore-only, plugs into ARC's scheduler like heuristics/autotst_ts, and delegates heavy geometry work to a new linear_utils/ subpackage (5 modules). Currently implemented for isomerization/unimolecular reactions.
The PR also carries supporting additions to arc/species/zmat.py (anchors, smart-anchor detection, zmat re-indexing helpers), arc/species/converter.py (atom-map reordering), arc/reaction/reaction.py (is_unimolecular, refined is_isomerization), and a biradical-preservation fix in arc/species/species.py. CI was hardened: pinned action,
-n 4 --dist workstealfor stability, and obabel test made self-contained.