SkillHub

pharmaclaw-chemistry-query

v2.0.1

Chemistry agent skill for PubChem API queries (compound info/properties, structures/SMILES/images, synthesis routes/references) + RDKit cheminformatics (SMILES to molecule props/logP/TPSA, 2D PNG/SVG viz, Morgan fingerprints, retrosynthesis/BRICS disconnects, multi-step synth planning). Use for chem...

Sourced from ClawHub, Authored by Cheminem

Installation

Please help me install the skill `pharmaclaw-chemistry-query` from SkillHub official store. npx skills add Cheminem/pharmaclaw-chemistry-query

Chemistry Query Agent v1.4.1

Overview

Full-stack chemistry toolkit combining PubChem data retrieval with RDKit molecule processing, visualization, analysis, retrosynthesis, and synthesis planning. All outputs are structured JSON for easy downstream chaining. Generates PNG/SVG images on demand.

Key capabilities: - PubChem compound lookup (info, structure, synthesis refs, similarity search) - RDKit molecular properties (MW, logP, TPSA, HBD/HBA, rotatable bonds, aromatic rings) - 2D molecule visualization (PNG/SVG) - BRICS retrosynthesis with recursive depth control - Multi-step synthesis route planning - Forward reaction simulation with SMARTS templates - Morgan fingerprints and similarity/substructure search - 21 named reaction templates (Suzuki, Heck, Grignard, Wittig, Diels-Alder, etc.)

Quick Start

# PubChem compound info
exec python scripts/query_pubchem.py --compound "aspirin" --type info

# Molecular properties from SMILES
exec python scripts/rdkit_mol.py --smiles "CC(=O)Oc1ccccc1C(=O)O" --action props

# Retrosynthesis
exec python scripts/rdkit_mol.py --target "CC(=O)Oc1ccccc1C(=O)O" --action retro --depth 2

# Full chain (name → props + draw + retro)
exec python scripts/chain_entry.py --input-json '{"name": "caffeine", "context": "user"}'

Scripts

scripts/query_pubchem.py

PubChem REST API queries with automatic name→CID resolution and timeout handling.

--compound <name|CID> --type <info|structure|synthesis|similar> [--format smiles|inchi|image|json] [--threshold 80]
  • info: Formula, MW, IUPAC name, InChIKey (JSON)
  • structure: SMILES, InChI, image URL, or full JSON
  • synthesis: Synonyms/references for a compound
  • similar: Similar compounds by 2D fingerprint (top 20)

scripts/rdkit_mol.py

RDKit cheminformatics engine. Resolves names via PubChem automatically.

--smiles <SMILES> --action <props|draw|fingerprint|similarity|substruct|xyz|react|retro|plan>
Action Description Key Args
props MW, logP, TPSA, HBD, HBA, rotB, aromRings --smiles
draw 2D PNG/SVG (300×300) --smiles --output file.png --format png|svg
retro BRICS recursive retrosynthesis --target <SMILES|name> --depth N
plan Multi-step retro route --target <SMILES|name> --steps N
react Forward reaction via SMARTS --reactants "smi1 smi2" --smarts "<SMARTS>"
fingerprint Morgan fingerprint bitvector --smiles --radius 2
similarity Tanimoto similarity scoring --query_smiles --target_smiles "smi1,smi2"
substruct Substructure matching --query_smiles --target_smiles "smi1,smi2"
xyz 3D coordinates (MMFF optimized) --smiles

scripts/chain_entry.py

Standard agent chain interface. Accepts {"smiles": "...", "context": "..."} or {"name": "...", "context": "..."}. Returns unified JSON with props, visualization, and retrosynthesis.

python scripts/chain_entry.py --input-json '{"name": "sotorasib", "context": "user"}'

Output schema:

{
  "agent": "chemistry-query",
  "version": "1.4.0",
  "smiles": "<canonical>",
  "status": "success|error",
  "report": {"props": {...}, "draw": {...}, "retro": {...}},
  "risks": [],
  "viz": ["path/to/image.png"],
  "recommend_next": ["pharmacology", "toxicology"],
  "confidence": 0.95,
  "warnings": [],
  "timestamp": "ISO8601"
}

scripts/templates.json

21 named reaction templates with SMARTS, expected yields, conditions, and references. Includes: Suzuki, Heck, Buchwald-Hartwig, Grignard, Wittig, Diels-Alder, Click, Sonogashira, Negishi, and more.

Chaining

  1. Name → Full Profile: chain_entry.py with {"name": "ibuprofen"} → props + draw + retro
  2. Chemistry → Pharmacology: Output feeds directly into pharma-pharmacology-agent
  3. Retro + Viz: Get precursors, then draw each one
  4. Suzuki Test: --action react --reactants "c1ccccc1Br c1ccccc1B(O)O" --smarts "[c:1][Br:2].[c:3]B(O)O>>[c:1][c:3]"

Tested With

All features verified end-to-end with RDKit 2024.03+:

Molecule SMILES Tests Passed
Caffeine CN1C=NC2=C1C(=O)N(C(=O)N2C)C info, structure, props, draw, retro, plan, chain
Aspirin CC(=O)Oc1ccccc1C(=O)O info, structure, props, draw, retro, plan, chain
Sotorasib PubChem name lookup info, structure, props, draw, retro, chain
Ibuprofen PubChem name lookup info, structure, props, chain
Invalid SMILES XXXINVALID Graceful JSON error
Empty input {} Graceful JSON error

Resources

  • references/api_endpoints.md — PubChem API endpoint reference and rate limits
  • scripts/rdkit_reaction.py — Legacy reaction module
  • scripts/chembl_query.py, scripts/pubmed_search.py, scripts/admet_predict.py — Additional query modules

scripts/advanced_chem.py

Advanced cheminformatics engine with 6 Tier 1 capabilities.

--action <standardize|descriptors|scaffold|mcs|mmpa|chemspace> --smiles <SMILES> [options]
Action Description Key Args
standardize Salt stripping, charge normalization, tautomer enumeration --smiles
descriptors 217+ molecular descriptors (RDKit full set), QED, SA Score, Lipinski/Veber rules --smiles --descriptor_set all|druglike|physical|topological
scaffold Murcko scaffold extraction, generic scaffolds, diversity analysis, R-group decomposition --smiles or --target_smiles "smi1,smi2,..." --rgroup_core <SMARTS>
mcs Maximum Common Substructure across 2+ molecules --target_smiles "smi1,smi2,..."
mmpa Matched Molecular Pair Analysis — find single-point transformations --target_smiles "smi1,smi2,..."
chemspace Chemical space visualization (PCA/t-SNE/UMAP scatter plot PNG) --target_smiles "smi1,smi2,..." --method pca|tsne|umap --output plot.png

Examples:

# Standardize a salt form
python scripts/advanced_chem.py --action standardize --smiles "[Na+].CC(=O)[O-]"

# Full descriptors (217+)
python scripts/advanced_chem.py --action descriptors --smiles "CC(=O)Oc1ccccc1C(=O)O" --descriptor_set all

# Scaffold diversity of a set
python scripts/advanced_chem.py --action scaffold --target_smiles "CC(=O)Oc1ccccc1C(=O)O,CN1C=NC2=C1C(=O)N(C(=O)N2C)C,CC(C)Cc1ccc(cc1)C(C)C(=O)O"

# MCS of aspirin and salicylic acid
python scripts/advanced_chem.py --action mcs --target_smiles "CC(=O)Oc1ccccc1C(=O)O,c1ccccc1C(=O)O"

# Matched molecular pairs
python scripts/advanced_chem.py --action mmpa --target_smiles "c1ccc(CC(=O)O)cc1,c1ccc(CCC(=O)O)cc1"

# Chemical space PCA plot
python scripts/advanced_chem.py --action chemspace --target_smiles "CC(=O)Oc1ccccc1C(=O)O,CN1C=NC2=C1C(=O)N(C(=O)N2C)C,c1ccccc1" --method pca --output space.png

Changelog

v2.0.0 (2026-02-28) - NEW: advanced_chem.py with 6 Tier 1 cheminformatics capabilities - Molecular Standardization & Tautomer Enumeration (salt stripping, charge normalization, canonical tautomers) - Extended Descriptors (217+ RDKit descriptors, QED, SA Score, Lipinski, Veber) - Scaffold Analysis (Murcko, generic scaffolds, diversity ratio, R-group decomposition) - Maximum Common Substructure (rdFMCS with coverage per molecule) - Matched Molecular Pair Analysis (rdMMPA fragmentation, transformation detection) - Chemical Space Visualization (PCA/t-SNE/UMAP with matplotlib scatter plots) - Dependencies: scikit-learn, matplotlib (added)

v1.4.1 (2026-02-25) - Security hardening: input sanitization for all subprocess calls (SMILES, compound names, output paths) - Added _sanitize_input() — length limits, null-byte rejection for all user inputs - Added _sanitize_output_path() — prevents path traversal, restricts extensions, blocks arbitrary file writes - Added shell metacharacter rejection in resolve_target() - Added SMILES validation via RDKit in chem_ui.py before subprocess calls - Added compound input validation in query_pubchem.py (length/null-byte checks) - Added timeout to resolve_target() PubChem subprocess call - Addresses VirusTotal "suspicious" classification for argument injection vectors

v1.4.0 (2026-02-14) - Fixed PubChem SMILES/InChI endpoint (property/CanonicalSMILES/TXT) - Fixed chain_entry.py HTML entity corruption - Fixed brics_retro to handle BRICSDecompose string output correctly - Added request timeouts (15s) to all PubChem calls - Graceful error handling for invalid SMILES and empty input - Updated chain output version and schema - Comprehensive end-to-end testing

v1.3.0 - RDKit props NoneType fixes, invalid SMILES graceful errors - React fix: ReactionFromSmarts import - Name resolution via PubChem for all RDKit actions

v1.2.0 - BRICS retrosynthesis + 21 reaction templates library - Multi-step synthesis planning