Getting Started with SynRFP

Welcome to the SynRFP documentation.

SynRFP is a mapping-free, graph-based toolkit for converting chemical transformations (reaction SMILES) into compact, reproducible fingerprints. It separates the pipeline into three modular operators: tokenizers (graph → tokens), combination/Δ aggregation, and randomized sketchers (multiset → fixed-size sketch). The high-level convenience wrapper and engine are implemented in the synrfp module. :contentReference[oaicite:0]{index=0}

SynRFP pipeline

Figure 1 — Overview of the SynRFP pipeline.

Installation

Python requirements

SynRFP targets Python 3.11 or later.

Install from PyPI

Install the stable release:

pip install synrfp

Install the package for development

git clone https://github.com/TieuLongPhan/SynRFP.git
cd SynRFP
python -m venv .venv
source .venv/bin/activate
pip install -e ".[dev]"

This installs SynRFP in editable mode so local edits are immediately importable.

Quick sanity checks

Check the installed version

python -c "import importlib.metadata as m; print(m.version('synrfp'))"

Core concepts & where to find them

  • The convenience wrapper synrfp(…) converts an RSMI string to a binary bit vector; see the top-level implementation. :contentReference[oaicite:1]{index=1}

  • The SynRFP engine composes a tokenizer with either an unweighted sketcher (e.g., ParityFold, MinHashSketch) or a weighted sketcher (e.g., CWSketch). See the ParityFold and sketcher implementations.

  • Reactions are parsed into lightweight GraphData containers and Reaction objects; these helpers are used by SynRFP for robust graph construction.

Minimal examples

  1. Verify a single reaction fingerprint (one-liner)

from synrfp.synrfp import synrfp

bits = synrfp(
    "CCO>>C=C.O",
    tokenizer="wl",     # "wl" or "nauty"
    radius=1,
    sketch="parity",    # "parity", "minhash", or "cw"
    bits=1024,
    seed=42,
)
print(len(bits), bits[:16])  # -> 1024, [0,1,0,0,...]  (binary vector)
# synrfp(...) convenience wrapper implemented in synrfp.py.
  1. Build an engine programmatically (tokenizer + sketcher)

from synrfp.tokenizers.wl import WLTokenizer
from synrfp.sketchers.parity_fold import ParityFold
from synrfp.synrfp import SynRFP, build_graph_from_printout, tanimoto_bits

# Tokenizer and sketcher classes (examples)
tok = WLTokenizer()
sk = ParityFold(bits=1024, seed=0)

# Create a SynRFP engine
fp_engine = SynRFP(tokenizer=tok, radius=1, sketch=sk)
# fp_engine.fingerprint(...) expects GraphData instances. GraphData helpers live in graph_data.py.
  1. Encode a batch of reactions (parallel-friendly)

from synrfp.encoder import SynRFPEncoder

rxn_smiles = ["CCO>>C=C.O", "CO>>CO2"]
fps = SynRFPEncoder.encode(
    rxn_smiles,
    tokenizer="wl",
    radius=1,
    sketch="parity",
    bits=1024,
    seed=0,
)
print(fps.shape)  # (2, 1024)

For more extensive examples and tutorials, visit the Tutorials and Examples section.