Splito is a python library designed for aiding in drug discovery by providing powerful methods for parsing and splitting datasets. It enables researchers and chemists to efficiently process data for their ML projects.
Splito is part of the Datamol ecosystem: https://datamol.io.
You can install
splito using pip:
You can use conda/mamba. Ask @maclandrol for credentials to the conda forge or for a token
Quick API Tour
import datamol as dm from splito import ScaffoldSplit # Load some data data = dm.data.chembl_drugs() # Initialize a splitter splitter = ScaffoldSplit(smiles=data["smiles"].tolist(), n_jobs=-1, test_size=0.2, random_state=111) # Generate indices for training set and test set train_idx, test_idx = next(splitter.split(X=data.smiles.values))
Check out the tutorials to get started.