Skip to content


Splito is a python library designed for aiding in drug discovery by providing powerful methods for parsing and splitting datasets. It enables researchers and chemists to efficiently process data for their ML projects.

Splito is part of the Datamol ecosystem:


You can install splito using pip:

pip install splito

You can use conda/mamba. Ask @maclandrol for credentials to the conda forge or for a token

mamba install -c conda-forge splito

Quick API Tour

import datamol as dm
from splito import ScaffoldSplit

# Load some data
data =

# Initialize a splitter
splitter = ScaffoldSplit(smiles=data["smiles"].tolist(), n_jobs=-1, test_size=0.2, random_state=111)

# Generate indices for training set and test set
train_idx, test_idx = next(splitter.split(X=data.smiles.values))


Check out the tutorials to get started.