Multi-fidelity optimization¶

Some experiments can be very expensive. These may be supplemented by simpler alternatives, or perhaps high-throughput calculations. This would give measurements of lower fidelity, and the planner can take advantage of these measurements to guide high fidelity optimization.

This can also be used in a virtual screening setting. Expensive quantum chemistry calculations can be supplemented by faster semi-empirical methods. Another example could also be the virtual screening of compounds for drug activity, with high fidelity free-energy perturbation calcualtions being approximated by faster and lower fidelity docking calculations.

[ ]:

import json
import pickle
import numpy as np
import pandas as pd
from copy import deepcopy

from olympus.datasets import Dataset
from olympus.objects import (
    ParameterContinuous,
    ParameterDiscrete,
    ParameterCategorical,
    ParameterVector
)
from olympus.campaigns import ParameterSpace, Campaign

from atlas.planners.multi_fidelity.planner import MultiFidelityPlanner      # specially designed planner for multi-fidelity optimization

import pickle

/home/garyk/mambaforge/envs/atlas/lib/python3.9/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
  from .autonotebook import tqdm as notebook_tqdm

For this example, we will perform a screening of the bandgap of perovskites. There’s two fidelities of measurements, one using GGA (low), and one use HSE06 (high). You can set the associated cost to each one, but we will consider queries to GGA calculations as 10 times cheaper than with HSE06.

[ ]:

COST_BUDGET = 50            # this time the budget is a cost
NUM_INIT_DESIGN = 10
NUM_CHEAP = 8               # this is the ratio of low:high measurements (ie. 8:1 low/high fidelity)

Here we will create an additional fidelity parameter s, which can only be the permitted fidelities. The MultiFidelityPlanner will be allowed to vary this parameter, and perform optimization with an additional constrained fidelity parameter.

[25]:

dataset = Dataset(kind='perovskites')

# build parameter space
param_space = ParameterSpace()

# fidelity param
param_space.add(ParameterDiscrete(name='s', options=[0.1, 1.0], low=0.1, high=1.0))
for param in dataset.param_space: # add perovskite component parameters ('organic', 'cation', and 'anion')
    param_space.add(param)

[ ]:

# lower fidelity data calucated using GGA is available in the examples folder
# so we will load it here to create a new function for measurements
# fill in the ATLAS_PATH
ATLAS_PATH = '.'
LOOKUP = pickle.load(open(f'{ATLAS_PATH}/examples/multi_fidelity/perovskites/lookup/lookup_table.pkl', 'rb'))

def measure(params, s):
    # high-fidelity is hse06, low-fidelity is gga
    if s == 1.0:
            measurement = np.amin(
                    LOOKUP[params.organic.capitalize()][params.cation][params.anion]['bandgap_hse06']
            )
    elif s == 0.1:
            measurement = np.amin(
                    LOOKUP[params.organic.capitalize()][params.cation][params.anion]['bandgap_gga']
            )
    return measurement

[ ]:

campaign = Campaign()
campaign.set_param_space(param_space)

planner = MultiFidelityPlanner(
    goal='minimize',
    init_design_strategy='random',
    num_init_design=NUM_INIT_DESIGN,
    use_descriptors=True,
    batch_size=1,
    acquisition_optimizer_kind='pymoo',     # this is required
    fidelity_params=0,                      # this dimension is the fidelity parameter (we use the first one)
    fidelities=[0.1, 1.],                   # these are the possible fidelities (GGA = 0.1, and HSE = 1.0)
)

planner.set_param_space(param_space)

───────────────────────────────────────────────────────────────────────────────────────────────────────────────────

                                                                                                                   
                                                 Welcome to ATLAS!

                                                Made with 💕 in 🇨🇦

───────────────────────────────────────────────────────────────────────────────────────────────────────────────────

───────────────────────────── Initial design phase ─────────────────────────────

[ ]:

# accumulated cost, the budget is also cost
COST = 0.

target_rec_measurements = []
iter_ = 0
while COST < COST_BUDGET:
    print(f'\nITER : {iter_+1}\tCOST : {COST}\n')

    # this is how much the corresponding measurement will cost
    if iter_ % NUM_CHEAP == 0:
        planner.set_ask_fidelity(1.0)
    else:
        planner.set_ask_fidelity(0.1)

    samples = planner.recommend(campaign.observations)
    for sample in samples:
        measurement = measure(sample, sample.s)
        campaign.add_observation(sample, measurement)

        print('SAMPLE : ', sample)
        print('MEASUREMENT : ', measurement)

        iter_+=1

    # do a check to see if model will find the optimal
    if campaign.num_obs > NUM_INIT_DESIGN:
        # make greedy recommendation on the target fidelity
        # use this to make a high-fidelity measurement
        rec_sample = planner.recommend_target_fidelity(batch_size=1)[0]
        rec_measurement = measure(rec_sample, rec_sample.s)
        print('REC SAMPLE : ', rec_sample)
        print('REC MEASUREMENT : ', rec_measurement)

        target_rec_measurements.append(rec_measurement)
        # kill the run if we have found the lowest hse06 bandgap
        # on the most recent high-fidelity measurement
        if rec_measurement == min_hse06_bandgap:
            print('found the min hse06 bandgap!')
            break
    else:
        target_rec_measurements.append(measurement)
        if measurement == min_hse06_bandgap and samples[0].s == 1.:
            print('found the min hse06 bandgap!')
            break