Developing Custom Asimmodules

This section will guide you through taking your own in-house simulation and integrating it into asimtools. The process is designed to be as straight-forward as reasonably possible and with continued user-feedback we hope to make it more seamless.

As an example, we will ASIMplify the calculation of equations of state as shown in ASE. To understand the code, visit the ASE tutorials page. Ultimately, the ASIMplified code will work with any chemical species, any supported calculator and any configured computing environment with fully parallelizable job submission on HPCs in just a few steps!

The code is given below:

from ase.build import bulk
from ase.calculators.emt import EMT
from ase.eos import calculate_eos
from ase.db import connect

db = connect('bulk.db')
for symb in ['Al', 'Ni', 'Cu', 'Pd', 'Ag', 'Pt', 'Au']:
    atoms = bulk(symb, 'fcc')
    atoms.calc = EMT()
    eos = calculate_eos(atoms)
    v, e, B = eos.fit()  # find minimum
    # Do one more calculation at the minimum and write to database:
    atoms.cell *= (v / atoms.get_volume())**(1 / 3)
    atoms.get_potential_energy()
    db.write(atoms, bm=B)

There are a couple of issues here that make it difficult to make this a high-throughput calculation. First, because we directly setup the calculator in the code, if we want to change the calculator we have to change the code. The same applies for the structures and chosen chemical species. In addition, the current code runs independent calculations one after the other. If each of these calculations was an expensive calculation, it would be much less efficient than running them in separate jobs on a HPC with a job scheduler. To fix any of these issues, we would have to modify the code, job scripts and perhaps even file structure. We will show how ASIMTools can help with all of these issues. The end-result will be that any of these changes can made without touching code or a job script.

1. Build and test with a cheap calculator

To make things easier, it is best to first replace the expensive parts of your workflow with toy examples e.g. use an empirical calculator like EMT instead of DFT, use a smaller/simpler structure, loop over fewer cases etc.

The ASE example already uses a simple EMT calculator and simple structures.

2. Wrap in a function

The workhorse of ASIMTools is the asimmodule, any code wrapped in a function that returns a dictionary can be run within the ASIMTools framework. The easiest thing to do would be to take the code and copy and paste it in a function which is defined inside an asimmodule with the same name

from ase.build import bulk
from ase.calculators.emt import EMT
from ase.eos import calculate_eos
from ase.db import connect

def ase_eos():
    db = connect('bulk.db')
    for symb in ['Al', 'Ni', 'Cu', 'Pd', 'Ag', 'Pt', 'Au']:
        atoms = bulk(symb, 'fcc')
        atoms.calc = EMT()
        eos = calculate_eos(atoms)
        v, e, B = eos.fit()  # find minimum
        # Do one more calculation at the minimu and write to database:
        atoms.cell *= (v / atoms.get_volume())**(1 / 3)
        atoms.get_potential_energy()
        db.write(atoms, bm=B)

    return {}

Immediately as it is, this is an asimmodule can be run in ASIMTools. You can run it with the following sim_input.yaml

asimmodule: /path/to/ase_eos.py
env_id: inline
workdir: results

then call

asim-execute sim_input.yaml

Only a little bit more complicated than calling python ase_eos.py

3. Make the asimmodule use any calculator

This asimmodule however still depends on specific structures and a specific calculator. Let’s do the easy thing first, let’s make the asimmodule work with any calculator using a simple change.

from ase.build import bulk
from ase.eos import calculate_eos
from ase.db import connect
from asimtools.calculators import load_calc

def ase_eos(
    calc_id,
):
    calc = load_calc(calc_id)
    db = connect('bulk.db')
    for symb in ['Al', 'Ni', 'Cu', 'Pd', 'Ag', 'Pt', 'Au']:
        atoms = bulk(symb, 'fcc')
        atoms.calc = calc
        eos = calculate_eos(atoms)
        v, e, B = eos.fit()  # find minimum
        # Do one more calculation at the minimu and write to database:
        atoms.cell *= (v / atoms.get_volume())**(1 / 3)
        atoms.get_potential_energy()
        db.write(atoms, bm=B)

    return {}

Just like that we can now run the asimmodule with any correctly configured calculator for all the structures! We can even now run calc_array to iterate getting the results using different calculators.

4. Make the asimmodule use any structure

The final change we will make is to parallelize over structures as below

from ase.build import bulk
from ase.eos import calculate_eos
from ase.db import connect
from asimtools.calculators import load_calc

def ase_eos(
    image,
    calc_id,
):
    calc = load_calc(calc_id)
    db = connect('bulk.db')
    atoms = get_atoms(**image)
    atoms.calc = calc
    eos = calculate_eos(atoms)
    v, e, B = eos.fit()  # find minimum
    # Do one more calculation at the minimu and write to database:
    atoms.cell *= (v / atoms.get_volume())**(1 / 3)
    atoms.get_potential_energy()
    db.write(atoms, bm=B)

    return {}

Easy-peasy. We now have an asimmodule that works with arbitrary environment, arbitrary calculator and arbitrary input structure (Of course the simulation will fail if we give a bad structure/calculator for example)

5. Final cleanup

We can do some final cleanup of the asimmodule so that it sends outputs to output.yaml and logs some checkpoints. Additionally, any asimmodules added to the repository will need clear syntax highlighting and documentation.

from typing import Dict
import logging
from ase.eos import calculate_eos
from ase.db import connect
from asimtools.calculators import load_calc
from asimtools.utils import get_atoms

def ase_eos(
    image: Dict,
    calc_id: str,
    db_file: 'bulk.db'
) -> Dict:
    calc = load_calc(calc_id)
    db = connect(db_file)
    atoms = get_atoms(**image)
    atoms.calc = calc
    eos = calculate_eos(atoms)
    v, e, B = eos.fit()  # find minimum
    logging.info('Successfully fit EOS')
    # Do one more calculation at the minimu and write to database:
    atoms.cell *= (v / atoms.get_volume())**(1 / 3)
    atoms.get_potential_energy()
    db.write(atoms, bm=B)

    results = {'v': float(v), 'e': float(e), 'B': float(B)}
    return results

To run this asimmodule on an arbitrary structure say Argon with say the LennardJones calculator, in a slurm job we can now use the following input files.

sim_input.yaml:

asimmodule: /path/to/ase_eos.py
env_id: batch
workdir: results
args:
    image:
        builder: bulk
        name: Ar
    calc_id: lj_Ar

calc_input.yaml:

lj_Ar:
    name: LennardJones
    module: ase.calculators.lj
    args:
        sigma: 3.54
        epsilon: 0.00802236
emt: #This is not used if an LJ calculator is chosen
    name: EMT
    module: ase.calculators.emt
    args: {}

env_input.yaml:

batch:
    mode:
        use_slurm: true
        interactive: false
    slurm:
        flags:
            - -n 2
        precommands:
            - source ~/.bashrc
            - conda activate asimtools
inline: # This is not used if env_id is batch
    mode:
        use_slurm: false
        interactive: true

6. Running multiple simulations in a workflow

Going back to the original problem, we wanted to run the simulation of multiple different elements with the EMT calculator. To achieve that in parallel, we can nest the ase_eos asimmodule in a asimtools.asimmodules.workflows.sim_array.sim_array() asimmodule as follows

sim_input.yaml:

asimmodule: workflows.sim_array
workdir: results
args:
    key_sequence: ['args', 'image', 'name']
    array_values: ['Al', 'Ni', 'Cu', 'Pd', 'Ag', 'Pt', 'Au']
    env_ids: 'batch'
    template_sim_input:
        asimmodule: /path/to/ase_eos.py
        args:
            calc_id: emt
            image:
                builder: bulk
                crystalstructure: 'fcc'

To make the asimmodule easier to access without having to use the full path, you can set the environment variable

export ASIMTOOLS_ASIMMODULE_DIR=/path/to/my/asimmodule/dir/

You can then move the ase_eos.py asimmodule to /path/to/my/asimmodule/dir/ i.e. the asimmodule directory. This allows you to refer to asimmodules prepended with the asimmodule dir as below

asimmodule: workflows.sim_array
workdir: results
args:
    key_sequence: ['args', 'image', 'name']
    array_values: ['Al', 'Ni', 'Cu', 'Pd', 'Ag', 'Pt', 'Au']
    env_ids: 'batch'
    template_sim_input:
        asimmodule: ase_eos/ase_eos.py
        args:
            calc_id: emt
            image:
                builder: bulk
                crystalstructure: 'fcc'

The above example loops over crystals for which ASE already has FCC lattice parameters, but what if we want to loop over the species and corresponding lattice parameters? We can either specify a list of images dictionaries as array_values or use secondary_array_values. We can also explicitly tell ASIMTools to include the array_values in the directory names in the standard format (e.g. id-0000__Al__, id-0001__Ni__ etc.).

asimmodule: workflows.sim_array
workdir: results
args:
    key_sequence: ['args', 'image', 'name']
    array_values: ['Al', 'Ni', 'Cu', 'Pd', 'Ag', 'Pt', 'Au']
    labels: values
    secondary_key_sequences:
    - ['args', 'image', 'a']
    secondary_array_values:
    - [4.0479, 3.524, 3.6149, 3.8907, 4.0853, 3.9242, 4.0782]
    env_ids: 'batch'
    template_sim_input:
        asimmodule: ase_eos/ase_eos.py
        args:
            calc_id: emt
            image:
                builder: bulk
                crystalstructure: 'fcc'

This will perform the EOS calculation for each species with the corresponding lattice parameter. That’s 7x5=35 calculations in parallel without touching code or a job script!