Running an Existing Asimmodule

This section will guide you through running an already existing asimmodule whether it be one of the asimmodules provided in the main repository, from the asimmodule repository or a custom asimmodule you have written yourself.

To run a simulation, all you need to do is run either of the following commands:

asim-execute sim_input.yaml -c calc_input.yaml -e env_input.yaml

Providing calc_input.yaml or env_input.yaml is optional. If not provided, the globally configured files will be used. See env_input.yaml. This command will automatically run the specified simulation in the correct directory and environment.

In rare cases, you can use the following command:

asim-run sim_input.yaml -c calc_input.yaml

This command will run the simulation in the current directory and environment. For most cases, you will only ever use asim-execute. The differences between asim-execute and asim-run are explained in Usage of asim-execute and asim-run.

In summary, the steps you need to take to start using any of the in-built asimmodules, described in detail below, are the following:

Install ASIMTools in you python environment or add it to your PYTHONPATH.
Setup your global env_input.yaml and calc_input.yaml and set the environment variables pointing to them.
Write a sim_input.yaml based on the examples provided in the repository. Do not hesitate to submit an issue if you are confused as we are still in a testing phase
asim-execute!

Input files

env_input.yaml

One can provide an env_input.yaml file that details the kind of environments in which asimmodules can be run. This file can be provided with the asim-execute command using the -e flag or set globally. An example of an env_input.yaml file is given below

# template
env_id:
  mode:
    use_slurm: true
    interactive: false
    run_prefix: ...
    run_suffix: ...
  slurm:
    flags: [flag1, flag2, ...]
    precommands: [precommand1, precommand2, ...]
    postcommands: [postcommand1, postcommand2, ...]

# Concrete examples below
inline: # Run the asimmodule directly in the console
  mode:
    use_slurm: false
    interactive: true

batch_job: # Submit a batch job using slurm with 2 tasks
  mode:
    use_slurm: true
    interactive: false
  slurm:
    flags:
        - -n 2
        - --mem-per-cpu=2G
    precommands:
        - source ~/.bashrc
        - conda activate asimtools
    postcommands:
        - conda deactivate asimtools
# Submit an interactive job using slurm, you can use a dictionary
# for the flags
interactive_job:
  mode:
    use_slurm: true
    interactive: true
  slurm:
    flags:
      -n: 2
      --gres: gpu:2
    precommands:
      - module load lammps

The highest level key is the env_id which is the one specified in the sim_input.yaml. An env_input.yaml can have any number of env_id s. That way you can specify one global file if you use the same environments repeatedly. You can configure a global config file by setting the environment variable.

export ASIMTOOLS_ENV_INPUT=/path/to/my/global/env_input.yaml

If you do not provide an env_input.yaml and there is no file called env_input.yaml in the work directory, ASIMTools will look for the env_id in the global file.

The parameters, required, shown in the template section are described below

env_id: (str) unique key for identifying the environment, env_id in sim_input.yaml must match one of the env_id s defined in the env_input.yaml being used.
env_id.mode.use_slurm: (bool) whether or not to request a slurm allocation to run the asimmodule
env_id.mode.interactive: (bool) whether or not to run the asimmodule directly in the terminal (using salloc) or to submit a batch job (using sbatch).
env_id.mode.run_prefix: (str) string to append before running the asimmodule e.g. if run_prefix=mpirun the asimmodule will be invoked with the equivalent of mpirun python my_asimmodule.py. run_prefix in env_input.yaml is always prepended before the one provided by calc_input.yaml.
env_id.mode.run_suffix: (str) string to append after running the asimmodule e.g. if run_suffix: ' &> out.txt' is provided, the asimmodule will be invoked with the equivalent of python my_asimmodule.py &> out.txt. run_suffix in env_input.yaml is always appended after the one provided by calc_input.yaml.
env_id.slurm.flags: (list/dict, optional) The slurm flags for the allocation as a list of flags e.g. [-n 4, -N 1]. One can also specify a dictionary e.g. '{-n': 4, '-N': 1, '--mem':2G}
env_id.slurm.precommands: (list, optional) Commands to be run/added to the job script before running the asimmodule. A common use case is loading a module or activating an environment.
env_id.slurm.postcommands: (list, optional) Commands to be run/added to the job asimmodule after running the asimmodule. e.g. for file cleanup or moving files after the job is complete.

calc_input.yaml

The calc_input.yaml is used to configure an ASE calculator. As above, a global configuration file can be set using

export ASIMTOOLS_CALC_INPUT=/path/to/my/global/calc_input.yaml

or provided to asim-execute at run time. Note that if you launch a chained workflow with asim-run instead of asim-execute, asimmodules farther down the chain will use the global calc_input.yaml, so always use asim-execute

# Template
calc_id:
  name: ...
  module: ...
  precommands: [precommand1, ...]
  postcommands: [postcommand1, ...]
  run_prefix: ...
  run_suffix: ...
  args:
    arg1: value_1
    ...

# Concrete examples
# Here is a simple LJ potential from ASE
lj:
  name: LennardJones
  module: ase.calculators.lj
  args:
    sigma: 3.54
    epsilon: 0.00802236

# GPAW needs a run_prefix to work in parallel using mpirun
gpaw:
  name: GPAW
  module: gpaw.calculator
  run_prefix: mpirun
  args:
    kpts: [2,2,2]
    h: 0.1
    xc: PBE
    txt: gpaw_output.txt

# You can install a universal potential like MACE and define it as well, see
# asimtools/calculators.py for implemented external calculators. Submit an
# issue if you want one to be implemented.
MACE32-medium:
  name: MACE
  args:
    model: medium
    use_device: cuda

The parameters for the calculators provided directly in ASE are specified under the assumption that the calculator will be initiated as follows:

from module import name
calc = name(**args)

This works for all calculators defined in ASE v3.22 and below, for newer versions of ASE, you might need to use the versions that use profiles e.g. use name: EspressoProfile not name: Espresso until those become stable in ASE. For externally defined calculators, you can submit an issue and we will implement it. For example, calculators for NequIP, Deep Potential, MACE, CHGNet and M3GNet force fields are implemented.

calc_id: (str) unique key for identifying the calculator, calc_id in sim_input.yaml must match one of the calc_id s defined in the provided calc_input.yaml
calc_id.name: (str) Either the name of the class or the reference to one of the provided external calculators.
calc_id.module: (str) The module from which the calculator class is imported. e.g. if name=LennardJones and module=ase.calculators.lj, then the calculator object is imported as from ase.calculators.lj import LennardJones. This works if the calculator is available in ASE or follows ASE format for initialization such as GPAW. Any other ASE calculator will need to have the instantiation defined in :ref:calculators.py
calc_id.mode.run_prefix: (str) string to append before running the asimmodule e.g. if run_prefix=mpirun the asimmodule will be invoked with the equivalent of mpirun python my_asimmodule.py. run_prefix in env_input.yaml is always prepended before the one provided by calc_input.yaml.
calc_id.mode.run_suffix: (str) string to append after running the asimmodule e.g. if run_postfix=' &> out.txt' the asimmodule will be invoked with the equivalent of python my_asimmodule.py &> out.txt. run_postfix in env_input.yaml is always appended after the one provided by calc_input.yaml.
calc_id.precommands: (list, optional) Commands to be run/added to the job asimmodule before running the asimmodule. A common use case is loading a module or activating an environment
calc_id.postcommands: (list, optional) Commands to be run/added to the job asimmodule after running the asimmodule. e.g. cleaning up bulky tmp or wavefunction files
calc_id.args: (dict) key-value pairs to be passed as arguments for the initialization of the calculator class. e.g. if the class is LennardJones, the arguments are passed as calc = LennardJones(**{'sigma':3.2, 'epsilon':3})

sim_input.yaml

The minimal requirement to run an asimmodule is to provide a sim_input.yaml file. An example of a sim_input.yaml is shown below:

asimmodule: singlepoint
env_id: inline
overwrite: false
submit: true
workdir: results
precommands:
    - export MY_ENV_VAR=3
args:
    arg1: value_1
    arg2: value_2
    ...

The parameters are:

asimmodule: (str) name of core asimmodule or /path/to/my/asimmodule.py. Core asimmodules defined in the asimmodules directory can be simply referred to using Python dot notation. E.g. to specify the asimtools.asimmodules.workflows.sim_array() asimmodule, you would specify workflows.sim_array. Any other asimmodule should be specified as either a full path or a path relative to ASIMTOOLS_ASIMMODULE_DIR variable to a python file. E.g. my_asimmodules/asim_ple.py
env_id: (str, optional) Environment/context in which to run asimmodule configured in env_input.yaml, defaults to running in the current console
overwrite: (bool, optional) (bool) whether or not to overwrite work directories if they exist, defaults to false
submit: (bool, optional) whether to run the asimmodule. If set to false it will just write the input files which is very useful for testing before submitting large workflows. You can go in and test one example before resubmitting with submit=True, defaults to true
workdir: (str, optional) The directory in which the asimmodule will be run, asim-execute will create the directory whereas asim-run ignores this parameter entirely, defaults to ‘./results’
precommands: (list, optional) a list of commands to run in the console before running the asimmodule, defaults to empty list
postcommands: (list, optional) a list of commands to run in the console after running the asimmodule, defaults to empty list
args: (dict) The arguments of the function being called in the asimmodule as key-value pairs. These are specific to the asimmodule being run.

All ASIMTools generated files are named sim_input.yaml but you can name user defined files as whatever you like

Specifying Images/Atoms

One of the most useful applications of ASIMTools is the unification of methods for setting up ASE atoms objects using the same interface. If an asimmodule requires a single or multiple atoms objects as input, they are provided as either an image dictionary for a single Atoms object or images for a list of Atoms objects as part of the args section. Below are the different ways to get an atoms object. You can also download images from The Materials Project and for some cases generate them using Pymatgen.

For a detailed description of the API and examples, see asimtools.utils.get_atoms()

# Reading a specific image from a structure file using ase.io.read
image:
  image_file: /path/to/my/ASE-readable/image/file.xyz
  # Optional keyword argument passed to ase.io.read
  index: 3

# Building a bulk crystal using ase.build.bulk
image:
  builder: bulk
  # Optional keyword arguments passed to the builder, must match ASE exactly
  name: Li
  crystalstructure: bcc
  a: 4.3
  cubic: True

# Building a surface using ase.build.fcc100
image:
  builder: fcc100
  # Optional keyword arguments passed to the builder, must match ASE exactly
  symbol: Fe
  vacuum: 8
  periodic: False

# Building a 3x3x3 supercell of Ar using ase.build.bulk then
# Atoms.repeat(repeat) and then applying Atoms.rattle(stdev=rattle_stdev)
image:
  name: Ar
  repeat: [3,3,3]
  rattle_stdev: 0.01

# You can even supply an atoms object directly so that the interface is
# universal. This is most useful in the asimmodule code itself.
image:
  atoms: Atoms

# An example downloading a structure from Materials Project using your own
# USER_API_KEY
image:
  mp_id: 'mp-14'
  interface: pymatgen
  user_api_key: "USER_API_KEY"
  conventional_unit_cell': true

Similarly, if the asimmodule requires multiple image inputs, there exists a universal interface. The keyword is usually specified as images. This is especially useful for distributing simulations across multiple structures or reading structures from multiple previous simulations, even in different directories.

For a detailed description of the API, see asimtools.utils.get_images()

# Reading specific images from a structure file using ase.io.read
images:
  image_file: /path/to/my/ASE-readable/image/file.xyz
  # Optional keyword arguments passed to ase.io.read
  index: '3:8'
  format: extxyz

# You can read all files matching a certain pattern using a wildcard
images:
  pattern: /path/to/my/structure/files/*.cif
  # Optional keyword argument passed to ase.io.read
  index: -1

# You can read all files matching certain patterns using a wildcard
images:
  patterns:
  - /path/to/my/structure/files/*.cif
  - /path/to/my/other/structure/files/*.cfg

# You can even supply a list of atoms objects directly so that the interface
# is universal. This is most useful in the asimmodule code itself.
images:
  images: [Atoms1, Atoms2, ...]

Usage of asim-execute and asim-run

The major difference between asim-execute and asim-run is that, asim-execute takes into account the workdir and the env_id. asim-run will run the asimmodule in the current directory and in the current console. In fact, asim-execute will create the workdir and then run asim-run in the correct environment/batch job. You can always for example, request a slurm allocation, go to the directory where you want the asimmodule to be run and call asim-run from there if you would like more control or to debug. If you want verbose logs for debugging, you can run with the -d or --debug flag.

Output files

A job or asimmodule run through ASIMTools will always produce a standard set of output files in addition to whatever outputs the asimmodule produces. In particular the most important outputs are the output.yaml and the job.log file.

``output.yaml`` contains the status of the job being run in the current directory which can be one of clean, started, complete, failed, discard. The statuses are self-explanatory, the discard status is never written by ASIMTools but a user can edit an output.yaml file and change it’s status to discard to tell ASIMTools to ignore that job in any workflows. This is common for example if you launch multiple jobs and one of them fails irredemably. Deleting the directory for that job is also ok if nothing depends on it downstream. Importantly, any results returned by the function defined in the asimmodule are found in output.yaml. Asimmodule functions should always return a dictionary of only primitive types for this purpose.

An example of an output.yaml file is shown below.

# Successful output for singlepoint asimmodule
end_time: 2023-08-28 21:50:51.368025
energy: 13.77302319846367  #This was added by the scinglepoint asimmodule
files:
  image: image_output.xyz
job_ids: '372919'
start_time: 2023-08-28 21:50:46.188300
status: complete

# Failed output
start_time: 14:29:55, 10/06/23
status: failed

job.log captures the logged output of asim-run or asimmodules that use logging. It is extremely useful for debugging as following the logs starting from the base directory will usually lead you to the correct traceback that caused the failure.
stderr.txt captures errors and backtraces from running asimmodules. This is usually the most informative file for debugging. You can be directed to the correct one by noting errors in job.log files.
stdout.txt captures any stdout from running asimmodules. It is mostly a safety measure for catching anything that prints to stdout and rarely has useful information unless you write an asimmodule that uses print statements. In batch jobs, this output this goes to the slurm job output.
input_image.xyz and input_images.xyz capture the images input into the asimmodule. This makes sure there is a concrete artifact for the structure used by the asimmodule for the purposes of visualization and debugging. They are always in extxyz format as a flexible standard format
slurm* are slurm job files which can be named according to flags specified in env_input.yaml otherwise are named slurm_stdout.id-%a_j%A or slurm_stderr.id-%a_j%A after job and array IDs

Checking job status and Restarting failed jobs

To check the status of jobs, even complicated chains and distributed jobs, we provide the asim-check utility which can be run using:

asim-check /path/to/sim_input.yaml

This will print the job tree, including statuses and work directories of the jobs whose root directory is specified as workdir in sim_input.yaml.

In many cases, there may be mistakes in one of your configuration files leading to a failed workflow. In these cases there are a few ways you could resolve the issue:

Delete the work directory and restart the workflow. This is why it is recommended that the base sim_input.yaml has workdir set to a new directory that only has the results of the workflow.
Modify the ASIMTools generated sim_input.yaml to fix the problem. If there are downstream sim_input.yaml files in a chain, they will have to be deleted or set overwrite=True. Deleting is recommended for safety purposes.
Simply rerun asim-execute. This will rerun the jobs, skipping any jobs with a status of complete or discard. Note that error files are not deleted so you will have to clear those manually. Use this with caution!

Importing functions from asimmodules and the API

Because asimmodules contain what are merely Python functions, you can always import them and use them in any other code for example, you can import asimtools.asimmodules.singlepoint() and use it as below.

from asimtools.asimmodules.singlepoint import singlepoint

results = singlepoint(image={'name': 'Ar'}, calc_id='lj')
print(results)

You can also use the utils and tools e.g. to load a calculator using just a calc_id

from asimtools.calculators import load_calc

calc = load_calc('lj')