asimtools.asimmodules.data package

asimtools.asimmodules.data.collect_images module

Collects images into a single file

author: mkphuthi@github.com

asimtools.asimmodules.data.collect_images.collect_images(images: Dict, out_format: str = 'extxyz', fnames: Sequence[str] = ['output_images.xyz'], splits: Sequence[float] | None = (1,), shuffle: bool = True, sort_by_energy_per_atom: bool = False, remove_duplicates: bool | None = False, rename_keys: Dict | None = None, energy_per_atom_limits: Sequence[float] | None = None, force_max: float | None = None, stress_limits: Sequence[float] | None = None, properties: tuple | None = ('energy', 'forces', 'stress')) Dict[source]

Collects images into one file/database and can split them into multiple files/databases for ML tasks

Parameters:
  • images (Dict) – Images specification, see asimtools.utils.get_images()

  • out_format (str, optional) – output file format, defaults to ‘extxyz’

  • fnames (str, optional) – file name without extension, defaults to ‘output_images.xyz’

  • splits (Optional[Sequence[float]], optional) – Ratios to split data into, defaults to None

  • shuffle (bool, optional) – shuffle images before splitting, defaults to True

  • sort_by_energy_per_atom (bool, optional) – sort images before splitting, defaults to False

  • remove_duplicates – Whether to search for and remove duplicates with pymatgen.analysis.structure_matcher.StructureMatcher. This is quite slow, defaults to False

  • rename_keys (Optional[Dict], optional) – keys to rename on writing to the output file, defaults to None

  • energy_per_atom_limits (Optional[Sequence[float]], optional) – energy limits for filtering images, defaults to None

  • force_max (Optional[float], optional) – forces maximimum for filtering images, defaults to None

  • stress_limits (Optional[Sequence[float]], optional) – stress limits for filtering images, defaults to None

  • properties – which of energy, force, stress to consider

:type Sequence, optional :return: results :rtype: Dict

Module contents