asimtools.asimmodules.data package
asimtools.asimmodules.data.collect_images module
Collects images into a single file
author: mkphuthi@github.com
- asimtools.asimmodules.data.collect_images.collect_images(images: Dict, out_format: str = 'extxyz', fnames: Sequence[str] = ['output_images.xyz'], splits: Sequence[float] | None = (1,), shuffle: bool = True, sort_by_energy_per_atom: bool = False, remove_duplicates: bool | None = False, rename_keys: Dict | None = None, energy_per_atom_limits: Sequence[float] | None = None, force_max: float | None = None, stress_limits: Sequence[float] | None = None, properties: tuple | None = ('energy', 'forces', 'stress')) Dict[source]
Collects images into one file/database and can split them into multiple files/databases for ML tasks
- Parameters:
images (Dict) – Images specification, see
asimtools.utils.get_images()out_format (str, optional) – output file format, defaults to ‘extxyz’
fnames (str, optional) – file name without extension, defaults to ‘output_images.xyz’
splits (Optional[Sequence[float]], optional) – Ratios to split data into, defaults to None
shuffle (bool, optional) – shuffle images before splitting, defaults to True
sort_by_energy_per_atom (bool, optional) – sort images before splitting, defaults to False
remove_duplicates – Whether to search for and remove duplicates with
pymatgen.analysis.structure_matcher.StructureMatcher. This is quite slow, defaults to Falserename_keys (Optional[Dict], optional) – keys to rename on writing to the output file, defaults to None
energy_per_atom_limits (Optional[Sequence[float]], optional) – energy limits for filtering images, defaults to None
force_max (Optional[float], optional) – forces maximimum for filtering images, defaults to None
stress_limits (Optional[Sequence[float]], optional) – stress limits for filtering images, defaults to None
properties – which of energy, force, stress to consider
:type Sequence, optional :return: results :rtype: Dict