gnrs.deduplication


gnrs.deduplication.deduplication_task

This module provides the DuplicateRemovalTask class for removing duplicate crystal structures from the pool.

This source code is licensed under the BSD-3-Clause license found in the LICENSE file in the root directory of this source tree.

class gnrs.deduplication.deduplication_task.DuplicateRemovalTask[source]

Bases: TaskABC

Task for removing duplicate crystal structures from the pool.

Initialize the duplicate removal task.

Parameters:
  • comm – MPI communicator

  • config – Config dictionary

  • gnrs_info – Genarris info dictionary

TASK_NAME = 'duplicate_removal'
__init__(comm, config, gnrs_info)[source]

Initialize the duplicate removal task.

Parameters:
  • comm (mpi4py.MPI.Comm) – MPI communicator

  • config (dict) – Config dictionary

  • gnrs_info (dict) – Genarris info dictionary

Return type:

None

initialize()[source]

Initialize the duplicate removal task.

Return type:

None

pack_settings()[source]

Pack settings needed for duplicate removal.

Returns:

Task settings dictionary

Return type:

dict

print_settings(task_set)[source]

Print task settings in a formatted table.

Parameters:

task_set (dict) – Task settings dictionary

Return type:

None

create_folders()[source]

Create output folders.

Return type:

None

perform_task(task_set)[source]

Execute the duplicate removal task.

This method: 1. Groups structures by space group 2. Removes duplicates from each space group in parallel 3. Scatters the deduplicated pool back across ranks

Parameters:

task_set (dict) – Task settings dictionary

Return type:

None

collect_results()[source]

Write surviving structures to disk.

Return type:

None

analyze()[source]

Analyze the results of the task.

Return type:

None

finalize()[source]

Finalize the task and update runtime settings.

Return type:

None

gnrs.deduplication.dedup

Duplicate structure removal using pymatgen StructureMatcher.

Structures are grouped by space group for computational efficiency, then within each space group a reference structure is broadcast to all MPI ranks and compared against the remaining candidates in parallel.

This source code is licensed under the BSD-3-Clause license found in the LICENSE file in the root directory of this source tree.

gnrs.deduplication.dedup.group_by_spg(structs)[source]

Group structures by space group.

Parameters:

structs (dict[str, ase.atoms.Atoms]) – {name: Atoms}.

Returns:

{name: Atoms, …}}.

Return type:

{spg

gnrs.deduplication.dedup.dedup_group(pool, matcher, spg, energy_key)[source]

Remove duplicates from a space group in parallel.

  1. Master picks one candidate from the pool and broadcasts its

    pymatgen Structure to all ranks.

  2. The remaining structures are scattered across ranks; each rank

    tests matcher.fit(candidate, local_struct) in parallel.

  3. Match results are gathered. Master collects the duplicate

    cluster, selects the best structure, and removes duplicates from the pool until the pool is empty.

Parameters:
  • pool (dict[str, ase.atoms.Atoms]) – {name: Atoms} — all structures in this space group (only meaningful on master; ignored on workers).

  • matcher (pymatgen.analysis.structure_matcher.StructureMatcher) – Configured StructureMatcher instance.

  • spg (int | None) – Space group.

  • energy_key (str | None) – Key in Atoms.info for energy, or None.

Returns:

Atoms} — unique structures in the space group.

Return type:

{name