mispr.gaussian.utilities package¶
Submodules¶
mispr.gaussian.utilities.db_utilities module¶
Define db utility functions.
-
mispr.gaussian.utilities.db_utilities.get_db(input_db=
None
)[source]¶ Helper function to create a GaussianCalcDb instance from a file or a dict.
mispr.gaussian.utilities.dbdoc module¶
Define functions for cleaning up JSON documents.
- mispr.gaussian.utilities.dbdoc.add_solvent_to_prop_dict(prop_dict, solvent_gaussian_inputs, solvent_properties)[source]¶
Add solvent properties to a property dictionary (e.g. BDE, BE, etc.).
- Parameters:¶
- prop_dict : dict¶
Property dictionary.
- solvent_gaussian_inputs : str¶
Gaussian input parameters corresponding to the implicit solvent model used in the Gaussian calculations, e.g. “(Solvent=TetraHydroFuran)”.
- solvent_properties : dict¶
Additional solvent input parameters used in the Gaussian calculations; e.g., {“EPS”:12}.
- Returns:¶
Property dictionary with solvent properties added.
- Return type:¶
dict
mispr.gaussian.utilities.files module¶
Define utility functions for handling files and paths.
- mispr.gaussian.utilities.files.bibtex_parser(bib_file, working_dir)[source]¶
Parse a bibtex file and returns a dictionary of the entries.
- mispr.gaussian.utilities.files.recursive_relative_to_absolute_path(operand, working_dir)[source]¶
Convert recursively relative paths to absolute paths.
mispr.gaussian.utilities.fw_utilities module¶
Define utility functions for modifying workflow settings. Based on atomate powerups.
-
mispr.gaussian.utilities.fw_utilities.add_common_mods(workflow, fw_mods=
None
)[source]¶ Wrapper function to add common modifications to a workflow.
- Parameters:¶
- workflow : Workflow¶
The workflow to modify.
- fw_mods : dict, optional¶
A dictionary of modifications to be applied to the workflow; supported ones are
CONTROL_WORKER
,MODIFY_QUEUE_PARAMETERS
,REPLACE_RUNTASK
, andRUN_FAKE_GAUSSIAN
(see the docstring of each function for more details); values of the dictionary are the inputs to the corresponding function.
- Returns:¶
The modified workflow.
- Return type:¶
Workflow
-
mispr.gaussian.utilities.fw_utilities.control_worker(workflow, firework_substring=
None
, task_substring=None
, fworker=None
, category=None
)[source]¶ Modify the Firework’s fworker name and category in a workflow. Can be used when running workflows on multiple workers at the same time to specify which worker/machine to use.
- Parameters:¶
- workflow : Workflow¶
The workflow to control.
- firework_substring : str, optional¶
A substring to search for in the Firework names to exclude certain fireworks.
- task_substring : str, optional¶
A substring to search for in the Firetask names to exclude certain Firetasks.
- fworker : str, optional¶
The name of the fworker to use for the Firework; should be consistent with the one specified in the FireWorker (my_fworker.yaml file).
- category : str, optional¶
The category to be assigned for the Firework; should be consistent with the one specified in the FireWorker (my_fworker.yaml file).
- Returns:¶
The modified workflow with the specified fworker and/or category.
- Return type:¶
Workflow
-
mispr.gaussian.utilities.fw_utilities.get_list_fireworks_and_tasks(workflow, firework_substring=
None
, task_substring=None
)[source]¶ Return a list of (firework_index, task_index) tuples for all fireworks and tasks in a workflow.
-
mispr.gaussian.utilities.fw_utilities.modify_queue_parameters(workflow, ntasks_per_node=
None
, walltime=None
, queue=None
, pre_rocket=None
, other_parameters=None
, firework_substring=None
, task_substring=None
)[source]¶ Modify the default Firework’s queue parameters in a workflow. Default ones are specified in the my_qadapter.yaml file. Helpful when different workflows requires different computational resources (e.g. number of CPUs, memory, etc.).
- Parameters:¶
- workflow : Workflow¶
The workflow to modify.
- ntasks_per_node : int, optional¶
The number of tasks to run on each node.
- walltime : str, optional¶
The walltime for the job.
- queue : str, optional¶
The queue/partition to run the job on.
- pre_rocket : str, optional¶
The pre-rocket command to run before the job.
- other_parameters : dict, optional¶
Other parameters to be added to the queueadapter.
- firework_substring : str, optional¶
A substring to search for in the Firework names to exclude certain fireworks.
- task_substring : str, optional¶
A substring to search for in the Firetask names to exclude certain Firetasks.
- Returns:¶
The modified workflow with the specified queue parameters.
- Return type:¶
Workflow
-
mispr.gaussian.utilities.fw_utilities.replace_runtask(workflow, firework_substring=
None
, operation='remove_custodian'
, additional_params=None
)[source]¶ Replace all tasks with
RunGaussian
(e.g. RunGaussianDirect) with RunGaussianCustodian or vice versa.- Parameters:¶
- workflow : Workflow¶
The workflow to modify.
- firework_substring : str, optional¶
A substring to search for in the Firework names to exclude certain fireworks.
- operation : str, optional¶
The operation to perform on the Firetask; supported ones are
remove_custodian
anduse_custodian
.- additional_params : dict, optional¶
Additional parameters to be added to the new Firetask that are not included in the original Firetask; refer to the corresponding Firetask documentation for supported parameters.
- Returns:¶
The workflow with the replaced run Firetasks.
- Return type:¶
Workflow
-
mispr.gaussian.utilities.fw_utilities.run_fake_gaussian(workflow, ref_dirs, input_files=
None
, tolerance=None
)[source]¶ Replace all tasks with
RunGaussian
(i.e. RunGaussianDirect, RunGaussianCustodian) with RunGaussianFake that runs a fake Gaussian job. We do not actually run Gaussian but copy existing inputs and outputs. Useful for testing purposes.- Parameters:¶
- workflow : Workflow¶
The workflow to modify.
- ref_dirs : list¶
A list of directories containing the reference calculations for the fake Gaussian job (e.g. [‘home/opt’, ‘home/freq’]).
- input_files : list, optional¶
A list of input files for the fake Gaussian job; order should match that in ref_dirs; e.g. [“opt.com”, “freq.com”].
- tolerance : float, optional¶
The tolerance for the comparison of the provided input file with the existing one.
- Returns:¶
The workflow with the replaced run Firetasks.
- Return type:¶
Workflow
mispr.gaussian.utilities.gout module¶
Define functions for processing different gaussian output formats.
-
mispr.gaussian.utilities.gout.process_run(operation_type, run, input_file=
None
, **kwargs)[source]¶ Process a Gaussian run and returns a dictionary of the results. Used for creating db documents and/or json files.
- Parameters:¶
- operation_type : str¶
Type of operation to be performed; supported ones are:
get_from_gout
: Get data from a GaussianOutput object as defined inpymatgen.io.gaussian
.get_from_gout_file
: Get data from a Gaussian output file.get_from_run_dict
: Get data from a Gaussian output dictionary.get_from_run_id
: Retrieve data from dtabase using a run id, e.g. “5e3737d9da0b1cbbd5d556f7”.get_from_run_query
: Retrieve data from dtabase using query criteria, e.g.{"smiles": "COCCOC", "type": "freq", "functional": "B3LYP", "basis": "6-31+G*", "phase": "gas", ...}
- run : GaussianOutput, str, dict¶
The actual Gaussian run; type depends on the
operation_type
.- input_file : str, optional¶
The input file for the run; used for adding Gaussian input parameters to the final Gaussian dictionary; if not specified, will get these parameters from the run itself, but in this case,
input_parameters
usually specified at the end of the Gaussian input file will not be saved since they are not easily retrieved from the Gaussian output file.- kwargs : keyword arguments¶
Additional keyword arguments for the operation: namely,
working_dir
anddb
.
- Returns:¶
Cleaned up Gaussian output dictionary.
- Return type:¶
dict
mispr.gaussian.utilities.inputs module¶
Define functions for handling gaussian inputs.
-
mispr.gaussian.utilities.inputs.handle_gaussian_inputs(gaussian_inputs, solvent_gaussian_inputs=
None
, solvent_properties=None
)[source]¶ Wrapper function to cleanup/modify the Gaussian input parameters for one or more job in a workflow. Checks for implicit solvent parameters and adds missing keywords for a given job.
- Parameters:¶
- gaussian_inputs : dict¶
Dictionary of dictionaries of Gaussian inputs, e.g.
{"opt": {opt_gaussian_inputs}, "freq": {freq_gaussian_inputs}}
- solvent_gaussian_inputs : str, optional¶
String of Gaussian inputs for the solvent, e.g.
"(Solvent=Generic, Read)"
- solvent_properties : dict, optional¶
Dictionary of solvent properties, e.g.
{"Eps": 4.33, "EpsInf": 1.69}
- Returns:¶
Dictionary of dictionaries of reformatted Gaussian inputs.
- Return type:¶
dict
mispr.gaussian.utilities.metadata module¶
Define functions for creating db schema.
- mispr.gaussian.utilities.metadata.get_chem_schema(mol)[source]¶
Return a dictionary of chemical schema for a given molecule to use in building db documents or json file.
- mispr.gaussian.utilities.metadata.get_job_name(mol, name)[source]¶
Append a molecule label to the name of a workflow for easy monitoring and identification.
- mispr.gaussian.utilities.metadata.get_mol_formula(mol)[source]¶
Get the alphabetical molecular formula for a molecule.
mispr.gaussian.utilities.misc module¶
Define miscellaneous functions useful in many of the mispr levels.
- mispr.gaussian.utilities.misc.pass_gout_dict(fw_spec, key)[source]¶
Helper function used in the Gaussian Fireworks to pass Gaussian output dictionaries from one task to the other, while checking that the criteria for starting the following task are met (e.g. normal termination of the previous job, lack of imaginary frequencies, etc.).
-
mispr.gaussian.utilities.misc.recursive_compare_dicts(dict1, dict2, dict1_name, dict2_name, path=
''
)[source]¶ Compare recursively two dictionaries and returns the differences.
- Parameters:¶
- dict1 : dict¶
First dictionary to compare.
- dict2 : dict¶
Second dictionary to compare.
- dict1_name : str¶
Name of the first dictionary (for messages on the differences).
- dict2_name : str¶
Name of the second dictionary (for messages on the differences).
- path : str, optional¶
Used internally to keep track of the keys in nested dicts, meant to be “” for the top level
- Returns:¶
Differences between the two dictionaries (if any).
- Return type:¶
str
- mispr.gaussian.utilities.misc.recursive_signature_remove(d)[source]¶
Remove Recursively the signature “@” from a dictionary (e.g. those in the name of a module). Used when processing Gaussian runs before saving them to the db.
mispr.gaussian.utilities.mol module¶
Define functions for processing molecules.
- mispr.gaussian.utilities.mol.get_bond_order_str(mol)[source]¶
Find bond order as a string (“U”: unspecified, “S”, “D”: double, “T”: triple, “A”: aromatic) by iterating over bonds of a molecule. First convert pymatgen mol to openbabel mol to use openbabel in finding bond order.
- mispr.gaussian.utilities.mol.label_atoms(mol)[source]¶
Get the SMILES representation of a molecule and label the atoms that appear in the SMILES string with the atom indexes as they appear in the molecule.
Helpful to know the atom indexes in the molecule without having to visualize it.
-
mispr.gaussian.utilities.mol.perform_local_opt(mol, force_field=
'uff'
, steps=200
)[source]¶ Perform a local optimization on the molecule using OpenBabel.
-
mispr.gaussian.utilities.mol.process_mol(operation_type, mol, local_opt=
False
, **kwargs)[source]¶ Process a molecule. Used for handling different molecule formats provided to Gaussian workflows.
- Parameters:¶
- operation_type : str¶
Operation to perform for the molecule to process the input structure format. Supported commands:
get_from_mol
: If the input is a pymatgen Molecule object.get_from_file
: If the input is any file format supported by Openabel and pymatgen.get_from_gout_file
: If the input is a Gaussian output file.get_from_str
: If the input is a string.get_from_mol_db
: If the input is an InChI representation of the molecule to be used to query the database.get_from_gout
: If the input is a pymatgen GaussianOutput object.get_from_run_dict
: If the input is a GaussianOutput dictionary.get_from_run_id
: If the input is a MongoDB document ID to be used to query the database.get_from_run_query
: If the input is a dictionary with criteria to search the database: e.g.{'inchi': inchi, 'type': type, 'functional': func, ...}
get_from_pubchem
: If the input is a common name for the molecule to be used in searching the PubChem database.derive_molecule
: Used for deriving a molecule by attaching a functional group at a site and the corresponding mol input should be a dictionary, e.g.{'operation_type': <mol_operation_type for the base structure>, 'mol': <base_mol>, 'func_grp': func_group_name, ...}
link_molecules
: Used for linking two structures by forming a bond at specific sites and the corresponding mol input should be a dictionary, e.g.{'operation_type': ['get_from_file', 'get_from_mol_db'], 'mol': ['mol1.xyz', 'mol_inchi'], 'index': [3, 5], 'bond_order': 1}
- mol : Molecule, str, GaussianOutput, dict¶
Sources of structure, e.g. file path if mol_operation_type is specified as
get_from_file
, InChI string if mol_operation_type is specified asget_from_mol_db
, etc.- local_opt : bool, optional¶
Whether to perform local optimization on the input structure using OpenBabel; defaults to False.
- **kwargs¶
Keyword arguments:
working_dir
.db
.str_type
(format of string if operation_type =get_from_str
, e.g.smi
or any other format supported by OpenBabel).force_field
(force field to use for local optimization if local_opt is True):gaff
,ghemical
,mmff94
,mmff94s
, anduff
.steps
(number of steps for local optimization if local_opt is True).charge
.abbreviation
(abbreviation to be used for the molecule when downloading it from the PubChem database; defaults to mol).
- Returns:¶
pymatgen Molecule object.
- Return type:¶
Molecule
mispr.gaussian.utilities.rdkit module¶
Define functions for processing rdkit molecules.
-
mispr.gaussian.utilities.rdkit.calc_energy(rdkit_mol, maxIters=
200
)[source]¶ Perform local optimization on rdkit Mol object and calculates its energy using UFF.
-
mispr.gaussian.utilities.rdkit.draw_rdkit_mol(rdkit_mol, filename=
'mol.png'
, working_dir=None
)[source]¶ Draw the 2D structure of a molecule and saves it to a file.
-
mispr.gaussian.utilities.rdkit.draw_rdkit_mol_with_highlighted_bonds(rdkit_mol, bonds, filename=
'mol.png'
, colors=None
, working_dir=None
)[source]¶ Draw the 2D structure of a molecule and highlights the bonds specified by the user.
- Parameters:¶
- rdkit_mol : Mol¶
RDKit Mol object.
- bonds : list¶
List of tuples of indexes of atoms forming a bond to highlight; e.g. [(3, 11), (5, 13)] to highlight the bonds between sites 3 and 11 and sites 5 and 13.
- filename : str, optional¶
Name of the file to save the image to; defaults to “mol.png”.
- colors : list, optional¶
List of colors to use for highlighting the bonds; colors should be provided in rgb format, e.g. (0.0, 0.0, 0.0) for black; if not provided or number of colors provided is less than number of bonds to highlight, will randomly generate colors.
- working_dir : str, optional¶
Directory to save the image to; defaults to current working directory.
-
mispr.gaussian.utilities.rdkit.get_rdkit_mol(mol, sanitize=
True
, remove_h=False
)[source]¶ Convert a pymatgen mol object to RDKit rdmol object. Uses RDKit to perform the conversion <http://rdkit.org>. Accounts for aromaticity.