Tutorials¶
qmpy is a package containing many tools for computational materials science.
The qmpy package comes bundled with two executable scripts, qmpy and oqmd. qmpy is a simple bash script that starts an interactive python environment and imports qmpy:
$ qmpy
>>>
To write your own python script that utilizes qmpy functionality, simply start with an import like:
from qmpy import *
and all of the commands shown below should work.
Database entries¶
Once the database is installed, you can query it very flexibly and easily. In this section we will explore the data structure of entries in the OQMD and provide several examples of how to make queries. For deeper understanding of how django models work, you should check out the (excellent) django documentation.
First, lets look at how to access an entry from the database. As an example,
lets pull up an entry for an Element
:
>>> fe = Element.objects.get(symbol='Fe')
>>> fe
<Element: Fe>
Django models have a number of fields that can be accessed directly once the database entry has been loaded. For example, with an element you can:
>>> fe.symbol
u'Fe'
>>> fe.z
26L
For a complete list of the model attributes that are stored in the database,
refer to the documentation for the model you are interested in, in this case
Element
.
Note
When strings are returned, they are returned as unicode strings, (indicated by the “u” preceding the string) integers are returned as long integers (indicated by the trailing “L”). For most purposes this makes no difference, as these data types will generally behave exactly as expected, i.e.:
>>> fe.z == 26
True
>>> fe.symbol == "Fe"
True
In addition to data attributes like these, django models have relationships to
other models. In qmpy there are two flavors of relationships: one-to-many and
many-to-many. An example of a one-to-many relationship would be the
relationship between an Element
and an Atom
.
There are many atoms which are a given element, but each atom is only
one element. In the case of Fe:
>>> fe.atom_set
<django.db.models.fields.related.RelatedManager object at 0x7f0997fa2690>
>>> fe.atom_set.count()
127585
>>> atom = fe.atom_set.first()
>>> print atom
<Atom: Fe @ 0.000000 0.000000 0.000000>
>>> atom.element
<Element: Fe>
A RelatedManager
is an object that deals with obtaining other django models
that are related to the main object. We can use the objects.count() method to
fidn the number of Atom
objects that are Fe, and find ~125,000.
To obtain one of these atoms, we use the objects.first() method, which
simply returns the first Atom
which is Fe. Much more functionality of Managers
and RelatedManagers will be shown throughout this tutorial and in the examples,
but for a proper understanding you should refer to the django docs.
An example of a many-to-many relationship would be the relationship between an
Element
and a Composition
. A composition (e.g. Fe2O3)
can contain many elements (Fe and O), and an element can be a part of many
compositions (Fe3O4 and FeO as well). This is the nature of a many-to-many
relationship. In the case of Fe:
>>> fe.composition_set.count()
10882
>>> comp = fe.composition_set.filter(ntypes=2)[0]
>>> comp
<Composition: AcFe>
>>> comp.element_set.all()
[<Element: Ac>, <Element: Fe>]
In this example we have taken our base object (the Element
) and filtered its
composition_set for Composition
objects which meet the condition ntypes=2 (i.e.
there are two elements in the composition), and taken the first such
Composition
(index 0 in the QuerySet
that is returned).
Creating a structure¶
There are several ways to create a structure, but we will start with reading in a POSCAR:
>>> s = io.read(INSTALL_PATH+'/io/files/POSCAR_BCC')
Once you have the Structure
object, the important features of a crystal
structure can be accessed readily.:
>>> s.cell
array([[ 3., 0., 0.],
[ 0., 3., 0.],
[ 0., 0., 3.]])
>>> s.lat_params
[3.0, 3.0, 3.0, 90.0, 90.0, 90.0]
>>> s.atoms
[<Atom: Cu @ 0.000000 0.000000 0.000000>, <Atom: Cu @ 0.500000 0.500000
0.500000>]
>>> s.composition
<Composition: Cu>
>>> s.volume
27.0
You can also readily construct a Structure
from scratch, from the lattice
vectors and the atom positions.:
>>> s2 = Structure.create([3,3,3], [('Cu', [0,0,0]),
('Cu', [0.5,0.5,0.5])])
>>> s2 == s
True
First Principles Calculations¶
At this time qmpy only supports automation of calculations using the Vienna Ab
Initio Simulation Package (VASP). The reading and creation of these
calculations are handled by the Calculation
model.
To read in an existing calculation:
>>> path = '/analysis/vasp/files/normal/fine_relax/'
>>> calc = Calculation.read(INSTALL_PATH+path)
qmpy will search the directory for an OUTCAR or OUTCAR.gz file. If it is able to find an OUTCAR, it will attempt to read the file. Next, we will demonstrate several of the key attributes you may wish to access:
>>> calc.energy # the final total energy
-12.416926999999999
>>> calc.energies # the total energies of each step
array([-12.415236 -12.416596, -12.416927])
>>> calc.volume # the output volume
77.787375068172508
>>> calc.input
<Structure: SrGe2>
>>> calc.output
<Structure: SrGe2>
>>> from pprint import pprint
>>> pprint(calc.settings)
{'algo': 'fast',
'ediff': 0.0001,
'encut': 373.0,
'epsilon': 1.0,
'ibrion': 1,
'idipol': 0,
'isif': 3,
'ismear': 1,
'ispin': 1,
'istart': 0,
'lcharg': True,
'ldipol': False,
'lorbit': 0,
'lreal': False,
'lvtot': False,
'lwave': False,
'nbands': 24,
'nelm': 60,
'nelmin': 5,
'nsw': 40,
'potentials': [{'name': 'Ge_d', 'paw': True, 'us': False, 'xc': 'PBE'},
{'name': 'Sr_sv', 'paw': True, 'us': False, 'xc': 'PBE'}],
'potim': 0.5,
'prec': 'med',
'pstress': 0.0,
'sigma': 0.2}
Searching for models¶
The documentation for Django for searching for models is ver complete, and should be taken as the ultimate reference for searching for models in qmpy, but a basic overview is provided here. Specific methods used in this section are the ‘filter’, ‘exclude’ and ‘get’ methods.
Searches using filter will, as the name suggests, filter for database entries that have the specified properties. Similarly, exclude will return entries that do NOT have the specified properties. Both of these methods will return a QuerySet containing any objects that meet the requirements of your search. Filter and exclude calls can be chained together to create relatively complex queries. Get is slightly different in that it returns ONLY ONE object, and returns the object itself, rather than a QuerySet of such objects.
A much more complete documentation of all things query related can be found at the django docs
Searching for entries based on stability¶
Formation energies are stored as FormationEnergy instances, which are associated with an :mod:~qmpy.Entry and a :mod:~qmpy.Calculation. Knowing this, we can search for stable Entries using:
>>> stable = Entry.objects.filter(formationenergy__stability__lt=0)
>>> stable.count()
18150
The same concept can be applied to searching for other quantities, as long as you can relate them to a FormationEnergy by “__” constructions:
>>> stable_comps = Composition.objects.filter(formationenergy__stability__lt=0)
>>> stable_comps.count()
18150
>>> s = Structure.objects.filter(calculated__formationenergy__stability__lt=0)
>>> s.count()
18150
Adding other search criteria lets you explore a little more:
>>> stable = FormationEnergy.objects.filter(stability__lt=0)
>>> # Find the number of stable compounds containing O
>>> stable.filter(composition__element_set='O').count()
4017
>>> # or Fe. Is it surprising that this is smaller?
>>> stable.filter(composition__element_set='Fe').count()
653
>>> # Meta data is also a possiblity. How many stable compounds were found
>>> # in the course of calculations for a particular project?
>>> stable.filter(entry__project_set='prototypes').count()
3119
Searching for entries based on composition¶
You can find compositions in a few ways using filters and excludes. If you want a specific region of phase space (including related subspaces):
>>> elts = [ 'Fe', 'Li', 'O' ]
>>> others = Element.objects.exclude(symbol__in=elts)
>>> comps = Composition.objects.exclude(element_set=others)
This searchs finds every composition that doesn’t have any elements that aren’t in the region of phase space requested. For binary or ternary phase spaces it can be more efficient to search permutations of sub-spaces:
>>> comps = Composition.objects.filter(ntypes=3)
>>> for e in elts:
>>> comps = comps.filter(element_set=e)
>>> for e in elts:
>>> e_comps = Composition.objects.filter(element_set=e, ntypes=1)
>>> comps |= e_comps
>>> for e1, e2 in itertools.combinations(elts, r=2):
>>> bin_comps = Composition.objects.filter(element_set=e1)
>>> bin_comps = bin_comps.filter(element_set=e2, ntypes=2)
>>> comps |= bin_comps
>>> comps.distinct().count()
However, for larger regions of phase space (4 or 5 or more) the number of subqueries of the second approach rapidly becomes more expensive than the single, more complicated query of the first.
Advanced searching¶
The previous section covered some pretty basic methods for searching for database entries. In this section we will look at some more advanced concepts, specifically: aggregation, Q() and F().
Aggregation and Annotation¶
Django also provides methods for adding searchable and retrievable fields to
database entries within the SQL command. For example, suppose we want to search
for Entry
objects that have more than 5 structures associated with
them. We can accomplish this using aggregation to add a temporary field to
Entries that is the number of structures. This temporary field can then be
filtered on and returned just like a normal field:
>>> from django.db.models import Count, Average
>>> entries = Entry.objects.annotate(n_structures=Count('structure'))
>>> many_structs = entries.filter(n_structures__gt=5)
>>> data = many_structs.values('id', 'path', 'n_structures')
>>> print data[0]
{'id': 1562L,
'n_structures': 6,
'path': u'/home/oqmd/libraries/prototypes/elements/C19/Ne'}
We can also do a variety of aggregation methods, also in SQL. Say we want to know the average number of elements (averaged over all compositions in the OQMD, or over ICSD structures):
>>> Composition.objects.aggregate(Avg('ntypes'))
{'ntypes__avg': 2.9965}
>>> Composition.objects.filter(entry__meta_data__value='icsd').\
aggregate(Avg('ntypes'))
{'ntypes__avg': 2.9811}
Complex searches¶
When using sequential filter and exclude commands, these commands are related by AND operators. In order to execute more complicated queries which use a series of AND and OR operators, we must use the Q() method.:
>>> from django.db.models import Q
>>> query = Q(meta_data__value='prototype') | Q(meta_data__value='icsd')
>>> Entry.objects.filter(query)
The Q() query method also supports negation, by calling ~Q():
>>> Composition.objects.filter(Q(ntypes=2) &
~Q(element_set='O'))
Using qmpy to manage a high-throughput calculation project¶
qmpy
can manage almost every aspect of a high-throughput materials screening
project. In this section we will walk through all of the necessary steps to
get a new project off the ground in a new installation. This tutorial assumes
that you have a functional installation of qmpy
, as well as a working
database (does not need to have a copy of the OQMD included, a blank slate
database will work fine).
In this tutorial we will undertake to explore a wide range of (CH3NH3)PbI3 perovskites, which have recently recieved much attention as high-efficiency photovoltaic cells. We will run through the process of setting up compute resources, constructing simple lattice decorations of this structure, as well as some not-so-simple decorations, running the calculations, and finally evaluating the relative stability of the resulting energies.
Setting up computational resources¶
We will begin with configuring qmpy to find the right computational resources
to be able to perform calculations using qmpy. First, you need to establish the
computational resources that are available to it: a Account
, which is a
Host
paired with a User
. A Account
is then granted
access to at least one Allocation
.
Warning
In the current implementation, qmpy is only able to run calculations on clusters that utilize PBS/torque and Maui.
Lets start by configuring a host. Lets assume that we are running qmpy from one cluster, but we want to do our calculations on another machine. Lets edit the resources/hosts.yml file (inside the qmpy installation):
# hosts.yml
bigcluster:
binaries: {vasp_53: /usr/local/bin/vasp_53}
check_queue: /usr/local/maui/bin/showq
hostname: big.cluster.edu
ip_address: XXX.XX.XX.XXX
nodes: 100
ppn: 12
sub_script: /usr/local/bin/qsub
sub_text: bigcluster.q
walltime: 691200
In this configuration file, we are creating a list of dictionaries. Each outer
loop entry creates a Host
.
Important attributes to be aware of:
binaries is a dictionary of binary name -> binary path pairs. By default, qmpy calculations try to run vasp_53, and expects a path to a vasp 5.3.3 binary, but should be reliable for most vasp 5.X versions.
sub_script is the path on the cluster to pbs qsub command
check_queue is the path on the cluster to maui’s showq command
sub_text is the name of a file in qmpy/configuration/qfiles. An example qfile template is shown below.
ppn = # of processors / node
nodes = # of nodes you want qmpy to be able to use (does not need to match the number of nodes on the cluster)
Note
Note that these files must be parseable YAML format files. See this guide <http://www.yaml.org/start.html> _ for an introduction to the YAML format.
The queue files that qmpy submits must be tailored both to the job being submitted and the cluster being submitted to. To that end, qmpy uses a simple template system, with the most basic template (that should work in many cases) is:
#!/bin/bash
#MSUB -l nodes={nodes}:ppn={ppn}
#MSUB -l walltime={walltime}
#MSUB -N {name}
#MSUB -o jobout.txt
#MSUB -e joberr.txt
#MSUB -A {key}
cd $PBS_O_WORKDIR
NPROCS=`wc -l < $PBS_NODEFILE`
#running on {host}
{header}
{mpi} {binary} {pipes}
{footer}
The “{variable}” construction is used to automatically replace strings based on
the calculation requirements. Some variables (nodes, ppn, name, walltime, host)
are fairly constant for the host. Others, like “key”, specify which allocation
to charge the hours to, which is defined by the Allocation
associated with the calculation. Finally, the rest of the variables are set
based on the requirements of the calculation. For general calculations, the
header variable is used to unzip any zipped files in the folder (e.g., CHGCAR),
the for parallel calculations the mpi variable contains the mpirun + argumnts
command and for serial calculations it is left blank. The binary variable will
be replaced with the path to the binary, as defined in the hosts.yml file. The
pipes variable will pipe stdout and stderr, which by default is always to
stdout.txt and stderr.txt. Finally, footer zips the CHGCAR, OUTCAR, PROCAR and
ELFCAR, if they exist.
and resources/hosts.yml:
# users.yml
oqmdrunner:
bigcluster: {run_path: /home/oqmdrunner, username: oqmdrunner}
oqmduser:
bigcluster: {run_path: /home/oqmduser/rundir, username: oqmduser}
smallcluster: {run_path: /home/oqmduser/rundir, username: oqmduser}
loop entry creates a User
with that name, and each cluster listed
then creates an Account
for that User
, with username
(given by username) and configured to run calculations in run_path. Here we are
assuming that a second non-compute cluster, smallcluster, was also defined in
hosts.yml.
Warning
Passwordless ssh must be configured to each account (either as a user:host
pair, or host-based authentication) from the account you are running qmpy
on. The Account
class has a create_passwordless_ssh method
that can set this up for you, however, this process can be unreliable, so if
it fails you will need to sort those problems out for yourself.
Next, we configure our allocations, using the allocations.yml file:
# allocations.yml
bigcluster:
host: bigcluster
users: [oqmdrunner, oqmduser]
key: alloc1234
An allocation takes a host, a list of users and an optional key. The host and list of users are used to determine who is allowed to run calculations on the allocation, while the key is used to identify the allocation to moab, if that feature is implemented.
Finally, we can create a Project
, defined in projects.yml:
# projects.yml
example:
allocations: [bigcluster]
priority: 0
users: [oqmdrunner]
We title the project “example”, (since it is just an example) and then define the lists of allocations that this project is authorized to use, and the users that are associated with the project. In order to apply these changes, run:
>>> from qmpy import *
>>> sync_resources()
Working with Structures¶
If, for example, we say that we want to calculate i) a range of defects in a host matrix or ii) a wide range of compositions in a particular structure. It is possible to do this in the framework of qmpy, with minimal effort. Sadly, our starting point, the CH3NH3PbI3 (since CH3NH3 is methylammonium, let us call the compound MaPbI3 from now on) is not fully resolved in XRD. The best structure I can find only has the Pb, C, N and I sites determined. So, lets take this structure (reproduced here from Constantinos C. Stoumpos; et. al., Inorganic Chemistry 52 (2013) 9019-9038):
I C Pb N
1.0
8.849000 0.000000 0.000000
0.000000 8.849000 0.000000
0.000000 0.000000 12.642000
C I N Pb
4 12 4 4
direct
0.5000000000 0.0000000000 0.3520000000
0.0000000000 0.5000000000 0.3520000000
0.5000000000 0.0000000000 0.8520000000
0.0000000000 0.5000000000 0.8520000000
0.2858300000 0.2141700000 0.0046000000
0.7858300000 0.2858300000 0.0046000000
0.2141700000 0.7141700000 0.0046000000
0.7141700000 0.7858300000 0.0046000000
0.0000000000 0.0000000000 0.2472000000
0.5000000000 0.5000000000 0.2472000000
0.7141700000 0.2141700000 0.5046000000
0.2141700000 0.2858300000 0.5046000000
0.7858300000 0.7141700000 0.5046000000
0.2858300000 0.7858300000 0.5046000000
0.0000000000 0.0000000000 0.7472000000
0.5000000000 0.5000000000 0.7472000000
0.5000000000 0.0000000000 0.2420000000
0.0000000000 0.5000000000 0.2420000000
0.5000000000 0.0000000000 0.7420000000
0.0000000000 0.5000000000 0.7420000000
0.0000000000 0.0000000000 0.0000000000
0.5000000000 0.5000000000 0.0000000000
0.0000000000 0.0000000000 0.5000000000
0.5000000000 0.5000000000 0.5000000000
Now, to populate this structure with hydrogen atoms, lets write a small script to add atoms to the C and N sites. We can see from the POSCAR that the C-N pairs are always oriented along the z-direction, so we will add 3 hydrogen atoms “above” each C and “below” each N.:
>>> n_h_bond = 1.01 #A
>>> c_h_bond = 1.09 #A
>>>
>>> s = io.read('POSCAR') # just reading in original structure
>>> for atom in s:
if atom.element.symbol == 'C':
x = np.sin(np.pi/4)*c_h_bond
x2 = np.sin(np.pi/4)*x
ref = atom.cart_coord
s.add_atom(Atom.create('H', s.get_coord(ref+[x, 0.0, x2])))
s.add_atom(Atom.create('H', s.get_coord(ref+[-x2, -x2, x2])))
s.add_atom(Atom.create('H', s.get_coord(ref+[-x2, x2, x2])))
elif atom.element.symbol == 'N':
x = np.sin(np.pi/4)*c_h_bond
x2 = np.sin(np.pi/4)*x
# Note the angles of the N's attached hydrogens are rotated
# relative to the C's attached hydrogens.
s.add_atom(Atom.create('H', s.get_coord(ref+[-x, 0.0, -x2])))
s.add_atom(Atom.create('H', s.get_coord(ref+[x2, x2, -x2])))
s.add_atom(Atom.create('H', s.get_coord(ref+[x2, -x2, -x2])))
>>> io.poscar.write(s, 'POSCAR_mod')
You can verify that the created structure has the right bond lengths:
>>> s = io.read('POSCAR_mod')
>>> for a1, a2 in itertools.combinations(s.atoms, r=2):
d = s.get_distance(a1, a2)
elts = set([ a1.element.symbol, a2.element.symbol ])
if elts in [ set(['H', 'N']), set(['H', 'C']) ]:
if d < 1.5:
print elts, d
set([u'H', u'C']) 1.08999999999
set([u'H', u'C']) 1.09000000017
set([u'H', u'C']) 1.09000000017
set([u'H', u'C']) 1.09000000017
set([u'H', u'C']) 1.08999999999
set([u'H', u'C']) 1.09000000017
set([u'H', u'C']) 1.08999999999
set([u'H', u'C']) 1.09000000017
set([u'H', u'C']) 1.09000000017
set([u'H', u'C']) 1.09000000017
set([u'H', u'C']) 1.08999999999
set([u'H', u'C']) 1.09000000017
set([u'H', u'N']) 1.00999999987
set([u'H', u'N']) 1.00999999961
set([u'H', u'N']) 1.00999999961
set([u'H', u'N']) 1.00999999987
set([u'H', u'N']) 1.00999999961
set([u'H', u'N']) 1.00999999961
set([u'H', u'N']) 1.00999999987
set([u'H', u'N']) 1.00999999961
set([u'H', u'N']) 1.00999999961
set([u'H', u'N']) 1.00999999987
set([u'H', u'N']) 1.00999999961
set([u'H', u'N']) 1.00999999961
As you can see, all N-H and C-H bond lengths are correct.
Combinatorial site replacements¶
Now that we have a good structure, lets start replacing atoms! First, we need to specify the substitutions we will make. Lets start with simply isovalent substitutions for Pb and I. In MePbI3, Methylammonium is a +1 ion, Pb is +2 and I is -1. We will specify replacements in the form of lists of substitutions.:
>>> pb_sub = [ 'Pb', 'Sn',
'Be', 'Mg', 'Ca', 'Sr', 'Ba',
'V', 'Mn', 'Ni', 'Co', 'Fe', 'Zn',
'Cd', 'Eu' ,'Ru', 'Pd', 'Pt', 'As']
>>> i_sub = [ 'I', 'Br', 'Cl', 'F', 'H',
'N', 'S' ]
Next, we need to create a directory structure that is reasonable for
understanding where the structures are. This is primarily to make it easier
for people to find the calculations by hand; qmpy would be find with randomly
generated strings for folder names. We organize the structures into nested
directories of the form {anion}/{cation}. The substitutions are implemented by
the substitute()
method.:
>>> # first we write a little helper function for making folders
>>> def mkdir(path):
if not os.path.exists(path):
os.mkdir(path)
>>> # now we loop through antion/cation pairs and for each we:
>>> # create the POSCAR
>>> # create an Entry
>>> project = Project.get('example')
>>> for anion in i_sub:
mkdir(anion)
for cation in pb_sub:
new_dir = '%s/%s' % (anion, cation)
mkdir(new_dir)
new_struct = s.replace({'Pb':cation,
'I':anion})
io.poscar.write(s, new_dir+'/POSCAR')
entry = Entry.create(new_dir+'/POSCAR', projects=[project])
entry.save()
task = Task.create(entry, 'static')
Running the calculations¶
In order to run calculations with qmpy, we utilize a JobManager and TaskManager. The role of the TaskManager is to look at the calculations that have been requested (in the form of Tasks), and will attempt to fill the available resources with those calculations. The calculations are stored as Jobs, which tracks where the calculation is being run, and where it came from. This is where the JobManager takes over, checking all running Jobs, and if it is found to be done, it is collected. These managers can be accessed through the oqmd script (qmpy/bin/oqmd) either as a daemon process or, more safely, in a screen.:
$ oqmd jobserver -T run
and:
$ oqmd taskserver -T run
As Tasks and Jobs are processed, both of these methods will continuously report job submissions, task completions, as well as errors encountered.
Other examples¶
Now, we will run through a variety of problems, and demonstrate solutions which leverage different functionalities with qmpy.
Identification of FCC decortations¶
First, we will find all binary entries:
>>> binaries = Entry.objects.filter(ntypes=2)
>>> fcc = Composition.get('Cu').ground_state.structure
Then we run through every structure, and see if replacing all atoms with Cu results in a structure that is equivalent (on volume scaling) with FCC Cu.:
>>> fccs = []
>>> for entry in binaries[:100]:
>>> struct = entry.structure
>>> ## Construct a dictionary of elt:replacement_elt pairs
>>> ## where every replacement is Cu
>>> rdict = dict((k, 'Cu') for k in entry.comp)
>>> test = struct.substitute(rdict, rescale=False,
>>> in_place=False)
>>> if fcc == test: # simple equality testing will work
>>> fccs.append(entry)
Warning
If you actually try to run this on the entire database, understand that it will take a pretty long time! Each entry tested takes between 0.1 and 1 second, so it would take most of 24 hours to run through all 80,000+ binary database entries.
Deviation from Vagard’s Law¶
Use the element_groups dictionary to look get a list of all simple metals:
>>> elts = element_groups['simple-metals']
Then, for each pair of metals get all of the entries, and their volumes.:
>>> vols = {}
>>> for e1, e2 in itertools.combinations(elts, r=2):
>>> entries = Composition.get_list([e1, e2])
>>> for entry in entries:
>>> vol = entry.structure.volume_pa
>>> vols[entry.name] = vols.get(entry.name, []) + [vol]
Then, for every composition get the Vagard’s law volume.:
>>> vagards = {}
>>> for comp in vols:
>>> comp = parse_comp(comp) # returns a elt:amt dictionary
>>> uc = unit_comp(comp) # reduces to a total of 1 atom
>>> vvol = 0
>>> for elt, amt in uc.items():
>>> vvol += elements[elt]['volume']*amt
More things you can do: * Calculate an average error for each system * Make a scatter plot for a few binaries show in volume vs x * Look for cases where some are above and some are below * Get relaxed volume of all stable compounds * What about including the “nearly stable”
Compute all A-B bond lengths¶
This script loops over pairs of elements, gets the binary PhaseSpace, and then loops over structures on the convex hull.:
>>> for e1, e2 in itertools.combinations(elts, r=2):
>>> # do logic
>>> if e1 == e2:
>>> break
>>> ps = PhaseSpace([e1,e2])
>>> k = frozenset([e1,e2])
>>> bonds = []
>>> for p in ps.stable:
>>> s = p.calculation.input
>>> if s.ntypes < 2:
>>> continue
>>> dists = get_pair_distances(s, 10)
>>> bonds.append(min(dists[k]))
>>> print e1, e2, np.average(bonds), np.std(bonds)
Integrating with Sci-kit Learn¶
First, the necessary imports:
>>> from sklearn.svm import SVR
>>> from sklearn.ensemble import GradientBoostingRegressor
>>> from sklearn import cross_validation
>>> from sklearn.decomposition import PCA
>>> from sklearn import linear_model
>>> from sklearn import grid_search
>>> from qmpy import *
As an example problem, we will build a very simple model that predicts the volume of a compound at a given composition based only on the composition:
>>> elts = Element.objects.filter(symbol__in=element_groups['simple-metals'])
>>> out_elts = Element.objects.exclude(symbol__in=element_groups['simple-metals'])
>>> models = Calculation.objects.filter(path__contains='icsd')
>>> models = models.filter(converged=True, label__in=['static', 'standard'])
>>> models = models.exclude(composition__element_set=out_elts)
>>> data = models.values_list('composition_id', 'output__volume_pa')
Now we will build a fit set and test set:
>>> y = []
>>> X = []
>>> for c, v in data:
>>> y.append(v)
>>> X.append(get_basic_composition_descriptors(c).values())
>>> X = np.array(X)
>>> y = np.array(y)
>>> x1, x2, y1, y2 = cross_validation.train_test_split(X, y, train_size=0.5)
Now to actually implement the model:
>>> clf = linear_model.LinearRegression()
>>> clf.fit(x1, y1)
>>> clf.score(x2, y2)
More examples¶
For more examples check in qmpy/examples for a variety of scripts that demonstrate these and other tasks.
Script |
Description |
---|---|
sklearn/build_model.py |
Build a volume model using sklearn to predict structure volume based only on composition. |
structures/modify_CNPbI3.py |
Take an incomplete structure CNPbI3, that is supposed to be (CH3NH3)PbI3, and add missing H atoms. |
structures/make_protos.py |
Take a base structure and create a variety of decorations. |
structures/find_layered.py |
Search through the database of structures for cases where the structure is not fully connected, i.e. layered structures. |
structures/bond_lengths.py |
Compute average A-B bond lengths for all A-B pairs, using all stable structures. |
database/discovery_rate.py |
Using reference information from the ICSD and measures of structural uniqueness find the nominal year of ‘discovery’ for all ICSD structures. |
database/precipitates.py |
Screen for good precipitate strengtheners. Creates the results used in <insert ref> |
database/Li-M-O_screen.py |
Screen for (MO_x).(LiO_2) compounds for hybrid Li-ion/Li-O2 electrode materials. Reproduces the results used in <insert ref>. |
database/oqmd_vs_expt.py |
Compare OQMD formation energies with experimental formation energies. |
analysis/chem_pots.py |
Plot of all modified chemical potentials. |
analysis/pot_fitting.py |
Fit chemical potentials. |
analysis/get_formations.py |
Calculation formation energies based on new chemical potentials. |