To import a database stored as an xyz
file, use
data = IPFitting.Data.read_xyz(fname)
See ?read_xyz
for further options. This will return a Vector{Dat}
where
each Dat
is a container storing the atomistic configurion (JuLIP.Atoms
),
the configtype
as well as the "DFT observations".
A basis is defined by
- choice of bond-length or bond-angle PIPs.
- space transform
- choice of cut-off
- body-order
- polynomial degree
For example, using bond-angle PIPs with Morse coordinates and a cosine cut-off to model Si, we first define a descriptor
r0 = rnn(:Si)
rcut = 2.5 * r0
desc = BondAngleDesc("exp(- (r/$r0 - 1.0))", CosCut(rcut-1, rcut))
We can then generate basis functions using nbpolys
, e.g.,
# body-order, descriptor, degree
B4 = nbpolys(4, desc, 8)
In practise, one would normally specify different cut-offs and space transforms
for different body-orders. Suppose these give descriptors D2, D3, D4
, then
a 4-body basis can be constructed via
B = [ nbpolys(2, D2, 14); nbpolys(3, D3, 11); nbpolys(4, D4, 8) ]
For more details and more complex basis sets, see below.
Once the dft dataset and the basis functions have been specified, the least-squares system matrix can be assembled. This can be very time-consuming for high body-orders, large basis sets and large data sets. Therefore this matrix is stored in a block format that allows us to later re-use it in a variety of different ways. This is done via
db = LsqDB(fname, configs, basis)
- The db is stored in two files:
fname_info.jld2
andfname_kron.h5
. In particular,fname
is the path + name of the db, but without ending. E.g,"~/scratch/nbodyips/W5Bdeg12env"
. configs
is aVector{Dat}
basis
is aVector{<: AbstractCalculator}
- The command
db = LsqDB(fname, configs, basis)
evaluates the basis functions, e.g.,energy(basis[i], configs[j].at)
for alli, j
, and stores these values which make up the lsq matrix.
To reload a pre-computed lsq system, use LsqDB(fname)
. To compute a lsq
system without storing it on disk, use LsqDB("", configs, basis)
, i.e.,
pass an empty string as the filename.
The main function to call is
lsqfit(db; kwargs...) -> IP, fitinfo
The system is solved via (variants of) the QR factorisation. See ?lsqfit
for details.
The output IP
of lsqfit
is a JuLIP.AbstractCalculator
which supports
energy, forces, virial, site_energies
. (todo: write more here, in
particular mention fast
)
there are two functions filter_basis
and filter_configs
that can be
used to choose a subset of the data and a subset of the basis. For example,
to take only 2B:
Ib2 = filter_basis(db, b -> (bodyorder(b) < 2))
See inline documentation for more details.
Suppose configs::Vector{Dat}
is a list of configurations, then we can
add fitting error information by calling
add_fits!(myIP, configs, fitkey="myIP")
This will evaluate all observations stored in configs with the new IP and store
them in configs[n].info[fitkey]["okey"]
. These observation values can then
be used to compute RMSE, produce scatter plots, etc.
This calculation can take a while. If myIP
has just been fitted using lsqfit
then there is a quicker way to generate the fitting errors, but this is not
yet implemented. TODO: implement this!