1010 lines
38 KiB
Plaintext
1010 lines
38 KiB
Plaintext
- page 1/102
|
|
|
|
Today I'll talk about my work as a Ph.D. student at the University of Torino.
|
|
|
|
I will deal with aspects related to phenomenology from the point of view of
|
|
strings for which I worked both on theoretical and computational levels.
|
|
|
|
- page 2/102
|
|
|
|
The plan is to first introduce a semi-phenomenological description of physics
|
|
from a theory of strings.
|
|
|
|
These tools will be used throughout the talk as basis for the rest of the
|
|
topics.
|
|
|
|
I will then mainly deal with open strings and analyse correlators in the
|
|
presence of twist and spin fields seen as singular points in the string
|
|
propagation.
|
|
|
|
From particle physics, I will then move to a string theory description of
|
|
cosmology in the presence of time dependent singularities in time dependent
|
|
orbifods.
|
|
|
|
Finally I will focus on computational aspects of string compactifications.
|
|
|
|
- page 3/102
|
|
|
|
I will therefore start with the most geometrical aspects of the discussion,
|
|
reviewing some basics and introducing a framework to deal with open string
|
|
amplitudes in the presence of twist and spin fields.
|
|
|
|
- page 4/102
|
|
|
|
As usual in string theory the starting point is Polyakov's action.
|
|
We start directly from the superstring extension where gamma is the
|
|
worldsheet metric (for the moment generic) and rhos are the 2-dimensional
|
|
"gamma matrices".
|
|
|
|
- page 5/102
|
|
|
|
This description presents lots of symmetries which, at the end of the day,
|
|
provide the key to its success.
|
|
In fact the invariances of the action are such that the theory is conformal
|
|
with a vanishing traceless stress tensor.
|
|
|
|
- page 6/102
|
|
|
|
Using standard field theory methods the stress energy tensor of the
|
|
superstring in D dimensions generates the known Virasoro algebra, extending
|
|
the classical de Witt's algebra.
|
|
At quantum level it presents a central charge whose value depends on the
|
|
dimension of the target space.
|
|
|
|
- page 9/102
|
|
|
|
In order for the stress energy tensor to be a conformal primary field, we
|
|
need to introduce sets of fields known as conformal ghosts.
|
|
These are conformal fields specified by their weight lambda and they are
|
|
introduced as a first order Lagrangian theory.
|
|
|
|
- page 8/102
|
|
|
|
The central charge arising from the algebra of the modes of the ghost sector
|
|
can then be used to compensate the superstring counterpart.
|
|
Consistency of the theory thus fixes the dimension of the target space to be
|
|
a 10-dimensional spacetime.
|
|
|
|
- page 9/102
|
|
|
|
The fact that "everyday" physics in accelerators is 4-dimensional is
|
|
recovered through compactification.
|
|
In simple words we recover 4-dimensional physics at low energy by considering
|
|
the 10-dimensional Minkowski space as a product of the usual 4-dimensional
|
|
spacetime and a 6-(real)-dimensional space.
|
|
|
|
The internal space has to obey stringent restrictions: in particular
|
|
it has to be a compact manifold (to "hide" the extra dimensions), it has to
|
|
break most of the supersymmetry present at high energy, and its arising gauge
|
|
algebra has to contain the standard model algebra.
|
|
|
|
- page 10/102
|
|
|
|
These manifolds have been firstly postulated by Eugenio Calabi and later
|
|
proved to exist by Shing Tung Yau, hence the name Calabi-Yau manifolds.
|
|
In the case at hand, they are a specific class of 3-dimensional complex
|
|
manifolds with SU(3) holonomy.
|
|
They must be Ricci-flat or equivalently have a vanishing first Chern class.
|
|
|
|
- page 11/102
|
|
|
|
In general it is not easy to classify these manifolds (as well as computing
|
|
for instance their metric, which is generally not known).
|
|
However, for instance, we will see that the dimension of the complex
|
|
cohomology groups, known as Hodge numbers, will play a strategic role.
|
|
|
|
- page 12/102
|
|
|
|
We solve the equations of motion and the boundary conditions for the string
|
|
propagation (we focus on the bosonic part for the moment).
|
|
The action in fact introduces naturally the Neumann boundary conditions for
|
|
the strings.
|
|
The solution factorises into its holomorphic components (where z is the usual
|
|
coordinate on the complex plane) due to the equations of motion.
|
|
|
|
- page 13/102
|
|
|
|
Now consider for a second the simplest toroidal compactification of closed
|
|
strings, that is suppose we can "hide" the last direction of the string on a
|
|
circle.
|
|
|
|
As in usual quantum mechanics this leads to momentum quantization (defined as
|
|
an integer n in units of the compactification radius).
|
|
|
|
A closed string can moreover wind an integer number of times around the cycle
|
|
introducing a "winding number" m
|
|
|
|
In turn this is reflected on the spectrum of the theory.
|
|
Differently from field theory, shrinking the radius does not decouple the
|
|
modes, but the compact dimension remains.
|
|
In other words exchanging R and 1/R does not modify the theory.
|
|
|
|
- page 14/102
|
|
|
|
This so called T-duality can also be applied to open strings with a different
|
|
outcome.
|
|
|
|
Since open strings cannot wind around the compact dimension, the behaviour of
|
|
the spectrum is as in field theory: the compact dimension decouples and the
|
|
open string is constrained on a lower dimensional surface.
|
|
|
|
This is the result of introducing Dirichlet boundary conditions on the T-dual
|
|
coordinate, meaning that the endpoints of the string have to reside on the
|
|
same surface.
|
|
|
|
The procedure can be applied to several dimensions thus introducing surfaces
|
|
on which the endpoints of the open string live, called D-branes.
|
|
|
|
- page 15/102
|
|
|
|
D-branes naturally introduce preferred directions of motion thus breaking the
|
|
D-dimensional Poincaré symmetry.
|
|
|
|
It is in fact possible to show that in D-dimensions, the open string sector
|
|
at massless level contains an Abelian field.
|
|
|
|
D-branes split the components into a lower dimensional U(1) field on the
|
|
D-brane and a vector of spacetime scalars.
|
|
|
|
- page 16/102
|
|
|
|
In general we can have strings whose endpoints are on the same D-brane
|
|
leading to U(1) gauge theories as well as stretched strings across different
|
|
branes.
|
|
|
|
The Chan-Paton factors can be used to label the endpoints of the strings by
|
|
the position on the D-branes.
|
|
|
|
It is then possible to show that when the D-branes are coincident, states can
|
|
be rearranged into enhanced gauge groups (for instance, unitary), thus
|
|
creating non Abelian gauge theories.
|
|
|
|
- page 17/102
|
|
|
|
However, physical constraints on the possible constructions pose serious
|
|
questions on the D-brane dispositions.
|
|
|
|
A quark can be modelled with a string stretched across two distant branes,
|
|
but such quark would have a mass proportional to the distance to the branes:
|
|
chirality in particular would not be possible to define, while being one of
|
|
the defining features of the Standard Model.
|
|
|
|
This is where the possibility to put D-branes at an angle with respect to
|
|
each other becomes crucial.
|
|
|
|
While most of the modes will indeed gain a mass, the massless spectrum can
|
|
support chiral states localised at the intersections.
|
|
|
|
- page 18/102
|
|
|
|
In this framework we consider models built with D6-branes.
|
|
We embed them in 6-dimensional internal space without worrying about
|
|
compactification for now.
|
|
|
|
In this scenario we study correlators of fields in the presence of twist
|
|
fields, which are a set of conformal fields arising at the intersections.
|
|
These are particularly interesting to compute for instance Yukawa couplings
|
|
in string theory.
|
|
|
|
- page 19/102
|
|
|
|
Using the path integral approach, correlators involving twist fields are
|
|
dominated by the instanton contribution of the classical Euclidean action.
|
|
|
|
We focus specifically on its computation in the case of three D-branes
|
|
necessary for the Yukawa couplings.
|
|
|
|
- page 20/102
|
|
|
|
The literature already takes into consideration D-branes embedded as lines in
|
|
a factorised version of the internal space, where the possible relative
|
|
rotations are Abelian in each plane of the space.
|
|
|
|
The contribution of the string in this case is proportional to the area of
|
|
the triangle formed on the plane by the three D-branes since the
|
|
string is completely constrained to move on the plane.
|
|
|
|
- page 21/102
|
|
|
|
In what follows we consider a non completely factorised space and we focus on
|
|
its 4-dimensional sector.
|
|
|
|
After filling the physical 4-dimensional space, we study the remaining
|
|
directions of the D-brane in 4-dimensional internal space using a
|
|
well-adapted frame of reference which is in general rotated with respect to
|
|
global coordinates.
|
|
|
|
The rotation is not directly an SO(4) element, but it is an element of a
|
|
Grassmannian: in fact separately rotating the Dirichlet and Neumann
|
|
components does not change anything (we just need to relabel the
|
|
coordinates), as long as no reflections are involved.
|
|
|
|
The rotation involved is therefore a representative of a left equivalence
|
|
class of possible rotations.
|
|
|
|
- page 22/102
|
|
|
|
Introducing the usual conformal transformations mapping the strip to the
|
|
complex plane, the intersecting branes are mapped to the real axis.
|
|
|
|
D-branes are therefore real intervals on the space, between their
|
|
intersection points.
|
|
|
|
- page 23/102
|
|
|
|
In this definition, the boundary conditions of the strings become a set of
|
|
discontinuities on the real axis, one for each D-brane, and the embedding
|
|
equation specifying the intersections.
|
|
|
|
- page 24/102
|
|
|
|
Instead of dealing directly with the discontinuities, it is more suitable to
|
|
introduce an auxiliary function defined over the entire complex plane by
|
|
gluing functions defined on the upper plane and on the bottom plane on
|
|
arbitrary interval.
|
|
|
|
This "doubling trick" transforms the discontinuity into a monodromy factor,
|
|
when looping any base point through the glued interval.
|
|
|
|
- page 25/102
|
|
|
|
The fact that rotations are non Abelian leads to two different monodromies
|
|
for a base points starting in the upper plane or in the bottom plane.
|
|
|
|
This is a general feature, but the 3 D-branes case simplifies this enough.
|
|
|
|
- page 26/102
|
|
|
|
Dealing with these 4 x 4 matrices is delicate.
|
|
|
|
Using a known isomorphism between SO(4) matrices and two copies of SU(2) x
|
|
SU(2), the monodromy matrix can be cast into a tensor product of two 2 x 2
|
|
matrices.
|
|
In this matrix form the solution is therefore given by 2-dimensional basis of
|
|
holomorphic functions with three regular singular points.
|
|
|
|
- page 27/102
|
|
|
|
The usual SL invariance allows us to fix the monodromies in 0, 1 and
|
|
infinity.
|
|
|
|
The overall solution will then be a superposition of all possible basis.
|
|
|
|
Given the previous properties we therefore look for a basis of Gauss
|
|
hypergeometric functions.
|
|
|
|
- page 28/102
|
|
|
|
Since we deal with rotations, the parameters of the hypergeometric functions
|
|
involved are indeed connected to the rotation vectors.
|
|
|
|
However the choice is not unique and labelled by the
|
|
periodicity of the rotations.
|
|
|
|
- page 29/102
|
|
|
|
The reason is a huge redundancy in the description: using the free parameters
|
|
of the rotations we should in fact fix all degrees of freedom in the
|
|
solution, which at the moment is an infinite sum involving an infinite amount
|
|
of free parameters.
|
|
|
|
In fact we only showed that the rotation matrix is equivalent to a
|
|
monodromy matrix from which can build an overparametrised solution.
|
|
|
|
Using contiguity relations we can then restrict the sum over independent
|
|
functions.
|
|
|
|
Finally requiring the Euclidean action to be finite restricts the sum to only
|
|
two terms (the particular terms surviving in the sum depend on the rotation
|
|
vectors but they are never more than two).
|
|
|
|
Fixing the intersection points eventually determines the free constants in the solution.
|
|
|
|
- page 30/102
|
|
|
|
The physical interpretation of the solution is finally straightforward in the
|
|
Abelian case, where the action can be reduced to the sum of the areas of the
|
|
internal triangles (this is a general result even for a generic number of
|
|
D-branes).
|
|
|
|
- page 31/102
|
|
|
|
In the non Abelian case we considered there is no simple way to write the
|
|
action using global data.
|
|
However the contribution to the Euclidean action is larger than the Abelian
|
|
case: the strings are in fact no longer constrained on a plane and, in order
|
|
to stretch across the boundaries, they have to form a small bump while
|
|
detaching from the D-brane.
|
|
|
|
The Yukawa coupling in this case is therefore suppressed with respect to the
|
|
Abelian case.
|
|
|
|
- page 32/102
|
|
|
|
We then turn the attention to fermions and the computation of correlators
|
|
involving spin fields.
|
|
Though ideally extending some ideas, we abandon the intersecting D-brane
|
|
scenario, and we introduce point-like defects on one boundary of the
|
|
worldsheet in such a way that the superstring undergoes a change of its
|
|
boundary conditions when meeting a defect.
|
|
|
|
- page 33/102
|
|
|
|
It is possible to show that in this case the Hamiltonian of the theory
|
|
is only piecewise conserved.
|
|
|
|
- page 34/102
|
|
|
|
Suppose now that we could expand the field on a basis of solutions to the
|
|
boundary conditions and work, as before, on the entire complex plane.
|
|
|
|
- page 35/102
|
|
|
|
Ideally we would be interested in extracting the modes in order to perform
|
|
any computation of amplitudes.
|
|
The definition of the operation is connected to a dual basis whose form is
|
|
completely fixed by the original field (which we know) and the request of
|
|
time independence.
|
|
|
|
- page 36/102
|
|
|
|
The resulting algebra of the operators is in fact defined through such
|
|
operation and it is therefore time independent.
|
|
|
|
- page 37/102
|
|
|
|
Differently from what done in the bosonic case, we focus on U(1) boundary
|
|
change operators.
|
|
The resulting monodromy on the complex plane is therefore a phase factor.
|
|
|
|
- page 38/102
|
|
|
|
As in the previous case we can write a basis of solutions which incorporates
|
|
the behaviour looping the point-like defects.
|
|
Consequently we can also define a dual basis.
|
|
|
|
Both fields are defined up to integer factors, since we are still dealing
|
|
with rotations.
|
|
|
|
- page 39/102
|
|
|
|
In order to compute amplitudes we then need to define the space on which the
|
|
representation of the algebra acts.
|
|
We define an excited vacuum, annihilated by positive frequency modes, and the
|
|
lowest energy vacuum (from the strip definition).
|
|
|
|
- page 40/102
|
|
|
|
Vacua need to be consistent, leading to conditions labelled by an integer
|
|
factor relating the basis of solutions with its dual (and ultimately the
|
|
algebra of operators).
|
|
The vacuum must always be correctly normalised and the description of physics
|
|
using any two of the vacuum definitions should be consistently equivalent.
|
|
|
|
- page 41/102
|
|
|
|
To avoid having overlapping in- and out-annihilators, the label L must
|
|
vanish.
|
|
|
|
- page 42/102
|
|
|
|
In this framework, the stress energy tensor displays as expected a time
|
|
dependence due to the presence of the point-like defects.
|
|
Specifically it shows that in each defect we have a primary boundary changing
|
|
operators (whose weight depends on the monodromy) and which creates the
|
|
excited vacuum from the invariant vacuum.
|
|
This is by all means an excited spin field.
|
|
|
|
Finally (and definitely fascinating), the stress energy tensor obeys the
|
|
canonical OPE, that is the theory is still conformal (even though there is a
|
|
time dependence).
|
|
|
|
- page 43/102
|
|
|
|
In formulae, the excited vacuum used in computations is thus created by a
|
|
radially ordered product of excited spin fields hidden in the defects.
|
|
|
|
- page 44/102
|
|
|
|
We are therefore in a position to compute the correlators involving such spin
|
|
fields (however since we cannot compute the normalisation, we can compute
|
|
only quantities not involving it).
|
|
For instance we reproduce the known result of bosonization.
|
|
|
|
Moreover, since we have complete control over the algebra of the fermionic
|
|
fields, we can also compute any correlator involving both spin and matter
|
|
fields.
|
|
|
|
- page 45/102
|
|
|
|
We therefore showed that semi-phenomenological models need the ability to
|
|
compute correlators involving twist and spin fields.
|
|
|
|
We then introduced a framework to compute the instanton contribution to the
|
|
correlators using intersecting D-branes and we showed how to compute
|
|
correlators in the fermionic case involving spin fields as point-like defects
|
|
on the string worldsheet.
|
|
|
|
The question would now be how to extend this to non Abelian spin fields and,
|
|
most importantly, to twist fields, where there is no framework such as
|
|
bosonization.
|
|
|
|
- page 46/102
|
|
|
|
After considering defects and singular points in particle physics, I will
|
|
deal with time dependent singularities in cosmology.
|
|
|
|
- page 47/102
|
|
|
|
The reason is that as string theory is considered a theory of everything, its
|
|
phenomenological description should in fact include both strong and
|
|
electroweak forces as well as gravity.
|
|
|
|
- page 48/102
|
|
|
|
In particular from the gravity side, we would like to have a better view of
|
|
the cosmological implications in string theory.
|
|
|
|
- page 49/102
|
|
|
|
For instance we could try to study Big Bang models to gain some better
|
|
insight with respect to field theory.
|
|
|
|
- page 50/102
|
|
|
|
For this, one way would be to build toy models of singularities in time, in
|
|
which the singular point exists in one specific moment, rather than place.
|
|
|
|
- page 51/102
|
|
|
|
A simple way to make it so is to build toy models from time-dependent
|
|
orbifolds which can model singularities as their fixed points.
|
|
|
|
- page 52/102
|
|
|
|
In the past people already dealt with such problem finding divergences in the
|
|
computation of amplitudes.
|
|
The presence of such divergences in N-point correlators is however usually
|
|
associated to a gravitational backreaction due to exchange of gravitons.
|
|
|
|
- page 53/102
|
|
|
|
However the 4-tachyon amplitude in string theory is divergent already in the
|
|
open string sector at tree level: in other words genuine gauge theories.
|
|
|
|
The effective field theory interpretation would be a 4-point interaction of
|
|
scalar fields (higher spins would only spoil the behaviour).
|
|
|
|
- page 54/102
|
|
|
|
To investigate further, we consider the so called Null Boost Orbifold.
|
|
|
|
The construction starts from D-dimensional Minkowski spacetime through a
|
|
change of coordinates.
|
|
|
|
- page 55/102
|
|
|
|
The orbifold is then built through the periodic identification of one
|
|
coordinate along the direction of its Killing vector, which leads to momentum
|
|
quantization.
|
|
|
|
- page 56/102
|
|
|
|
From these identifications, we can build scalar wave functions obeying the
|
|
standard equations of motion.
|
|
|
|
Notice the behaviour in the time direction u which already takes a peculiar
|
|
form and the presence of the quantized momentum in a strategic place.
|
|
|
|
- page 57/102
|
|
|
|
In order to introduce the divergence problem we first consider a theory of
|
|
scalar QED.
|
|
|
|
- page 58/102
|
|
|
|
When computing the interactions between the fields, the terms involved are
|
|
entirely defined by two main integrals.
|
|
|
|
It might not be immediately visible, but given the behaviour of the scalar
|
|
functions, any vertex interaction with more than 3 fields diverges.
|
|
|
|
- page 59/102
|
|
|
|
The reason for the divergence is connected to the "strategically placed"
|
|
quantized momentum.
|
|
|
|
When when all quantized momenta vanish, in the limit of small u (that is near
|
|
the singularity) the integrands develop isolated zeros preventing the
|
|
convergence.
|
|
In fact, in this case, even a distributional interpretation (not unlike the
|
|
derivative of a delta function) fails.
|
|
|
|
- page 60/102
|
|
|
|
So far the situation is therefore somewhat troublesome.
|
|
|
|
Moreover, obvious ways to regularise the theory do not work: for instance
|
|
adding a Wilson line does not cure the problem as divergences also involve
|
|
neutral strings which would not feel the regularisation.
|
|
|
|
In fact the problems seem to arise from the vanishing volume in phase space
|
|
along the compact direction: the issue looks like geometrical, rather than
|
|
strictly gravitational.
|
|
|
|
- page 61/102
|
|
|
|
Since the field theory fails to give a reasonable value for amplitudes
|
|
involving time-dependent singularities, we could therefore ask whether string
|
|
theory can shed some light.
|
|
|
|
- page 62/102
|
|
|
|
The relevant divergent integrals are in fact present also in string theory.
|
|
They arise from interactions of massive vertices like what is shown here.
|
|
|
|
These vertices are usually overlooked as they do not play in general a
|
|
relevant role at low energy.
|
|
However it is possible that near the singularity they might actually give
|
|
their contribution.
|
|
These vertices are involved at low energy in the definition of contact terms
|
|
in the effective field theory, which therefore does not account for them.
|
|
|
|
- page 63/102
|
|
|
|
In this sense even string theory cannot give a solution to the problem.
|
|
In other words since the effective theory does not even exist, its high
|
|
energy completion is not capable of providing a better description.
|
|
|
|
- page 64/102
|
|
|
|
There is however one geometric way to escape this.
|
|
Since the issues are related to a vanishing phase space volume, analytically
|
|
speaking it is sufficient to add a non compact direction to the orbifold in
|
|
which the particle is "free to escape".
|
|
|
|
- page 65/102
|
|
|
|
While the Generalised Null Boost Orbifold has basically the same definition
|
|
through one of its Killing vector, the presence of the additional direction
|
|
acts in a different way on the definition of the scalar functions.
|
|
|
|
As you can see the new time behaviour ensures better convergence properties,
|
|
and the presence of the continuous momentum ensures that no isolated zeros
|
|
are present at any time.
|
|
Even in the worst case scenario, the arising amplitudes would still have a
|
|
distributional interpretation.
|
|
|
|
- page 66/102
|
|
|
|
We therefore showed that divergences in the simplest theories are present
|
|
both in field theory and string theory and that in the presence of
|
|
singularities, the string massive states start to play a role.
|
|
|
|
The nature of the divergences is however due to vanishing volumes in phase
|
|
space and cannot be classified as simply a gravitational backreaction.
|
|
In fact the introduction of "escape routes" for fields grants a
|
|
distributional interpretation of the amplitudes.
|
|
|
|
It is also possible to show that this is not restricted to "null boost" types
|
|
of orbifolds, but even other kinds of orbifolds present the same issues.
|
|
|
|
- page 67/102
|
|
|
|
In summary we showed that the divergences cannot be regarded as simply
|
|
gravitational, but even gauge theories (that is the open sector of the string
|
|
theory) present issues.
|
|
|
|
Their nature is however subtle and connected to the interaction of string
|
|
massive modes (or contact terms in the low energy formulation) which are not
|
|
usually taken into account.
|
|
|
|
- page 68/102
|
|
|
|
We finally move to the last section.
|
|
|
|
After the analysis of semi-phenomenological analytical models, we now
|
|
consider a computational task related to compactifications of
|
|
extra-dimensions using machine learning.
|
|
|
|
- page 69/102
|
|
|
|
We focus on Calabi-Yau manifolds in three complex dimensions.
|
|
|
|
Due to their properties and their symmetries, the relevant topological
|
|
invariants are two Hodge numbers.
|
|
|
|
As the number of possible compact Calabi-Yau 3-folds is a huge, we focus on a
|
|
subset.
|
|
|
|
- page 70/102
|
|
|
|
Specifically we focus on manifolds built as intersections of hypersurfaces in
|
|
projective spaces, that is intersections of several homogeneous equations in
|
|
the complex coordinates of the manifold.
|
|
|
|
As we are interested in studying these manifolds as topological spaces, for
|
|
each equation and projective space we do not care about the coefficients, but
|
|
only the exponents, or better the degree of the equation in a given
|
|
coordinate.
|
|
|
|
- page 71/102
|
|
|
|
The intersections can be generalised to multiple projective spaces and
|
|
equations and the manifold can be characterised by a matrix containing the
|
|
powers of the coordinates in each equation.
|
|
|
|
The problem in which we are interested is therefore to be able to take the so
|
|
called "configuration matrix" of the manifolds and predict the value of the
|
|
Hodge numbers.
|
|
|
|
- page 72/102
|
|
|
|
The real issue is now how to treat the configuration matrix and how to build
|
|
such map.
|
|
|
|
- page 73/102
|
|
|
|
We use a machine learning approach.
|
|
|
|
In very simple words it means that we want to find a new representation of
|
|
the input (possibly parametrized by some weights which we can tune and
|
|
control) such that the predicted Hodge numbers are as close as possible to
|
|
the correct result.
|
|
|
|
In this sense the machine has to learn some way to transform the input to get
|
|
a result close to what in the computer science literature is called the
|
|
"ground truth".
|
|
|
|
The measure of proximity or distance from the true value is called "loss
|
|
function" or "Lagrangian function" (with a slight abuse of naming
|
|
conventions).
|
|
The machine then learns some way to minimise this function (for instance
|
|
using gradient descent methods and updating the parameters).
|
|
|
|
- page 74/102
|
|
|
|
We thus exchange the difficult problem of finding an analytical solution with
|
|
an optimisation problem (it does not imply "easy", but it is at least
|
|
doable).
|
|
|
|
- page 75/102
|
|
|
|
In order to learn the best way of doing this, we can rely on a vast computer
|
|
science literature and use large physics datasets containing lots of samples
|
|
from which to infer a structure.
|
|
|
|
- page 76/102
|
|
|
|
In this sense the approach can merge techniques from physics, mathematics and
|
|
computer science benefiting from advancements in all fields.
|
|
|
|
- page 77/102
|
|
|
|
The approach can furthermore provide a good way to analyse data and infer
|
|
structure and advance hypothesis, which could end up overlooked using
|
|
traditional brute force algorithms.
|
|
|
|
In this case we focus on the prediction of two Hodge numbers with very
|
|
different distributions and ranges.
|
|
The data we consider were computed using top of the class computing power at
|
|
CERN in the 80s, with a huge effort by the string theory community.
|
|
|
|
In this sense Complete Intersection Calabi-Yau manifolds are a good starting
|
|
point to investigate the application of machine learning techniques because
|
|
they are well studied and characterised.
|
|
|
|
- page 78/102
|
|
|
|
The dataset we use contains less than 10000 manifolds (in machine learning
|
|
terms it is still small).
|
|
|
|
From these we remove product spaces (recognisable by their block diagonal
|
|
form of the configuration matrix) and we remove very high values of the Hodge
|
|
numbers to avoid learning "extremal configurations".
|
|
Mind that we only remove them from the training data which the machine
|
|
actually uses to learn.
|
|
|
|
In this sense we are simply not giving the machine "extremal" configurations
|
|
in an attempt to push as far as possible the application: should the machine
|
|
learn a good representation, it will automatically be capable of learning
|
|
also those configurations without a human manually feeding them.
|
|
|
|
We then define three separate folds: the largest contains training data used
|
|
by the machine to adjust the parametrisation, 10% of the data is then used
|
|
for intermediate evaluation of the process, while the last subset is used to
|
|
give the final predictions.
|
|
|
|
Differently from the validation set, the test set has not been seen by the
|
|
machine and therefore can reliably test the generalisation ability of the
|
|
algorithm.
|
|
|
|
Differently from some previous approaches we consider this as a regression
|
|
task in the attempt to let the machine learn a true map between the
|
|
configuration matrix and the Hodge numbers (in case we can also discuss the
|
|
classification approach as it has some interesting applications itself).
|
|
|
|
- page 79/102
|
|
|
|
The pruned distribution of the Hodge numbers therefore presents less outliers
|
|
than the initial dataset, but as you can see we expect the result to be
|
|
similar even without the procedure, since the number of outliers removed is
|
|
small.
|
|
|
|
This first analysis however proved to be a good success to get higher
|
|
results.
|
|
|
|
- page 80/102
|
|
|
|
The pipeline we adopt is the same used at industrial level by companies and
|
|
data scientists.
|
|
We in fact heavily rely on data analysis to improve as much as possible the
|
|
output.
|
|
|
|
- page 81/102
|
|
|
|
This for instance can be done by including additional information with
|
|
respect to the configuration matrix, that is by feeding the machine variables
|
|
which can be manually derived: by definition they are redundant but can be
|
|
used to easily learn a pattern.
|
|
|
|
In fact as we can see most of the features such as the number of projective
|
|
spaces or the number of equations in the matrix are heavily correlated with
|
|
the Hodge numbers.
|
|
|
|
Moreover using algorithms to produce a ranking of the variables such as
|
|
decision trees shows that such "engineered features" are much more important
|
|
than the configuration matrix itself.
|
|
|
|
- page 82/102
|
|
|
|
Using the "engineered data", we now get to the choice of the algorithm.
|
|
There is no general rule for this, even though there might be good guidelines
|
|
to follow.
|
|
|
|
- page 83/102
|
|
|
|
Though the approach is clearly "supervised" in the sense that the machine
|
|
learns by approximating a known result, we also tried other approaches in an
|
|
attempt to generate additional information which the machine could use.
|
|
|
|
The first approach is a clustering algorithm, intuitively used to look for a
|
|
notion of "proximity" between the configuration matrices.
|
|
This however did not play a role in the analysis.
|
|
|
|
The other is definitely more interesting and it consists in finding a better
|
|
representation of the configuration matrix using less components.
|
|
The idea is therefore to "squeeze" or "concentrate" the information in a
|
|
lower dimensional space (matrices in our case have 180 components, so we are
|
|
trying to aim for something less than that).
|
|
|
|
- page 84/102
|
|
|
|
For the predictions we first relied on traditional regression algorithms,
|
|
such as linear models, support vector machines and boosted decision trees.
|
|
I will not enter into the details and differences between the algorithms, but
|
|
we can indeed discuss them.
|
|
|
|
- page 85/102
|
|
|
|
Let me however say a few words a dimensionality reduction procedure known as
|
|
"principal components analysis" (or PCA for short), since this proved to be
|
|
an important step in the analysis.
|
|
|
|
Suppose that we have a rectangular matrix (which could be number of samples
|
|
in the dataset times the number of components of the matrix once it has been
|
|
flattened).
|
|
|
|
The idea of PCA is to project the data onto a lower dimensional surface where
|
|
the variance if maximised in order to retain as much information as possible.
|
|
|
|
This is usually used to isolate a signal from a noisy background.
|
|
Thus by isolating only the meaningful components of the matrix we can hope to
|
|
help the algorithm.
|
|
|
|
- page 86/102
|
|
|
|
Visually PCA is used to isolate the eigenvalues and eigenvectors of the
|
|
covariance matrix (or the singular values of the matrix) which do not belong
|
|
to the background.
|
|
|
|
From random matrix theory we know that the eigenvalues of an independently
|
|
and identically distributed matrix (a Wishart matrix) follow a
|
|
Marchenko-Pastur distribution.
|
|
|
|
Such matrix containing a signal would therefore be recognised by the presence
|
|
of eigenvalues outside this probability distribution.
|
|
We could therefore simply keep the corresponding eigenvectors.
|
|
|
|
In our case this resulted in an improvement of the accuracy, obtained by
|
|
retaining less than half of the components of the matrix (corresponding to
|
|
99% of the variance of the initial set).
|
|
|
|
- page 87/102
|
|
|
|
As we can see we used several algorithms to evaluate the procedure.
|
|
|
|
Previous approaches in the literature mainly relied on the direct application
|
|
of algorithms to the configuration matrix.
|
|
We extended this beyond the previously considered algorithms (mainly support
|
|
vectors) to decision trees and linear models for comparison.
|
|
|
|
- page 88/102
|
|
|
|
Techniques such as feature engineering and PCA provide a huge improvement
|
|
(even with less training data).
|
|
Let me for instance point out the fact the even a simple linear regression
|
|
reaches the same level of accuracy previously reached by more complex
|
|
algorithms, even with much less training data.
|
|
|
|
This ultimately can cut computational costs and complexity.
|
|
|
|
- page 89/102
|
|
|
|
However this does not conclude the landscape of algorithms used in machine
|
|
learning.
|
|
In fact we also used neural networks architectures.
|
|
|
|
They are a class of function approximators which use (some variants of)
|
|
gradient descent to optimise the weights.
|
|
Their layered structure is key to learn highly non linear and complicated
|
|
functions.
|
|
|
|
We focused on two distinct architectures.
|
|
|
|
The older fully connected networks were employed in previous attempts at
|
|
predicting the Hodge numbers.
|
|
They rely on a series of matrix operations to create new outputs from
|
|
previous layers.
|
|
In this sense the matrix W and the bias term b are the weights which need to
|
|
be updated.
|
|
Each node is connected to all the outputs, hence the name fully connected or
|
|
densely connected.
|
|
Equivalently this means that the matrix W does not have vanishing
|
|
entries.
|
|
|
|
To learn non linear functions this is however not sufficient: an iterated
|
|
application of these linear maps would simply result in a linear function to
|
|
be learned.
|
|
We "break" linearity by introducing an "activation function" at each layer.
|
|
|
|
The second architecture is called convolutional from its iterated application
|
|
of "sliding window function" (that is convolutions) applied on the layers.
|
|
|
|
- page 90/102
|
|
|
|
Convolutional networks have several advantages over a fully connected
|
|
approach.
|
|
|
|
Since the input in this case does not need to flattened, convolutions retain
|
|
the notion of vicinity between cells in a grid (here we have an example of a
|
|
configuration matrix as seen by a convolutional neural network).
|
|
|
|
Since they do not have one weight for each connection, they have a smaller
|
|
number of parameters to update (proportional to the size of the window).
|
|
In our specific case we cut by more than one order of magnitude the number
|
|
of parameters used with respect to fully connected networks.
|
|
|
|
Moreover weights are shared by adjacent cells, meaning that if there is a
|
|
structure to be inferred, this is the way to go to exploit the "artificial
|
|
intelligence" underlying the operations involved.
|
|
|
|
- page 91/102
|
|
|
|
In this sense a convolutional architecture can isolate defining features of
|
|
the output and pass them to the following layer as in the animation.
|
|
|
|
For instance, using a computer science analogy, this can be used to classify
|
|
objects given a picture: a convolutional neural network is literally capable
|
|
of isolating what makes a dog a dog and what distinguishes it from a cat
|
|
(even more specific it can separate a Labrador from a Golden Retriever).
|
|
|
|
- page 92/102
|
|
|
|
This has in fact been used in computer vision tasks in recent year for
|
|
pattern recognitions, object detections and spatial awareness tasks (for
|
|
instance to isolate the foreground from the background).
|
|
In this sense this is the closest approximation of artificial intelligence in
|
|
supervised tasks.
|
|
|
|
- page 93/102
|
|
|
|
My contribution in this sense is inspired by deep learning research at
|
|
Google.
|
|
In recent years they were able to devise new architectures using so called
|
|
"inception modules" in which different convolution operations are used
|
|
concurrently.
|
|
The architecture has better generalisation properties since more features can
|
|
be detected and processed at the same time.
|
|
|
|
- page 94/102
|
|
|
|
In our case we decided to go for two concurrent convolutions one scanning
|
|
each equation (the vertical kernel) of the configuration matrix, while a
|
|
second convolutions scans the projective spaces (in horizontal).
|
|
|
|
The layer structure is then concatenated until a single output is produced
|
|
(the Hodge number that is).
|
|
|
|
The idea is that this way the network can learn a relation between projective
|
|
spaces and equations and recombine them to find a new representation.
|
|
|
|
- page 95/102
|
|
|
|
As we can see even the simple introduction of a traditional convolutional
|
|
kernel (the result shown was reached with a 5x5 kernel function) is
|
|
sufficient to boost the accuracy of the predictions (previous best results in
|
|
2018 reached only 77% of accuracy on h^{1,1}).
|
|
|
|
- page 96/102
|
|
|
|
The introduction of the Inception architecture has major advantages: it uses
|
|
even less parameters than "traditional" convolutional networks, it boosts the
|
|
performance reaching near perfect accuracy, it needs a lot less data (even
|
|
with just 30% of the data for training, the accuracy is already near
|
|
perfect).
|
|
|
|
Moreover with this architecture we were able to predict also h^{2,1} with 50%
|
|
accuracy: even if does not look a reliable method to predict it (I agree, for
|
|
now), mind that previous attempts have usually avoided computing it, or they
|
|
reached accuracies as high as 8-9% (even feature engineering could boost it
|
|
only around 35%).
|
|
|
|
The network is also solid enough to predict both Hodge numbers at the same
|
|
time: trading a bit of the accuracy for a simpler model, it is in fact
|
|
possible to let the machine learn the existing relation between the Hodge
|
|
numbers without specifically inputing anything (for instance by inserting the
|
|
fact that the difference of the Hodge numbers is the Euler characteristic).
|
|
|
|
For more specific info I invite you to take a look at Harold's talk on the
|
|
subject at the recent "string_data" workshop.
|
|
|
|
- page 97/102
|
|
|
|
Deep learning can therefore be used as a predictive method, provided that one
|
|
is able to analyse the data (no black boxes should ever be admitted).
|
|
|
|
- page 98/102
|
|
|
|
It can also be used a source of inspiration for inquiries and investigations
|
|
always provided a good analysis is done beforehand.
|
|
|
|
- page 99/102
|
|
|
|
Deep learning can also be used for generalisation of patterns and relations.
|
|
As always only after careful consideration.
|
|
|
|
- page 100/102
|
|
|
|
Moreover convolutional networks look promising and with a lot of unexplored
|
|
potential.
|
|
This is in fact the first time in which they have been successfully used in
|
|
theoretical physics.
|
|
|
|
Finally, this is an interdisciplinary approach in which a lot is yet to be
|
|
learned from different perspective.
|
|
|
|
- page 101/102
|
|
|
|
More directions to investigate now remain.
|
|
|
|
In fact one could in principle exploit freedom in representing the
|
|
configuration matrices to learn the best possible representation.
|
|
|
|
Otherwise one could start to think about this in a mathematical embedding and
|
|
study what happens for CICY 4-folds (almost one million complete
|
|
intersections).
|
|
|
|
Moreover, as I was saying, this could be used as an attempt to study formal
|
|
aspects of deep learning, or even more to directly dive into the "real
|
|
artificial intelligence" and start to study the problem in a reinforcement
|
|
learning environment where the machine automatically learns a task without
|
|
knowing the final result.
|
|
|
|
- page 102/102
|
|
|
|
I will therefore leave the open question as to whether this is actually going
|
|
to be the end or just the start of something else.
|
|
|
|
In the meantime I thank you for your attention.
|