Update images and references

Signed-off-by: Riccardo Finotello <riccardo.finotello@gmail.com>
2020-10-20 19:29:13 +02:00
parent 06e27a3702
commit 1eb7136ead
16 changed files with 414 additions and 1301 deletions
--- a/sec/part3/conclusion.tex
+++ b/sec/part3/conclusion.tex
@@ -10,7 +10,7 @@ For instance we could try to set up a map from any matrix to its favourable repr
 This could be the basis for the use of adversarial networks~\cite{Goodfellow:2014:GenerativeAdversarialNets} capable of generating the favourable embedding from the first.
 Or on the contrary one could generate more matrices for the same manifold in order to increase the size of the training set.
 Another possibility is to use the graph representation of the configuration matrix to which is automatically invariant under permutations~\cite{Hubsch:1992:CalabiyauManifoldsBestiary} (another graph representation has been decisive in~\cite{Krippendorf:2020:DetectingSymmetriesNeural} to get a good accuracy).
-Techniques such as (variational) autoencoders~\cite{Kingma:2014:AutoEncodingVariationalBayes, Rezende:2014:StochasticBackpropagationApproximate, Salimans:2015:MarkovChainMonte}, cycle GAN~\cite{Zhu:2017:UnpairedImagetoimageTranslation}, invertible neural networks~\cite{Ardizzone:2019:AnalyzingInverseProblems}, graph neural networks~\cite{Gori:2005:NewModelLearning, Scarselli:2004:GraphicalbasedLearningEnvironments} or more generally techniques from geometric deep learning~\cite{Monti:2017:GeometricDeepLearning} could be helpful.
+Techniques such as (variational) autoencoders~\cite{Kingma:2014:AutoEncodingVariationalBayes, Rezende:2014:StochasticBackpropagationApproximate}, cycle GAN~\cite{Zhu:2017:UnpairedImagetoimageTranslation}, invertible neural networks~\cite{Ardizzone:2019:AnalyzingInverseProblems}, graph neural networks~\cite{Gori:2005:NewModelLearning, Scarselli:2004:GraphicalbasedLearningEnvironments} or techniques from geometric deep learning~\cite{Monti:2017:GeometricDeepLearning} could be helpful.

 Finally our techniques apply directly to \cicy $4$-folds~\cite{Gray:2013:AllCompleteIntersection, Gray:2014:TopologicalInvariantsFibration}.
 However there are many more manifolds in this case (around \num{e6}) and more Hodge numbers, such that one can expect to reach a better accuracy for the different Hodge numbers (the different learning curves for the $3$-folds indicate that the model training would benefit from more data).
--- a/sec/part3/introduction.tex
+++ b/sec/part3/introduction.tex
@@ -2,24 +2,24 @@ In the previous parts we presented mathematical tools for the theoretical interp
 The ultimate goal of the analysis is to provide some insights on the predictive capabilities of the string theory framework applied to phenomenological data.
 As already argued in~\Cref{sec:CYmanifolds} the procedure is however quite challenging as there are different ways to match string theory with the experimental reality, that is there are several different vacuum configurations arising from the compactification of the extra-dimensions.
 The investigation of feasible phenomenological models in a string framework has therefore to deal also with computational aspects related to the exploration of the \emph{landscape}~\cite{Douglas:2003:StatisticsStringTheory} of possible vacua.
-Unfortunately the number of possibilities is huge (numbers as high as $\num{e272000}$ have been suggested for some models)~\cite{Lerche:1987:ChiralFourdimensionalHeterotic, Douglas:2003:StatisticsStringTheory, Ashok:2004:CountingFluxVacua, Douglas:2004:BasicResultsVacuum, Douglas:2007:FluxCompactification, Taylor:2015:FtheoryGeometryMost, Schellekens:2017:BigNumbersString, Halverson:2017:AlgorithmicUniversalityFtheory, Taylor:2018:ScanningSkeleton4D, Constantin:2019:CountingStringTheory}, the mathematical objects entering the compactifications are complex and typical problems are often NP-complete, NP-hard, or even undecidable~\cite{Denef:2007:ComputationalComplexityLandscape, Halverson:2019:ComputationalComplexityVacua, Ruehle:2020:DataScienceApplications}, making an exhaustive classification impossible.
+Unfortunately the number of possibilities is huge (numbers as high as $\num{e272000}$ have been suggested for some models)~\cite{Douglas:2003:StatisticsStringTheory, Ashok:2004:CountingFluxVacua, Taylor:2015:FtheoryGeometryMost, Taylor:2018:ScanningSkeleton4D, Constantin:2019:CountingStringTheory}, the mathematical objects entering the compactifications are complex and typical problems are often NP-complete, NP-hard, or even undecidable~\cite{Denef:2007:ComputationalComplexityLandscape, Halverson:2019:ComputationalComplexityVacua}, making an exhaustive classification impossible.
 Additionally there is no single framework to describe all the possible (flux) compactifications.
 As a consequence each class of models must be studied with different methods.
 This has in general discouraged, or at least rendered challenging, precise connections to the existing and tested theories (in particular, the \sm of particle physics).

 Until recently the string landscape has been studied using different methods such as analytic computations for simple examples, general statistics, random scans or algorithmic enumerations of possibilities.
-This has been a large endeavor of the string community~\cite{Grana:2006:FluxCompactificationsString, Lust:2009:SeeingStringLandscape, Ibanez:2012:StringTheoryParticle, Brennan:2018:StringLandscapeSwampland, Halverson:2018:TASILecturesRemnants, Ruehle:2020:DataScienceApplications}.
+This has been a large endeavor of the string community~\cite{Grana:2006:FluxCompactificationsString, Brennan:2018:StringLandscapeSwampland}.
 The main objective of such studies is to understand what are the generic predictions of string theory.
-The first conclusion of these studies is that compactifications giving an effective theory close to the Standard Model are scarce~\cite{Dijkstra:2005:ChiralSupersymmetricStandard, Dijkstra:2005:SupersymmetricStandardModel, Blumenhagen:2005:StatisticsSupersymmetricDbrane, Gmeiner:2006:OneBillionMSSMlike, Douglas:2007:LandscapeIntersectingBrane, Anderson:2014:ComprehensiveScanHeterotic}.
+The first conclusion of these studies is that compactifications giving an effective theory close to the Standard Model are scarce~\cite{Dijkstra:2005:ChiralSupersymmetricStandard, Blumenhagen:2005:StatisticsSupersymmetricDbrane, Douglas:2007:LandscapeIntersectingBrane, Anderson:2014:ComprehensiveScanHeterotic}.
 The approach however has limitations mainly given by lack of a general understanding or high computational power required to run the algorithms.

-In reaction to these difficulties and starting with the seminal paper~\cite{Abel:2014:GeneticAlgorithmsSearch} new investigations based on Machine Learning (\ml) appeared in the recent years, focusing on different aspects of the string landscape and of the geometries used in compactifications~\cite{Krefl:2017:MachineLearningCalabiYau, Ruehle:2017:EvolvingNeuralNetworks, He:2017:MachinelearningStringLandscape, Carifio:2017:MachineLearningString, Altman:2019:EstimatingCalabiYauHypersurface, Bull:2018:MachineLearningCICY, Cole:2019:TopologicalDataAnalysis, Klaewer:2019:MachineLearningLine, Mutter:2019:DeepLearningHeterotic, Wang:2018:LearningNonHiggsableGauge, Ashmore:2019:MachineLearningCalabiYau, Brodie:2020:MachineLearningLine, Bull:2019:GettingCICYHigh, Cole:2019:SearchingLandscapeFlux, Faraggi:2020:MachineLearningClassification, Halverson:2019:BranesBrainsExploring, He:2019:DistinguishingEllipticFibrations, Bies:2020:MachineLearningAlgebraic, Bizet:2020:TestingSwamplandConjectures, Halverson:2020:StatisticalPredictionsString, Krippendorf:2020:DetectingSymmetriesNeural, Otsuka:2020:DeepLearningKmeans, Parr:2020:ContrastDataMining, Parr:2020:PredictingOrbifoldOrigin} (see also~\cite{Erbin:2018:GANsGeneratingEFT, Betzler:2020:ConnectingDualitiesMachine, Chen:2020:MachineLearningEtudes, Gan:2017:HolographyDeepLearning, Hashimoto:2018:DeepLearningAdS, Hashimoto:2018:DeepLearningHolographic, Hashimoto:2019:AdSCFTCorrespondence, Tan:2019:DeepLearningHolographic, Akutagawa:2020:DeepLearningAdS, Yan:2020:DeepLearningBlack, Comsa:2019:SupergravityMagicMachine, Krishnan:2020:MachineLearningGauged} for related works and~\cite{Ruehle:2020:DataScienceApplications} for a comprehensive summary of the state of the art).
+In reaction to these difficulties and starting with the seminal paper~\cite{Abel:2014:GeneticAlgorithmsSearch} new investigations based on Machine Learning (\ml) appeared in the recent years, focusing on different aspects of the string landscape and of the geometries used in compactifications~\cite{Krefl:2017:MachineLearningCalabiYau, Ruehle:2017:EvolvingNeuralNetworks, He:2017:MachinelearningStringLandscape, Carifio:2017:MachineLearningString, Altman:2019:EstimatingCalabiYauHypersurface, Bull:2018:MachineLearningCICY, Mutter:2019:DeepLearningHeterotic, Ashmore:2020:MachineLearningCalabiYau, Brodie:2020:MachineLearningLine, Bull:2019:GettingCICYHigh, Cole:2019:SearchingLandscapeFlux, Faraggi:2020:MachineLearningClassification, Halverson:2019:BranesBrainsExploring, Bizet:2020:TestingSwamplandConjectures, Halverson:2020:StatisticalPredictionsString, Krippendorf:2020:DetectingSymmetriesNeural, Otsuka:2020:DeepLearningKmeans, Parr:2020:ContrastDataMining, Parr:2020:PredictingOrbifoldOrigin} (see~\cite{Ruehle:2020:DataScienceApplications} for a comprehensive summary of the state of the art).
 In fact \ml is definitely adequate when it comes to pattern search or statistical inference starting from large amount of data.
 This motivates two main applications to string theory: the systematic exploration of the space of possibilities (if they are not random then \ml should be able to find a pattern) and the deduction of mathematical formulas from the \ml approximation.
-The last few years have seen a major uprising of \ml, and more particularly of neural networks (\nn)~\cite{Bengio:2017:DeepLearning, Chollet:2018:DeepLearningPython, Geron:2019:HandsOnMachineLearning}.
+The last few years have seen a major uprising of \ml, and more particularly of neural networks (\nn)~\cite{Goodfellow:2017:DeepLearning, Chollet:2018:DeepLearningPython, Geron:2019:HandsOnMachineLearning}.
 This technology is efficient at discovering and predicting patterns and now pervades most fields of applied sciences and of the industry.
 One of the most critical places where progress can be expected is in understanding the geometries used to describe string compactifications and this will be the object of study in the following analysis.
-We mainly refer to~\cite{Geron:2019:HandsOnMachineLearning, Chollet:2018:DeepLearningPython, Bengio:2017:DeepLearning} for reviews in \ml and deep learning techniques, and to~\cite{Ruehle:2020:DataScienceApplications, Skiena:2017:DataScienceDesign, Zheng:2018:FeatureEngineeringMachine} for applications of data science techniques.
+We mainly refer to~\cite{Geron:2019:HandsOnMachineLearning, Chollet:2018:DeepLearningPython, Goodfellow:2017:DeepLearning} for reviews in \ml and deep learning techniques, and to~\cite{Ruehle:2020:DataScienceApplications, Skiena:2017:DataScienceDesign, Zheng:2018:FeatureEngineeringMachine} for applications of data science techniques.

 We address the question of computing the Hodge numbers $\hodge{1}{1} \in \N$ and $\hodge{2}{1} \in \N$ for \emph{complete intersection Calabi--Yau} (\cicy) $3$-folds~\cite{Green:1987:CalabiYauManifoldsComplete} using different \ml algorithms.
 A \cicy is completely specified by its \emph{configuration matrix} (whose entries are positive integers) which is the basic input of the algorithms.
@@ -68,7 +68,7 @@ Code is available on \href{https://thesfinox.github.io/ml-cicy/}{Github}.

 As presented in~\Cref{sec:CYmanifolds}, a \cy $n$-fold is a $n$-dimensional complex manifold $X$ with \SU{n} holonomy (dimension in \R is $2n$).
 An equivalent definition is the vanishing of its first Chern class.
-A standard reference for the physicist is~\cite{Hubsch:1992:CalabiyauManifoldsBestiary} (see also~\cite{Anderson:2018:TASILecturesGeometric, He:2020:CalabiYauSpacesString} for useful references).
+A standard reference for the physicist is~\cite{Hubsch:1992:CalabiyauManifoldsBestiary} (see also~\cite{Anderson:2018:TASILecturesGeometric} for useful references).
 The compactification on a \cy leads to the breaking of large part of the supersymmetry which is phenomenologically more realistic than the very high energy description with intact supersymmetry.

 \cy manifolds are characterised by a certain number of topological properties (see~\Cref{sec:cohomology_hodge}), the most salient being the Hodge numbers \hodge{1}{1} and \hodge{2}{1}, counting respectively the Kähler and complex structure deformations, and the Euler characteristics:\footnotemark{}
@@ -79,14 +79,14 @@ The compactification on a \cy leads to the breaking of large part of the supersy
  \chi = 2 \qty(\hodge{1}{1} - \hodge{2}{1}).
  \label{eq:cy:euler}
 \end{equation}
-Interestingly topological properties of the manifold directly translate into features of the $4$-dimensional effective action (in particular the number of fields, the representations and the gauge symmetry)~\cite{Hubsch:1992:CalabiyauManifoldsBestiary, Becker:2006:StringTheoryMTheory}.\footnotemark{}
+Interestingly topological properties of the manifold directly translate into features of the $4$-dimensional effective action (in particular the number of fields, the representations and the gauge symmetry)~\cite{Hubsch:1992:CalabiyauManifoldsBestiary}.\footnotemark{}
 \footnotetext{%
  Another reason for sticking to topological properties is that there is no \cy manifold for which the metric is known.
  Hence it is not possible to perform explicitly the Kaluza--Klein reduction in order to derive the $4$-dimensional theory.
-}%
+}
 In particular the Hodge numbers count the number of chiral multiplets (in heterotic compactifications) and the number of hyper- and vector multiplets (in type II compactifications): these are related to the number of fermion generations ($3$ in the Standard Model) and is thus an important measure of the distance to the Standard Model.

-The simplest \cy manifolds are constructed by considering the complete intersection of hypersurfaces in a product $\cA$ of projective spaces $\mathds{P}^{n_i}$ (called the ambient space)~\cite{Green:1987:CalabiYauManifoldsComplete, Green:1987:PolynomialDeformationsCohomology, Candelas:1988:CompleteIntersectionCalabiYau, Green:1989:AllHodgeNumbers, Anderson:2017:FibrationsCICYThreefolds, Anderson:2018:TASILecturesGeometric}:
+The simplest \cy manifolds are constructed by considering the complete intersection of hypersurfaces in a product $\cA$ of projective spaces $\mathds{P}^{n_i}$ (called the ambient space)~\cite{Green:1987:CalabiYauManifoldsComplete, Green:1987:PolynomialDeformationsCohomology, Candelas:1988:CompleteIntersectionCalabiYau, Green:1989:AllHodgeNumbers, Anderson:2017:FibrationsCICYThreefolds}:
 \begin{equation}
  \cA = \mathds{P}^{n_1} \times \cdots \times \mathds{P}^{n_m}.
 \end{equation}
@@ -173,7 +173,7 @@ Below we show a list of the \cicy properties and of their configuration matrices
    \item unique Hodge number combinations: $266$
  \end{itemize}

-  \item ``original dataset''~\cite{Candelas:1988:CompleteIntersectionCalabiYau, Green:1989:AllHodgeNumbers}
+  \item ``original dataset''~\cite{Candelas:1988:CompleteIntersectionCalabiYau, Green:1989:AllHodgeNumbers}:
  \begin{itemize}
    \item maximal size of the configuration matrices: $12 \times 15$
    \item number of favourable matrices (excluding product spaces): $4874$ ($\num{61.8}\%$)
@@ -181,7 +181,7 @@ Below we show a list of the \cicy properties and of their configuration matrices
    \item number of different ambient spaces: $235$
  \end{itemize}

-  \item ``favourable dataset''~\cite{Anderson:2017:FibrationsCICYThreefolds}
+  \item ``favourable dataset''~\cite{Anderson:2017:FibrationsCICYThreefolds}:
  \begin{itemize}
    \item maximal size of the configuration matrices: $15 \times 18$
    \item number of favourable matrices (excluding product spaces): $7820$ ($\num{99.1}\%$)
--- a/sec/part3/ml.tex
+++ b/sec/part3/ml.tex
@@ -302,7 +302,7 @@ Obviously the very small percentage of outliers makes the effect of removing the
 We compare the performances of different \ml algorithms: linear regression, support vector machines (\svm), random forests, gradient boosted trees and (deep) neural networks.
 We obtain the best results using deep \emph{convolutional} neural networks.
 In fact we present a new neural network architecture, inspired by the Inception model~\cite{Szegedy:2015:GoingDeeperConvolutions, Szegedy:2016:RethinkingInceptionArchitecture, Szegedy:2016:Inceptionv4InceptionresnetImpact} which has been developed in the field of computer vision.
-We provide some details on the different algorithms in~\Cref{app:ml-algo} and refer the reader to the literature~\cite{Bengio:2017:DeepLearning, Chollet:2018:DeepLearningPython, Geron:2019:HandsOnMachineLearning, Skiena:2017:DataScienceDesign, Mehta:2019:HighbiasLowvarianceIntroduction, Carleo:2019:MachineLearningPhysical, Ruehle:2020:DataScienceApplications} for more details.
+We provide some details on the different algorithms in~\Cref{app:ml-algo} and refer the reader to the literature~\cite{Goodfellow:2017:DeepLearning, Chollet:2018:DeepLearningPython, Geron:2019:HandsOnMachineLearning, Skiena:2017:DataScienceDesign, Ruehle:2020:DataScienceApplications} for more details.


 \subsubsection{Feature Extraction}
@@ -394,7 +394,7 @@ For the same reason, the latter are not displayed for the favourable dataset.
 \paragraph{Visualisation of the performance}

 Complementary to the predictions and the accuracy results, we also provide different visualisations of the performance of the models in the form of univariate plots (histograms) and multivariate distributions (scatter plots).
-In fact the usual assumption behind the statistical inference of a distribution is that the difference between the observed data and the predicted values can be modelled by a random variable called \textit{residual}~\cite{Lista:2017:StatisticalMethodsData,Caffo::DataScienceSpecialization}.\footnotemark{}
+In fact the usual assumption behind the statistical inference of a distribution is that the difference between the observed data and the predicted values can be modelled by a random variable called \textit{residual}~\cite{Skiena:2017:DataScienceDesign,Caffo::DataScienceSpecialization}.\footnotemark{}
 \footnotetext{%
  The difference between the non observable \textit{true} value of the model and the observed data is known as \textit{statistical error}.
  The difference between residuals and errors is subtle but the two definitions have different interpretations in the context of the regression analysis: in a sense, residuals are an estimate of the errors.
@@ -1232,7 +1232,7 @@ In fact this neural network is much more powerful than the previous networks we
 When predicting only \hodge{1}{1} it surpasses \SI{97}{\percent} accuracy using only \SI{30}{\percent} of the data for training.
 While it seems that the predictions suffer when using a single network for both Hodge numbers this remains much better than any other algorithm.
 It may seem counter-intuitive that convolutions work well on this data since they are not translation or rotation invariant but only permutation invariant.
-However convolution alone is not sufficient to ensure invariances under these transformations but it must be supplemented with pooling operations~\cite{Bengio:2017:DeepLearning} which we do not use.
+However convolution alone is not sufficient to ensure invariances under these transformations but it must be supplemented with pooling operations~\cite{Goodfellow:2017:DeepLearning} which we do not use.
 Moreover convolution layers do more than just taking translation properties into account: they allow to make highly complicated combinations of the inputs and to share weights among components to find subtler patterns than standard fully connected layers.
 This network is more studied in more details in~\cite{Erbin:2020:InceptionNeuralNetwork}.