Update presentation and typos
Signed-off-by: Riccardo Finotello <riccardo.finotello@gmail.com>
This commit is contained in:
@@ -196,7 +196,6 @@
|
|||||||
In the non Abelian case we considered there is no simple way to write the action using global data.
|
In the non Abelian case we considered there is no simple way to write the action using global data.
|
||||||
However the contribution to the Euclidean action is larger than the Abelian case: the strings are in fact no longer constrained on a plane and, in order to stretch across the boundaries, they have to form a small bump while detaching from the D-brane.
|
However the contribution to the Euclidean action is larger than the Abelian case: the strings are in fact no longer constrained on a plane and, in order to stretch across the boundaries, they have to form a small bump while detaching from the D-brane.
|
||||||
The Yukawa coupling in this case is therefore suppressed with respect to the Abelian case.
|
The Yukawa coupling in this case is therefore suppressed with respect to the Abelian case.
|
||||||
Phenomenologically speaking, since the couplings are proportional to the mass of the scalar involved, the non Abelian case describes the coupling of lighter states.
|
|
||||||
|
|
||||||
- page 32/102
|
- page 32/102
|
||||||
|
|
||||||
@@ -298,7 +297,7 @@
|
|||||||
|
|
||||||
- page 52/102
|
- page 52/102
|
||||||
|
|
||||||
In the literature we can already find studies in the computation of amplitudes (mainly closed strings, since we are dealing with gravitational interactions).
|
In the literature we can already find the computation of amplitudes (mainly closed strings, since we are dealing with gravitational interactions).
|
||||||
The presence of divergences in N-point correlators is however usually associated to a gravitational backreaction due to exchange of gravitons.
|
The presence of divergences in N-point correlators is however usually associated to a gravitational backreaction due to exchange of gravitons.
|
||||||
|
|
||||||
- page 53/102
|
- page 53/102
|
||||||
@@ -367,7 +366,7 @@
|
|||||||
- page 63/102
|
- page 63/102
|
||||||
|
|
||||||
In this sense even string theory cannot give a solution to the problem.
|
In this sense even string theory cannot give a solution to the problem.
|
||||||
In other words since the effective theory does not even exist, its high energy completion cannot be capable of providing a better description.
|
In other words since the effective theory does not even exist, its high energy completion is not capable of providing a better description.
|
||||||
|
|
||||||
- page 64/102
|
- page 64/102
|
||||||
|
|
||||||
@@ -413,14 +412,14 @@
|
|||||||
|
|
||||||
Specifically we focus on manifolds built as intersections of hypersurfaces in projective spaces, that is intersections of several homogeneous equations in the complex coordinates of the manifold.
|
Specifically we focus on manifolds built as intersections of hypersurfaces in projective spaces, that is intersections of several homogeneous equations in the complex coordinates of the manifold.
|
||||||
|
|
||||||
As we are interested in studying these manifolds as topological spaces we do not care about the coefficients, but only the exponents.
|
As we are interested in studying these manifolds as topological spaces, for each equation and projective space we do not care about the coefficients, but only the exponents, or better the degree of the equation in a given coordinate.
|
||||||
The intersection is complete in the sense that it is non degenerate.
|
The intersection is complete in the sense that it is non degenerate.
|
||||||
|
|
||||||
- page 71/102
|
- page 71/102
|
||||||
|
|
||||||
The intersections can be generalised to multiple projective spaces and equations and the manifold can be characterised by a matrix containing the powers of the coordinates in each equation.
|
The intersections can be generalised to multiple projective spaces and equations and the manifold can be characterised by a matrix containing the powers of the coordinates in each equation.
|
||||||
|
|
||||||
The problem we are interested is therefore to be able to take the so called "configuration matrix" of the manifolds and predict the value of the Hodge numbers.
|
The problem in which we are interested is therefore to be able to take the so called "configuration matrix" of the manifolds and predict the value of the Hodge numbers.
|
||||||
Formally this is a map from a matrix to a natural number.
|
Formally this is a map from a matrix to a natural number.
|
||||||
|
|
||||||
- page 72/102
|
- page 72/102
|
||||||
@@ -462,7 +461,7 @@
|
|||||||
|
|
||||||
The dataset we use contains less than 10000 manifolds (in machine learning terms it is still small).
|
The dataset we use contains less than 10000 manifolds (in machine learning terms it is still small).
|
||||||
|
|
||||||
From these we remove product spaces (recognisable by their block diagonal form of the configuration matrix) and we remove very high values of the Hodge numbers to avoid learning "extremal configurations".
|
From these we remove product spaces (recognisable by their block diagonal form of the configuration matrix) and we remove very high values of the Hodge numbers from training to avoid learning "extremal configurations".
|
||||||
|
|
||||||
In this sense we are simply not feeding the machine "extremal" configurations in an attempt to push as far as possible the application: should the machine learn a good representation, it will automatically be capable of learning also those configurations without a human manually feeding them.
|
In this sense we are simply not feeding the machine "extremal" configurations in an attempt to push as far as possible the application: should the machine learn a good representation, it will automatically be capable of learning also those configurations without a human manually feeding them.
|
||||||
|
|
||||||
@@ -489,7 +488,6 @@
|
|||||||
In fact as we can see most of the features such as the number of projective spaces or the number of equations in the matrix are heavily correlated with the Hodge numbers.
|
In fact as we can see most of the features such as the number of projective spaces or the number of equations in the matrix are heavily correlated with the Hodge numbers.
|
||||||
|
|
||||||
Moreover even using algorithms to produce a ranking of the variables such as decision trees show that such "engineered features" are much more important than the configuration matrix itself.
|
Moreover even using algorithms to produce a ranking of the variables such as decision trees show that such "engineered features" are much more important than the configuration matrix itself.
|
||||||
Here we can see some of the scalar variables ranked across each other.
|
|
||||||
|
|
||||||
- page 82/102
|
- page 82/102
|
||||||
|
|
||||||
@@ -524,9 +522,9 @@
|
|||||||
|
|
||||||
- page 86/102
|
- page 86/102
|
||||||
|
|
||||||
Visually PCA is used to isolate the eigenvalues of the covariance matrix (or the singular values of the matrix) which do not belong to the background.
|
Visually PCA is used to isolate the eigenvalues and eigenvectors of the covariance matrix (or the singular values of the matrix) which do not belong to the background.
|
||||||
|
|
||||||
From random matrix theory we know that the eigenvalues of a independently and identically distributed matrix (a Wishart matrix) follow a Marchenko-Pastur distribution.
|
From random matrix theory we know that the eigenvalues of an independently and identically distributed matrix (a Wishart matrix) follow a Marchenko-Pastur distribution.
|
||||||
|
|
||||||
Such matrix containing a signal would therefore be recognised by the presence of eigenvalues outside this probability distribution.
|
Such matrix containing a signal would therefore be recognised by the presence of eigenvalues outside this probability distribution.
|
||||||
We could therefore simply keep the corresponding eigenvectors.
|
We could therefore simply keep the corresponding eigenvectors.
|
||||||
@@ -537,12 +535,13 @@
|
|||||||
|
|
||||||
As we can see we used several algorithms to evaluate the procedure.
|
As we can see we used several algorithms to evaluate the procedure.
|
||||||
Previous approaches in the literature mainly relied on the direct application of algorithms to the configuration matrix.
|
Previous approaches in the literature mainly relied on the direct application of algorithms to the configuration matrix.
|
||||||
We extended this beyond the previously considered algorithms (mainly support vectors) to decision trees and linear models.
|
We extended this beyond the previously considered algorithms (mainly support vectors) to decision trees and linear models for comparison.
|
||||||
|
|
||||||
- page 88/102
|
- page 88/102
|
||||||
|
|
||||||
Techniques such as feature engineering and PCA provide a huge improvement (even with less training data).
|
Techniques such as feature engineering and PCA provide a huge improvement (even with less training data).
|
||||||
Let me for instance point out the fact the even a simple linear regression reaches the same level of accuracy previously reached by more complex algorithms, even with much less training data.
|
Let me for instance point out the fact the even a simple linear regression reaches the same level of accuracy previously reached by more complex algorithms, even with much less training data.
|
||||||
|
This ultimately can cut computational costs and complexity.
|
||||||
|
|
||||||
- page 89/102
|
- page 89/102
|
||||||
|
|
||||||
@@ -554,10 +553,10 @@
|
|||||||
|
|
||||||
We focused on two distinct architectures.
|
We focused on two distinct architectures.
|
||||||
|
|
||||||
The older fully connected network were employed in previous attempts at predicting the Hodge numbers.
|
The older fully connected networks were employed in previous attempts at predicting the Hodge numbers.
|
||||||
They rely on a series of matrix operations to create new outputs from previous layers.
|
They rely on a series of matrix operations to create new outputs from previous layers.
|
||||||
In this sense the matrix W and the bias term b are the weights which need to be updated.
|
In this sense the matrix W and the bias term b are the weights which need to be updated.
|
||||||
Each node is connected to all the outputs, hence the name fully connected or densely connected.
|
Each node is connected to all the outputs, hence the name fully connected or densely connected (or equivalently the matrix W does not have vanishing entries).
|
||||||
To learn non linear functions this is however not sufficient: an iterated application of these linear maps would simply result in a linear function to be learned.
|
To learn non linear functions this is however not sufficient: an iterated application of these linear maps would simply result in a linear function to be learned.
|
||||||
We "break" linearity by introducing an "activation function" at each layer.
|
We "break" linearity by introducing an "activation function" at each layer.
|
||||||
|
|
||||||
@@ -569,7 +568,7 @@
|
|||||||
|
|
||||||
Since the input in this case does not need to flattened, convolutions retain the notion of vicinity between cells in a grid (here we have an example of a configuration matrix as seen by a convolutional neural network).
|
Since the input in this case does not need to flattened, convolutions retain the notion of vicinity between cells in a grid (here we have an example of a configuration matrix as seen by a convolutional neural network).
|
||||||
|
|
||||||
Since they do not have one weight for each connection, they have a smaller number of parameters (proportional to the size of the window) to be updated (in our specific case we cut more more than one order of magnitude the number of parameters used).
|
Since they do not have one weight for each connection, they have a smaller number of parameters (proportional to the size of the window) to be updated (in our specific case we cut by more than one order of magnitude the number of parameters used).
|
||||||
|
|
||||||
Moreover weights are shared by adjacent cells, meaning that if there is a structure to be inferred, this is the way to go to exploit the "artificial intelligence" underlying the operations involved.
|
Moreover weights are shared by adjacent cells, meaning that if there is a structure to be inferred, this is the way to go to exploit the "artificial intelligence" underlying the operations involved.
|
||||||
|
|
||||||
@@ -577,7 +576,7 @@
|
|||||||
|
|
||||||
In this sense a convolutional architecture can isolate defining features of the output and pass them to the following layer as in the animation.
|
In this sense a convolutional architecture can isolate defining features of the output and pass them to the following layer as in the animation.
|
||||||
|
|
||||||
Using a computer science analogy, this is used to classify objects given a picture: a convolutional neural network is literally capable of isolating what makes a dog a dog and what distinguishes it from a cat (even more specific it can separate a Labrador from a Golden Retriever).
|
For instance, using a computer science analogy, this can be used to classify objects given a picture: a convolutional neural network is literally capable of isolating what makes a dog a dog and what distinguishes it from a cat (even more specific it can separate a Labrador from a Golden Retriever).
|
||||||
|
|
||||||
- page 92/102
|
- page 92/102
|
||||||
|
|
||||||
@@ -600,7 +599,7 @@
|
|||||||
|
|
||||||
- page 95/102
|
- page 95/102
|
||||||
|
|
||||||
As we can see even the simple introduction of a traditional convolutional kernel (it was a 5x5 kernel in this case) is sufficient to boost the accuracy of the predictions (results by Bull et al. in 2018 reached only 77% of accuracy on h^{1,1}).
|
As we can see even the simple introduction of a traditional convolutional kernel (it was a 5x5 kernel in this case) is sufficient to boost the accuracy of the predictions (previous best results in 2018 reached only 77% of accuracy on h^{1,1}).
|
||||||
|
|
||||||
- page 96/102
|
- page 96/102
|
||||||
|
|
||||||
@@ -610,6 +609,8 @@
|
|||||||
|
|
||||||
The network is also solid enough to predict both Hodge numbers at the same time: trading a bit of the accuracy for a simpler model, it is in fact possible to let the machine learn the existing relation between the Hodge numbers without specifically inputing anything (for instance by inserting the fact that the difference of the Hodge numbers is the Euler characteristic).
|
The network is also solid enough to predict both Hodge numbers at the same time: trading a bit of the accuracy for a simpler model, it is in fact possible to let the machine learn the existing relation between the Hodge numbers without specifically inputing anything (for instance by inserting the fact that the difference of the Hodge numbers is the Euler characteristic).
|
||||||
|
|
||||||
|
For more specific info I invite you to take a look at Harold Erbin's talk on the subject at the recent "string_data" workshop.
|
||||||
|
|
||||||
- page 97/102
|
- page 97/102
|
||||||
|
|
||||||
Deep learning can therefore be used conscientiously (and I cannot stress this enough) as a predictive method, provided that one is able to analyse the data (no black boxes should ever be admitted).
|
Deep learning can therefore be used conscientiously (and I cannot stress this enough) as a predictive method, provided that one is able to analyse the data (no black boxes should ever be admitted).
|
||||||
@@ -629,7 +630,6 @@
|
|||||||
This is in fact the first time in which they have been successfully used in theoretical physics.
|
This is in fact the first time in which they have been successfully used in theoretical physics.
|
||||||
|
|
||||||
Finally, this is an interdisciplinary approach in which a lot is yet to be learned from different perspective.
|
Finally, this is an interdisciplinary approach in which a lot is yet to be learned from different perspective.
|
||||||
Just think of the entire domain of geometric deep learning in computer science where the underlying structures of the process are investigated: surely mathematics and theoretical physics could provide a framework for it.
|
|
||||||
|
|
||||||
- page 101/102
|
- page 101/102
|
||||||
|
|
||||||
@@ -637,7 +637,7 @@
|
|||||||
|
|
||||||
In fact one could in principle exploit freedom in representing the configuration matrices to learn the best possible representation.
|
In fact one could in principle exploit freedom in representing the configuration matrices to learn the best possible representation.
|
||||||
|
|
||||||
Otherwise one could start to think about this in a mathematical embedding and study what happens in higher dimensions (where the number of manifolds is larger: almost one million complete intersections).
|
Otherwise one could start to think about this in a mathematical embedding and study what happens for CICY 4-folds (almost one million complete intersections).
|
||||||
|
|
||||||
Moreover, as I was saying, this could be used as an attempt to study formal aspects of deep learning, or even more to directly dive into the "real artificial intelligence" and start to study the problem in a reinforcement learning environment where the machine automatically learns a task without knowing the final result.
|
Moreover, as I was saying, this could be used as an attempt to study formal aspects of deep learning, or even more to directly dive into the "real artificial intelligence" and start to study the problem in a reinforcement learning environment where the machine automatically learns a task without knowing the final result.
|
||||||
|
|
||||||
|
|||||||
@@ -1507,8 +1507,6 @@
|
|||||||
|
|
||||||
\item \textbf{dataset pruning}: no product spaces, no ``very far'' outliers (reduction of $0.49\%$)
|
\item \textbf{dataset pruning}: no product spaces, no ``very far'' outliers (reduction of $0.49\%$)
|
||||||
|
|
||||||
\item $h^{1,\, 1} \in \qty[ 1,\, 16 ]$ and $h^{2,\, 1} \in \qty[ 15,\, 86 ]$
|
|
||||||
|
|
||||||
\item $80\%$ training, $10\%$ validation, $10\%$ test
|
\item $80\%$ training, $10\%$ validation, $10\%$ test
|
||||||
|
|
||||||
\item choose \textbf{regression}, but evaluate using \textbf{accuracy} (round the result)
|
\item choose \textbf{regression}, but evaluate using \textbf{accuracy} (round the result)
|
||||||
|
|||||||
Reference in New Issue
Block a user