FoodKG is a unique software in its type and purpose. There are no other systems or tools that have the same features. Our main work falls under graph embedding techniques. Embedded vectors learn the distributional semantics of words and can be used in different applications such as Named Entity Recognition (NER), question answering, document classification, information retrieval, and other machine learning applications \citep{nadeau2007survey}. The embedded vectors mainly rely on calculating the angle between pairs of words to check the semantic similarity and perform other word analogies tasks suck as the common example \textit{king – queen = man – woman}. The two main methods for learning word vectors are matrix factorization methods such as Latent Semantic Analysis (LSA) \citep{deerwester1990indexing} and Local Context Window (LCW) methods such as skip-gram (Word2vec) \citep{mikolov2013distributed}.
Matrix factorization is the method that generates the low-dimensional word representation in order to capture the statistical information about a corpus by decomposing large matrices after utilizing low-rank approximations. In LSA, each row corresponds to a word or a concept whereas columns correspond to a different document in the corpus. However, while methods like LSA leverage statistical information, they do relatively poor in the word analogy task indicating a sub-optimal vector space structure. The second method aids in making predictions within a local context window, such as the Continuous Bag-of-Words (CBOW) model \citep{mikolov2013efficient}. CBOW architecture relies on predicting the focus word from the context words. Skip-gram is the method of predicting all the context words one by one from a single given focus word.
Few techniques have been proposed, such as hierarchical softmax, to optimize such predictions by building a binary tree of all the words then predict the path to a specific node. Recently, \citep{glove} shed the light on GloVe, which is an unsupervised learning algorithm for generating embeddings by aggregating global word-word co-occurrence matrix counts where it tabulates the number of times word j appears in the context of word i. FastText is another embedding model created by the Facebook AI Research (FAIR) group for efficient learning of word representations and sentence classification \citep{bojanowski2017enriching}. FastText considers each word is a combination of \textit{n}-grams of characters where \textit{n} could range from 1 to the length of the word. Therefore, fastText has some advantages over Word2vec and GloVe, such as finding a vector representation for the rare words that may not appear in Word2vec and GloVe. \textit{n}-gram embeddings tend to perform better on smaller datasets. A knowledge graph embedding is a type of embbedings in which the input is a knowledge graph that leverages the use of relations between the vertices, triple-based.
We consider Holographic Embeddings of Knowledge Graphs (HolE) to be the state-of-art knowledge graph embedding model \citep{nickel2016holographic}. When the input dataset is a graph instead of a text corpus we apply different embedding algorithms such as: {LINE} \citep{tang2015line}, citep{grover2016node2vec}, {M-NMF} \citep{wang2017community}, and {DANMF} \citep{ye2018deep}. DeepWalk is one of the common models for graph embedding \citep{perozzi2014deepwalk}. DeepWalk leverages modeling and deep learning for learning latent representations of vertices in a graph by analyzing and applying random walks. Random walk in a graph is equivalent to predicting a word in a sentence, whereas in graphs the sequence of nodes that frequently appear together are considered to be the sentence within a specific window size.
This technique also uses skip-gram to minimize the negative log-likelihood for the observed neighborhood samples. \hl{GEMSEC is another graph embedding algorithm that learns nodes clustering while computing the embeddings, whereas the other models do not utilize clustering}. It relies on sequence-based embedding with clustering to cluster the embedded nodes simultaneously. The algorithm places the nodes in abstract feature space to minimize the negative log-likelihood of the preserved neighborhood nodes with clustering the nodes into a specific number of clusters. Graph embeddings hold the semantics between the concepts in a better way than word embeddings, and that is the reason behind using a graph embedding model to utilize graph semantics in FoodKG.