We ended Part ONE on our quest to find a robust graph-based embedding solution. Let’s continue our search hereon:

**Finding our match in Cleora**

In our quest, we stumbled upon an AI framework called Cleora^{1}, which claimed to be ~197x faster than DeepWalk or Node2Vec. It checked all the boxes and qualified for our next iteration. Let’s see why:

- Cleora’s idea was simple: nodes close to each other should have high similarity scores but not at the cost of driving a not-connected node far away
- Cleora was purely unsupervised
- No explicit learning objective was formulated and no sampling of positive or negative examples was performed

So in Cleora, a node embedding is calculated by taking the average weights of its neighbours and then normalising it. While the process is iterative, the number of iterations is a hyperparameter along with the embedding size. However in Cleora’s case, node attributes are not required, thus it is purely based on graph structure and the local neighbourhood.

We used Cleora for customer-restaurants graph data in the National Capital Region (NCR) area. And to our delight, the embedding generation was superfast (i.e <5 minutes). For context, do remember that GraphSAGE took ~20hours for the same data in the NCR region.

Since we had a bipartite graph with customers and restaurants as nodes, the notion of similarity in the embedding space meant that customers interacting with similar restaurants would have had high similarity and vice-versa.

**Let’s find the results**

Following are the examples of restaurant-restaurant similarity (out of the top 10 similar restaurants of Gurugram) being captured.

Here, the theme is around restaurants offering healthy food, for which Caterspoint is a vetted example basis customer behaviour.

Here we notice that customers ordering from a healthy restaurant are likely to be closely connected to other healthy restaurants. No cuisine-related, price-related or any other metadata is passed in the embedding creation process; only customer nodes and restaurant nodes with edges are considered.

Here, the theme is around restaurants offering gourmet food. In this case, customers ordering from high-end restaurants are closely connected to other high-end restaurants.

Here the theme is restaurants offering regional cuisine. Again, customers ordering from these restaurants will be more closely connected to other similarly placed regional restaurants.

Hence, there is strong evidence to use these embeddings downstream as both customer similarity and restaurant similarity capture the graph related properties robustly.

What we saw with Cleora was that it preserved entity type info, which was important for heterogeneous graphs. However, since both customer and restaurant embeddings are separated (as per the figure above), restaurant recommendations cannot be made by directly taking the dot product between both entities and sorting them based on similarity. To do that, we need to **represent customers in terms of ordered restaurants**.

- One way would be to use different iteration parameters to calculate customer and restaurant embedding. This way connected restaurants will have high similarity scores, but overall it will be an ‘overfit’ solution and will not generalise well.
- We can also average out the ordered restaurant’s embeddings. Pretty simple right, but it loses crucial information on the restaurant embedding space. Essentially, this might recommend all the restaurants near the centroid of the ordered from restaurants.

More details can be found on this^{2} thread.

**Limitations**

- When we wanted to use edge weight into embedding and raised this question
^{3}in cleora github repo, we found that this feature was not implemented yet. - Additionally, unlike graphSAGE, node features could not be used in Cleora.

**Next stop: EMDE (Efficient Manifold Density Estimator)**

To solve this problem, we decided to explore EMDE^{4} (Efficient Manifold Density Estimator)** ** exploits locality-sensitive hashing to create ‘*sketches’*, i.e. histogram-like structures which represent density on multidimensional manifolds. One can assume that these items are in a continuous shape and not as discrete points in the embedding space. Here, customer preference is the probability density function, so items liked by the customers can act as samples drawn using probability density. EMDE estimates this probability density function.

EMDE recommender architecture *(Source: **An efficient manifold density estimator for all recommendation systems**)*

Essentially, this process **compresses multiple vectors into a single one**. It takes item embeddings (from multiple sources) and a list of items per customer as input and returns sparse customer embedding as output. Once customer embedding is calculated, the recommendation task is then straightforward. There are three parts in the algorithm.

**Part 1 : Vector Encoding and Aggregation**

This includes generating the item and customer sketch. There are two hyperparameters:

- N – number of independent space partitions
- K – total number of hyperplanes in single space partitioning

** Item sketch**: the item embedding space is divided into K partitions and use LSH to create an encoding vector. This process is repeated N times and each of the generated vectors are concatenated to get an item sketch. LSH

^{5}helps to ensure that semantically, only similar items share their assigned buckets in the sketch, thus preserving the geometric prior from upstream embedding.

*Properties of item sketch* –

- Sketch are additive and of constant size (N*K).
- sketch(t
_{1}, .. , t_{n}) = sketch(t_{1}) + …. + sketch(t_{n}) *Dimension*(sketch(t_{1}, .. , t_{n}) =*Dimension*(sketch(t_{1})) =*Dimension*(sketch(t_{n}))

- sketch(t
- Similar items would have similar sketches:
*sketch*(roti)_{~}*sketch*(chapati) - Easily Incorporate time decay. For e.g., customer’s recently ordered food items vs items ordered a few months back, both items have different relevance in customer’s recommendation feed.
- sketch(item) = w(t
_{2}– t1)sketch(item), where t_{2}-t_{1}is time difference, and w are decay force

- sketch(item) = w(t

** Customer sketch**: Here, a sketch of each item (from customer interaction) is created independently and all sketches are added along the axis.

*Properties of customer sketch* –

- Assume that customer have interacted with food item (i_1, i_2, …., i_n) previously,
- sketch(customer) = sketch(i_
_{1}, …, i__{n}), where |x| is k = 1n|x_{k}|^{2}

- sketch(customer) = sketch(i_
*Dimension*(customersketch) =*Dimension*(item sketch)

**Part 2 – Feed Forward NN**

Since we needed an ML framework to transform a mapping from customer space to item space, we used the Feedforward Neural Network Model, which is used as a conditional density estimator. In this case,

- customer embedding (customer sketch and customer features) serves as our input
- the target serves as theitem sketch (customer’s next ordered item)

Here, the input is a collection of previously ordered items and simultaneously, the task predicts which item(s) will be clicked on or ordered next.** ****This is valid for any customer-item interaction data where we have embedding for items.**

**Part 3 – Prediction/Item Ranking**

In this case, we can compute customer embedding from the model, where the input is customer-sketch and the output is customer embedding.

We get the dot product score with item set sketches and sort the items by score in decreasing order to get the ranking.

**Experiment Setup**

- We used K = 128 and N = 20, and computed item embeddings from m = 3 different sources. Therefore, customer
*Dimension*(customersketch) =*Dimension*(item sketch) = K*N*m = 7680 - Input embedding dimension =
*Dimension*(customersketch) +*Dimension*(customerfeatures)

= 7680 + 4 = 7684 and Output embedding dimension = 7680

**Major benefits of MDME**

MDME offers flexibility to combine multiple embeddings captured from different modalities. For example, we can get item embedding based on interactions (click/order/add to fav), item names or item images, etc.

### Advantages of MDME include:

- The ability to append contextual attributes such as customer price range, time of the day etc.
- It does not require heavy computation and resources. Hence, it is easier to scale and train as compared to other sequential-based methods like recurrent neural networks (RNN) and graph neural networks (GNN).
- Do note that the common issue with GNN based methods is neighbourhood explosion which impacts the model performance.
- Further, unlike RNN which takes input sequentially and makes the learning slow, this sequential behaviour can easily be captured by sketches by using the additive property. It can also incorporate time decay.

**Where do we stand now?**

We are currently using EMDE for

- Generating candidates to facilitate downstream recommendation systems.
- It generates recommendations using density-based rich customer representation.
- It allows us to trace customer look-alikes (
*‘People Like You*’) to find similar users with similar cuisine/taste preferences as well as price affinity.

**Where do we go from here?**

- Recommendation depends on time of the day and location. These two signals need to be incorporated within the EMDE solution. Currently, we are experimenting with the first part of EMDE architecture to provide these inputs and their representation.
- Recall@Top10 = 35%. Seems low but we need to measure this in conjunction with other metrics such as diversity and fairness of our system. Increasing Recall is easy. However, we want to refrain from simply recommending user’s past ordered restaurants and improve the discovery through our platform.
- More experimentation is needed to incorporate multiple embedding sources and customer feature space to further improve the recommendations.
- Cleora + EMDE gives us a generalised framework for recommendation. So we are exploring ways to use it in other applications such as search ranking, dish recommendations, etc.

The aim remains to make recommendations more accurate and efficient. More on our tech and product innovations and learnings on our blog.

*This is a two-part series on improving the recommendation search for our customers. If you are interested to work with us on such innovative problem solving, then connect with Manav Gupta on LinkedIn. We are always looking for enterprising data scientists at Zomato.*

This blog was written by Saurabh Gupta, Sonal Garg and Manav Gupta.

-x-

Sources:

- Cleora: A Simple, Strong and Scalable Graph Embedding Scheme, arxiv.org
- User-Item Embedding, github.com
- Edges with weight, github.com
- An efficient manifold density estimator for all recommendation systems, arxiv.org
- Locality-sensitive hashing, wikipedia.org