Mining Structured Data From Recipes

API for extracting ingredients, quantities and other entities from unstructured text.


A food-tech startup based in India needed to implement food recommendations for their users, but had too little information about their users’ preferences. There was a large database of user-generated English language recipes that could provide a plethora of previously unavailable data, if extracted automatically, accurately and at a sufficient scale.


We leveraged machine learning tools to build an API that takes raw, plain-text recipes and extracts all kinds of structured data: ingredients, quantities and units, step-by-step instructions, cooking temperatures or required appliances and utensils.

Looking for something similar?


System Architecture


  • To separate the ingredients list from the instructions, we used K‐Means clustering from the scikit-learn library fed by a “bag of POS tags” to model a grammatical spectrum of each sentence/line, combined with extra features such as “starts with number”, “starts with verb”, “ends with period”.
  • To extract ingredients, quantities and units, we leveraged a specialized New York Times Ingredient Phrase Tagger with conditional random fields model trained on 130,000 documents from NYT Cooking section, plus our own dataset featuring European units, such as grams.

Instructions are parsed using Stanford CoreNLP pipeline, combining multiple approaches:

  • CRF NER model trained on CoNLL2003 dataset.
  • Custom regex rules (e.g. for extracting temperature-related data).
  • Domain‐specific gazetteers for various new entity types such as utensils and appliances. These can be extended at any me without re‐training any model.

Example Input


3 tablespoons olive oil  
2 garlic cloves, finely chopped  
1 red onion, finely chopped  
1 zucchini, grated and squeezed to remove excess moisture  
1⁄4 cup chives, finely chopped  
1 cup cottage cheese  
1⁄2 cup gluten-free flour  
1 teaspoon baking powder  
1 pinch salt and pepper  
12 eggs  
1 avocado, cut into quarters and fanned  
kale chips  
2 tablespoons honey  
1⁄2lemon, juice and zest  
2 tablespoons sriracha sauce

Preheat the oven to 355 F and line a baking sheet with baking paper. In a bowl combine 2 Tbsp of Olive oil and the kale with salt and pepper, spread evenly on to tray and cook for 12-15 minutes or until crispy. Set aside to cool for 5 minutes. In a frypan on medium-high heat add remaining olive oil and caramelize the red onion for 4 minutes before adding garlic and cooking for a further 3 minutes, set aside. In a bowl whisk your eggs then add in your onion, zucchini, chives, cheese, flour, baking powder and season with salt & pepper. Grease a 6 tin muffin tray and evenly distribute egg mix into each. Bake in the oven for 8-10 minutes or until egg is cooked. Remove and allow to cool for 5 minutes. Meanwhile in a bowl combine the siracha, honey and lemon juice. To finish your muffin top with the avocado 
drizzle with the Sriracha honey and top with a kale chip.

API Response

         "name":"olive oil",
         "input":"3 tablespoons olive oil"
         "comment":"finely chopped",
         "input":"2 garlic cloves, finely chopped"
         "name":"red onion",
         "comment":"finely chopped",
         "input":"1 red onion, finely chopped"
         "text":"Preheat the oven to 355 F and line a
     baking sheet with baking paper.",
               "text":"355 F",

Read More Case Studies