Cooking AI Generated Indian Flatbread

My wife and me tried cooking indian naan flatbread according to a recipe generated by Zephyr-7B-β.

I wrote an article about not using ChatGPT a while ago, and I am still not interested in funneling semi-personal information into such a service. Especially if there is the theoretical possibility of running an open large language model on my own hardware.

And for a while, that remained a theoretical possibility for me. I tried to run generative text AI locally in the past and was not able to figure it out. And It didn’t feel very much like I was missing out either. The ChatGPT-ish AI stuff still smells like hype marketing to me. Perhaps it offers a somewhat hassle-free alternative to the modern day search engine experience though.

Anyways, I came into contact with a package called “ollama”, which I installed from my distro’s repository. This package allows you to download LLMs and run them your own machine, and to interface with them through the terminal.

I installed the tool, downloaded some models and had the idea to generate a recipe for indian naan flatbread. Not any kind of recipe, but one that is aimed at preparation with a cast iron pan. That sounds like a fun benchmark for a chatbot, doesn’t it? I had fantastic flatbread at an indian restaurant and would love to have the same at home.

One of the models I prompted for this was Mistral-7B-OpenOrca. It was quite a funny experience, because this model tried to give me german output even though it is only intended for english and french. It must have gotten that idea from my first interactions with it and/or from me telling it that I want the recipe in units as used in germany.

After being unable to convince OpenOrca to give me a recipe without ambiguous units like tablespoons in uncorrupted english, I tried Zephir-7B-β. This model generated multiple interesting recipes that went into text files on my system. Since we did not have yogurt, it was fine-tuned out of the equation. The recipe that we went with resulted from the following prompt:

“you are a hobbyist cook who loves making great meals with simple tools. you share your best recipes. write a recipe for making naan bread in a cast iron pan. use metric units. do not use the units fahrenheit, tablespoon, tsp and teaspoon. use celsius for temperature. do not use self-raising flour, yogurt or baking powder in the recipe. use butter and milk in the recipe.”

This way of talking to a machine feels like being a teacher and giving a cooking exercise to your class. The recipe was turned into a proper asciidoc document with the help of vim, printed out and taken into the kitchen. We put the ingredients into our kitchen machine to make the dough and added some sugar to feed our dry yeast (it works like that, right?).

The result was really satisfying! It was not quite on the level of the flatbread that I ate at the restaurant, but still impressive for such an easy home dish. So far, this is the best pan flatbread I cooked. Here is what it looked like:

What you get by following the recipe word by word seems raw at the edges, so I waited until the breads looked done. Even though the outcome did not blow my mind, this recipe is definitely a good base to build upon…

… and that brings us to the conclusion. I still don’t like ChatGPT and think that AI is a net negative for society, especially when people see it as a substitute for using their own brains. But as much as one side of me wishes that such things wouldn’t exist, they are here to stay and we need to make the best of the situation.

While trusting an AI-generated recipe blindly might result in food poisoning, you can use your brain and separate the good from the bad. I currently don’t see AI models writing major parts of my application code.

By the time I’m fishing this post, I’ve had some more experiences with text generation and image generation models. Infact, the image at the top of the article was generated by AI. What seems to work well is asking the models to suggest and compare, for example, different softwares. I also prompted different models about some fairly simple code/config questions, and the results seem rougly 50% accurate.

Generally speaking, there still seem to be a lot of infactual hallucinations in the text output. So we shouldn’t rely on text generation models too much for our research. A 70B model might perform differently than a 7B model. But they aren’t necessarily far away from eachother in the benchmarks, and a 7B model can even deliver superior results.