Letters from a Zeneca

Letters from a Zeneca

Letter 103: How to Vibe Code a Prediction Model

A step-by-step guide to building a prediction model with AI, no machine learning background required

Mar 10, 2026
∙ Paid

While not strictly crypto related, this is tangential to prediction markets which I have written about in the past and I believe are of interest to the readers here. In addition, a few readers have explicitly asked me to write a post like this, so here we are!

I’ve been building a prediction model for the past few weeks to determine who will win in competitive matches for Dota 2 (an esports video game). I’ve done it all through vibe coding with Claude Code (with some help from Yoshi via openclaw, but it’s all possible via CC directly). I have no degree in machine learning, and no data science background.

While still early, the results are looking very promising. I backtested the model and the results actually look fantastic. Honestly, they look too good to be true, so take this with a healthy dose of skepticism:

I’ve been tracking real results for a few weeks with ~100 actual bets logged, and the model is performing well so far (~7.5% ROI), so I have some hope that things will continue to go well. But I know it’s still early days.

There are a lot of people out there selling claims of their polymarket bots printing 6 figures a week and making it seem like it’s all so easy. It is not easy, and it takes time, dedication, motivation, and hard work. You have to be willing to learn, it’s not as simple as “hey claude, build me a prediction model that makes money”. Even when you have a model like this, you have to test and test and test, and make sure it actually works. You have to maintain it, update it, and even then, you will run into liquidity issues when betting/predicting, and it’s never as simple as printing 6 figures.

I have been working daily on my model for almost 2 months, putting in 5+ hours a day on average. There’s a lot of painstaking work and moments of frustration. But the potential is there, I believe, for anyone to do what I done and to build a (hopefully) profitable model.

Today I’m going to walk you through how prediction models work, how I created mine, and how you can create your own models with the power of vibe coding.

I’ll break down the core components that every successful prediction model needs and some additional suggestions for how to build and develop them in a practical sense:

  1. Start with a clear and well defined question

  2. Ask the AI to help you every step of the way

  3. You need reliable, clean data

  4. Your features are everything

  5. Choosing the right model

  6. Hyperparameter tuning

  7. Eliminate data leakage

  8. Proper train and test splits

  9. Evaluation metrics

  10. Good calibration

  11. Fast iteration cycles

  12. A retraining pipeline

  13. Testing and monitoring in the real world

  14. Putting it all together

Let’s get into it.

Sidenote: I am in the process of launching a new educational community for those who want to learn about AI. I have two co-founders who have been building and teaching people about AI for years, and among other things, we’re going to be running 8 live video workshops a week.

We’re in the early days, but we’re accepting some new members. There’s a special offer for premium subscribers that I’ll share at the end of this newsletter, which is a 70% discount from what our eventual price is going to be.

Public launch will be around the end of the month, so keep your eyes peeled for that. Super excited about this!


1. Start with a clear and well defined question

The biggest mistake people make when they hear “prediction model” is they start thinking about algorithms and frameworks and profit. Don’t do that. Instead, think about what question you’re trying to answer.

Who wins this Dota 2 match? is a good question. It’s binary and measurable. You know when you got it right and when you got it wrong.

What will happen in crypto this week? is a bad question. It’s vague, and there’s no clear success or failure criteria. You’d struggle to even know what data to collect.

The quality of your question determines the quality of everything that follows.

A good place to start is to ask yourself:

  • What am I predicting?

  • What are the possible outcomes?

  • When do I make the prediction?

  • When do I find out the result?

If you have clear answers to all four, you should be able to come up with a good question that you’re going to use a prediction model to answer.

If you’re trying to build a model that you’ll use to make bets and make money, then I think it’s best to start with something you have existing domain knowledge and expertise in.

I picked Dota 2 because it’s a video game I have been playing for 20+ years, have bet on for fun before (a lot; I have lost so much money betting on this damn game over the years lol, but now I might get my revenge!!), have been watching people play for 10+ years, and know inside and out. I know more about the game than the vast majority of people, even more than most who watch it regularly. Some of that knowledge might come in handy when finding an edge later on.

How AI helps

You can describe your general area of interest to any AI and ask it to help you formulate a well defined question. Ask it to push you toward specificity. Tell it your domain and what decisions you want to make, and it will help you find the question you want to answer.

Honestly actually, just copy and paste this whole section into the AI and be like “I want help coming up with a good question, I want to make a prediction model, can you help me?” and go from there. A common theme you’ll notice throughout this post is that you can and should…

2. Ask the AI to help you every step of the way

There’s no glory in doing all of this yourself. AI is the most powerful tool in the world, use it, and use it well.

So once you’ve picked the question you want to answer, it’s time to load up your vibe coding platform of choice and start asking the AI for more help. I wrote about Claude Code a few weeks ago and that’s where I built my model. I would recommend either using Claude Code with Opus 4.6 or Codex with GPT 5.4 as they’re currently the two frontier models when it comes to coding.

Of course you can give this a go with lesser models and experiment (and it’s a great way to learn), but if you’re trying to make money, I really do think you’re gonna want the top tier models.

Once in Claude Code/Codex, create a new project and just start telling it what you want, based on the question you came up with. Something along the lines of:

I want to build a Dota 2 prediction model to help predict which team will win a match. I want you to help me with this. Start by doing some deep research to discover all you can about building prediction models; specifically, Dota 2 and esports models. Look at research papers and any evidence of other successful models out there that we can learn from. Share the sources and evidence with me. Take all of that information and come up with a step by step plan for us and let me know what we need to get started.

The AI is going to do a pretty damn good job of coming up with a plan from here, but one thing I found very helpful and important is actually reading the sources and research papers yourself (or at least a couple of them). I know we’re all training ourselves to rely on AI summaries and bullet points, but you really do want to have a bit of an understanding of how everything works under the hood; it’ll be super helpful as you move forward.

The rest of this letter will hopefully give you some of that context and help you understand these things too.

3. You need reliable, clean data

Your model learns from data. If the data is wrong, incomplete, or inconsistent, the model will learn the wrong things.

For my Dota 2 model, I get the majority of my data from official APIs. I always recommend trying to find good APIs for your data vs scraping data from the web. For Dota, the API I use has comprehensive match data going back years. Team compositions, player stats, match outcomes, patch information, and much more. The data is structured, well-documented, and updated regularly.

Unfortunately (or maybe fortunately, since it might present an opportunity), not every domain has a nice API waiting for you. Sometimes you have to scrape websites, parse PDFs, or work with messy spreadsheets.

Usually, you’ll still have to do a bit of both (I scrape some stuff, even though 95% comes from APIs).

Ultimately, the format doesn’t matter as much as the reliability. You need to trust that the data accurately represents what happened. APIs are easier, but not the only way to get to this point.

Aside from reliable data, clean data goes a long way. This means: no duplicate records, consistent formatting, no missing values in critical fields, and clear documentation of what each field represents.

How AI helps

Ask it to write data quality checks. Something like: “write a script that loads my match data, checks for duplicates, flags any matches with missing team IDs, and shows me the distribution of matches per month.” You could even go more basic, and say “I want to ensure our data is clean and reliable, how can we do that?” and it’ll come up with some suggestions and a plan, and you can go from there.

4. Your features are everything

Features are one of the most important things to understand when it comes to building a predictive model. In a nutshell, features are the inputs your model uses to make predictions. Raw data is rarely useful on its own. You need the raw data because that’s what is used to crate these features, but it’s the features that are what are actually used to predict things.

For Dota 2, a raw stat like “team A has played 200 matches” tells you almost nothing about who wins the next one. But “team A has won 65% of their last 20 matches on the current patch” tells you something useful about recent form in the current meta.

This is where your domain knowledge comes in handy. You understand your domain. You know what factors influence outcomes. If you’re an avid golf fan, you know that the weather has an impact, the type of grass makes a difference, that whether a player is starting in the morning vs the afternoon can change how likely they are to score well, you know that long hitters perform better on some courses, and so on.

Your model doesn’t start out knowing any of this. It only knows what you tell it through features.

Good features capture information that is available before the prediction, relevant to the outcome, and not redundant with other features.

How AI helps

Ask it to suggest features as a starting point, so you get an idea of the types of things you can use. Then describe your domain expertise and the factors you think matter to brainstorm a list of additional things (features) you think might impact the outcome.

Then, ask it to engineer those features from your raw data. It will write the transformation code. You evaluate whether the features make sense. This back-and-forth is where vibe coding shines. You bring the thinking (at least some of it) and your domain knowledge. The AI does everything else.

5. Choosing the right model

You have a question, data, and features. Now you need something that takes those features and turns them into a prediction. That something is a model.

Think of a model as a function. You feed it inputs (your features) and it gives you an output (a prediction). Different types of models learn this function in different ways. Some are simple. Some are complex. The right choice depends on your problem, but for most prediction tasks with structured data, the answer is simpler than you’d think.

This post is for paid subscribers

Already a paid subscriber? Sign in
© 2026 Zeneca · Privacy ∙ Terms ∙ Collection notice
Start your SubstackGet the app
Substack is the home for great culture