Alright, buckle up, because I’m about to spill the beans on my Fils vs. Tsitsipas project. It wasn’t exactly a walk in the park, but hey, who learns anything from easy stuff, right?

The Idea Sparked
It all started when I was watching some tennis the other day, you know, casually, and the Fils vs. Tsitsipas match came up, I got this crazy idea to dive deep into analyzing their match, figured I could use it as a test case to level up my data skills and what-not.
Getting My Hands Dirty with Data
- Data Gathering: First thing’s first, I needed data. Loads of it. I started by scouring the web for any publicly available stats. Things like serve percentages, winners, unforced errors – the whole shebang. It was a bit of a treasure hunt, digging through different tennis websites and stat aggregators.
- Data Cleaning: Oh boy, this was a doozy. The data I got wasn’t exactly squeaky clean. Different formats, missing values, you name it. I spent a good chunk of time wrangling it into shape. I used Python with Pandas, because that’s my go-to for data manipulation. I ended up cleaning up missing info and standardizing all the data.
- Feature Engineering: Once the data was clean, I started thinking about what features could be interesting. Instead of just looking at raw numbers, I calculated things like the ratio of winners to unforced errors, serve dominance, and return effectiveness. Basically trying to find hidden patterns that raw numbers might miss.
Building the Model (Or Trying To)
This is where things got a bit dicey. I figured I’d use some kind of machine learning model to predict things like who was more likely to win the next point or even the whole match based on the stats. Sounded cool in my head, at least.

I considered a few options, like:
- Logistic Regression: Seemed like a good starting point for predicting binary outcomes (win/lose).
- Random Forest: Heard good things about its ability to handle complex relationships and avoid overfitting.
I ended up going with Random Forest because I wanted to see how it handled the complexity of tennis stats. I split the data into training and testing sets, fiddled with the hyperparameters (number of trees, max depth, etc.), and crossed my fingers.
The Results (And The Reality Check)
Okay, so the model wasn’t exactly Nostradamus. It did alright, but the accuracy wasn’t as mind-blowing as I’d hoped. It was better than just flipping a coin, sure, but it wasn’t predicting the future.

I realized a few things:
- Data Limitations: Tennis is more than just stats. There’s momentum, mental toughness, court surface, weather conditions – a whole bunch of stuff my model didn’t account for.
- Model Complexity: Maybe Random Forest was overkill. A simpler model might have been just as good, or even better, given the limitations of the data.
What I Learned
Even though my model didn’t become the next tennis oracle, I learned a ton:
- Data Cleaning is Key: Seriously, you can’t underestimate the importance of clean data. Garbage in, garbage out.
- Feature Engineering Matters: Thinking creatively about features can uncover hidden insights.
- Models Aren’t Magic: They’re only as good as the data you feed them. And they don’t replace actual knowledge of the subject matter.
Next Steps

I’m not giving up on this idea just yet. Here’s what I’m thinking for round two:
- More Data: I need to find a way to incorporate more factors beyond just basic stats. Maybe some sentiment analysis of news articles or social media posts?
- Simpler Models: I’ll try a simpler model like logistic regression and see if it performs better.
- Focus on Specific Scenarios: Instead of predicting the whole match, maybe focus on predicting individual game outcomes or even point outcomes.
So yeah, that’s the story of my Fils vs. Tsitsipas adventure. It was a bit of a rollercoaster, but I came out of it with a bunch of new skills and a better understanding of data analysis. Stay tuned for the rematch!