Stream 1 (Starting out in ML)
This stream is suitable for participant that do not have any ML experience and are interested in a curriculum for self-learning. The following curriculum along with a support group of peers going through the same journey will prepare participants for stream two.
You don’t need to know everything about everything. At the beginning, focus on learning a few things really well.
- Machine learning basics: how to formulate a machine learning problem
- Introduction, and Chapter 5 of the deep learning book
- Note: if you need a refresher on Linear Algebra, probability theory, and numerical computation, chapters 2-4 in the deep learning book are a great resource.
- [optional] Chapter 1 of Hands-on machine learning book
- Introduction, and Chapter 5 of the deep learning book
- Learn theory of 5 basic algorithms, how to evaluate them and how to use them in practice (sklearn):
- regression:
- Linear regression
- Clustering:
- K-means clustering
- classification:
- logistic regression
- SVMs
- Random forests
- regression:
- Model evaluation
- Cross-validation, over-fitting,
- accuracy, recall, precision, F1 score, ROC curve, loss functions
- Project 1 [Kaggle]
- study problem formulation
- follow others’ solutions with various algorithms
- replicate existing solutions and understand various aspects of data preparation and modeling
Guidance on picking Projects
- project 1 (Stream 1):
- should be doable in a span of a week (estimated 40-60 hours),
- focus is on learning by example, how a data problem is formulated, how others solved the problem.
- Test: the main question to be able to answer everywhere is “WHY”?
- What business problem does solving this problem tries to meet? What is the value of the project if solved?
- Test: Think about the business, is this problem worth solving?
- Understand the data, how to pre-process, explore, normalize similar datasets, why? and what tools are used?
- Test: Can you do the data preparation on a similar dataset?]
- why have a specific algorithm been used to model the dataset? How were the hyper-parameters chosen?
- Test: Can you compare the algorithms, what are pros and cons?
- Test: Can you apply the algorithms you studied on a different but similar problem?
- How is each solution evaluated? what is the evaluation metric?
- Test: Why does that metric make sense? Can you justify it? what are the alternatives?
- Test: How did you make sure the model didn’t overfit? Can you justify why the solution is correct?
- Can you write about the problem, and the solutions you studied, and the tools you used in a blog post?
- Test: Find gaps in your knowledge while writing you blog post.
- Test: what did you learn? What did you find very useful? What are the caveats?