Stream 1 program

Stream 1 (Starting out in ML)

This stream is suitable for participant that do not have any ML experience and are interested in a curriculum for self-learning. The following curriculum along with a support group of peers going through the same journey will prepare participants for stream two.

You don’t need to know everything about everything. At the beginning, focus on learning a few things really well.

  • Machine learning basics: how to formulate a machine learning problem
  • Learn theory of 5 basic algorithms, how to evaluate them and how to use them in practice (sklearn):
    • regression:
      • Linear regression
    • Clustering:
      • K-means clustering
    • classification:
      • logistic regression
      • SVMs
      • Random forests
  • Model evaluation
    • Cross-validation, over-fitting,
    • accuracy, recall, precision, F1 score, ROC curve, loss functions
  • Project 1 [Kaggle]
    • study problem formulation
    • follow others’ solutions with various algorithms
    • replicate existing solutions and understand various aspects of data preparation and modeling
Guidance on picking Projects
  • project 1 (Stream 1):
    • should be doable in a span of a week (estimated 40-60 hours),
    • focus is on learning by example, how a data problem is formulated, how others solved the problem.
      • Test: the main question to be able to answer everywhere is “WHY”?
    • What business problem does solving this problem tries to meet? What is the value of the project if solved?
      • Test: Think about the business, is this problem worth solving?
    • Understand the data, how to pre-process, explore, normalize similar datasets, why? and what tools are used?
      • Test: Can you do the data preparation on a similar dataset?]
    • why have a specific algorithm been used to model the dataset? How were the hyper-parameters chosen?
      • Test: Can you compare the algorithms, what are pros and cons?
      • Test: Can you apply the algorithms you studied on a different but similar problem?
    • How is each solution evaluated? what is the evaluation metric?
      • Test: Why does that metric make sense? Can you justify it? what are the alternatives?
      • Test: How did you make sure the model didn’t overfit? Can you justify why the solution is correct?
    • Can you write about the problem, and the solutions you studied, and the tools you used in a blog post?
      • Test: Find gaps in your knowledge while writing you blog post.
      • Test: what did you learn? What did you find very useful? What are the caveats?

Sign up sheet