Stream 2 program

Stream 2 (Getting first ML job)

In stream two, the most important thing for job seekers is obtaining experience in doing projects that would be similar to what a Data scientist/ML engineer will do on the job. This will be achieved through doing multiple individual projects to gain experience with various data modalities and algorithms. The capstone group projects’ goal is to simulate the actual work that you’ll do as a data scientist/ML engineer in your actual job.

  • Intermediate ML algorithms
    • more sophisticated classifiers and clustering algorithms
      • Random Forests, Gradient boosting , LightGBM
      • Neural Networks (ConvNets, LSTMS, AutoEncoders)
  • Project 2 [e.g. Kaggle]
    • replicate others existing solutions
    • provide 1 additional solution to the problem
  • Project 3 [e.g. Kaggle]
    • replicate others existing solutions
    • provide 1 additional solution to the problem
  • Project 4 [Capstone group project]
    • ML problem formulation
    • data collection / preparation
    • modeling
    • team work
  • Interview prep
    Guidance on picking Projects
  • project 1 (Stream 1):
    • should be doable in a span of a week (estimated 40-60 hours),
    • focus is on learning by example, how a data problem is formulated, how others solved the problem.
      • Test: the main question to be able to answer everywhere is “WHY”?
    • What business problem does solving this problem tries to meet? What is the value of the project if solved?
      • Test: Think about the business, is this problem worth solving?
    • Understand the data, how to pre-process, explore, normalize similar datasets, why? and what tools are used?
      • Test: Can you do the data preparation on a similar dataset?]
    • why have a specific algorithm been used to model the dataset? How were the hyper-parameters chosen?
      • Test: Can you compare the algorithms, what are pros and cons?
      • Test: Can you apply the algorithms you studied on a different but similar problem?
    • How is each solution evaluated? what is the evaluation metric?
      • Test: Why does that metric make sense? Can you justify it? what are the alternatives?
      • Test: How did you make sure the model didn’t overfit? Can you justify why the solution is correct?
    • Can you write about the problem, and the solutions you studied, and the tools you used in a blog post?
      • Test: Find gaps in your knowledge while writing you blog post.
      • Test: what did you learn? What did you find very useful? What are the caveats?
  • Project 2, 3 (Stream 2):
    • should be doable in a span of a 10 days of full time work (estimated 80-100 hours),
      • If it’s the same project as project 1, then only (estimated 40-60 hours).
    • Repeat process from project 1, can you answer all the tests?
    • Focus is on adding a new solution. What other algorithms can be used to solve this problem? why?
      • Test: apply it to the problem, and evaluate
      • Test: Why is your new solution giving better or worse results? Can you discuss it?
      • Test: Putting all solutions together, can you discuss the merits of each solution, and contextualize all results?
      • Test: Can you write a blog post about your solution? Where are the gaps in your knowledge?
  • Project 4 (Stream 2):
    • Should be doable in a span of a 5-10 days by a team of 3-4 (estimated 200-300 man-hours).
    • focus is on showing the ability to formulate a new problem as a machine learning problem, source, clean, and model data.
      • Test: why did you choose that problem? why does it make sense to use machine learning there?
      • Test: Why and how did you source your data? What is the data size? Why does that data make sense for the problem? How did you clean and prepare the data?
      • Test: What algorithms did you pick for modeling the problem? Why do they make sense for the dataset?
      • Test: Can you discuss the various solutions? What are the pros and cons? Can you recommend one solution?
      • Test: Which part of the project did each individual contribute to? How did the team work together? What tools did you use to stay productive? What were the problems?
      • Test: Can you write a blog post about your work? What were the problems you faced? how did you solve them? what did you learn?

Sign up sheet