Week 2-ML Production1

This week we will be learning about modeling The cases when model Deployment is acceptable or not.

Model-centric Ai vs. Data-Centric In this week you will learn about both you will learn ways to improve data instead of collecting more and more data.

Key Challenges

Let’s look AI system is = code +Data

There are some models that are good enough from GitHub.

There are three inputs for developing an ML-Model. Most of the time we are focused on algorithms and Data.

General Process - Life Cycle

It is highly iterative process as most of the time you start with model+hyperparameters+data and do the training and error analysis and then improve. Its empirical process. Recommended to do-looping and save the states of model+hyper+data and select what you have achieved.

After you achieve a good model, you should do an error analysis to select audit performance.

Challenges in Model Development

  1. Check if it works well in train set

  2. If it works well in dev/test sets.

  3. Doing well on business metric/project goals.

Step 2 depends on 1. Unfortunately, step 3 may not guaranteed by step 2. There is a lot of disagreement between the machine learning team and business stakeholders who cares about business metrics.

Why Low Average Error is not Good Enough

It is possible the machine learning model performed well in the test set but may not be good enough for deployment purposes

Let’s say web searches have informational and transactional queries. Let’s say you want to see the apple pie recipe. Let us say a web search engine cant return you the best one. In that case, the user may forgive the engine

In other cases, the user has very clear intent and may not be forgiven for giving another result. Such as navigational queries

Stanford, Reddit, and Youtube

To summarize, the case where the user is fine with low average accuracy in informational queries may be acceptable, and very low errors in navigational queries is not acceptable for production. Check the user tolerance level.

performance on key slices of the dataset.

EXAMPLE: ML FOR LOAN APPROVAL

It will depend on attributes—make sure not to discriminate by ethnicity, gender, location, language, or other protected attributes.

Many countries have regulations for loan approval attributes. A learning algorithm for loan approval is not acceptable for deployment if it will cause discrimination and bias.

AI community has a discussion about fairness

Another example: product recommendations from retailers.

Following is unacceptable :

  1. It gives irrelevant recommendation of a specific ethnicity

  2. Considers large retailers and ignores small

  3. Lets say it does not recommend certain products such as electronics.

How to carry out analysis on key slices of data?

You will learn soon stay tuned

Rare Classes

Skewed data distribution-Medical Science-1% of patients have disease-need to select appropriate metric. Need to see accuracy on rare classes

We have specific conditions and its respective performanceIt is not acceptable to ignore Hernia.So In that case average test set accuracy is not required to be low. There can be the case if algorithm is always predict no can still acheive 90 percent accuracy.

Average accuracy gives equal weightage to every example which is not acceptable in rare class cases.

Unfortunate conversation in many companies

Conversation can be below dont get offensive. Doing well in test set is not enough. What is actual business need? We need to do actual error analysis. In the next coming discussion we will see how to overcome these challenges.