# Data Scientist Interview Questions For Freshers (2020)

## Table of Content

**What is Data Science?**

Data science is the hottest career of 21st century. It is a field that deploys scientific methods, processes, algorithms and systems to extract knowledge and insights from both structured and unstructured data. It is a multi-disciplinary field based on the same concept of big data.

**Who is a Data Scientist?**

A fresh breed of analytical data experts with technical skills with which they solve complex problems. These guys are part mathematicians, part computer scientists and part trend-spotters. Data scientists are always curious to explore the sort of problems that need to be solved with the help of data analysis. These people had started their career as statisticians or data analysts and gradually evolved into data scientists. They acquire the key information through analysis, creative curiosity and a knack for translating high-tech ideas and then can turn it into profit.

**How to become a Data Science professional?**

From businesses to non-profit organizations to government institutions, there is a huge (or infinite) amount of information/data that can be sorted, interpreted, and applied to a range of projects. Data scientists are trained to gather, organize, and analyze this data. Their analysis helps people like us to make daily decisions. In order to become a data scientist, one can pursue a Data Science degrees which comprises of computer-related majors. This course has mathematics and statistics at the core. However, one should not confuse Data science with Statistics. Though both the areas combine similar skills and share common goals, they are unique in an aspect.

### Also read Instrumentation Interview Questions and Answers

**Top Skills in Data Science**

Below listed are some top skills in demand among recruiters who are looking for data science professionals. Every data scientist must masted these skills to become an undisputed leader in their field.

- Programming in database querying languages like SQL and a statistical programming language, like Python or R.
- Statistics – An idea of distributions, statistical tests, maximum likelihood estimators etc. is a must.
- Machine Learning methods like ensemble methods, random forests, k- nearest neighbors, etc. are techniques that can be executed using Python and R libraries.
- Linear Algebra and Calculus knowledge will help to draw conclusions on the basis of fundamental linear algebra questions or multi-variable calculus.
- Data Visualization basics to represent complex data in a visually exciting manner. If you can master data visualization tools, you will be all the more in demand.
- Communication is a key skill. A data scientist’s data-driven story should sound conceivable and convincing. Their communication will be the route to propel a manager and put the word across to further seniors.
- Data Wrangling or Data Munging is a key skill with the help of which a Data Scientist can map and transform data from a single raw data form into a different format.
- Software Engineering background is always a plus for data scientists.
- Data Intuition is dependent on development of data and testing the results fo favorable results.

**Data Science Certifications & Courses**

There are several recognized data science certifications available in the market. Either one can go ahead and accomplish a part time diploma or enroll into a fully fledged course to become an expert.

- IBM Data Science Professional Certificate
- Professional Certificate in Data Science from Harvard University
- DASCA
- Applied AI with DeepLearning, IBM Watson IoT Data Science Certificate
- Certified Analytics Professional (CAP)
- Cloudera Certified Associate: Data Analyst
- Dell Technologies Data Scientist Associate (DCA-DS)
- HDP Data Science
- Microsoft MCSE: Data Management and Analytics
- SAS Certified Advanced Analytics Professional

These are only a handful few. If you are all set to become a data scientist, feel free to train and practice at home with R programming and then you can choose to complete one of the above listed courses or any other relevant one on the internet. Remember, its not about the course content, but your intent which can push you to become a good Data Scientist.

**Salary of a Data Scientist**

The salary of a data scientist can range anywhere between 30k to 2 lakh per month depending on the company you apply into and the college you pass out from.

**Data Scientist Jobs in India**

Apply to Data Scientist Jobs on Firstnaukri.com now.

If you are on a Data Scientist job, you are supposed to do the following:

- Design and build new data set processes
- Model data
- Carry out data mining and production
- Determine new ways to improve data and search quality
- Understand predictive capabilities.
- Perform and interpret data studies
- Perform product experiments concerning new data sources or new uses for existing data sources.
- Develop prototypes
- Develop proof of concepts
- Write algorithms
- Create predictive models
- Carry out customized analysis.

**Data Science Interview Questions**

### 1.How to create a taxonomy to identify key customer trends in unstructured data?

Mentioning it as a good check for business owners and understanding their objectives before categorizing data should be the best way to do it. After doing it, stick to an iterative approach where you pull in new data samples and improve your model by validating it for accuracy and also integrating stakeholder or business feedback. This would help you to ensure that the model produces actionable results.

### 2.Python or R – Which is a preferred text for analytics?

Both are open-source programming languages. Still Python would be a better option as it comprises of a Pandas library that provides easy to use data structures and high performance data analysis tools. If it is a statistical analysis exercise, please use R.

### 3. Which technique is used to predict categorical responses?

Classification technique is used to mine classifying data sets for binary or multi class target variables.

### 4. What are Recommendation Systems?

It is a subclass of information filtering system that can predict the “rating” or “preference”. Its applied in commercial applications. Recommender systems are widely used in movies, news, research articles, products, social tags, music, etc.

### 5. What is power analysis?

An experimental design technique which is used to determining the effect of a given sample size. Significance level = P(Type I error) = probability of finding an effect that is not there.

If four quantities have an intimate relationship:

- sample size
- effect size
- significance level = P(Type I error) = probability of finding an effect that is not there
- power = 1 – P(Type II error) = probability of finding an effect that is there

We can determine the fourth.

### 6. What is Collaborative filtering?

It is used to make automatic predictions to filter interests of a user. It can be done by collecting preferences or taste information from users through collaboration. The process of filtering in recommended systems is mainly to find patterns or information by collaborating viewpoints from various data sources and multiple agents.

### 7. What is Machine Learning?

It is an application of Artificial Intelligence. One simple definition is “Machine Learning learns from experience E w.r.t some class of task T and a performance measure P if learners performance at the task in the class as measured by P improves with experiences.” (Source: Tech Republic)

### 8. During analysis, how do you treat missing values?

The extent of the missing values is identified after identifying the variables with missing values. If any patterns are identified the analyst has to concentrate on them as it could lead to interesting and meaningful business insights. If there are no patterns identified, then the missing values can be substituted with mean or median values (imputation) or they can simply be ignored. There are various factors to be considered when answering this question:

Understand the problem statement, understand the data and then give the answer. Assigning a default value which can be mean, minimum or maximum value. Getting into the data is important.

If it is a categorical variable, the default value is assigned. The missing value is assigned a default value.

If you have a distribution of data coming, for normal distribution give the mean value.

Should we even treat missing values is another important point to consider? If 80% of the values for a variable are missing then you can answer that you would be dropping the variable instead of treating the missing values.

### 9. How can outlier values be treated?

Outlier values can be identified by using univariate or any other graphical analysis method. If the number of outlier values is few then they can be assessed individually but for large number of outliers the values can be substituted with either the 99th or the 1st percentile values. All extreme values are not outlier values. The most common ways to treat outlier values –

To change the value and bring in within a range

To just remove the value.

### 10. What is the goal of A/B Testing?

It is a statistical hypothesis testing for randomized experiment with two variables A and B. The goal of A/B Testing is to identify any changes to the web page to maximize or increase the outcome of an interest. An example for this could be identifying the click through rate for a banner ad.

### 11. Why data cleaning plays a vital role in analysis?

Cleaning data from multiple sources to transform it into a format that data analysts or data scientists can work with is important. The number of data sources increases with time and hence it might take up to 80% of the time for just cleaning and making data a critical part of the analysis.

### 12. Differentiate between univariate, bivariate and multivariate analysis.

Univariate analysis – It is the easiest way to analyze data. “Uni” means “one”, meaning data has just one variable. This analysis is independent on several causes & relationships (unlike regression). A major purpose of this analysis is to take data, summarize the same and find patterns.

Bivariate analysis – It is the simultaneous analysis of two variables (attributes). This analysis explores the relationship between two variables, and if there is an association between them which is strong enough, or whether there are any differences and also the significance of these differences.

Multivariate analysis – Also called MVA, it is based on the statistical principle of multivariate statistics, which involves observation and analysis of more than one statistical variables.

### 13. What do you understand by the term Normal Distribution?

It is also called Gaussian distribution. In this probability distribution, the graph is symmetric about the mean, as the data near the mean are more frequent in occurrence than data far from the mean. In other words, data is distributed around a central value without any bias to the left or right and reaches normal distribution in the form of a bell shaped curve.

Hope the above Data Science interview questions and answers help you prepare confidently for an upcoming Data Scientist interview.

**Quick tip:** Prepare your resume like a pro and also handle important HR Interview questions like ‘What is teamwork?‘, ‘Describe yourself’ etc. confidently. In case you face a group discussion round, tackle the same with these retorts on current affairs and social issue topics.

All the best.