Top Business Analyst Interview Questions Answers - Frequently Asked Business Analyst Interview Questions - Avatto

What is Clustering?

Clustering is the process of grouping objects in such a way that objects in one group is much similar to objects of that group than to those in other groups. We first partition the set of data into groups based on data similarity and then assign the labels to the groups.

What are the different requirement for any data clustering method?

The following are the requirements list:-

1. Scalability – The method should be able to scale in order to handle a large set of data.
2. Able to handle different data types
3. Able to deal with outliers and noise
4. Usability
5. Insensitive to the order of input records

What is k-mean algorithm?

K-mean is a simple algorithm that is used for data clustering. The procedure follows a simple and easy way to classify a given data set through a certain number of clusters (assume k clusters) fixed a priori.

The main idea is to define k centroids, one for each cluster. These centroids should be placed in a cunning way because of different location causes the different result. So, the better choice is to place them as much as possible far away from each other. The next step is to take each point belonging to a given data set and associate it to the nearest centroid.

When no point is pending, the first step is completed and an early grouping is done. At this point, we need to re-calculate k new centroids as the barycenter of the clusters resulting from the previous step. After we have these k new centroids, a new binding has to be done between the same data set points and the nearest new centroid. A loop has been generated.

As a result of this loop, we may notice that the k centroids change their location step by step until no more changes are done. In other words, centroids do not move anymore.

What are the other algorithm for data clustering?

The following are the different data clustering algorithms:-

1. Distribution Based Clustering
2. Density-Based Clustering
3. Connectivity Based Clustering
4. Centroid Based Clustering (K-Mean)

What is OLAP?

Online Analytical Processing (OLAP) is a technology that is used to organize large business databases and support business intelligence. It performs a multidimensional analysis of business data and provides the capability for complex calculations, trend analysis, and sophisticated data modeling.