Data Mining Discussion 2 b

What do we understand by similarity measured and what is its importance?

We understand a few things about similar measurements and its importance. First, this is the measure of how much alike the data objects are with each other. In data mining, this is usually described as a distance with dimensions representing features of the objects. A small distance indicates a high degree of similarity, while a large distance indicates low degree of similarity. This is highly dependant on the domain and application. One example can be using the similar heights of two people and how far apart they currently live from each other. If both were measured in centimeters, then the distance between their dwellings will dominate the correlations with their heights.

What do we understand by dissimilarity measure and what is its importance?

We understand that it is common to define dissimilarity measures that are non-negative. However; is not essential and is not a problem for the class consistency if the dissimilarity of an object to itself is negative, only if its the smallest distance possible for that object. Not all dissimilarity type measures are symmetric. The measures between objects A and B may be different from the one measured between B and A. One popular property of dissimilarities is whether a triangle inequality is satisfied. In other words, for any of the two objects, this will measure directly between them and will be less than the dissimilarity found for a detour taken over a third object.