In data analytics, it is important to get an insight into the data before getting into an analysis plan. Either a visual tool or numerical tool provides you with the scope to understand the dataset. The most quotidian is either using a bar chart or histogram which provides the details of the shape of the data. Researcher/ Data analyst can get the perception of how much the data is either skewed or symmetric in nature. When they are right or left skewed (asymmetric) there comes a challenge.
Skewed data infers that most of the observations fall on its tails and these tails can be either left or right. It needs to be perceived that the majority of observations are greater than the average, hence, long tail can be witnessed. These kinds of observations are widely seen in hydrology, meteorology, insurance and financial sectors. One of the best examples which can be given on hands-on experience is Chennai floods 2015.
People insure their vehicles (car or motorcycle) when they purchase it. As per the statistics of the insurance sector, the claim of clients is not that much frequent in a year other than damages due to accidents or theft. In 2015, unexpected occurrence of flood brought various insurance companies into a nightmare as there have been extremely large occurrences of damage for the vehicles due to stagnation of water. This is one among the classical example for citing extreme observations.
Impact of those extreme observations led Insurers into nightmares as they had a sudden outrage of claims. India estimated they might receive claims totalling over ₹1,000 crore (US$140 million) for losses to property, cargo and inventory, mostly from auto companies (Economic Times ). New India Assurance received claims amounting ₹425 crore (US$59 million) from 1,700 claims submissions till mid-December. By January 2016, various insurers reported they had received roughly 50,000 damage claims totalling ₹4,800 crore (US$670 million).
It should be noteworthy to point out that extreme observations should never be confused with outliers ( i am not going to discuss the difference between the outlier and extreme value in this blog). There are a different classification of tails,
TailsHeavy-tailed distributions are categorized as fat and thin-tailed. High kurtosis values can be observed such that there is a high probability of extreme observations. It is possible to observe a greater mass in the tails of the distribution with high kurtosis value. thin-tailed distribution is the one for which the upper tails declines to zero exponentially or faster. A thin-tailed distribution has a finite upper limit which has an occurrence of extremely high values in the observations will be sufficiently small such that the expected value of marginal utility will be finite for the thin-tailed distributions.
The height distribution that has focused on the world’s tallest person (13 feet and 4 inches) is a classification of thin-tailed. The reason for this categorization is because there has been a lower chance of probability of a person whose height will be greater than the world’s tallest. Similarly, Chennai Flood and Cyclone Gaja are fat-tailed; until 2018 we had Chennai 2015 flood had the highest expenditure, however, later Cyclone Gaja turned out to be the highest expenditure towards recovery. Hence, there is a high chance of expenditure which can go beyond cyclone Gaja.
Comments
Post a Comment