December 30, 2016

Telling lies with statistics?

The purpose of writing this blog is not to promote how to lie or cheat with statistics but to help you spot other people lying or, politely speaking, to spot their “statistical abuse”.

img

In fact, you cannot work in statistics for too long before someone comes along and demonstrate to you how you can play unethical tricks with statistics. We have seen many people often misuse statistics when attempting to persuade someone to their point of view, although their original or raw data collected might be truthful and reliable.

The famous quote: “There are three kinds of lies: lies, damned lies, and statistics.” first attributed to the 19th century British Prime Minister Benjamin Disraeli but the source for this view was popularized in the autobiography of Mark Twain, where he made that attribution. Twain used this phrase to talk about the persuasive power numbers have and how they can be manipulated by individuals to push a public agenda.

There is even a small but very popular book called How to Lie with Statistics by former editor at Look magazine Better Homes and Gardens, Darrell Huff, published more than 60 years ago in 1954 and still in print, translated in many different languages. Some say it is the most widely read statistics book in the world. His many examples of “lies” are the misleading statistical trickery and presentation used in politics, commercial disclosure and the media as ammunition to win support.

Huff was not a trained statistician and his presentation of the topic can be charitably described as informal and some of the illustrations in the book would be quite offensive, if they were included in a contemporary statistic book. But, what was true in 1954 is just as true today.

Huff gives seven common tactics used to knead statistical data into “dough”. Some of them are:

  1. Analysis of a biased sample that does not represent the intended population leading to distorted findings. Frankly speaking, biased sampling can occur either intentionally or unintentionally. When it is done intentionally, it is obvious that a sample is chosen which ensures results consistent with the desired outcome;
  2. Outcome from small sample sizes, particularly from non-homogeneous population that yields misleading conclusions because they are not representative of the population;
  3. When we hear someone says, “The average is …”, we better make sure we know which type of average (Mean? Median? Mode?) that they are talking about. The raw data collected can be so skewed that the three parameters are totally different in values. Also be careful when you read about differences between numbers that come from rankings. They are just nominal figures (in name only), not numerical. It is not correct to make a mean estimation from these figures!
  4. Visual impressions on graphic presentations can be totally deceptive. One can misuse statistics to make differences seem greater than they actually are by graphically presenting the data in a deceptive manner. Can you tell any difference in Figures 1 and 2 below on the monthly number of samples processed in a testing laboratory?

Figure 1: This graph shows the actual difference between November and December

img

Figure 2 : This graph exaggerates the difference between November and December

img

The data shows 4982 samples were processed in November and the number of samples increased to 5125 in the following month. The Figure 1 shows that this 3% improvement was nothing to cry about but when the Figure 2 graph was presented to the management showing the difference between November and December on a different scale, the lab manager might receive a commendation letter from the top management for good performance, as it appears that his laboratory had really made good progress in securing more sample throughputs.

  1. Incorrectly exerting that there is a direct correlation between two findings in research papers. We often hear this sort of statements in clinical trials on drugs against curing certain diseases but without knowing the reliability of their controlled factor or factors in the experimental trials.

The moral of the story is that we are all consumers of statistics and we are constantly surrounded by information provided by someone who is unethical and is trying to influence us or gain our support. In my opinion, this practice often happens amongst the circles of politics and media although some scientists also get involved either intentionally or unintentionally. By having a basic understanding about the field of statistics, we increase the likelihood that we can ward off those undesirable persons in their attempt to distort the truth and to mislead us.

In short, it is hoped that all of us practice statistics responsibly.


December 28, 2016
Is rolling a die completely random in casinos?

   Is rolling a die completely random in casinos?

img

When we have a standard and fair six-sided die, we believe the odds of rolling a particular number are 1/6 as there is an equal probability of rolling each of the numbers 1 – 6. But, will the number 1 (and all the other five numbers) come up one-sixth of the time as predicted? We know that if someone rolls a die, the initial force on the die, the topography over which the die is travelling and the laws of physics are going to affect the final results. Is it possible, at least in theory anyway, for us to predict its outcome and benefit from an advantage of it?

The potential monetary gains have drawn gamblers in the world’s casinos to make all sorts of dice throwing methods in order to solve this tempting problem.

In casinos around the world, there is a popular game called “Casino Craps” or “Bank Craps”. It is played on a purpose-built table and two dice are used for the game. These dice are made even with very high standard quality and are routinely inspected for any damage during the throwing. As a matter of course, the dice are replaced with new ones after about eight hours of use to maintain their fairness. Also casinos have implemented rules in the way a player handles them. Why would they do this?

A story goes that in the middle of the 20th century, a gambler spent a good deal of time developing a manner of throwing the dice in which they spun frantically but did not tumble. By using this method, the gambler had managed to achieve good outcome but the results were so profitable that the gambler was finally banned from entering casinos.

Today, a rule in playing the Crabs game requires that the shooter (the player) must handle the dice with one hand only when throwing, and the dice must hit the walls on the opposite end of the table. The wall at that opposite end contains numerous bumps that presumably randomize the outcome of the throw!

Hence, gamblers are advised to leave their chance of winning to randomness behavior of the dice on a carps table and accept the odds gracefully..


December 25, 2016
How to evaluate outliers in regression?

outliers-in-regression


December 17, 2016
Common mistakes in application of linear regression

common-mistakes-in-linear-regression-application


December 11, 2016
December 11, 2016
December 07, 2016
R techniques in generating random numbers

r-techniques-in-generating-random-numbers


December 06, 2016
Techniques for generating random numbers

series-of-notes-on-randomization-part-ii


December 04, 2016
Randomization – Part I

series-of-notes-on-randomization-part-i