Sampling from a Population and its influence on its Estimates


A data distribution is a function or a listing which shows all the possible values (or intervals) of the data. It also (and this is important) tells you how often each value occurs. Often, the data in a distribution will be ordered from smallest to largest, and graphs and charts allow you to easily see both the values and the frequency with which they appear.

Published on May 01, 2021

sample probability center

4 min READ

    Let’s say you are one of the researchers at WhY Stats High School and were assigned the task to study the selection behavior of its students.

    So you conducted an experiment in which each individual will pick n no of balls from a bowl full of 50 colored balls.

    Does this experiment correct? If not why? How can we improve it?

    In the next few minutes we are going to answer these questions.

Population Parameter vs Statistic Sample

    The field of inferential statistics enables you to make educated guesses about the numerical characteristics of large groups. The logic of sampling gives you a way to test conclusions about such groups using only a small portion of its members.

    These large groups are called Population of Interest and the small protions are the Sample of Study.

    The following table illustrated the idea of taking a sample from a population.

PopulationSample
Advertisement for IT jobs in HyderabadThe top 50 google searches for Advertisements for IT jobs in Hyderabad in past decade
Songs from the a popular Indian Song ContestSong list of all of the winning songs from the Shankar Mahadev Contest every year
Undergraduate Students in Delhi330 undergraduate students from the University of Delhi who volunteer for the study
All the Countries over the worldRecently updated data of all the countries listed in the official Government website

    we can determine the various measures such as mean, median, standard deviation, etc of the population using the sample we have in our hands.

Terminology Alert!

A Parameter is any measured quantity of a statistical population that summarises or describes an aspect of the population

Whereas

A Statistic is an estimate of the population parameter using a small sample drawn out of the population.

 
    In other words we can estimate population parameters using the sample statistic.

    But how confident we are that the statistic is a true estimate of the parameter.

Story Time: Mom and Chicken

Your mom is cooking something in the kitchen and you can smell it far from your room.

You ran towards the kitchen, saw Juicy, Tender Roasted Chicken almost finished, and felt like eating.

Suddenly your mom takes a piece of it and asks you "Do you wanna taste?"

No more second thoughts, you grabbed the piece and ate it...felt like...DELICIOUS

 
    As we saw in the story tasting a single piece of chicken(statistic) made you believe/estimate the overall taste of the roasting chicken.

    But what made it true estimate?

    Yup! randomly sampling a single piece from the whole made it accurate to infer about.

    There are many methods of sampling such as simple random sampling, stratified sampling, cluster sampling, systematic sampling, etc… etc…etc… which is out of the scope of this post.

    What ever the type of sample it should be a representative of the required population that we are about to study.

Experiment: Pick 21 Balls

    Now let’s get back to our example… did you remember it!

    So to ensure that the sample is a random one you placed the bowl in an empty room and make sure only one person a time enters the room.

Wait!

    Soon you found that most of the people picked up green balls more than red ones.

    Why is the bias happening? Are we doing anything wrong?

Story Time: Sample Bias

The time is World War-Ⅱ in England. In a dimly lit Quonset hut, the Royal Air Force crews gather having just returned from a bombing run over Germany.

The debriefing begins with a moment of silence for the flight crews who did not return, then the lieutenant asks the returning flight crews, from which direction did the fatal attacks come?

The pilots respond to the man "We were attacked from above and behind". It's unanimous. The lieutenant scribbles this on a piece of paper and hands it to one of them and instructs "Take this information to our department flight crew. This may save lives".

But as he is leaving the dimly lit Quonset hut, a hand reaches from the inky shadows and says..."STOP!

No, that info may cost lives"

 
    So what is wrong with it?

    All we know is that those who survived were attacked from above and behind. Those who died may have been attacked from a different direction.

Experiment Continues: Randomly Picking Balls

    To ensure that people pick the balls randomly you will blindfold the people before entering the room and give instructions on where the bowl is.

    This way you found that the sample picked had much less bias compared to the previous case.

    But, how will you estimate this is true for the population?  

Population MeanSample Mean

 

Lets say there were 12 red balls, 8 green balls, 14 blue balls and 16 pink balls in our bowl.

Hence on an average the average ratio of all categories picked by a person is in the range of 6.25 - 12.25 then it is probably the true estimate.

Note: Any