By: Kristin Hunter-Thomson
Do we have to?
Our favorite question we so often receive when we ask students to do some of the work may just have a different meaning to it. Let’s take a step back and think if our students actually understand how to do what we are asking them to do when it comes to setting up and populating their own data tables.
The data tables are the key to organized data collection and are the precursor to setting you up to be successful in building your data visualizations (graphs or maps) from the data. But how often do we teach the “how and why” of data tables to our students?
Why the Parts of Data Tables
Data tables are after all just a matrix of boxes to put in information. One can willy-nilly place that information OR you can do it with some strategy to help you (and others) better understand your data.
So, there are rows and columns right? Why do we have each?
- Columns – You can label the columns before you collect any data and then you get to be done with them 🙂 So what exactly should go in each column? These are your variables! What you are actually measuring. If you are running an experiment, then there is a column for your treatment and your control, and really anything else that you are measuring.
- Rows – These are the data you collect. So if you have one measurement of data for each variable, then you will have one row. If you have more than one measurement (in other words replicates) then you have as many rows as you have measurements.
Another way to think about this is to remember that if you are calculating any sort of summary statistic of your data (e.g., mean, total counts, standard deviation) then that goes in the LAST row for each column (preferably identified with bold or italics to know it is not an actual measured data value). Therefore, having your students think about what they will want to take an average of can also help them think about organizing their data table.
How for Different Kinds of Data
Great, we got the basics of rows and columns but how does this vary based on the kind of data you are collecting?
Governor et al. (2017) in Big Data Small Devices: Investigating the Natural World Using Real-Time Data provide a great summary of how different data tables are set up (a component of Part 2 is available here).
- We also have data that are not collected by a category but rather just have measurements for the x and y values. For example, you can measure the running speed and rate rate of each person in a class. Both of these variables are measured values, so these data are often referred to as “Interval-Ratio Data.”
- We have data that are collected by categories that have no particular order. For example, you can compare apples then oranges and then bananas just the same as comparing bananas then apples and then oranges. There is no order to these categories, so these data are often referred to as “Nominal Data.”
- We also have data that are collected by categories but the categories do have a particular order. For example, you can compare the average air temperature from Winter then Spring then Summer then Fall to get a sense of how it changes over the year, but it would not make sense to compare Fall then Spring then Summer then Winter. There is an order to these categories, so these data are often referred to as “Ordinal Data.”
Hopefully, looking at the small differences between the data tables across these three kinds of data helps you to see:
- the similarities of how the columns and rows are set up regardless of the data type
- how the data type influences the data table setup
- how setting up the data table the right way puts you on the path to choosing and creating good data visualizations.
So, the next time you hear “do we have to?” take a moment to step back and think about if your students “know how to.” Have fun!