Skip to Main Content

In this article we report on middle school students’ understanding of concepts related to the second step of the statistical problem-solving process, study design: create a plan and collect data. Students participated in an open-ended statistics lesson involving nonexperimental study design. For this research we analyzed the students’ discourse and categorized their statements using Groth’s (2003) “Levels of Thinking Associated With Designing a Nonexperimental Study.” Similar to studies by other researchers, we found middle school students: intuitively used a naïve conception of stratified random sampling as an appropriate method for collecting data; and, recognized some forms of bias. Based on these findings we offer recommendations for how to modify and implement statistics lessons so that students develop an understanding of study design, create a plan and collect data. Finally, we put forward suggestions for future research.

The Association for Middle Level Education (AMLE) has advocated for schools to be developmentally responsive, challenging, empowering, and equitable for young adolescents (AMLE, 2012, p. xii). There are a plethora of statistical topics for middle school students in the Common Core State Standards for Mathematics (CCSS-M)—in which statistical topics are addressed—and in state standards that can foster these attributes. The Statistical Education of Teachers (SET) document (Franklin et al., 2015) stated that “students are expected to begin thinking statistically at Grade 6, and topics introduced in the middle grades include data collection design, exploration of data, informal inference, and association” (p. 4). The SET document also supports the AMLE (2012) attributes, “developmentally responsive, challenging, empowering, and equitable” (p. xii) by recommending that “students should learn statistics in an activity-based learning environment” (p. 21) and that “statistical topics should be developed through meaningful experiences with the statistical problem-solving process” (p. 22).

The components of the statistical problemsolving process have been defined as: (a) formulate a question; (b) create a plan and collect data; (c) analyze the data; and (d) interpret the results (Franklin et al., 2015; Marriott et al., 2009; Wild & Pfannkuch, 1999). Using this process as a guide, we updated a classroom lesson from A Collection of Math Lessons (From Grades 3 through 6), by Marilyn Burns (1987), Chapter 9, Alphabetical Statistics, in which students explore letter-of-the-alphabet frequency of usage in print material. Although the lesson was implemented in a fifth-grade classroom in the Burns (1987) book, we revised and updated it to use it in Kim Given’s seventh-grade classroom over two class sessions. For more details about the lesson refer to Harkness et al. (2019) and the brief description in The Lesson section.

In this article, we focus on the second step of the process, study design: create a plan and collect data. Although national education policy organizations such as the National Council of Teachers of Mathematics and the Australian Education Council and others recommend students learn about study design throughout their schooling. We strongly agree with Shaughnessy (2007) that study design is often given little attention in schools in the United States and that is why we focused this research on the study design step or the statistical problemsolving process

The Guidelines for Assessment and Instruction in Statistics Education (GAISE) report focused solely on random sampling as a study design method (Franklin et al., 2007). This report was revised and the most recent version includes a slightly modified description of the study design step in the statistical problem-solving process: collect or consider [emphasis on new language] data (GAISE II, Bargagliotti et al., 2020). One reason that the word consider was added was so that data could include those “acquired from another source” (p. 14) with the primary statistical concern being to “look for, account for and explain variability” (p. 14). Accordingly, it is important that students understand that “data collection design impacts the scope of generalizability and the possible limitations in analysis and interpretation” (p. 14). The three levels of understanding for the study design step in the GAISE II report were significantly expanded as compared to the first GAISE report. Specifically, some of the “essentials” for each level include:

  • Level A requires students “have opportunities to … determine what data might be collected or retrieved to answer [statistical] questions” (p. 22)

  • Level B necessitates students “develop a critical attitude in analyzing data collection methods” (p. 17) and “shift away from smaller data sets and [they must start using technology]” (p. 44)

  • Level C expects students understand “the issues of bias and confounding variables in observational studies and their implications for interpretation” (p. 17) and the types of studies learned previously such as “whole groups, samples, and simple experiments” at a “deeper level” (p. 72).

Based on our experience as mathematics and statistics teachers, the textbooks we use typically include contrived statistical problems that provide small data sets or summaries of data and that often give few details about how the data were collected. Perhaps, allowing students to skip or grapple with the first two steps of the statistical problem-solving process provides teachers with more time to focus learning on statistical concepts related to data analysis or interpretation of findings. However, an understanding of study design is critical.

There are examples in the statistics education literature of middle school students collecting student-generated data from the class by hand (Bush et. al, 2015; Groth et. al, 2017; Kazak et al., 2018; Savard & Manuel, 2016; Zapata-Cardona, 2018) and of students entering student-generated data by hand into a spreadsheet application for analysis and communication of findings (Harkness et al., 2019; Lavigne & Lajoie, 2007; Prodromou, 2011). Two recent studies described middle-school student involvement in electronic data collection (Peters et al., 2019; Reid & Carmichael, 2015).

Research on elementary, middle, and high school students’ understanding of study design has included their understanding of sample variation, sample representativeness, and sampling bias (Watson & Morris, 2000). Student understanding of sampling bias was also studied by Watson and Kelly (2005) and measured by the 1996 National Assessment of Educational Progress as noted by Groth (2003). Research conducted by Watson (2008) supports the finding that elementary-aged children can (and should) access quite complex statistical concepts, including sampling methods.

In Groth’s (2003) study of high school and freshmen college students, he categorized student levels of understanding of study design resulting in a “working framework for describing high school students’ thinking in regard to statistical study design” (p. 266). In the spirit of Groth’s suggestion to repeat his study with younger students, we used his framework, a priori levels of understanding (prestructural, unistructural, multistructural, and relational) to categorize the comments that middle school students made about study design while participating in the revised and updated Alphabetical Statistics lesson. Our research question for this study was:

How do middle school students’ comments about study design align with the descriptions and categories (prestructural, unistructural, multistructural, and relational) as suggested by Groth (2003)?

We share our findings from seventh-grade students (12–13 years old) and compare them with the findings that Groth (2003) reported as well as with upper elementary and middle school students from other studies of students’ understanding of study design (Jacobs, 1999; Meletiou-Mavrotheris & Paparistodemou, 2015). We conclude with suggestions regarding the use of the findings in the classroom and suggestions for future research.

Groth’s (2003) framework is an adaptation of the Biggs and Collis (1982) structure of the observed learning outcome (SOLO) taxonomy. The SOLO taxonomy “identifies a hierarchy of responses to academic tasks, including five levels: prestructural, unistructural, multistructural, relational, and extended abstract” (Groth, p. 254). Groth conducted clinical interviews of 15 high school and early college students in which the students were given scenarios to prompt their thinking about experimental and nonexperimental study designs. In a nonexperimental study design, also called an observational study design, “the researcher observes values of the [variables of interest] for the sampled [participants], without anything being done to the [participants] (such as imposing a treatment)” (Agresti et al., 2017, p. 155). In an experimental study design, the researcher “[assigns participants] to certain experimental conditions [e.g., drug or placebo] and then observ[es] outcomes” (Agresti et al., 2017, p. 154). From the students’ responses Groth aligned the first four levels of the SOLO taxonomy with levels of thinking for each type of study design. We used Groth’s descriptions of these levels to analyze seventh-grade students’ understanding, based on their comments, during a lesson where they were presented with the revised Alphabetical Statistics scenario, in which it was most fitting to conduct a nonexperimental study. Therefore, we deemed the research as student “voice” because we used their verbatim comments as data.

According to Cook-Sather (2006), “[No] clear and definite conception exists for ‘student voice’ research” (p. 359) although particular words—rights; respect; and, “listening— surface repeatedly when researchers describe its use. Voice can denote students merely expressing their point of view on a topic or to students actively participating in generation of knowledge and action or praxis. For us, the use of student voice data provided the potential to reposition our middle school participants to: shape classroom power dynamics; garner respect; and, challenge us to listen (CookSather, 2006). We agree with Bishop (1993) who acknowledged the dangers of using “student-vacant” research projects to inform our instruction because, as teacher educators, our ultimate goal was that this research would cause us to think deeply about middle school students’ statistical thinking and help us plan future statistics lessons.

Educational policy organizations such as the National Council of Teachers of Mathematics and mathematics education researchers (Cooke & Adams, 1998; Jansen, 2006; Sherin, 2002; Vacc, 1993) have advocated the use of student discussion. Students engaged in discussion of mathematical concepts can encourage active sense making through speaking aloud, practicing reasoning and analyzing the reasoning of classmates. These experiences promote mathematical understanding rather than mere memorization of procedures and algorithms presented by the teacher. Class discussions allow teachers to informally assess contributions, scaffold ideas through questioning and direct the conversation toward significant mathematical ideas (Cooke & Adams, 1998; Jansen, 2006; Sherin, 2002; Vacc, 1993). Class discussions allow researchers to listen to student “voice.”

Shelly used an adaptation of the Marilyn Burns lesson several times with middle school students, high school students, and preservice teachers. Succinctly, her implementation transpired as follows: She showed a short video clip of the syndicated television show “Wheel of Fortune” and told students that her friend would be a contestant on the game show. This friend needed help to know what letters are the most frequently used so that she could choose those letters. Students were then asked to make and justify predictions. Next, she gave each group of students a page from a local newspaper and told them to count the frequencies of letters from one paragraph. After they tallied the frequencies, Shelly compiled the data from the entire class using a table format similar to the one in Figure 1. Based on the frequencies in the table, students were asked to revise their original predictions to help her friend. Shelly then revealed the list of letters from highest frequency to lowest frequency: “ETAO NI SRHLDCUPFMWY BG VK QX JZ” (Burns, 1987, p. 112, bold letters indicated a tie in frequency.] She then asked questions such as: Why do you think there are so many vowels in the most frequent six letters? What letters in this list surprise you and why? Why do you think your sample from the newspaper is or is not a good representation? How could you improve data collection so that it is more accurate and gives my friend a better chance to win on “Wheel of Fortune”?

Based on the recommendations from the SET document (Franklin et al., 2015), we decided that rather than impose the study design on the seventh-grade students—give them a page from a local newspaper and tell them to count the frequencies of letters from one paragraph—we would allow them to suggest, discuss, and create the study design. Indeed, we also thought a new scenario with cell phone text messages within the context would make it more relevant to middle school students. As noted previously, when Shelly used this lesson in the past, her students did not formulate the questions and the study design was specified. Although we decided to use two class sessions over 2 days for implementation, time was still an issue; therefore, we wrote the problem scenario. In this regard we did not strictly adhere to the first SET recommendation—formulate a question—but students would design the study, collect and analyze the data, and interpret the results they deemed would best help the teacher’s friend. We created the problem scenario:

My friend, Sasha, is going to be on the television show, “Wheel of Fortune.” There is a new category for the show called Pithy (synonyms: concise, succinct, terse, short) Text Messages. I told Sasha that the students in my class would help her figure out what letters she should select. What would be an appropriate study design in order to help Sasha?

Figure 1

Data Table Example

Figure 1

Data Table Example

Close modal

We expected Kim Given’s middle school students would be motivated to explore this problem scenario because they could use technology, the text messages on their cell phones, as data. More details about this lesson and how students collected, analyzed, and interpreted the results appear in Harkness et al. (2019).

Qualitative study design allows the researchers to focus on the participants’ meanings. Our qualitative study using student voice data facilitated our interpretation of students’ understanding of concepts related to study design.

All 20 students in Kim’s seventh-grade classroom in the Midwest United States participated in the study. This was not a subject matter class but an interdisciplinary class in which Kim used integrated lessons in all content areas. The use of this classroom was considered a convenience sample. According to Creswell (2012), “In convenience sampling the researcher selects participants because they are willing and available to be studied” (p. 145). For this study we analyzed the students’ discourse and matched their statements to Groth’s (2003) levels of thinking associated with designing a nonexperimental study: prestructural, unistructural, multistructural, and relational.

Two cameras were positioned in the classroom to record students’ and Kim’s comments. Other data included student work and field notes taken by Sarai and Shelly and reported in Harkness et al. (2019).

After transcribing students’ and teacher’s comments, Sarai and Shelly separately coded the transcripts, attempting to find student comments that exhibited Groth’s (2003) levels. They then met to discuss the individually coded student comments. When there was disagreement about the levels, they went back to Groth’s (2003) paper to look for more nuanced descriptions of the SOLO levels to resolve discrepancies and come to agreement. Table 1 shows Groth’s levels based on SOLO levels, Groth’s pattern descriptor, Groth’s nuanced descriptions and exemplars from the middle school students’ comments. We used all of the first three columns to code the students’ comments but relied most heavily on the third column with the nuanced descriptions.

Based on students’ interactions with the teacher, Kim, and their classmates we found multiple levels of thinking related to study design. The majority of students’ comments at the beginning of the lesson were at the unistructural level of thinking and some students’ comments progressed to the multistructural and relational levels. Based on sheer numerical data within the transcripts we found five comments from students deemed unistructural, four comments deemed multi-structural, and one comment deemed relational. However, these numbers do not capture the students’ voices and the statistical discussion that transpired. Therefore, we include student comments below. All student names are pseudonyms.

Table 1

Groth’s (2003) Descriptions and Exemplars From Our Study

Groth’s Levels Based on SOLO LevelsGroth’s Pattern DescriptorGroth’s Nuanced Descriptions of Students’ ResponsesExemplars From a Middle School Students’ Comments
PrestructuralNo design strategies articulated, but is aware of the existence of studies and empirical data.Use preexisting studies and information gathered from sources such as books, internet websites, and journals to determine answers; or reliance on others to already have gathered the data or no identification of data gathering techniques; or redundant or unnecessary responseNone
UnistructuralData collection without concern for representativeness.Acknowledge the need for empirical data; or initiate ideas for data gathering techniques; however, data gathering techniques not likely to be representative of the population from which drawn“Um, after a large sample size, the results don’t vary extremely or that if there are outliers, they don’t affect the overall … test?” (Henry)
MultistructuralData collection with concern for representativeness.Ascertain data gathering techniques with concern for representativeness; however, no unification of these two aspects in order to develop a coherent strategy for data gathering“Um, you kinda hafta figure out what defines a normal text message.” (Toby)
RelationalData collection with concern for representativeness and one or more methods to ensure it.Unify data gathering techniques and concern for representativeness; realize the importance of incorporating either random sampling, stratified random sampling, or both“I have a new idea. Instead of, like starting a new conversation, we should just go back and look at our old ones and, like, just look at, like, what our parents sent us as opposed to what our friends sent us as opposed to as what our, like coaches sent us.” (Micah)
Source:Groth (2003, p. 259).

At the beginning of the first day of the lesson, Kim introduced the problem scenario and asked the entire class how they might collect data. Please note: Suggesting the students collect data, may have prevented them from making any prestructural responses. Micah offered,

We could find some sort of, like, passage or anything on the internet and, like, just count out all the letters and figure out which ones have the most.

This response revealed a unistructural level of thinking because, as described in Groth’s (2003) framework, although Micah initiated an idea for a data gathering technique, it was not related to texting and thus the technique would not necessarily result in gathering data that represented the population of interest: all pithy text messages. Kim did not directly address Micah’s comment before she revealed the most common letters used in newspapers and books (Burns, 1987). She next reminded students of their first task:

I want you to put together a plan of how we could use the text messages we have access to in order to decide whether text messaging uses different letters in terms of frequency. OK, is everybody clear on what my question is? We’re trying to design what we would look for and how we would measure frequency of letter use in text messages.… You are trying to design a study.

She then assigned students to groups to design the study. During the group work, Micah shared another idea with his group,

I say we have a conversation about messaging and then count the letters.

Although his data gathering technique now used a more appropriate study design approach, this idea was still at the unistructural level of thinking because it did not show concern for representativeness, per Groth’s (2003) framework. It is also interesting to consider Micah’s idea that the conversation should have been about messaging; perhaps he confused the scenario with the statistical question.

After the group work, students came together for a whole-class discussion to share considerations they made in designing a study. Some groups volunteered responses and Kim asked for responses from groups who did not volunteer. Their responses revealed different levels of thinking about study design. Henry offered,

After a large enough sample size, the results don’t vary extremely or that if there are outliers, they don’t affect the overall … test?

This unistructural response revealed Henry’s intuitive or beginning understanding of statistical inference, familiarity with some statistical terminology, and confusion about how these ideas related to study design. In terms of the Groth (2003) framework, a coherent strategy for obtaining representativeness was missing from Henry’s comment. Alternatively, Toby verbalized grappling with the idea of representativeness by saying,

You kinda hafta figure out what defines a normal text message.

Although Toby’s multistructural suggestion referred to a “normal text message,” none of the students seemed concerned with the part of the scenario that described the category as pithy text messages and what the word pithy meant. Fiona said,

[You need to consider] a variety of people you texted … ’cause if you text adult—adults, you might text differently than you text your friends.

Fiona’s multistructural comment led us to believe she had connected representativeness to variation. This statement may have helped Micah develop his thinking and move from a unistructural to a multistructural level of thinking when he repeated his earlier statement and added:

I said that, like, we should have, like, a conversation on text messaging with somebody about, like, anything and then obviously count the letters. Then also we should do it to multiple people ‘cause some people may, like, use abbreviations and different abbreviations.

Micah had now unified his data gathering technique with a concern for representativeness, as described by Groth (2003), but he still did not incorporate the additional criterion necessary to reach the relational level of thinking: a coherent strategy for obtaining representativeness. A few minutes later, he moved toward the relational level of thinking by suggesting a stratification of the sample to obtain representativeness, saying:

I have a new idea. Instead of, like, starting a new conversation, we should just go back and look at our old ones and, like, just look at, like, what our parents sent us as opposed to what our friends sent us as opposed to as to what our, like, coaches sent us.

Micah’s revelation sparked a discussion about other important study design topics such as response bias and participant consent. Henry commented:

The old conversations thing might be useful because that way the person does not—your results are not affected by the fact that you know it’s part of the test … [by using old conversations], our data is not contaminated by knowledge.… Is it legal for us, like if you’re using a group chat or something else, isn’t it illegal for us to use [other people’s] text messages in part of a study without consent?

Unfortunately, due to the time constraints of the class sessions, a discussion regarding research ethics and how sample size and random sampling related to representativeness did not occur. These ideas emerged on the second day, at the end of the lesson, when students were asked about their confidence in being able to tell Sasha which letters to pick. Students commented that if they could have taken a bigger sample of text messages, they would have felt more confident that their results were representative. No student suggested taking a random sample. Nor did anyone express concern about using self-selected texts, all either to or from these seventh-grade students. Additionally, the students also perceived human fallibility resulting in counting errors as a significant obstacle to the representativeness of the sample. Most students’ comments focused on their preference for the use of technology for data gathering, tallying and counting, to decrease the occurrence of such errors.

Yet, even without time available to discuss these study design concepts further, Micah progressed through the levels of understanding of study design. He began the lesson at the Unistructural level of understanding—suggesting data collection without concern for representativeness—and by considering Kim’s questions and collaborating with others he progressed level by level to the relational level of understanding. Ultimately, he suggested data collection with concern for representativeness and a method, albeit somewhat naïvely conceived, for ensuring it.

Our findings were similar to those of previous studies of students’ understanding of study design with some noteworthy differences. Unlike Groth’s (2003) study of high school and early college students, no students in the present study provided responses at the pre-structural level, perhaps because Kim suggested they would collect data. The middle school students in our study employed study design terminology used by some of the students in Groth’s study such as sample and sample size. However, not surprisingly, students in this study did not use the term stratify, which was used by some of Groth’s students who would have been exposed to this term in a high school or college-level statistics course. However, we expected them, like some of Groth’s students, to use the term random sample because it is in the CCSS-M seventh-grade standards (National Governors Association Center for Best Practices & Council of Chief State School Officers, 2010) but they did not.

Similar to findings in Groth’s (2003) study and in studies of elementary students (Jacobs, 1999; Meletiou-Mavrotheris & Paparistodemou, 2015), although they did not use the term stratify, students in this seventh-grade class proposed collecting data through stratification rather than through random selection. A stratified random sample divides the population into separate groups (strata) based on shared characteristics and then takes random samples proportionate to the population size of each stratum. The students in Kim’s class collected data from messages sent by parents, friends, and coaches. Their rationale for stratification was to achieve representativeness, like some students in Groth’s study. They used a naïve conception of stratified random sampling because they did not consider the proportion of population in each stratum and they did not consider randomly selecting messages.

Additionally, like a few of Meletiou-Mavrotheris and Paparistodemou’s (2015) sixth-grade students and most of Watson and Moritz’s (2000) ninth-grade students, the seventh-grade students in the present study suggested selecting a larger sample to improve representativeness. Similar to students in Jacobs’s (1999) and Meletiou-Mavrotheris and Paparistodemou’s (2015) studies, students in this study did not express concern about using self-selected texts. The reasons for their lack of concern about this type of bias were unclear based on their comments. The students were primarily concerned about the human error in counting the occurrences of each letter with comments such as, “Do we need to double check [the counts], have a second person review the data?” and “People might have lost place while counting,” and “Did everyone actually count 150 letters?” and “[It would be better to] use a computer to help with data collection.” The students also criticized the tediousness of counting letters by hand several times.

The lesson provided information about students’ gaps in knowledge regarding study design. Although a representative sample maintains the proportions of observation characteristics that are in the population (Shaughnessy, 2006), students ignored the potential effects of bias and believed that a larger sample alone would improve its representativeness. Students’ understanding of stratified random sampling was intuitive and naïve as they did not acknowledge the importance of proportionality.

Further, the lesson aligned with the AMLE (2012) recommendation, “for active learning … direct experience and interaction with the physical, intellectual and social environments” (p. 14). Additionally, Kim was more of a facilitator of learning with her methods when students engaged in: experiential learning, interdisciplinary problem-solving, inquiry, and group discussions (Gutek, 2004; Olivia, 2009; Schiro, 2008, as cited in Edwards et al., 2014, p. 14).

As previously mentioned, no students in the present study provided responses at the pre-structural level. This may have been attributed to statements made by the teacher, a possible limitation of the study. When Kim introduced the scenario, she suggested collecting data and later asked students to use “texts we have access to” to design the study. Thus, it is unknown whether any students would have offered prestructural level responses without these prompts. Additionally, perhaps another limitation was the use of student voice research which privileges experience over theory. We must not assume that voice data from the students who participate in whole group discussions are representative of the entire class of students (Hadfield & Haw, 2001). Yet, by using student voice data, we felt as though we acknowledged the role that the students played in shaping our research, reporting our findings, and designing future lessons.

At the time of this study the mathematics standards used by public schools in Ohio were the CCSS-M. The standards may or may not have been strictly followed by the students’ mathematics teachers from previous years of schooling and their current year of schooling; we collected no data about previous mathematics experiences that these middle school students had prior to the implementation of this lesson.

Curiously, the statistics standards in CCSS-M through Grade 8 do not include the statistical problem-solving cycle: (a) formulate a question; (b) create a plan and collect data; (c) analyze the data; and (d) interpret the results (Franklin et al., 2015; Marriott et al., 2009; Wild & Pfannkuch, 1999). However, the concepts of a statistical question, variability, and the use of random sampling to draw informal inferences about a population were included in these standards (National Governors Association Center for Best Practices & Council of Chief State School Officers, 2010). The statistical problem-solving process is a dimension of statistical thinking (Wild & Pfannkuch, 1999) and thus including this process in the standards would incorporate the aforementioned concepts in a manner that could facilitate statistical thinking. This consequential omission appears to have been recognized by the Ohio Department of Education because Ohio revised its mathematics standards in 2017 to include references to the GAISE (2007) report and the steps of the statistical problem-solving process therein.

The focus of this open-ended classroom lesson was to engage students in learning about the often overlooked or contrived study design step of the statistical problem-solving process at the middle school level. This lesson had the advantages of being a real-world context to students in this age group, in part because it included using a popular technology, text messaging, and because it necessitated collecting data to answer the research question. An added benefit of the teacher questioning and dialog between students during this lesson was the resulting insightful information for Kim about her students’ understanding of study design. In other words, the lesson provided Kim with an opportunity “to develop pedagogical content knowledge … knowledge about common student conceptions and thinking patterns” (Franklin et al., 2015, p. 2).

The recognition of where statistics diverges from mathematics is noted in the SET document as another important part of the statistics education of teachers (Franklin, et al., 2015). Some statistical concepts in this lesson, such as bias and sampling methods, were outside of but connected to the mathematics realm. Further, the statistical question is another non-mathematical concept but we defined it for the students in this lesson. Perhaps this was a flaw in the planning of the lesson. We should have used a scenario that allowed students to formulate the question:

What does Sasha need to know in order to be successful when she makes guesses in the Pithy Text Messages category on “Wheel of Fortune”?

However, there was additional room for improvement. First, although the term pithy was discussed at the beginning of the lesson as the type of text message of interest, the seventh-grade students did not discuss or decide about the length of text messages to use for the sample. Second, the lesson only involved categorical data. These issues could be easily addressed by modifying future lessons to include additional questions about text messages that require a numeric response such as, “What is the typical length of a text message?” Third, was the amount of time that was available. In the future, three class sessions should be designated for this lesson so that rich discussions about other important aspects of the statistical problem-solving process such as formulating the statistical questions, statistical vocabulary, sample size, sampling methods, and bias could be addressed by the teacher.

Furthermore, Kim had never used a statistical lesson in her classroom before our collaborative effort. Based on the results of our data analysis and multiple conversations between the three of us about her students’ understanding of the statistical problem-solving process, she felt that if given the opportunity to implement this lesson again in her classroom with another group of students she would refrain from suggesting students should collect data to possibly see a greater range of responses on Groth’s (2003) levels when considering study design. Kim also thought she would put more planning and thought into questions related to students’ naïve understandings as they generated their study design. There were several key elements of students’ current levels of statistical and study design understanding that this lesson had the opportunity to correct or clarify that were missed due to time constraints and Kim’s self-reported lack of experience in teaching this content to students. However, overall, she felt the student comments were highly valuable to her own understanding of students’ current levels of understanding. She noted that she would, in future implementations of the lesson, build in more “try it out” experiences between discussions to allow students more concrete exploration of concepts in between theoretical discussion. In a middle school setting, the amount of time between action and reflection works best in shorter cycles to better engage all members of the class (McCoy, 2013). Ultimately, Kim believed allowing students more opportunities to test out theories may have led to a deeper understanding of the study design step, of the statistical problem-solving process, based on Groth’s levels (2003).

Future research could include other iterations of this lesson with middle school students. Postlesson interviews could be included to follow up and clarify student responses such as Micah’s initial comments about text messaging. Teachers’ understanding of the statistical problem-solving process and how to orchestrate discussions about study design should be researched further to enable teachers to achieve the goals for teacher preparation described in the SET document (Franklin et al., 2015). Specifically, SET recommendations include that middle school teachers of statistics know how to:

  • use random selection in the design of a sampling plan;

  • recognize the connections between study design and interpretation of results; consider issues such as bias, confounding, and scope of inference;

  • understand that one goal of statistical inference is to generalize results from a sample to some larger population; and

  • recognize that generalization from a sample require random selection (pp. 22–23).

Beyond engaging students in learning about the study design step of the statistical problemsolving process, this study—using student voice data—illustrated how students make sense collaboratively with the help of others. This was particularly evident in observing how the discussion with classmates and the teacher during the lesson facilitated Micah’s progressive movement along Groth’s (2003) levels of understanding of study design. This study also brought to fore challenges in implementing a lesson that included the study design step of the statistical problem-solving process. Enough time must be available for teachers to successfully orchestrate discussions that provide insight into and that improve students’ understandings of study design. These challenges can be met most successfully with a teacher educated in statistics.

Statistics education includes study design concepts such as bias and sampling methods that are not directly mathematical, which can pose a new challenge for mathematics teachers and mathematics teacher educators. A barrier to implementing the study design step of the statistical problem-solving process in the mathematics curriculum is the lack of inclusion of the statistical problem-solving process in the CCSS-M. With its inclusion in the CCSS-M, development of lessons that engage students in study design and research on students’ and teachers’ understanding of it, study design can cease to be the often contrived or overlooked step in the statistical problem-solving process.

Agresti
,
A.
,
Franklin
,
C.
, &
Klingenberg
,
B.
(
2017
).
Statistics: The art and science of learning from data
. ( (4th) ed.).
Pearson
.
Association for Middle Level Education
.
(
2012
).
This we believe in action: Implementing successful middle level schools
.
Bargagliotti
,
A.
,
Franklin
,
C.
,
Arnold
,
P.
,
Gould
,
R.
,
Johnson
,
S.
,
Perez
,
L.
, &
Spangler
,
D. A.
(
2020
).
The pre-K–12 guidelines for assessment and instruction in statistics education II (GAISEII)
.
American Statistical Association
. www.amstat.org.asa/files/pdfs/GAISE/GAISEIIPreK-12_Full.pdf
Biggs
,
J. B.
, &
Collis
,
K. F.
(
1982
).
Evaluating the quality of learning: The SOLO taxonomy
.
Academic Press
.
Bishop
,
W.
(
1993
). Students’ stories and the variable gaze of composition research. In
S. I.
Fontaine
&
S.
Hunter
(Eds.),
Writing ourselves into the story
(pp.
197
14
).
Southern Illinois University Press
.
Burns
,
M.
(
1987
).
A collection of math lessons from Grades 3 through 6
.
Math Solutions Publications
.
Bush
,
S. B.
,
Karp
,
K. S.
,
Albanese
J.
, &
Dillon
,
F.
(
2015
).
The oldest person you’ve known
.
Mathematics Teaching in the Middle School
,
20
(
5
),
278
285
.
Cook-Sather
,
A.
(
2006
).
Sound, presence, and power: “Student voice” in educational research and reform
.
Curriculum Inquiry
,
36
,
359
390
.
Cooke
,
L. B.
, &
Adams
,
V. M.
(
1998
).
Encouraging “math talk” in the classroom
.
Middle School Journal
,
29
(
5
),
35
40
.
Creswell
,
J. W.
(
2012
).
Educational research: Planning, conducting, and evaluating quantitative and qualitative research
( (4th) ed.).
Pearson
.
Edwards
,
S.
,
Kemp
,
A. T.
, &
Page
,
C. S.
(
2014
).
The middle school philosophy: Do we practice what we preach or do we preach something different?
Current Issues in Middle Level Education
,
19
(
1
),
13
19
.
Franklin
,
C.
,
Kader
,
G.
,
Bargagliotti
,
A.
,
Scheaffer
,
R.
,
Case
,
C.
, &
Spangler
,
D.
(
2015
).
The statistical education of teachers
.
American Statistical Association
.
Franklin
,
C.
,
Kader
,
G.
,
Bargagliotti
,
A.
,
Mewborn
,
D.
,
Moreno
.,
J.
,
Peck
.,
R.
,
Perry
,
M.
, &
Scheaffer
,
R.
(
2007
).
Guidelines for assessment and instruction in statistics education (GAISE) report: A pre-K–12 curriculum framework
.
American Statistical Association
. www.amstat.org/ASA/Education/Guidelines-for-Assessment-and-Instruction-in-Statistics-Education-Reports.aspx
Groth
,
R.
(
2003
).
High school students’ levels of thinking in regard to statistical study design
.
Mathematics Education Research Journal
,
15
(
3
),
252
269
.
Groth
,
R. E.
,
Jones
,
M.
, &
Knaub
,
M.
(
2017
).
Working with noise in bivariate data
.
Mathematics Teaching in the Middle School
,
23
(
2
),
82
89
.
Hadfield
,
M.
, &
Haw
,
K.
(
2001
).
‘Voice’, young people and action research
.
Educational Action Research
,
9
(
3
),
485
502
.
Harkness
,
S. S.
,
Hedges
,
S.
, &
Given
,
K.
(
2019
,
January
2
).
A technology twist on a classic statistics lesson
.
Statistics Teacher
. www.statistic-steacher.org/2019/01/02/tech-twist-on-lesson/
Jacobs
,
V. R.
(
1999
).
How do students think about statistical sampling before instruction?
Mathematics Teaching in the Middle School
,
5
(
4
),
240
263
.
Jansen
,
A.
(
2006
).
Seventh graders motivations for participating in two discussion-oriented mathematics classrooms
.
The Elementary School Journal
,
106
(
5
),
409
428
.
Kazak
,
S.
,
Pratt
,
D.
, &
Gökce
,
R.
(
2018
).
Sixth grade students’ emerging practices of data modelling
.
ZDM - Mathematics Education
,
50
(
7
),
1151
1163
.
Lavigne
,
N. C.
, &
Lajoie
,
S. P.
(
2007
).
Statistical reasoning of middle school children engaging in survey inquiry
.
Contemporary Educational Psychology
,
32
,
630
666
.
Marriott
,
J.
,
Davies
,
N.
, &
Gibson
,
L.
(
2009
).
Teaching, learning and assessing statistical problem solving
.
Journal of Statistics Education
,
17
(
1
). www.amstat. org/publications/jse/v17n1/marriott.html
McCoy
,
B.
(
2013
).
Active and reflective learning to engage all students
.
Universal Journal of Educational Research
,
1
(
3
),
146
153
.
Meletiou-Mavrotheris
,
M.
, &
Paparistodemou
,
E.
(
2015
).
Developing students’ reasoning about samples and sampling in the context of informal inferences
.
Educational Studies in Mathematics
,
88
,
385
404
.
National Governors Association Center for Best Practices & Council of Chief State School Officers
.
(
2010
).
Common core state standards
.
Peters
,
S. A.
,
Gross
,
M.
, &
Stokes-Levine
,
A.
(
2019
).
Project-based statistics: Capitalizing on students’ interests
.
Mathematics Teaching in the Middle School
,
24
(
6
),
330
336
.
Prodromou
,
T.
(
2015
).
Teaching statistics with technology
.
Australian Mathematics Teacher
,
71
(
3
),
32
40
.
Reid
,
J.
, &
Carmichael
,
C.
(
2015
).
A taste of Asia with statistics and technology
,
Australian Primary Mathematics Classroom
,
20
,
10
14
.
Savard
,
A.
, &
Manuel
,
D.
(
2016
).
Teaching statistics: creating an intersection for intra and interdisciplinarity
.
Statistics Education Research Journal
,
15
(
2
),
239
256
.
Shaughnessy
,
J. M.
(
2006
). Research on students’ understanding of some big concepts in statistics. In
G. F.
Burrill
&
P. C.
Elliott
(Eds.),
Thinking and reasoning with data and chance: Sixty-eighth yearbook
(pp.
77
98
).
National Council of Teachers of Mathematic
.
Shaughnessy
,
J. M.
(
2007
). Research on statistics learning and reasoning. In
F. K.
Lester
, Jr.
(Ed.),
Second handbook on research on mathematics teaching and learning
(pp.
957
1009
).
Information Age
.
Sherin
,
M. G.
(
2002
).
A balancing act: Developing a discourse community in a mathematics classroom
.
Journal of Mathematics Teacher Education
,
5
(
3
),
205
233
.
Vacc
,
N. N.
(
1993
).
Implementing the “professional standards for teaching mathematics”: Teaching and learning mathematics through classroom discussion
.
The Arithmetic Teacher
,
41
(
4
),
225
227
. www.jstor.org/stable/41195987
Watson
,
J. M.
(
2008
).
Exploring beginning inference with novice grade 7 students
.
Statistics Education Research Journal
,
7
(
2
),
59
82
.
Watson
,
J.
, &
Kelly
,
B.
(
2005
).
Cognition and instruction: Reasoning about bias in sampling
.
Mathematics Education Research Journal
,
17
(
1
),
24
57
.
Watson
,
J.
, &
Moritz
,
J.
(
2000
).
Developing concepts of sampling
.
Journal for Research in Mathematics Education
,
31
(
1
),
44
70
.
Wild
,
C. J.
, &
Pfannkuch
,
M.
(
1999
).
Statistical thinking in empirical enquiry
.
International Statistical Review
,
67
(
3
),
223
265
.
Zapata-Cardona
,
L.
(
2018
).
Students’ construction and use of statistical models: A socio-critical perspective
.
ZDM - Mathematics Education
,
50
(
7
),
1213
1222
.
Licensed re-use rights only

or Create an Account

Close Modal
Close Modal