Differential effects of additional formative assessments on student learning

It is well-established that formative assessments with accompanying feedback can enhance learning. However, the degree to which additional formative assessments on the same material further improve learning outcomes remains an open research question. Moreover, it is unclear whether providing additional formative assessments impacts self-regulated learning behavior, and if the benefits of such assessments depend on students’ self-regulated learning behavior. The current study, conducted in a real-world blended learning setting and using a Learning Analytics approach, compares 154 students who completed additional formative assessments with 154 students who did not. The results indicate that the additional formative assessments led to an improvement in learning outcomes, but also had both positive and negative effects on students’ self-regulated learning behavior. Students who completed additional formative assessments performed better on the assessments but reported lower levels of subjective comprehension and devoted more time to completing exercises. Simultaneously, they devoted less effort to additional learning activities (additional investment), such as class preparation and post-processing. Furthermore, the impact of additional formative assessments on learning success depended on students’ self-regulated learning behavior. It was primarily the students who invested above-average time during formative assessments (time investment) who benefited from the additional exercises. Cluster analysis revealed that high-effort students (those with above-average time investment and above-average additional investment) gained the most from the extra exercises. In contrast, low-effort students and those who achieved high performance with relatively low effort (efficient students) did not benefit from additional formative assessments. In conclusion, providing students with additional formative assessments can enhance learning, but it should be done with caution as it can alter self-regulated learning behavior in both positive and negative ways, and not all students may benefit from it equally.


Introduction
The testing effect, a well-established learning technique ( Jensen et al., 2020), denotes the enhanced learning success observed when students actively engage with learned material, rather than relying on passive repetition or memorization (Schwieren et al., 2017).The effect is typically quantified by comparing the post-learning performance of learners who participated in active retrieval during the learning phase to those who did not engage in such practices.An instance of active retrieval is solving formative assessments, wherein learners apply their knowledge to solve problems (Boston, 2002).
A significant testing effect was demonstrated in both experimental and applied settings (Lamotte et al., 2021;Schwieren et al., 2017).It is evident in tasks ranging from relatively simple ones, like vocabulary memorization, to more complex tasks that involve applying theoretical knowledge to novel situations (Schwieren et al., 2017).The effect occurs when exercises during the learning phase are identical to those measuring learning success (Carpenter, 2011;Eriksson, et al., 2011;Karpicke & Roediger, 2007) and when non-identical exercises covering the same material are employed (Batsell et al., 2017;Foss & Pirozzolo, 2017;Francis et al., 2020;Jensen et al., 2020;McDaniel et al., 2013).
One open research question regarding the testing effect concerns whether multiple testing instances result in greater learning success than administering a single test.Multiple testing can include either the repeated use of the same test or the utilization of different tests covering the same content (Yang et al., 2021).A meta-analytic review conducted by Adesope et al. (2017) did not identify a significant difference between the testing effects of multiple tests versus a single test on the same material.However, a more recent review by Yang et al. (2021) discovered that, in applied contexts such as classrooms, the testing effect was more pronounced when the same test (or a similar test with the same content) was administered repeatedly.Additional empirical DIFFERENTIAL EFFECTS OF ADDITIONAL FORMATIVE ASSESSMENTS ... research is required to determine whether multiple tests on the same content yield a more pronounced testing effect (Yang et al., 2021).
Assessing the effectiveness of additional exercises on identical content is crucial for practical implementation, particularly given the laborious task of generating such exercises.Moreover, the provision of additional learning materials may not always lead to increased learning, and can even have a negative impact on learning in some cases by inducing cognitive overload or stress (Kossen & Ooi, 2021).
As additional formative assessments on the same content have the potential to influence self-regulated learning both positively and negatively, a learning analytics approach was employed to investigate the impact of additional formative assessments on self-regulated learning behavior.This approach involves the collection and analysis of data on students' learning behavior and progress, to enhance learning and teaching (Chatti et al., 2013;Ifenthaler, 2015;Leitner et al., 2017).The utilization of such data has attracted attention in the field of self-regulated learning, as it enables the monitoring of the learner's holistic action without interference in the process (Winne & Baker, 2013).
Prior research suggests that formative assessments have a positive effect on self-regulated study time allocation and monitoring (Clariana & Park, 2021;Fernandez & Jamet, 2017;Perry & Winne, 2006;Soderstrom & Bjork, 2014;Yang et al., 2017).Engagement in solving exercises enhanced students' monitoring of their knowledge, leading to a reduction in the overestimation of their knowledge and an increase in time allocated for studying (Soderstrom & Bjork, 2014).The positive impact of exercise solving on learning success was partially mediated by improved monitoring and learning behavior (Fernandez & Jamet, 2017).This is because additional formative assessments can provide feedback on learning status, aiding learners in metacognitive control, and adapting their self-regulated learning behavior (Clariana & Park, 2021;Perry & Winne, 2006).
Furthermore, the efficacy of additional formative assessments likely depends on individual differences in students' characteristics (Bertilsson et al., 2021).Prior knowledge and experience are also important factors, as students with more prior knowledge tend to benefit more from the testing effect than those with less prior knowledge (Cogliano, et al., 2019;Francis et al., 2020).According to the elaborative retrieval hypothesis, mental effort during recall predicts the magnitude of the testing effect, and practicing with exercises that are challenging but within the learner's abilities can enhance the effect (Carpenter, et al., 2009;Greving et al., 2020;Minear et al., 2018).In addition, students who already possess effective learning strategies may not benefit as much from formative assessments as those who lack such strategies (Robey, 2019).This is because they have already achieved high learning success even without formative assessments, and the effect of testing may not significantly enhance their learning outcomes.
Not only the number but also the timing of formative assessments can influence their effectiveness (Karpicke & Bauernschmidt, 2011).In line with current research on spaced learning (Greene, 2008;Jost et al., 2021), evenly distributed learning throughout the semester is more effective in promoting learning success than cramming right before a test (Adesope et al., 2017;Karpicke & Bauernschmidt, 2011).
In summary, the positive impact of testing on learning can be influenced by multiple factors, including individual student characteristics (e.g., hope of success, prior knowledge, investment, timing).It is plausible to assume that students with limited prior knowledge, low motivation, low investment, frequent incorrect responses, last-minute study habits, and shallow feedback processing may not benefit as much from extra exercises with feedback as other students.Therefore, various types of students may exist, and some may not experience the same benefits from additional formative assessments.
In this study, a clustering based on the data collected through the learning analytics approach was employed to examine whether the impact of solving additional formative assessments on end-of-semester knowledge test performance is influenced by student self-regulated learning behavior.Cluster analysis is a method that divides students into groups based on similarities within a cluster and dissimilarities between clusters (Dalmaijer et al., 2021;Shin & Shim, 2021).
To sum up, although there is significant evidence supporting the idea that formative assessments improve learning outcomes, it is still uncertain whether administering additional assessments on the same content leads to a more substantial effect and how this practice influences self-regulated learning behavior.Additionally, studies indicate that the benefits of additional formative assessments may differ based on students' self-regulated learning characteristics.As a result, this study aims to examine the following three hypotheses: 1. Administering additional formative assessments of the same content leads to higher performance on an end-of-course knowledge test.2. Administering additional formative assessments of the same content influences self-regulated learning behavior.

The relationship between administering additional formative
assessments and learning success depends on students' self-regulated learning behavior.

Method
1.1 Participants To be included in the study, students had to take both the prior knowledge test at the beginning of the semester and the end-of-semester knowledge test.They also had to complete at least four of the five formative assessments (not including the additional formative assessments).In addition, students with very low performance in the formative assessments or in the end-of--semester knowledge test (clear outliners) were excluded (three students).Of the 276 students enrolled in the course in 2020, a total of 194 met the inclusion criteria, while in 2021 a total of 166 of the 234 students enrolled met the inclusion criteria.This represents approximately 70% of all enrolled students in both courses.Out of the total 360 students evaluated (194 + 166), 324 had no missing values and 36 students were missing one of the five formative assessments.To address this, missing values for the 36 students (21 from 2020's 194 and 15 from 2021's 166) were imputed using the mice function (van Burren & Groothuis-Oudshoorn, 2011) employing a predictive mean matching approach.
The aim of the current study was to compare students solving most additional exercises with students not solving any additional exercises.Thus, students without access to any extra exercises were labeled "non-solvers" while those who finished at least four out of the five additional assessments were referred to as "solvers".From the 2021 cohort, 12 students who completed fewer than four additional formative assessments were excluded.Accordingly, 194 students were identified as "non-solvers" and 154 as "solvers".The study was approved by the local ethics committee.

Procedure
The mandatory "Psychological Diagnostics" course for master's students in psychology focused on complex methodological content such as equivalence analysis and item response theory.Students were permitted to choose the learning approach they found most suitable for the specific learning situation, as their learning behavior was not explicitly manipulated.Consequently, this study examined the impact of extra exercises compared to any other learning approach, including no learning at all.
Due to ethical concerns, students were not randomly assigned to groups.Instead, this study adopted a quasi-experimental approach, evaluating the same course across two successive years, 2020 and 2021.The only variation introduced between the two years was the inclusion of optional additional formative assessments in 2021.The additional formative assessment consisted of new exercises covering the same content as the initial assessment.All other elements of the course, including initial exercises, podcasts, literature, and DIFFERENTIAL EFFECTS OF ADDITIONAL FORMATIVE ASSESSMENTS ... instructions, were kept consistent across both years.Participation in the study was voluntary.All students had access to the standard learning materials.However, those who volunteered to participate in the study received additional feedback post-exam: z-standardized values of all variables specified in the method section, enabling them to compare their performance with that of the entire group.Non-participants did not receive this supplementary information.All data were pseudo-anonymized using unique pseudonyms.Based on the pseudonyms, there were no students who attended the course in both years, 2020 and 2021.
A blended learning approach was employed in both years, allowing students to engage with course material at their own pace and participate in timely online discussions.The curriculum included 12 weekly podcast lectures, each 90 minutes long, a prior knowledge test, and five biweekly formative assessments covering two lectures each.Data collection occurred during the formative assessments.
Upon completing each exercise, students received immediate feedback and suggestions for supplementary resources, including relevant literature, lecture slides, podcast excerpts, and additional links or references.Students could ask further questions on the feedback page, which were addressed in a forum or, if necessary, through scheduled online discussions.
Students were advised to adhere to a one-week submission window for formative assessments, facilitating an even distribution of their learning throughout the semester.This structure was consistent over both years; however, the frequency of formative assessments varied.In 2020, formative assessments were assigned every two weeks, whereas in 2021, with the addition of supplementary formative assessments, they occurred weekly.Despite the change in frequency, the exercises, including the additional assessments, could be repeated and remained accessible to students until the final exam.
Two weeks before the final exam, a comprehensive end-of-semester knowledge test with new exercises covering the entire course content was administered.All exercises and self-reports were designed and hosted using Qualtrics (Qualtrics, Provo, UT).Response latency, accuracy, and time spent on feedback pages were assessed.Links to the Qualtrics questionnaires were embedded into the ILIAS learning management system, where all essential learning resources, such as podcasts and literature, were made available to students.
To identify clusters of response behavior, data from the initial formative assessments were utilized.Additional formative assessments were not included because they were only accessible to the solvers.
In the next section, the behavioral and self-reported learning analytics data gathered are presented.Self-reports were used to capture perceptions such as subjective knowledge, subjective investment, and subjective importance.NATALIE BORTER Furthermore, for variables over which I did not have full control, as I intentionally permitted students to learn in ways they found most suitable for the specific learning situation, such as downloading or printing materials; self-reports were employed.The drawbacks of self-reports were minimized by employing pseudonymization to reduce the impact of social desirability and by using precise questions to decrease the likelihood of recall errors.For our main emphasis, the formative assessments, offline solutions were not allowed, so behavioral data were utilized.

Prior knowledge
Prior knowledge was assessed with 19 multiple-choice exercises.The test contained mainly theoretical exercises and calculations concerning real--world applications of the knowledge acquired during the bachelor's program (e.g., reliability, validity).One sum score was built for the 13 exercises covering theoretical exercises and one for the six exercises covering calculations.

Performance in the formative assessments
For each of the five formative assessments, covering different content such as item response theory, confirmatory factor analysis, equivalence analysis, and criterion-referenced testing, a sum score was built.The number of exercises per formative assessment ranged from 10 to 24.The exercises consisted primarily of multiple-choice exercises, in which the theoretical knowledge acquired from the podcast was applied to concrete situations.The sum scores of the five assessments were highly related, with a Cronbach's alpha of 0.76.The sum of all five formative assessments was used for further analyses.

End-of-semester knowledge test
The dependent variable of the current study was the performance in the end-of-semester knowledge test.It consisted of 22 exercises.In contrast to the exam, the knowledge test covered only the content of the formative assessments and was identical in 2020 and 2021.The correlation between the end-of-semester knowledge test and the final grades was comparable in both cohorts -2020 (r = 0.52, p < 0.05) and 2021 (r = 0.57, p < 0.03).

Time investment
Response latency was recorded for each task and the feedback page.To reduce the effect of strong outliers, for each time measure, all values greater than the 95th percentile were trimmed to the 95th percentile.As the response latencies were still highly right-skewed, each time measure was logarithmized.Thereafter, all response latencies were z-standardized and the first strong DIFFERENTIAL EFFECTS OF ADDITIONAL FORMATIVE ASSESSMENTS ... principal component of the response latencies on the exercise page (explaining 43% of the variance), and the first strong principal component of the response latencies on the feedback page (explaining 46% of the variance) were extracted and used for further analyses.

Number of completions
The "completions initial exercises" variable was computed for the initial formative assessments, considering the count of completions for both identical and distinct formative assessments.For the variable "completions overall" the total number of completions (also including the additional formative assessments) was calculated.The number of completions overall was categorized into six groups: 1 = up to ten completions; 2 = 11-15 completions; 3 = 16-20 completions; 4 = 21-25 completions; 5 = 26-30 completions; 6 = more than 30 completions.

Questions for the forum
Across all exercises of the formative assessments, the frequency with which questions were posed by students on the feedback page was recorded.This variable was highly right-skewed, and therefore the values were logarithmized.

On time / regularity
As a measure of regularity, it was counted how often students finished the formative assessments during the recommended one-week submission window.

Subjective knowledge
At the beginning of each formative assessment, students rated their subjective understanding of the content covered in the respective exercise session on a five-point scale (1 = I don't know this concept, 2 = I don't understand this concept well, 3 = I understand this concept less well, 4 = I understand this concept well, 5 = I understand this concept very well).First, the average of these ratings was taken for each formative assessment, and then the first strong principal component (explaining 65% of the variance) was extracted from the averaged ratings across all five formative assessments (excluding the additional formative assessment) and used for further analyses.

Subjective investment
After each formative assessment, students rated on a four-point scale their effort level in attempting to complete the exercises to the best of their ability (1 = I didn't try hard, 2 = I tried a little, 3 = I tried a lot, 4 = I tried hard).The average of these ratings was calculated and used for further analyses.

Lectures
At the beginning of each formative assessment, students indicated whether they had listened to the podcasts of the two lectures covered in the formative assessment (1 = I listened to neither of the two podcasts, 2 = I listened to parts of both podcasts, 3 = I listened to at least one of the podcasts completely, 4 = Yes, I listened to both podcasts completely).The mean value of this variable, computed across the five formative assessments, was utilized for subsequent analyses.

Reading forum
At the beginning of the end-of-semester knowledge test, students indicated on a three-point scale whether they had read the forum posts before (0 = I never read the forum, 1 = I read the forum only when I had questions, 2 = I read all forum posts at least once).

Compulsory literature
At the outset of each formative assessment, students specified their engagement with the mandatory literature, which, in combination with lectures, served as the foundational preparation for the assessment: (1) indicated they read at least some part of the mandatory literature, while (0) denoted they did not engage with it.
The mean value of this variable, computed across the five formative assessments, was utilized for subsequent analyses.

Relevance of content
On a four-point scale (false, somewhat false, somewhat true, true) students responded to the following questions about the content of the course: • I find "Psychological Diagnostics" interesting.
• I think my knowledge of "Psychological Diagnostics" will be useful to me in the future.• I think it is important to learn "Psychological Diagnostics" in psychology education.
The average of the three items was used for further analyses.

Learning hours during semester holidays
The students reported the number of hours they dedicated to studying for the exam following the final lecture of the semester.As data were highly right-skewed, they were logarithmized.
DIFFERENTIAL EFFECTS OF ADDITIONAL FORMATIVE ASSESSMENTS ...

Results
All analyses were conducted in R version 3.6.1 (R Core Team, 2021).

Descriptive statistics
Given the quasi-experimental design of the study, it was crucial to establish that there were no initial differences between the students from 2020 and 2021 in terms of "prior knowledge" and "subjective relevance of the content" at the beginning of the course.To compare the means for these measures, an equivalence analysis was conducted (Bentler & Satorra, 2010).For "prior knowledge," a two-factor solution (theory and calculations) was compared to a one-factor solution.The significant Chi-square difference (∆χ²(1) = 67.70,p < 0.001) indicated that the two-factor model (χ²(151) = 178.43,p = 0.063, CFI = 0.934, RMSEA = 0.022, SRMR = 0.047) provided a better fit to the data than the one-factor model (χ²(152) = 237.07,p < 0.001, CFI = 0.796, RMSEA = 0.039, SRMR = 0.055).Consequently, prior knowledge is more accurately represented by a two-factor solution.The two factors, theory and calculations, were correlated (r = 0.53, p < 0.01).
In Table 1, mean (standard deviation), skewness and kurtosis of the variables considered in the study are provided for the entire sample (N = 308), for the solvers (N = 154) and for the non-solvers (N = 154).The skewness of all variables was between −3 and 3 and the kurtosis between 10 and −10.According to Kline (2011), this indicates approximately normally distributed variables.Parametric methods were applied in this study as they are generally robust to scale assumption violations, especially when likert scales have seven or more categories (Norman, 2010;Dolan, 1994;Robitzsch, 2020).The majority of our ordinal variables had seven or more categories due to aggregation.The sole exception, "reading forum" with three categories, showed negligible differences between Pearson and Spearman correlations (maximum difference: 0.0165; average difference: < 0.0016).Hence, parametric methods were used.

Solving additional formative assessments, self-regulated learning behavior
and learning success With a t-test I investigated whether the solvers performed better in the endof-semester knowledge test than the non-solvers.As shown in Table 1, solvers reached a higher performance in the knowledge test than non-solvers (t(305.06)= −2.92,p < 0.01, d = 0.33), confirming the first hypothesis.
Consistent with the hypothesis, the findings indicate that engagement with additional formative assessments significantly influences self-regulated learning behavior (see Table 1).Specifically, it was observed that those who solved these assessments demonstrated enhanced performance, invested more time in the completion of exercises, and posed fewer questions about those exercises.
Albeit not statistically significant, in tendency, solvers demonstrated a lower level of subjective understanding and less dedication to reading the mandatory literature than the non-solvers.
When analyzing the "total completions", which is the total number of completed exercises from both the initial and the additional formative assessments (where multiple attempts were possible), solvers completed significantly more exercises.This was expected since they had access to both initial and additional assessments.
However, when considering the "initial completions" (which both groups could attempt multiple times), solvers completed fewer exercises than nonsolvers.This suggests that while having access to additional assessments led to more completions overall, it resulted in fewer completions of the initial assessments that were available to everyone.
DIFFERENTIAL EFFECTS OF ADDITIONAL FORMATIVE ASSESSMENTS ...

Students' characteristics, solving additional formative assessments and learning success
To identify meaningful clusters of self-regulated learning behavior, understanding the interrelations of the learning variables detailed in the Method section was crucial.An exploratory factor analysis was conducted to reduce the variables to a few interpretable factors.By decreasing the number of variables in the model, the cluster analysis can more effectively detect clusters within the dataset (Dalmaijer et al., 2021).The z-standardized variables were inputted into the fa.parallel function from the psych package (Revelle, 2022), resulting in a three-factor solution that best described the correlations between the thirteen manifest variables.The factor solution, following an oblimin rotation, is presented in Table 2. To comprehend the three factors, they will be described based on the measures exhibiting the highest loadings (Table 2).The first factor is associated with performance, as evidenced by substantial loadings of performance in formative assessments, subjective understanding, and prior knowledge.The second factor is connected to time investment, which includes time spent on exercise pages and feedback pages.This factor is related to the time investment in content learning, a critical self-regulation skill identified by Kim et al. (2018) to effort regulation (Baker et al., 2020) or organization (Mega et al., 2014).
The third factor pertains to additional investment, as demonstrated by engagement in reading the literature, posing questions, reading the forum, dedicating learning hours during the semester break and timely completion of exercises.Accordingly, additional investment is a combination of help seeking (Kim et al., 2018), time management (Kim et al., 2018;Li et al., 2018) and investment in content learning (Kim et al., 2018).
To examine whether the effect of additional formative assessments depends on students' self-regulated learning behavior, two approaches were employed.First, each of the three extracted factors was divided into four equal groups (quartiles) and the dependency of the effect of solving additional formative assessments on that split variable was investigated for each factor.Second, a cluster analysis was conducted across all three factors, and the dependency of solving additional formative assessments on cluster membership was examined.
In the first approach, a two-way ANOVA was conducted for each factor group (quartiles), with factor group membership and solving additional formative assessments as the between-subject factors and performance in the end-of-semester knowledge test as the dependent variable.A significant interaction would indicate that the effect of additional formative assessments on performance in the end-of-semester knowledge test depends on student characteristics.The interaction term was not significant for the performance factor (F(3, 300) = 0.41, p = 0.75, η 2 = 0.003) or the additional investment factor (F(3, 300) = 0.23, p = 0.87, η 2 = 0.002); however, it was significant for the time investment factor (F(3, 300) = 4.14, p < 0.01, η 2 = 0.04).

NATALIE BORTER
Figure 1 displays the interaction between the time investment group and solving additional formative assessments.In the lowest time investment quartile, solving additional formative assessments was associated with slightly lower performance in the end-of-semester knowledge test, whereas in all other quartiles, it was associated with higher performance.The performance difference in the end-of-semester knowledge test between non-solvers and solvers was −0.96 ( p = 0.13) for the first quartile, 0.63 ( p = 0.32) for the second quartile, 1.99 ( p < 0.01) for the third quartile, and 1.42 ( p < 0.05) for the fourth quartile.However, when applying a Bonferroni-corrected alpha level of 0.0125, the difference in the fourth quartile was no longer statistically significant.Overall, solving additional formative assessments appeared to be more beneficial for students who invested more time in solving the exercises.

Figure 1 Interaction between the completion of additional formative assessments and students' quartile ranking in time investment, in relation to performance on the end-of-semester knowledge test
DIFFERENTIAL EFFECTS OF ADDITIONAL FORMATIVE ASSESSMENTS ...
It is important to note that solvers and non-solvers were not equally distributed across the four time investment groups (χ²(3) = 10.44,p < 0.05).Fewer solvers (n = 29) than non-solvers (n = 48) were in the first quartile, and more solvers (n = 49) than non-solvers (n = 28) were in the second quartile.In the other two groups, solvers and non-solvers were similarly distributed (either n = 38 or n = 39).
In the second approach, which is based on all three factors (performance, time investment, additional investment), a k-means cluster analysis was conducted to identify distinct student types.Initially, the number of clusters was determined using the NbClust function (Charrad et al., 2014), followed by the execution of the k-means cluster analysis using the stats package (R Core Team, 2021).The NbClust function helps determine the number of clusters in a dataset by evaluating 22 distinct fit indicators.Among these fit indicators, eight suggested a two-cluster solution and six recommended a three-cluster solution.Higher numbers of clusters were proposed by fewer than three fit indicators each.Consequently, both the two and three-cluster solutions were further examined.To circumvent local minima, 1,000 random starting positions were utilized.
For both the two and three-cluster solutions, an investigation was conducted to determine if the positive effect of additional formative assessments depended on cluster membership, or in other words, whether a significant interaction existed between cluster membership and the positive effect of solving additional formative assessments on performance on the end-of-semester knowledge test.To this end, a two-way ANOVA was performed for both the two and three-cluster solutions, with cluster membership and solving additional formative assessments as between-subject factors, and performance in the end-of-semester knowledge test as the dependent variable.The interaction was not significant for the two-cluster solution (F(1, 304) = 1.09, p = 0.29, η 2 = 0.003) but it was for the three-cluster solution (F(2, 302) = 3.13, p < 0.05, η 2 = 0.02, see Figure 2).Therefore, the three-cluster solution was further investigated.In addition to the significant interaction, there was a main effect of cluster membership (F(2, 302) = 20.72,p < 0.001, η2 = 0.11) and a significant main effect of completing additional formative assessments (F(1, 302) = 9.77, p < 0.01, η2 = 0.03).
In the three-cluster solution (see Table 3), one cluster (n = 66) exhibited low performance, low time investment, and relatively low additional investment.

Figure 2 Interaction between the completion of additional formative assessments and the students' cluster membership in relation to their performance on the end-of-semester knowledge test
For subsequent analyses, this cluster will be denoted as the "low effort cluster."Another cluster (n = 120) was characterized by high performance, moderate time investment, and low additional investment.Accordingly, this cluster achieved high performance with comparatively low investment and is therefore referred to as the "efficient cluster."The last cluster (n = 122) exhibited aboveaverage performance and considerable effort in both time investment and additional investment.This cluster will be referred to as the "high effort cluster" in subsequent analyses and discussions.
DIFFERENTIAL EFFECTS OF ADDITIONAL FORMATIVE ASSESSMENTS ... The performance difference in the end-of-semester knowledge test between non-solvers and solvers was −1.57( p < 0.01) for the high effort cluster, −0.62 ( p = 0.21) for the efficient cluster, and 0.41 ( p = 0.53) for the low effort cluster.This pattern of results persists when alpha is adjusted for multiple testing.Accordingly, solving additional formative assessments appeared to be most beneficial for high effort students.
Again, solvers and non-solvers were not equally distributed across the three clusters (χ²(2) = 9.26, p < 0.01).A smaller proportion of solvers (n = 52) relative to non-solvers (n = 70) was observed in the high effort cluster, while a greater proportion of solvers (n = 73) compared to non-solvers (n = 47) was present in the efficient cluster.In contrast, the low effort cluster exhibited a more evenly distributed composition of solvers (n = 29) and non-solvers (n = 37).
Taken together, the effect of additional formative assessments depended on students' characteristics in both approaches.Both higher time investment alone and belonging to the high effort cluster resulted in a larger positive effect of additional formative assessment on the end-of-semester knowledge test.As shown in Table 4, the low effort cluster consisted mostly of students of the low time investment group (Q1), the efficient cluster consisted mostly of students with medium time investment (Q2, Q3) and the high effort cluster of high time investment students (Q3, Q4).Note.Cells marked light gray contained at least thirty students.In parentheses (the number of non-solvers, the number of solvers).

Discussion
This study aimed to investigate the impact of additional formative assessments on students' self-regulated learning behavior and learning success, while also considering the varying impacts on different student groups.The completion of additional formative assessments covering identical content led to improved performance on the end-of-semester knowledge test.Moreover, these assessments had a differential impact on self-regulated learning behaviors across various variables.Notably, solvers exhibited enhanced performance in the formative assessments, yet reported lower levels of subjective comprehension (albeit not significantly so).They dedicated more time to completing exercises within the assessments, asked fewer questions about the exercises, and tended to engage less with the compulsory literature (albeit not significantly so).Furthermore, the inf luence of additional formative assessments on learning success depended on students' self-regulated learning behaviors.Both increased time investment individually and membership in the high-effort cluster contributed to a more substantial positive effect of additional formative assessments on the end-of-semester knowledge test outcomes.

Influence of additional formative assessments on self-regulated learning behavior and learning success
The positive effect of additional formative assessments on learning success is consistent with the findings of Yang et al. (2021), who conducted a metaanalytic overview.The current study extends the existing literature by demonstrating this beneficial effect in an applied setting, with complex exercises and even when the learning phase and assessment phase exercises were not identical but covered the same content.
DIFFERENTIAL EFFECTS OF ADDITIONAL FORMATIVE ASSESSMENTS ... This study found that solvers exhibited differences from non-solvers in certain aspects of self-regulated learning behavior.This was based on the initial assessments that both groups completed.Given that each formative assessment introduced new material, solvers' enhanced performance can only be attributed to an indirect testing effect, since apart from the additional formative assessments, all other conditions were identical for both groups.The indirect testing effect occurs when testing not only enhances performance on the tested material but also on new, related material (Fernandez & Jamet, 2017;Szpunar et al., 2008;Wissman et al., 2011).Consequently, the additional formative assessments impacted the solvers' self-regulated learning behavior with this new content.
In this context, the differential impact of additional formative assessments on self-regulated learning behavior offers interesting insights.Even though solvers performed better than non-solvers in formative assessments, they reported lower estimates of their understanding (albeit not significantly so) compared to non-solvers.This pattern of results indicates that solvers exhibit less overestimation of their own performance, a phenomenon known as the "illusion of knowing" (Avhustiuk et al., 2018), where learners tend to overestimate their understanding relative to their actual performance.
This pattern of results, combined with the solvers' higher time investment in solving formative assessments, indicates that the provision of additional formative assessments promotes better monitoring of one's knowledge, which is consistent with the observation made by Fernandez and Jamet (2017), and more time allocation for studying, which is in line with Soderstrom and Bjork, (2014).This can be attributed to the fact that the provision of additional formative assessments enables students to receive supplementary feedback on their learning status (Clariana & Park, 2021;Perry & Winne, 2006).This feedback assisted them in monitoring which behaviors in the initial assessments were most beneficial for their learning success in the additional formative assessments, prompting them to adjust their strategies and behaviors accordingly.
However, it is crucial to acknowledge the potential less beneficial effects of the additional learning material.The increased cognitive demands associated with the additional formative assessments, in terms of both the material's complexity and the volume of information, could lead to cognitive overload (Kossen & Ooi, 2021), and the extra exercises probably reduced the time available for students to fully engage with the material, causing them to adopt less elaborate learning strategies (e.g., less additional investment).

Student characteristics and the benefit of additional formative assessments
The impact of additional formative assessments on learning success depended on students' self-regulated learning behavior.It was primarily the students who invested above-average time during formative assessments that benefited NATALIE BORTER from the additional exercises.Cluster analysis revealed that high-effort students (those with above-average time investment and above-average preparation/post-processing) gained the most from the extra exercises.
This outcome aligns with previous research by Greving et al. (2020), which demonstrated that the beneficial effect of solving exercises was most pronounced when retrieving information from memory was difficult but successful.In the high effort cluster, the retrieval of information from memory was generally successful, as overall performance in the investigated formative assessments was high.Furthermore, the retrieval of information from memory was difficult, as indicated by the above-average time investment (Dodonov & Dodonova, 2012;Dunst et al., 2014;Goldhammer, 2015) and the above-average additional effort (e.g., asking numerous questions in the forum).
The observed results align with the retrieval elaboration hypothesis (Carpenter et al., 2009).The high effort cluster demonstrated high time investment, additional investment, and above-average performance in formative assessments.Increased investment is typically linked with enhanced elaboration (Goldhammer et al., 2021).Likely due to their substantial investment, further elaboration or learning occurred during the initial formative assessments.The retrieval of this newly learned or elaborated content through additional formative assessments led to a more pronounced testing effect.
The high effort of this cluster may be correlated with high expectations of success, which is associated with a stronger positive impact of testing (Heitmann et al., 2022).Additionally, their regular learning behavior might also contribute to a more pronounced testing effect (Adesope et al., 2017;Karpicke & Bauernschmidt, 2011).
On the other hand, students in the low effort and efficient clusters did not show significant positive effects from additional formative assessments on their learning success.Low performers probably do not utilize the extra assessments effectively, while efficient learners do not require them, having already comprehended the material (Bjork et al., 2013).
For the low-effort cluster, this lack of effect might be attributed to the difficulty of the assessments, low motivation, or low elaboration of learning content (Carpenter et al., 2009;Heitmann et al., 2022;Minear et al., 2018).Exercises were probably too difficult for those students and retrieval of information was often unsuccessful, as indicated by the low performance in the formative assessments (Minear et al., 2018).According to the Yerkes-Dodson law (Yerkes & Dodson, 1908), when exercises become too difficult, motivation, response latencies and performance decrease (Borter et al., 2016;Dunst et al., 2014;Goldhammer, 2015).Their lack of prior knowledge may have posed challenges in integrating and elaborating on new but related content (Cogliano et al., 2019;Francis et al., 2020).In addition, especially for DIFFERENTIAL EFFECTS OF ADDITIONAL FORMATIVE ASSESSMENTS ... this group, the increased cognitive demands associated with the additional formative assessments, in terms of both the material's complexity and the volume of information, might have led to cognitive overload (Kossen & Ooi, 2021) or to hasty and unelaborated learning behavior due to the higher investment requirements imposed by the additional formative assessments.
In the efficient cluster, the absence of a significant positive effect could be due to either high ability and abstraction or the assessments being too easy for these students (Goldhammer, 2015) and accordingly no elaboration was needed.Even though retrieval from memory was quite successful in this cluster as indicated by the high performance in the formative assessments, it was not difficult (average time investment, very low additional investment e.g., asking questions).The exercises were probably not difficult enough for those students and after the first formative assessments no additional exercises were needed, as the students already grasped the content.Beside the possibility that formative assessments were too easy for students in this cluster, the high performance associated with rather low investment might be a sign of high ability or abstraction (Goldhammer, 2015).In this case, additional exercises are probably not necessary, as students understand the content on an abstract level and do not need different exercises from different contexts covering the same content.When low exercise difficulty is the reason for the missing effect of testing in this cluster, more difficult exercises would lead to a testing effect, whereas when high abstraction is the reason, more difficult exercises would probably not lead to a stronger testing effect.To differentiate between the two possibilities, further research is needed.
In addition, it was shown that students with poorer learning strategies show a larger testing effect than students with good strategies (Minear et al., 2018, Robey, 2019).The efficient cluster might have particularly good learning strategies as indicated by the high performance reached with rather low investment.

Solvers and non-solvers not equally distributed across time investment groups or clusters
The impact of solving additional formative assessments on self-regulated learning behavior led to an uneven distribution of students across time investment groups or clusters.Fewer solvers than non-solvers were found in the very low time investment group (Q1), while more solvers than non-solvers were present in the second time investment group (Q2).Furthermore, solvers more frequently belonged to the efficient cluster and less frequently to the high effort cluster.
On one hand, the additional formative assessments might have resulted in high effort students sacrificing additional investment (e.g., asking questions, reading literature) to invest more time in solving formative assessments NATALIE BORTER (indirect testing effect, better monitoring, prioritizing different learning materials).Due to the positive effect of additional formative assessments, this resulted in higher performance.Higher performance in combination with lower additional investment is the behavioral pattern associated with the efficient cluster and led to a shift from the high effort to the efficient cluster (e.g., in Table 4, more solvers in the efficient cluster and higher time investment groups).
On the other hand, solving additional formative assessments prompted low investment students to invest more time in solving exercises and to achieve higher performance in the formative assessments (indirect testing effect).This combination of medium time investment, higher performance, and low additional investment is associated with the efficient cluster (e.g., in Table 4, there are more solvers in high time investment groups of the efficient cluster but fewer in the low effort low time investment group).
In conclusion, due to an indirect testing effect, solvers demonstrated improved monitoring associated with more efficient learning, and as a result, many solvers were part of the efficient cluster, which is linked to high performance on the end-of-semester knowledge test.Additionally, the availability of numerous formative assessments for solvers may have forced them to make decisions on where to allocate their time (Yang et al, 2017).As they spent more time on the exercises and solved a greater number of them, they reduced other activities (additional investment, fewer repetitions of the first formative assessments, but more repetitions when including additional formative assessments).

Practical relevance of the findings
As a lot of time is invested in solving additional formative assessments and not all students profit from them, it seems unethical to suggest additional assessments to all students.In the future, approaches from adaptive learning analytics (Mavroudi et al., 2018) should be implemented into the course.As indicated by the results of this study, for students with above average time investment, additional formative assessments should be suggested as adding formative assessments probably improves their learning success.For students with below-average time investment, it is important to know whether below-average time investment is associated with low or high performance in the formative assessments.If it is associated with high performance, there is no need to suggest the additional formative assessments as they probably would not lead to greater learning success.However, more difficult exercises might lead to even greater learning success in this cluster, but future research is needed to test those predictions.When low time investment is linked to low performance in formative assessments, interventions to increase content understanding, content elaboration, improve learning DIFFERENTIAL EFFECTS OF ADDITIONAL FORMATIVE ASSESSMENTS ... strategies, enhance monitoring, or adjust time allocation should be suggested.Only after successfully making these improvements should additional assessments be recommended.
When deciding whether to create additional formative assessments for a course, it is essential to consider that although many students benefited from the extra assessments and nearly all students solved them when available, the effect sizes were relatively small, and providing additional formative assessments influenced students' behavior in both beneficial and less beneficial ways.The present study highlights the importance of considering individual differences in students' self-regulated learning behavior when implementing additional formative assessments.

Measurement considerations
To investigate learning as comprehensively as possible, a variety of variables were measured, some of which were highly related.Therefore, variables of the same type (e.g., response latencies for exercises) were reduced to a single score.Observations of the same type can be interpreted as a sampling of observations, and combining them leads to a more reliable measure (Goldhammer et al., 2021).For example, when combining 100 response latencies, the influence of measurement error (e.g., taking a coffee break while solving an exercise, leading to longer response latency) is reduced.Moreover, high correlations between similar measures, as indicated by a strong first principal component, suggest that the different variables measured the same construct.The summarized measures of the same type were combined in a factor analysis.First, this resulted in well-interpretable factors (performance, time investment, additional investment), and second, fewer but more reliable measures lead to a better performance in cluster analysis (Dalmaijer et al., 2021).Based on these three factors, three clusters were built.The clusters found were similar to previous studies, in which clusters based on effort and/ or processing depth ( Jovanović et al., 2017;Kovanovic et al., 2015;Li et al, 2020;Ning & Downing, 2015;Parpala et al., 2021;Sun & Xie, 2020;van Alten et al., 2021;Vanslambrouck et al, 2019;Zheng et al., 2020) based on regularity of learning (Kim et al., 2018;Parpala, 2021), on prior knowledge (Khayi & Rus, 2019), on the pace of learning (Munje et al., 2020), and on performance and learning behavior were found (Waspada et al., 2019).Accordingly, the three clusters of this study fit well into previous research.

Future work
Future research could investigate how cluster membership and learning behavior evolves throughout the semester and whether adaptive hints or instructions can help students find the learning behavior or strategy that maximizes their learning success.The consistency of these clusters across NATALIE BORTER various courses needs to be investigated.Furthermore, the psychological traits associated with cluster membership should be understood.It has been suggested by a recent study (Heitmann et al., 2022) that quizzing might not be beneficial for learners exhibiting a low hope of success, an attribute that might be prevalent in some of the clusters identified.
Additionally, the behavior data of the extra formative assessments should be examined, and exercise difficulty should be considered.Future research could benefit from a deeper exploration of the potential impact of assessment length on learner engagement, to discern if longer formative assessments might introduce variability in self-regulated learning.Furthermore, integrating various theories of self-regulation into our understanding of self-regulated learning behavior warrants further investigation.In addition, determining whether the positive effect of additional formative assessments can be attributed to an indirect testing effect, a direct testing effect, or a combination of both would be of significant interest in future research.

Limitations
The study's limitations primarily stem from its quasi-experimental approach in a real-world setting.Consequently, it is challenging to determine the generalizability of the findings to other courses.Furthermore, not all students in the course participated or met the inclusion criteria, which may have affected the results.Additionally, principal component analysis, exploratory factor analysis, and cluster analysis are exploratory instruments bearing the risk of false discoveries (Moosbrugger & Kelava, 2012).As a result, it is necessary to confirm or disprove these exploratory and course-specific findings in future research.

Conclusion
In conclusion, additional formative assessments led to an overall better performance in the end-of-semester knowledge test.However, this effect depended on students' characteristics.Above-average time investment was associated with a more beneficial effect of solving additional formative assessments.As indicated by the results of the cluster analysis, solvers characterized by above-average time investment and additional investment (high effort cluster) benefited from additional formative assessments, while below-average time investment was associated either with low investment/ understanding (low effort cluster) or high understanding with relatively low investment (efficient cluster).In both these clusters, no positive effect of additional formative assessments was identified.Furthermore, engaging in additional formative assessments led to changes in self-regulated learning behavior, both positive and negative, resulting in a higher proportion of DIFFERENTIAL EFFECTS OF ADDITIONAL FORMATIVE ASSESSMENTS ... solvers in the efficient cluster, which is associated with high performance on the end-of-semester knowledge test.Taken together, solving additional formative assessments is beneficial for some but not all students and is associated with both beneficial and less beneficial changes in self-regulated learning behavior.

Table 1
Descriptive statistics for entire sample, non-solvers and solvers as well as correlations with end-ofsemester knowledge test (r).
were not significant ( p > 0.10).For subjective understanding and the two-time investment measures, scores on the first principal component are reported.

Table 2
Standardized loadings of the measures on the three factors extracted by exploratory factor analysis with oblimin rotation DIFFERENTIAL EFFECTS OF ADDITIONAL FORMATIVE ASSESSMENTS ...

Table 3
Characterization of the three clusters identified as well as size of the entire sample (N) the sample of solvers, and the sample of non-solversNote.N = sample size of the entire sample and in parentheses sample size of non-solvers and solvers.

Table 4
Number of students in the three clusters depending on solving additional formative assessments and time investment group