Machine learning-based competence profiles in physics didactics knowledge

The professional knowledge of teachers is considered to be a prerequisite for good teaching within the framework of common impact models of educational processes (e.g. Terhart 2012). Classic modelling of professional knowledge (Shulman 1986; Baumert & Kunter 2006; adapted for physics according to Riese 2009) typically comprises the three central domains of subject knowledge (FW), general pedagogical knowledge (PW) and subject didactic knowledge (FDW). FDW is roughly described as the knowledge required to prepare FDW in a way that is appropriate for the target group.

In Germany, FDW is usually modelled in three dimensions (Tepner et al. 2012; Kröger 2019; Gramzow 2015). Studies in the context of quantitative scores show significant increases in FDW during studies and preparatory service (e.g. Riese & Reinhold 2012; Kirschner 2013; Kröger 2019) as well as correlations between the domains of professional knowledge (e.g. Sorge et al. 2019; Riese 2009) and performance in prototypical requirement situations (e.g. Schröder et al. 2020; Kulgemeyer et al. 2020).

Larger empirical research studies on the FDW in German-speaking countries primarily focus on global quantitative statements at the construct level. However, in order to enable rich feedback and to clarify differences in performance in terms of content, empirically based content-criteria-oriented descriptions of FDW characteristics are also necessary. Initial analyses in the field of FDW use the scale-anchoring procedure (e.g. Mullis et al. 2015) to describe levels with the help of item response models (IRT models) (Schiering et al. 2023; Zeller et al. 2022).

Objective of the project

Three steps are necessary to enable rich feedback on the FDW and make it usable for teaching practice:

- It must be possible to validly categorise the characteristics of learners' FDW in a model - one that is as empirically based as possible.

- The "probable" development and change of the FDW (e.g. as a reaction to certain courses) should be known.

- Suitable reactions must be developed and then selected.

This project primarily addresses the first and second desideratum and comprises the following work packages:

A cross-project analysis of the FDW was carried out (as of May 2023) using IRT models on the basis of, following the example of and in cooperation with the group of the physics division of the KiL / KeiLa projects (IPN Kiel; Schiering et al. 2019, 2023). Cluster analyses based on machine learning (ML) methods are carried out to uncover rich non-hierarchical content structures. To make the results usable in practice, the evaluation of a validated test instrument for the FDW (Gramzow 2015) will be automated. The second work package will be supplemented by change analyses, e.g. by the information on study progress already contained in the data set. The data set from the ProfiLe-P+ project (Vogelsang et al. 2019) will be used.

Method and initial results

In the analyses of the first work package, the application of the scale-anchoring method revealed systematic similarities between the level descriptions obtained in the form of operators that can be interpreted in terms of learning psychology. Across the project, it can be seen that FDW at low levels is primarily limited to reproductive aspects, while creative and evaluative elements are added at higher levels. These results are consistent with cognitive psychological findings on the knowledge acquisition process (e.g. Gagné & White 1978) and corresponding taxonomies (e.g. Anderson & Krathwohl 2001). However, the hierarchical approach of the scale-anchoring method used here does not provide a distinction between more creative and more evaluative characteristics.

Based on these results, cluster analyses of the test data are carried out, which are rarely used in science didactics (Zhai et al. 2020a, 2020b) but are suitable for clarifying possibly existing non-hierarchical structures. In the sense of a computational grounded theory (Nelson 2020), computer-based analyses are linked with human expert knowledge (initially in the form of task analyses) in order to describe (not necessarily hierarchical) "competence profiles" of the FDW. Initial results on possible competence profiles were already presented at the GDCP Annual Conference 2022 in the form of a poster (Zeller & Riese in press). In a further step, these cluster analyses of the score data are to be underpinned by analyses of the authentic language productions of the test subjects. Topic models (Blei et al. 2003; Blei 2012) or structural topic models (Roberts et al. 2019) or deep topic models (Grootendorst 2022) will be used for this purpose. Such approaches offer the possibility of describing the typical language use of respondents of certain competence profiles more precisely and thus further sharpening the description of the profiles.

Automating the evaluation of the validated test instrument used is a non-trivial task, as the test instrument largely consists of tasks with an open response format. There are already approaches to automated scoring at task level in the field of educational research (e.g. Andersen & Zehner 2021), but automated assignment to the ability profiles from the second work package can also be considered. This approach in particular can be supported by pre-trained neural networks (e.g. "BERT" according to Devlin et al. 2019, applied in science didactics e.g. by Wulff et al. 2022) in the sense of "transfer learning" due to the larger amount of text available per test person for this purpose. A typical benchmark for the quality of the developed system is the human-machine correspondence, e.g. in the form of Cohen's κ (e.g. Zhai et al. 2021).

Conclusion and outlook

The analyses described in the first work package suggest that the following approaches are also promising candidates for describing non-hierarchical structures of the FDW. However, since cluster analyses have rarely been used in science didactics so far, the development / composition of suitable methods is also a challenge here.


Andersen, N., & Zehner, F. (2021). shinyReCoR: A Shiny Application for Automatically Coding Text Responses Using R. Psych, 3(3), 422-446 .

Anderson, L. W., & Krathwohl, D. R. (Eds.). (2001). A taxonomy for learning, teaching, and assessing A revision of Bloom's taxonomy of educational objectives (4th ed.). New York: Longman.

Baumert, J., & Kunter, M. (2006). Keyword: Professional competence of teachers. Journal of Education Studies, 9(4), 469-520. 006-0165-2

Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent Dirichlet Allocation. Journal of Machine Learning Research, 3, 993-1022.

Blei, D. M. (2012). Probabilistic Topic Models. Communications of the ACM, 55(4), 77-84.

Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2019). BERT: Pre-Training of Deep Bidirectional Transformers for Language Understanding. arXiv:1810.04805

Gagné, R. M., & White, R. T. (1978). Memory Structures and Learning Outcomes. Review of Educational Research, 48(2), 187-222.

Gramzow, Y. (2015). Didactic knowledge of student teachers in the subject area of physics: Modelling and test construction. In H. Niedderer, H. Fischler & E. Sumfleth (Eds.), Studies in Physics and Chemistry Learning (Vol. 181). Berlin: Logos Verlag.

Grootendorst, M. (2022). BERTopic: Neural topic modelling with a class-based TF-IDF procedure. arXiv:2203.05794

Kirschner, S. (2013). Modelling and analysing the professional knowledge of physics teachers. In H. Niedderer, H. Fischler & E. Sumfleth (Eds.), Studies in Physics and Chemistry Learning (Vol. 161). Berlin: Logos Verlag.

Kröger, J. (2019). Structure and development of the professional knowledge of prospective physics teachers [Diss., Christian-Albrechts Universität Kiel].

Kulgemeyer, C., Borowski, A., Buschhüter, D., Enkrott, P., Kempin, M., Reinhold, P., Riese, J., Schecker, H., Schröder, J., & Vogelsang, C. (2020). Professional knowledge affects action-related skills: The development of preservice physics teachers' explaining skills during a field experience. Journal of Research in Science Teaching, 52(10), 1554-1582.

Mullis, I. V. S., Cotter, K. E., Centurino, V. A. S., Fishbein, B. G., & Liu, J. (2015). Using scale anchoring to interpret the TIMSS 2015 achievement scales. In I. V. S. Mullis & M. Hooper (Eds.), Methods and Procedures in TIMSS (pp. 14.1-14.47).

Nelson, L. K. (2020). Computational grounded theory: A methodological framework. Sociological Methods & Research, 49(1), 3-42.

Riese, J. (2009). Professional knowledge and professional competence of (prospective) physics teachers. In H. Niedderer, H. Fischler & E. Sumfleth (Eds.), Studien zum Physik- und Chemielernen (Vol. 97). Berlin: Logos Verlag.

Riese, J., & Reinhold, P. (2012). The professional competence of prospective physics teachers
in different forms of education. Journal of Education Studies, 15, 111-143. doi. org/10.1007/s11618-012-0259-y

Roberts, M. E., Stewart, B. M., & Tingley, D. (2019). stm: An R Package for Structural Topic Models. Journal of Statistical Software, 91(2), 1-40.

Schiering, D., Sorge, S., Petersen, S., & Neumann, K. (2019). Constructing a qualitative level model in the subject didactic knowledge of prospective physics teachers. Journal for Didactics of Natural Sciences, 25, 211-229.

Schiering, D., Sorge, S., Keller, M. M., & Neumann, K. (2023). A proficiency model for pre-service physics teachers' pedagogical content knowledge (PCK)-What constitutes high-level PCK? Journal of Research in Science Teaching, 60(1), 136-163.

Schröder, J., Riese, J., Vogelsang, C., Borowski, A., Buschhüter, D., Enkrott, P., Kempin, M., Kulgemeyer, C., Reinhold, P., & Schecker, H. (2020). Measuring the ability to plan lessons in the subject area of physics using a standardised performance test. Journal of Science Education, 26(1), 103-122.

Shulman, L. S. (1986). Those Who Understand: Knowledge Growth in Teaching. Educational Researcher, 15(2), 4-14.

Sorge, S., Kröger, J., Petersen, S., & Neumann, K. (2019). Structure and development of pre-service physics teachers' professional knowledge. International Journal of Science Education, 41(7), 862-889.

Tepner, O., Borowski, A., Dollny, S., Fischer, H. E., Jüttner, M., Kirschner, S., Leutner, D., Neuhaus, B. J., Sandmann, A., Sumfleth, E., Thillmann, H., & Wirth, J. (2012). Model for the development of test items to assess the professional knowledge of science teachers. Journal for Didactics of Natural Sciences, 18, 7-28.

Terhart, E. (2012). How does teacher training work? Research problems and design issues. Journal of Educational Research, 2(1), 3-21.

Vogelsang, C., Borowski, A., Buschhüter, D., Enkrott, P., Kempin, M., Kulgemeyer, C., Reinhold, P., Riese, J., Schecker, H., & Schröder, J. (2019). Development of professional knowledge and teaching performance in teacher training physics analyses on valid test score interpretation. Journal of Education, 65(4), 473-491.

Wulff, P., Mientus, L., Nowak, A., & Borowski, A. (2022). Utilising a Pretrained Language Model (BERT) to Classify Preservice Physics Teachers' Written Reflections. International Journal of Artificial Intelligence in Education.

Zeller, J., Jordans, M., & Riese, J. (2022). Approaches to determining competence levels in specialised didactic knowledge. In S. Habig (Ed.), Uncertainty as an element of science-related educational processes, Proceedings of the GDCP Annual Conference 2021. Essen: University of Duisburg-Essen.

Zeller, J., & Riese, J. (in press). Data-based ability profiles in physics didactic knowledge. In H. van Vorst (Ed.), Lernen, Lehren und Forschen in einer digital geprägten Welt, Proceedings of the GDCP Annual Conference 2022. Gesellschaft für Didaktik der Chemie und Physik.

Zhai, X., Haudek, K. C., Shi, L., Nehm, R. H., & Urban-Lurain, M. (2020a). From substitution to redefinition: A framework of machine learning-based science assessment. Journal of Research in Science Teaching, 57, 1430-1459.

Zhai, X., Shi, L., & Nehm, R. (2021). A Meta-Analysis of Machine Learning-Based Science Assessments: Factors Impacting Machine-Human Score Agreements. Journal of Science Education and Technology, 30, 361-379.

Zhai, X., Yin, Y., Pellegrino, J. W., Haudek, K. C., & Shi, L. (2020b). Applying machine learning in science assessment: a systematic review. Studies in Science Education, 56(1), 111-151.

Foto von Jannis Zeller

Jannis Zeller

Didaktik der Physik

Paderborn University