Skip to main content

Development and validation of an instrument for measuring civic scientific literacy


Civic scientific literacy (CSL) is a fundamental indicator of social development and national literacy and is very important for personal well-being and national competitiveness. To be in line with the technological advancements and demands of the current era, we focused on “science engagement”, which is now considered an important approach to enhance the CSL of the general public, and developed a 4-dimensional CSL assessment framework based on the crucial elements for citizens to engage in science. The 4 dimensions are scientific knowledge (SK), scientific method (SM), problem-solving (PS), and scientific thought and spirit of science (STSS). With this framework, a CSL measurement instrument including 50 items was developed. Various validity evidence, including evidence with reliability close to or exceeding 0.8 for all four dimensions, supports the use of the instrument to provide a valuation of CSL. Age, location type (urban or rural), educational background, and occupation are all significant factors associated with CSL.


Scientific literacy (SL) is a constantly evolving concept in both theory and practice (National Academies of Sciences, Engineering, and Medicine, 2016). Since its proposal in 1958, SL has been constantly redefined (Council of Canadian Academies, 2014; DeBoer, 2000; Miller, 1983; National Academies of Sciences, Engineering, and Medicine, 2016; Norris & Phillips, 2003; OECD, 2017; Rutherford & Ahlgren, 1991; Shen, 1975). Although SL has received widespread attention and recognition as an important goal in education (especially in science education) (Council of Canadian Academies, 2014; National Research Council, 1996; OECD, 2017; Rutherford & Ahlgren, 1991), there is currently no consensus on a definition of SL (DeBoer, 2000).

To assist in conceptualizing the variety of SL definitions, Roberts proposed two competing visions of SL. Vision I points to the scientific inner field from the perspective of scientists, emphasizing the disciplinary nature of science itself, including its products, processes, and characteristics. Vision II places more emphasis on the role of science in human affairs, including scientific thinking and activities (Roberts, 2010). Two visions represent two types of cultivation objectives. From the perspective of personal and social development needs, what we need is more people with SL, rather than dedicated to cultivating everyone into scientists. There is an undeniable huge gap between the general public and scientists (X. Liu, 2009). Liu believes that bridging the gap between the scientific society and the general public requires every citizen to participate in issues related to science (Liu, 2013). He suggests incorporating “science engagement” (SE) into the conceptual framework of SL, viewing it as vision III that transcends vision I and vision II. SE emphasizes the introduction of social, cultural, political, and environmental issues, aiming to cultivate the critical thinking, scientific communication, and consensus-building abilities of every citizen (Liu, 2013). The three visions are interdependent and mutually reinforcing, but vision III emphasizes active participation and seeking solutions to the urgent problems facing the world today (Liu, 2013). The three visions provide us with references for clarifying different SL definitions and evaluating SL with different goal orientations.

Due to the different conceptual connotations of SL, there are also differences in the measurement elements of SL (Fives et al., 2014; Gormally et al., 2012; Miller, 1998; Naganuma, 2017; OECD, 2017). Among them, the civic scientific literacy (CSL) assessment for Miller among adults is widely concerned and used worldwide. Based on the three kinds of science literacy (practical, civic, and cultural) proposed by Shen (Shen, 1975), from a practical perspective, Miller focuses on CSL and believes that CSL is a knowledge level that can support individuals in reading current scientific news, reports from different formal channels, or watch scientific television programs, instead of a scientific understanding of the field of science or engineering (Miller, 1983). He proposes three dimensions of CSL: scientific terminology and perspectives, scientific principles and methods, and the impact of science and society (Miller, 1983). Based on this, the developed CSL assessment tool (Miller, 1998) is widely used in over 40 countries and organizations worldwide. The SL assessment for school-age adolescents mainly includes two major international projects: Trends in International Mathematics and Science Study (TIMSS) and the Program for International Student Assessment (PISA). TIMSS does not specify the general concept of SL but emphasizes that students should be able to take action on a solid scientific foundation when facing various problems, to meet the requirements of the technological society for people’s abilities and higher levels of learning (Jones et al., 2015). TIMSS is guided by the school’s science curriculum and evaluates science content, science significance, and science practices (Jones et al., 2015). All projects are textbook questions without context. PISA has clarified and continuously iterated the definition and assessment framework of SL (OECD, 2019, 2017, 2023). According to PISA2015, SL refers to the reflective citizen’s capacity to engage in discussions about scientific issues, offer scientific insights, and further interpret it as three competencies: the competency to scientifically explain phenomena scientifically, evaluate and design scientific inquiry, and interpret data and evidence scientifically (OECD, 2017). PISA is oriented towards social needs and develops testing items based on specific contexts. So far, PISA has drawn participation from an increasing number of nations and areas.

From the perspective of three visions, Miller’s CSL assessment, which mainly examines scientific knowledge, and TIMSS, which emphasizes the content of the scientific field, are consistent with vision I (Liu, 2013; Naganuma, 2017), while PISA, which focuses on different scales of contexts, is mainly vision II (Liu, 2013). The evaluation of SL is not centered around SE. In today’s era, with the rapid expansion of information, the quick development of technology, and the continuous prominence of socio-scientific issues such as global warming and genetically modified foods, higher demands have been placed on citizens’ SE. The newly included science identity in the recently released PISA2025 Science Framework (OECD, 2023) emphasizes a special focus on sustainable development and environmental education. It can be foreseen that in the future, there will be widespread attention to SL in the context of sustainability. Therefore, the conceptualization of CSL should be further developed. It is necessary to emphasize SE and sustainability in CSL.

Based on the above considerations, we adopt the definition of “to understand the necessary scientific knowledge, master the basic scientific method, establish scientific thought, advocate for the spirit of science, and have the ability to apply the above to practical problems and participate in public affairs” as defined by the State Council of PRC (The State Council of PRC, 2006). This definition focuses on SE and emphasizes sustainable development in scientific thought. The participation and resolution of issues is an important content and objective of SE. However, we realized that, in the process of SE, the scientific knowledge and methods widely concerned by existing SL assessments are necessary but insufficient, and it is also necessary to understand the nature of science. At the same time, the common good qualities in scientific research, such as curiosity and thirst for knowledge, pursuit of truth and innovation, play an important role in arousing public interest in SE and effective problem-solving. More significantly, more consideration needs to be given to whether the existing approaches to challenging issues involving the integration of several disciplines, including science, society, and the environment, satisfy the criteria of sustainable development. The meaning and efficacy of SE depend on whether it satisfies the demands of sustainable development. In the current complex socio-scientific context, all these elements that play an important role in SE mentioned above should be included in the framework of CSL.

However, only a portion of them (such as scientific knowledge and methods) has been given attention in former CSL assessments (Miller, 1998), while important content such as spirit of science and scientific thought of Scientific Outlook on Development are less present in existing CSL evaluation frameworks. Therefore, it is necessary to develop a CSL assessment instrument for adults based on vision III. This instrument fills in the key gaps left by the current assessments regarding effective SE by citizens. Further research is required in the development and design of instruments given the current emphasis on competency and situational literacy needs.

In research on the evaluation of CSL, researchers have extensively focused on the factors associated with CSL. Past research findings indicate that urbanization, gender, and education have significant effects on CSL (Miller, 1998; Ren et al., 2013). For instance, by examining data from a 2016 survey of CSL in the United States, Miller found that educational attainment and exposure to college science courses were the two best predictors of CSL. He also noted that men, as compared to women, are more likely to have higher literacy levels (Miller, 2016). According to data collected on CSL among Chinese citizens, education, economy, and urbanization can all strongly influence the level of CSL (Tian & Dai, 2018). The data also revealed considerable gender (Zhang et al., 2016) and geographic (Zhang et al., 2013) disparities in CSL. According to Ren et al. (Ren et al., 2013), the difference in education systems and curricula between the United States and China affects their citizens’ CSL. An international comparative study revealed differences in CSL among individuals of different ages and occupations (Wu et al., 2018). Due to the differences in the content and dimensions of CSL assessment in the former research, it is necessary to further study the differences in CSL among different groups when developing measurement instrument based on the reconceptualized CSL.

The goal, then, is to reconceptualize CSL, put SE at the core of CSL assessment, and put forth a framework that covers the necessary and important parts of evaluating effective SE. Based on this framework, we developed and validated a CSL assessment instrument, and discussed the differences of groups with different demography characteristics in CSL based on the assessment results. Based on this research objective, the specific research questions of this study are determined as follows:

  1. (1)

    What evidence is available for the validity of the CSL measurement instrument?

  2. (2)

    Is there any significant difference in the CSL of citizens based on their demographic characteristics?

Conceptualizing of CSL

Conceptualizing the CSL and developing a framework for assessment are the fundamental aspects of instrument development. We have conducted an extensive review of the literature on SL and CSL, paying particular attention to how it is defined, the different elements that have been found, and the available tools for assessing it. The international comparative analysis lays the foundation for defining the connotation and extension of CSL and provides a reference for the construction of an assessment system and the development of an assessment instrument.

The new era requires citizens to participate in various issues and public affairs through SE. Therefore, we follow the State Council’s definition of CSL (The State Council of PRC, 2006), emphasize SE, and point to SL’s vision III. We developed CSL with the idea that CSL is the cornerstone of public health and well-being, the basis of social civilization and progress, and the essential assurance for the creation of a community of shared future and the long-term sustainability of human society as a whole. We combined the definition of CSL in this study, analyzed, compared and grouped the components of SL and CSL in relevant policy documents and extant literature, and identified the elements that are considered crucial to CSL. We believe that the public’s understanding of scientific concepts and procedures, the demonstration of problem-solving abilities, and the spirit of science and thoughts demonstrated based on the aforementioned elements are all considered in the assessment of CSL. After experts’ review and multiple rounds of revisions, a four-dimensional assessment framework for CSL was developed, which incorporates scientific knowledge (SK), scientific method (SM), problem-solving (PS), and scientific thought and spirit of science (STSS) (Table 1).

Table 1 Framework for the assessment of CSL

From the process of citizens’ SE, the foundation and premise parts are SK and SM. SK includes five elements: engineering and technology, life and health, earth and environment, mathematics and information, and matter and energy. SM contains three elements: model representation, induction and deduction, and observation and experimentation. The center part of CSL is PS, which also includes the following three elements: necessary life and production skills, practical and reflective abilities, and SE. STSS is an indispensable part of enabling the general public to think about scientific issues like scientists from a broader perspective of sustainable development, which includes five elements, namely Scientific Outlook on Development, nature of science (NOS), exploration and persistence, rationality and questioning, and empirical and innovation.

Overall, to reach effective SE, in addition to SK and SM, citizens should also possess critical aspects of the spirit of science and a wider range of scientific thought to better solve problems and participate in decision-making.

Scientific knowledge

SK appears in almost all SL visions (Carlson, 2008; Liu, 2013; Mc Eneaney, 2003). The SK discussed here is often of content knowledge, and is the foundation of SE. The first dimension in the framework, SK, is the fundamental dimension of CSL (Norris & Phillips, 2009). SK reflects the understanding of the main facts, concepts, and explanatory theories that form the basis of scientific knowledge, including the basic internal knowledge of mathematics and information, matter and energy, life and health, earth and environment, engineering and technology that the public should possess to meet the needs in life and production.

SK, which is the static component of science, is the end product of scientific investigation. SK consists of facts, concepts, laws, hypotheses, theories, and models. It can also be separated into earth and space science, material science, and life science, depending on the research subject (E. Liu, 2009). The understanding of SK is the most basic and important part of CSL. Students cannot further develop science-related skills, practice science, utilize science to solve issues, or take part in decision-making without the necessary knowledge base and a proper and clear understanding of science.

Scientific method

While the first dimension emphasizes the knowledge of nature and technology, this second dimension refers to the way to SK. Science is a body of knowledge based on rationality and criticism, and its essential feature is the method of constructing knowledge (Mönch & Markic, 2022). SM is a series of logical investigation processes adopted by subjects engaged in scientific activities to raise and answer specific questions about nature (Mönch & Markic, 2022). Science can be recognized (Galloway, 1992) by the questions it asks and the way it answers them in an attempt to produce a knowledge system (Mönch & Markic, 2022) that is not influenced by individual beliefs, perceptions, values, attitudes, and emotions. To study things in nature and find patterns in nature or life phenomena, scientists need to explain or discover them through observation, experimentation, and analytical reasoning. The process by which scientists acquire knowledge, that is, the scientific process, is a set of methods to study and solve natural problems and is a dimension of scientific dynamics. For the layman, understanding how science works may be more useful than understanding its results (Galloway, 1992).

The basic methods and skills required to engage in scientific practice, such as the capacity to engage in inquiry, demonstration, scientific research, operation, etc., are included in the framework for SM. Understanding the methodologies and processes used in scientific research is a major component of SM assessment. The public should be able to apply scientific methods such as observation, experimentation, induction, deduction, modeling, and so on to assess and solve scientific problems. They should also be able to understand the relationships between these approaches.


One important area of developing CSL is the ability to integrate science into daily life, particularly the application of science to address problems in daily life (DeBoer, 2000). The ability to develop social science decision-making and scientific problem-solving skills is more important than an understanding of basic content knowledge (Holbrook & Rannikmae, 2009). Problem-solving points to the final application elements of the improvement of CSL. Citizens can use the contents above to solve problems encountered in life and take part in social decision-making when they possess the necessary SK, the corresponding SM, and the ability to connect science with various disciplines and society. For example, they can use the above contents to select the best self-rescue method in the event of emergencies like earthquakes. They can also make decisions on socio-scientific issues considered to be an essential part of scientifically literate citizens (Bell & Lederman, 2003; Kolstø, 2001; Zeidler et al., 2002). For example, they can engage in debates and dialogues regarding some societal issues, such as global warming, and provide their persuasive views based on science.

Therefore, in this framework, the assessment of PS focuses primarily on the ability to analyze and resolve real-world problems as well as critical skills in life and production. In daily life and production, citizens are expected to master necessary skills related to health, first aid, emergency avoidance, travel, and safety production, and be able to use common household electrical appliances safely. When analyzing and solving problems, scientifically literate individuals can utilize relevant SK and SM, constantly reflect and adjust their thinking, collect and analyze evidence, make judgments based on the evidence, and propose scientific and reasonable solutions. When required, a scientifically literate citizen should be able to work with others to address difficulties, communicate clearly, and make an effort to agree.

Scientific thought and spirit of science

STSS is a crucial part of citizen SE, although it is lacking in existing assessments. The dimensions of STSS mainly include an understanding of NOS, Scientific Outlook on Development, and spirit of science.

Science education researchers have long argued that a key component of CSL is NOS. NOS is the inherent values and assumptions of science (Lederman, 1992). Understanding NOS entails being aware of what it is, what qualities it possesses, and how to tell science from pseudoscience plays an important role in the development of CSL. When discussing CSL, it is important to address the creativity and potential consensus on the nature of science derived from scientific history and philosophy (Chakravartty, 2022), which can help citizens develop more favorable attitudes toward science. NOS here refers to:

  1. (1)

    Recognizing that science is a dynamic field that is continually evolving; imagination and creativity can have an impact on how science develops.

  2. (2)

    Recognizing how science, technology, society, and the environment are interconnected and influence one another.

Scientific Outlook on Development refers to using the concepts of ecological civilization and sustainable development to guide personal decision-making and emphasizes the thought of sustainability in a complex global environment (The State Council of PRC, 2008). It is an important prerequisite for citizens’ SE.

Spirit of science is a group of clustering features from human values, personality traits, cognition preference, and habitual behavior that scientists demonstrate in their research efforts (Zhou et al., 2018). Its core attribute is the pursuit of truth. Specifically, possessing spirit of science means that an individual possesses the following excellent scientific research qualities (Educational Policy Commission, 1970; Zhou et al., 2018):

  1. (1)

    Exploration and persistence- having a curiosity and thirst for knowledge about scientific phenomena or things, and being able to correctly face difficulties and persevere in the exploration process.

  2. (2)

    Rationality and questioning- being able to analyze and solve problems based on logic, not superstitious about authority, respecting facts, daring to question one’s own and others’ viewpoints, and being willing to accept questioning from others.

  3. (3)

    Empirical and innovation- in the process of analyzing and solving problems, one can obtain evidence based on observations or experiments to test viewpoints and have the awareness of exploring new ideas and methods, striving for excellence.

The emphasis of STSS encourages an inquiry-based mindset, nurtures critical thinking, acknowledges and appreciates the positive impact of scientific research on social advancement, and emphasizes the fact that scientific findings can have both positive and negative effects.

Based on the above conceptualization of CSL, the next step is to develop, and validate the CSL assessment instrument, and explore the differences between people's CSL of various demographic parameters.


Item pool development

Context has played an important role in the assessment of SL in recent years (Naganuma, 2017; OECD, 2017, 2019). Developing tools for assessing abilities and literacy based on context has become a new trend. The vision III of CSL emphasizes the characteristics of the contexts with a scientific component (X. Liu, 2009, 2013). According to the cognitive perspective of situated cognition theory, item development aimed at evaluating people’s cognitive level should be based on the specific context created. CSL (especially SM, PS, and STSS) has a robust internal recessiveness that needs to be activated in specific contexts to induce the explicit. In other words, CSL is appropriate for assessment based on contexts. To allow participants to concentrate and reflect on SK, SM, and STSS via PS in context, we suggest a context-based item development approach (Fig. 1). This approach places specific items in an authentic context. Researchers will develop corresponding research ideas based on scientific phenomena and problems, conduct research to provide results and explanations, and actively promote their applications. SK, SM, PS, and STSS are all required when proposing and carrying out research ideas, constructing conclusions and explanations, and promoting applications. There is a significant link between “doing science” and “using science,” even if not everyone is a scientist or can pursue a scientific career. People choose to learn to “use science” and think like scientists without having to “do science” by directly participating in scientists’ scientific research processes. This is an essential way to develop CSL. From this perspective, the development approach for items is feasible.

Fig. 1
figure 1

Context-based item development approach

Context is an important carrier for achieving test objectives and a vital factor for arousing participants’ interest and effectively activating their cognition. The contexts of items are based on citizens’ production and lives, pointing to authentic scientific phenomena and problems familiar to citizens, being unbiased in terms of gender, ethnicity, race, etc., and having universality and fairness. The breadth of SK is an integral part of CSL. However, in specific contexts, there are often fewer categories of knowledge that can be examined. Based on this, we have developed three types of items:

  1. (1)

    Single items without contexts using a true or false question format, used to explore the breadth of SK.

  2. (2)

    Single items that are presented with contexts and no correlation, are used to examine SM and PS. They are all multiple-choice questions.

  3. (3)

    Item sets that are based on a specific context and arranged in a particular logical order (such as the course of scientific research). Item sets include multiple-choice questions and five-point Likert scale questions and be used to examine SM, PS, and STSS.

The first type of items without context all use a true or false question format, with scores assigned as 0 or 1. Given that the assessment of CSL with SK as the core has a long history and rich experience, we reviewed and learned from previous items on the SK assessment. To facilitate comparison with previous assessment data, some of the existing items are adopted in the instrument. For example, “Earth goes around the Sun once every year,” “Light travels faster than sound,” and “The continents on which we live have been moving their locations for millions of years and will continue to move in the future,” which continue to appear in the CSL assessment in the United States and China (Miller, 2000). These items are of good quality, and the SK in them is still at the basic level of the SK system, which is central to measuring the public’s understanding of science (Miller, 2000; Miller, 2022; Wu et al., 2018). Therefore, these items are referenced and retained. Simultaneously, we also keep track of current scientific research developments and have chosen recent scientific findings of general interest, such as clean energy, 5th Generation mobile communication technology (5G), and chips, to be covered by the study. Examples include “Solar energy is a type of sustainable and clean energy” and “The 5th Generation Mobile Communication Technology (5G) transports signals through electrical waves”.

For items with context, we drew inspiration from the design of large-scale international testing projects (OECD, 2019). Contextual single items serve primarily as anchors for specific links in the scientific research process and construct tasks based on crucial context information and the dimensions to be assessed. These types of items are all multiple-choice items. For example, Item 24 (Fig. 2) provides the context of sugar block dissolution that citizens often encounter in their daily lives and aims to examine the “observation and experimentation” elements of the SM dimension. Based on the intention of the item, the task and option settings of the item have been clarified. The operations in the options are commonly seen or used by citizens during the observation and problem-solving process of the “sugar block dissolution” phenomenon. Only the first option is correct. Therefore, when assigning Rasch scores, assign the first option a value of 1 and the other options a value of 0. Although the incorrect options are interference ones, it’s also frequently occurring operations. It should be noted that we have always chosen contexts that are familiar, appropriate, and fair to citizens for item design.

Fig. 2
figure 2

Sample of a single item (Item 24) with context

Contextual item sets often involve a hot topic that is heavily debated in society, the subject of in-depth scientific study, and one that is well-known to the general public. While the latter provides a foundation for writing items that meet the level of citizens’ cognition, the former offers rich materials and crystal-clear logical chains for organizing several items in item sets. According to the characteristics of the proposed evaluation dimension and context, the item set includes multiple-choice items and 5-point Likert scale items. A portion of a 13-item item set is shown in Fig. 3. Global climate change and carbon dioxide emissions serve as the context for this item set. Global, regional, and personal concerns all revolve around the hot issue of global warming. It is anticipated that it will continue to be a subject of ongoing concern for many groups for a sizable period in the future, including researchers, regular people, and government decision-makers. Scientists have studied the topic of global climate change extensively over the last few decades, developing a variety of viewpoints and adaptive understandings. It may be said that the item set created based on this circumstance has some justice and stability, given the universality and consistency of the research and conversation on this topic. Furthermore, this item set can serve as effective items for assessing CSL for a long time. A graph of the changes in the average global temperature and atmospheric carbon dioxide concentration since 1880 is shown in Item Q1 in Fig. 3. The task is to examine the relationship between the two variables. Only the second option is correct. Therefore, when assigning Rasch scores, assign the second option a value of 1 and the other options a value of 0. To respond to this question, participants must summarize and evaluate the relationship between the two curves based on the graph’s evidence of their shifting trends. Moreover, SM’s “induction and deduction” elements are examined in this item. Two positions on global warming were presented in Item Q2, and participants could indicate whether they agreed or disagreed with the two positions. Item Q2–1 requires participants to express their opinions on the credibility of the two different viewpoints and evaluates the “rationality and questioning” elements of the STSS dimension. Item Q2–2 requires participants to describe their desire to participate in scientific dialogue between two perspectives and evaluates the “science engagement” elements of the PS dimension. For Q2–1 and Q2–2, during Rasch scoring, “Agree” and “Strongly agree” were reassigned as 1, while others were assigned as 0.

Fig. 3
figure 3

Part of the sample item set

We closely adhered to the evaluation indicator framework during the item development process to maintain consistency between the item and each dimension. All of the items underwent three rounds of discussion and improvement after being drafted, resulting in an item pool that included all dimensions and elements.

Pilot study

Fifty people aged between 18 and 70 were randomly chosen for the pilot test, to preliminarily explore the distribution of various options in the test questions and the required duration of the test (approximately 30 minutes). Then, 10 participants were selected for additional cognitive interviews to further explore their understanding of the items.

Data source

Data from a random sample were gathered via online questionnaires. Online surveys are a feasible method, as evidenced by the 74.4% Internet penetration rate (CNNIC, 2022) and the current practice of network collection-based nationwide CSL surveys in China. The data was collected from April to May 2022 using a popular survey app in China that generated and randomly distributed 603 anonymous survey questionnaires, in which 578 valid responses were kept after data cleaning. The demographic details of the valid samples are shown in Table 2. 362 females and 216 males made up the sample. Their ages ranged from 18 to 70.

Table 2 Demographic information of the valid sample

Data analysis

Combining item difficulty and person ability on one unit (logit), the Rasch model offers a reliable method for assessing latent human qualities (Boone et al., 2014). Rasch’s modeling method predicts that for each item, a person with high ability should be more likely to provide the correct answer than a person with poor ability and that for any individual, their performance on easy items should always be better than that on difficult ones. In human research and education, Rasch analysis is frequently employed (Boone et al., 2014; Lamar, 2012). Since 2007, China has been in line with international standards and started using the IRT method for calculating the CSL score, which is an effective way to gauge CSL (He, 2019; Miller, 1998). Rasch analysis in IRT was thus a logical choice for our study.

ConQuest combines a variety of item response models into a single computer program that can fit item response and potential regression models. The multidimensional item response model can be used to analyze items intended to generate measures on up to 10 potential dimensions (Wu et al., 2007). Two multidimensional Rasch measurement designs are named between-item design and within-item design. Unlike the latter, where each item can be used as an indicator of numerous potential dimensions, each item in the former can only be used as an indicator of one potential dimension (Wu et al., 2007). The two designs mentioned above can be subjected to a fitting validation examination by ConQuest. Each item on the multidimensional between-item design instrument for this study only aims to evaluate one potential dimension. Based on our conceptualization of CSL, we used the Rasch partial credit model to verify the instrument. In particular, the fit statistics and reliability of the instrument were closely examined. We learned about the quality of each item through Winstep to better support instrument validation because Winstep can provide more information.

We optimized the items based on experts’ reviews and cognitive interviews and ultimately selected 54 items for the test. We first calculated the ability values of the participants in four dimensions using ConQuest to compare the differences in CSL among people of different genders, ages, location types (urban or rural), educational backgrounds, and occupations. We then used SPSS22.0 for analysis, confirmed that the variables did not conform to Normal distribution, and proceeded to use a nonparametric test for further analysis.

Validation of the instrument

According to the Standards for Educational and Psychological Testing (American Educational Research Association et al., 2014), five different types of evidence may support a validity argument based on test content, response process, internal structure, relations to other variables, and consequences of testing. We argue that data generated through the instrument for measuring CSL can be used to valid inferences about their CSL. Under this premise, we focused on the validity evidence based on test content, response process, and internal structure of the instrument for measuring CSL.

Evidence based on test content

To improve content validity, we conducted two rounds of expert reviews on the framework and one expert discussion on the tool.

The first expert review is to collect experts’ opinions on the description of important components of the connotation and extension of CSL. 28 experts from China, the United States, Malaysia, Pakistan, Turkey, and other countries in the fields of science research, science education, science popularization, and science and technology were invited to participate in a questionnaire survey on the importance of determining the meaning of CSL and its four dimensions of elements and behaviors. The questionnaire consists of 51 items using the 5-point Likert scale, with a score of 5 representing the highest degree of importance. The Cronbach’s alpha coefficient is 0.824 and indicates the questionnaire has high reliability. Based on the judgment basis of experts’ responses to the questionnaire (including practical experience, theoretical knowledge, domestic and foreign peer research, and intuitive perception), as well as their familiarity with CSL, the authoritative coefficient (Cr) of experts is determined, mainly distributed in the range of 0.802 to 0.901(all greater than 0.8) (Wu et al., 2022), indicating that experts have a relatively high level of authority and have a high level of credibility in their responses to the questionnaire. All items have a high average importance score, and experts believe that the vast majority of items are very important. The coefficient variation (Cv) of 45 items out of 51 items is less than 0.2 (Wu et al., 2022), indicating that experts have a relatively consistent level of agreement with the specific descriptions of the four dimensions. Based on this, a consensus has been reached on the four important components of CSL and their descriptions.

Thirty experts in the fields of science education, measurement and evaluation, scientific literacy research, scientists, and engineering technicians were invited to evaluate the structure and completeness of the assessment framework. The questionnaire consists of 13 items, using a 5-point Likert scale, with 5 representing the highest level of recognition. Among them, one item is used to investigate the overall structural rationality of the CSL evaluation framework, with each dimension containing three items, respectively exploring the completeness, accuracy, and testability of the description of the dimensions. The survey results show that experts have a high level of recognition for the overall and various dimensions of the CSL framework.

To ensure the content validity of the instrument, we invited 25 experts from 12 countries including China, Israel, Singapore, Australia, etc. to review the items in the form of online meetings. They proposed improvement suggestions for the effectiveness of the items, including the scientificity, fairness, and consistency between the items with the corresponding indicators. Based on the experts’ opinions, we revised some items, including further refinements have been made to the wording of some items, converting relevant content and expressions into a more understandable way for the general public, and revising or deleting items that may be influenced by public ideology during the investigation.

Evidence based on response processes

The response process refers to the cognitive process of survey participants, that is whether the interpretation of the items by participants aligns with the intended interpretation of items of items by test developers (American Educational Research Association et al., 2014). The cognitive interview is a widely used qualitative survey development method for questionnaire design. Its purpose is to gain insights into whether the respondent’s understanding of the survey items aligns with the intentions of instrument developers (Willis, 2005) and can provide evidence of instrument development validity based on reaction processes. To ensure that the wording of the items is understandable to the target person and conforms to the intention of instrument development, we selected 3 people aged 18–30, 4 people aged 31–50, and 4 people aged 51–70 in the pilot study, and conducted three focus group cognitive interviews to determine any problems related to the items in terms of language understanding, appropriateness, etc., and revised any complex and incomprehensible words or phrases in the items marked by the interviewees. According to the interviews, all of items the were understandable and their answers reflected their real opinions.

Evidence based on internal structure

The internal structure is the extent to which the instrument conforms to the constructs and covers the instrument’s dimensionality and item functioning (American Educational Research Association et al., 2014). Item functioning refers to internal consistency, item-total and interitem correlations, etc. Here, we mainly present validity evidence based on internal structure by providing fit statistics, Wright Map, reliability, and correlation between SCL dimensions (Rios & Wells, 2014).

Fit statistics

In Rasch analysis, item functioning is usually checked through fit statistics. The standardized residual, which is the difference between the observed response and the expected response under the model, is the main input in Rasch’s goodness of fit analysis (Wright & Masters, 1982). Both unweighted and weighted MNSQ are important indicators of fit statistics, with an acceptable range of 0.6 to 1.4 (Bond & Fox, 2015). The former is more sensitive to outliers, whereas the latter is more sensitive to aberrant response patterns near item difficulty or person ability. Through ConQuest, the item fit was examined to check whether the items met the criteria. According to Table 3, unweighted MNSQ greatly surpasses 1.4, and the vast majority of T-values far exceed 2, which indicates poor fitting of these four items (DEL1-DEL4). The change in Cronbach’s alphas if an item was deleted, the description and dimensions of the items, and the information gathered from interviews were all taken into account when deciding how to handle (whether to retain or delete) these items. In the end, these four items were removed to improve model fit and reliability.

Table 3 Fit statistics of items with poor fit

Winstep was used to examine each item, and Fig. 4 displays the Item Characteristic Curve (ICC) of Item 45. The horizontal axis represents the difference between the person’s ability and item difficulty. Theoretically, the larger the difference, the more likely it is to receive a higher score. The probability that a score will fall within that range is shown on the vertical axis. When two curves overlap, it indicates that there is an equal chance that reactive participants will receive two scores. Each rating curve should have an identifiable ‘peak,’ showing the most likely score for participants within that ability range. The peaks of the curves should ideally appear in numerical sequence, and each curve should suitably span a specific ability area. From Fig. 4, the “Level 1” curve in Fig. A is in reverse order to the intersection of “0” and “2” and lacks notable peaks. Both the Infit and Outfit MNSQ at level 1 exceed the acceptable range of 1.4 (Bond & Fox, 2015), indicating a problem with horizontal division. We then combined the original level 1 with level 0 and adjusted level 2 to level 1 based on responses and the item descriptions. After horizontal division adjustment, Fig. B demonstrates that the ICC and fit statistics perform well, suggesting that the revision is appropriate.

Fig. 4
figure 4

The Item Characteristic Curve (ICC) of Item 45 before and after the revision of the rating level. (A is before the item revision, while B is after the revision)

The final instrument consists of 50 items in total after the five items have undergone the aforementioned adjustments, of which SK, SM, PS, and STSS each include 23, 7, 8, and 12 items. The multidimensional Rasch model in ConQuest is then used to analyze four-dimensional data. According to the findings (Table 4), both the weighted and unweighted MNSQ of each item falls within the permitted range of 0.6 to 1.4 (Bond & Fox, 2015). The great majority of items have T-values that fall in the permissible range of − 2 to 2 (Bond & Fox, 2015). We first look at and rely more on MNSQ because the T-value heavily depends on sample size, and this study has a large sample size of 578 people. As long as the MNSQ value is within an acceptable fit range, we ignore the T-value (Boone et al., 2014). Overall, all of the instrument’s items generally fit the four-dimensional Rasch model well, indicating good validity to measure CSL.

Table 4 Estimate, Error, and Fit statistics of items in the revised instrument of CSL. (An asterisk next to a parameter estimate indicates it is constrained)

Wright map

To intuitively convey the ability estimation of the person and the difficulty distribution of the item, as well as their corresponding degrees, the Wright map, also known as the person-item map, can place the difficulty of the item and the ability of the person on a Rasch scale (Boone et al., 2014). The Wright map of this study is shown in Fig. 5, where each “X” denotes 5.6 cases, and the four vertical lines represent the four dimensions of CSL. Person’s ability increases gradually from bottom to top. Figure 5 displays a good distribution of people throughout 5 to 6 logits in each dimension. On the right side of the vertical line is the distribution of item difficulty, which ranges from − 2 to 2 logits, basically covering people with different ability. The findings demonstrate that, despite a lack of items measuring high and low levels of person’s ability, the instrument generally meets the requirements. The application of this instrument could help depict the distribution of a group’s ability in CSL.

Fig. 5
figure 5

Wright map for the four-dimensional instrument. (Dimensions 1, 2, 3, and 4 are person ability estimations for SK, SM, PS, and STSS)


Separation reliability is an indicator of reliability used to evaluate the instrument’s characterization ability on different dimensions. According to the findings through ConQuest, the instrument has a separation reliability of 0.985, which means that the characterization ability in multiple dimensions can successfully separate people’s ability. By measuring scores on a particular dimension, EAP/PV reliability demonstrates the capacity to discern between various cognitive ability. Table 5 displays the total number of items and EAP/PV reliability in the instrument’s four dimensions. According to Table 5, the EAP/PV reliability distributions for the four dimensions range from 0.779 to 0.825, all higher than or close to 0.8, which indicates good reliability. Overall, the tool satisfies the criteria for reliability in science research and has good reliability as a whole and in different dimensions. Additionally, it shows that the multidimensional Rasch analysis technique has been properly implemented.

Table 5 EAP/PV reliability for the four dimensions of CSL

Correlation between dimensions of CSL

By providing the opportunity to fit a multidimensional item response model, ConQuest allows for evaluating correlations between potential variables, reflecting the relationships between various dimensions in Rasch measurements through correlation. Table 6 shows the correlation (values below the diagonal) of the CSL subdimensions generated by ConQuest. The correlation range between dimensions ranges from 0.676 to 0.877. Specifically, the correlation coefficients between SK and SM, as well as between SM and PS, are 0.836 and 0.877, both greater than 0.8, indicating a high degree of correlation; The correlation coefficients between SK and PS, SK and STSS, SM and STSS, and PS and STSS are 0.782, 0.696, 0.676, and 0.799, respectively, all greater than 0.5, indicating a moderate correlation (Bond & Fox, 2015). The correlation between any two dimensions is statistically significant (df = 46, P<0.05) and is still within the acceptable range of 0.3 to 0.9 (Lin et al., 2019), indicating that there is sufficient discriminant validity between dimensions.

Table 6 Correlation between dimensions of CSL

Impact of demographics factors on CSL

The nonparametric test results seen in Tables 7 and 8 demonstrate that while there was no statistically significant difference in performance between males and females, some demographic characteristics, such as age, location type, educational background, and occupation, have a remarkable impact on CSL (the significance level is 0.05).

Table 7 Nonparametric tests of demographic factors (Gender and Location Type)
Table 8 Nonparametric tests of demographic factors (Age, Educational Background, and Occupation)

Comparing distributions across groups provides more details. In terms of age variables, residents aged 60 and older perform significantly worse on the SK, SM, PS, and STSS dimensions than citizens in all other age categories. Urbanites perform much better than rural citizens across all four CSL dimensions. Citizens’ CSL gradually rises as their educational background does. However, there is no significant difference in PS and STSS between citizens with a college degree and those with a master’s degree, nor is there any significant difference in STSS between citizens who only attended high school and citizens with a master’s degree. Citizens with a bachelor’s degree or above are significantly higher than those with a low degree in SK, SM, PS, and STSS dimensions. Regarding occupation, the CSL of unemployed citizens is significantly lower than that of employed citizens and students wating for further study.


Discussion of the instrument

The four-dimensional instrument was developed to assess CSL. It can be used online or via a paper and pencil format. 50 items make up the instrument, and it takes about 30 minutes for the participants to complete the survey. According to the Standards for Educational and Psychological Testing (American Educational Research Association et al., 2014), we have provided validity evidence based on test content, response process, and internal structure. The validity evidence collected during the instrument development process of this study includes content evidence based on the expert review, response process validity evidence based on cognitive interviews, and internal structure validity evidence based on four-dimensional model fit. The findings support that this instrument can be used to assess CSL.

In some ways, the instrument’s high quality suggests that the elements we considered for the CSL framework and tools’ evaluation are reasonable. They are crucial components of CSL that every citizen should possess since they aid in problem-solving and engagement in public affairs. However, additional research will be necessary to better understand what SK, SM, and STSS are needed by citizens, and how to address issues and participate in science in more practical situations.

Discussion of different groups’ CSL

In exploring the differences in CSL among citizens with different demographic characteristics, our findings are in agreement with several prior studies that have used other instruments to assess CSL.

According to earlier studies, numerous demographic variables have a significant impact on CSL. For example, citizens with higher educational backgrounds and younger ages have higher CSL (Miller, 1998, 2006; Ren et al., 2022), which is consistent with our research findings. This is in line with our general understanding that citizens who receive higher levels of education can learn more scientific knowledge and methods, have more opportunities for scientific participation, and have a deeper understanding of the nature of science. Therefore, they can demonstrate higher levels of CSL. This is consistent with the widespread belief that people with higher levels of education are better equipped to learn new scientific concepts and techniques, participate in more scientific activities, and comprehend the fundamental principles of science. As a result, they can exhibit increased levels of CSL. In terms of senior adults, particularly Chinese citizens over 60, the general level of education is quite poor. Ren indicates that the education reform in China in 1977 had a long-term and significant impact on people born after 1960 (Ren et al., 2022). This part explains why Chinese citizens under the age of 60 have a significantly higher CSL in the survey.

The performance of individuals from urban areas is considerably better than that of individuals from rural areas in the four dimensions of CSL when considering the characteristics of the location type. This is in line with common opinion. Urban and rural communities have unequal access to formal and informal education resources, including funding for education, qualified teachers, science and technology museums, planetariums, etc. This results in an uneven distribution of CSL between urban and rural areas.

However, our results also contradict some previously published findings on CSL. Previous national surveys from numerous countries found a statistically significant difference between males and females in CSL (Garner-O’Neale et al., 2013; Miller, 2000; Miller, 2016; Zhang et al., 2016). Our study’s findings, however, are in line with those of a survey conducted in Japan, showing that there is no appreciable distinction between men’s and women’s CSL (Naganuma, 2017). Given the lack of statistical differences between boys and girls in PISA scientific literacy tests (OECD, 2007), Naganuma believes this is a relatively new finding (Naganuma, 2017). Compared to the gender differences reflected in CSL assessments that focus on SK, Naganuma’s tool places making decisions at the center of CSL. It seems that there is no significant difference between men and women when CSL emphasizes “science engagement”.

Research results also reveal differences in CSL among citizens with different occupations. This is consistent with previous international comparative research results (Wu et al., 2018). Individuals with occupations exhibit higher levels of CSL than those who have not participated in social work, which seems to illustrate the important role of social engagement in improving CSL.

Nowadays, CSL plays an increasingly important role at the national and individual levels. How to issue practical policy documents and carry out effective CSL promotion actions at the national level is of great significance. Focusing on the differences in CSL among different groups with different demographic characteristics will help us have a clear target. We need to bridge the gap in CSL among different groups while promoting the overall improvement of CSL for all.

Limitations and implications

This study is an essential inquiry into the construction of the CSL assessment framework and instrument development. The findings presented above should be accepted with caution. The purpose of this study is to develop and validate an instrument. The ability values of the samples in the study are evenly distributed and have a large span, which can serve the development of the instrument well. However, as a populous country, the sample in this study may not represent the entire population of Chinese citizens. Therefore, it is necessary to select larger and more representative samples for further instrument improvement and investigation analysis. Although this instrument is already of good quality, there is still a lack of extremely difficult and low-difficulty items. Future revisions and additions of items are both possible to raise the instrument’s quality.

From the perspective of science education research, this study has certain theoretical and practical significance. We incorporated STSS into the framework of CSL and obtained effective validation from tools, enriching the connotation and extension of CSL in theory. In practice, although the instrument was developed in Chinese, the formulation of the assessment framework and the substantial collaboration of outside experts throughout the instrument’s development might, in some ways, be seen as an international consensus on CSL and its assessment. Future translations of the instrument into additional languages for regional use in many nations will help to advance CSL horizontal comparability. Future studies may look more closely at the performance level of CSL and the application of the instrument to various samples. All or some of the items may be used during the assessment process when utilizing the instrument.

The development of the new CSL framework and the effective validation of the instrument have certain implications for education within and across science disciplines. Currently, global science education widely focuses on problem-solving ability and advocates for the participation of all citizens in scientific problem-solving in countries and regions (Kolstø, 2001). The importance of PS as one of the dimensions of CSL in this study has been validated. This responds to the focus of science education on PS. The STSS newly incorporated into the CSL framework is also particularly important for science education with SL as its objective. Therefore, when teaching science, teachers should pay attention to the development of students’ PS and their comprehension of STSS as well as the students’ mastery of SK and SM.

Meanwhile, the significant differences in CSL among groups with different educational backgrounds and occupations suggest the important role of formal education and informal in cultivating individual CSL. Although individuals spend relatively little amount of time in formal education (Liu, 2013), informal education has a lower effect on developing CSL than formal education. Therefore, how to make good use of informal education (such as scientific venues, media information, etc.) to promote CSL for all should receive broader attention.

Availability of data and material

The datasets used and/or analyzed during the current study are available from the corresponding author on reasonable request.



Civic scientific literacy


Science engagement


Scientific literacy


Trends in International Mathematics and Science Study


Programme for International Student Assessment


Organization for Economic Co-operation and Development


Scientific knowledge


Scientific method




Scientific thought and spirit of science


Nature of science


5th Generation mobile communication technology


China Internet Network Information Center


Item Characteristic Curve


Authoritative coefficient


Coefficient of variation


Download references


The research was supported by the“Research on the Assessment System of International Civic Scientific Literacy in 2021”(grant 21EBR014). We would like to thank all the participants for their efforts in helping us understand their civic scientific literacy. We wish to express sincere gratitude to those who took the time to help improve our work and provide many helpful suggestions.


This research was made possible by the financial support of the China Research Institute for Science Popularization (Grant No. 21EBR014).

Author information

Authors and Affiliations



JW contributed to the study conception and design, the drafting and substantial review of the work; and has approved the submitted version. YML, ZMZ, JCW, TL, SL, JXL, and SMX contributed to the material preparation and data analysis. The first draft of the manuscript was written by YML. JW and TL substantively revised the work. The authors read and approved the submitted manuscript.

Corresponding authors

Correspondence to Jian Wang or Jingchun Wang.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liu, Y., Wang, J., Zhang, Z. et al. Development and validation of an instrument for measuring civic scientific literacy. Discip Interdscip Sci Educ Res 6, 6 (2024).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: