In implementing it in the EFL classroom, the authentic assessment, according to Mueller (2016), includes four important procedures. They are identifying standards or learning objectives for students, selecting authentic tasks, identifying the criteria for the tasks, and creating the rubrics. The followings are the detailed phases of authentic assessment. 1. What should the students know and able to do? English lecturers list the knowledge and skills that become the standard/learning objectives, for instances:
a. In reading skill, the students are able to identify the information about the the genuine and fake auto spare parts.
b. In writing skill, the students are able to write a summary of the identified information about the genuine and fake auto spare parts.
c. In vocabulary knowledge, the students are able to exploit the words, terms, expressions, and or lexis mostly used in the the genuine and fake auto spare parts’ texts/materials.
d. In grammatical competence, the students are able to understand and correctly use the eight parts of speech, i.e., noun, verb, adjective, pronoun, articles, etc; degrees of comparisons (positive, comparative, superlative and constructions with comparisons, etc); simple present tense; show cause and effect; introduce additional idea (furthermore, in addition); introduce an opposite idea (in contrast); introduce example (for instance); introduce a conclusion (in conclusion), etc.
e. In speaking skill, the students are able compare (explain, describe, and elaborate) the differences between the genuine and fake auto spare parts.
d. In listening skill, the audiences (the students who are listening to his classmate’s articulation/presentation) are able to take note, what they have already listened to indicate that they truly engage, participate and understand what has been articulated or uttered by the presenter.
2. What indicates students have met these standards? To determine if students have met these standards, you will design or select relevant authentic tasks, for example:
a. The tasks the students perform comprise ill-defined activities that have to do with their real-world life (lecturers and students determine the tasks will be learned and presented). Each student has different task.
3. What does good performance on this task look like? To determine it, students have performed well on the tasks, teacher will identify and look for characteristics of a good performance called criteria, for examples:
a. selects the auto spare parts,
b. show the differences between the fake and the genuine auto spare parts to the client,
c. explains the characteristics of the fake and genuine ones,
d. explains the quality,
e. explains the endurance of the fake and genuine auto spare parts,
f. explains the advantage of using genuine and fake parts,
g. shows the list prices, etc.
4. How well does the student perform? To discriminate amongst students’ performance, teacher creates scoring criteria such as rubrics, rating scale, checklist, anecdotal record, or memory approach (Mueller, 2016).
After identifying a standard, selecting authentic tasks, identifying the criteria for the task or assignment, the final procedure is to create scoring criteria for assessing students’ learning (Mueller, 2014). The authentic assessment scoring criteria suggested to apply are first is checklist. Checklist, according to Hills (1992), is a list on which the lecturers (or parent or other adult) checks the knowledge, skills, or behaviours observed before, during, or after the behaviour occurs. Second is Rating scale. Rating scale is a list of behaviours made into a scale, using frequency of behaviour, level of mastery, etc., which the observer checks before, during, or after the behaviour. Then the third is anecdotal or narrative record (or anecdote) is like a short note which narrates or recounts the most important events, incidents, cases during learning. This record is observed, analyzed and written by the lecturers. The fourth is memory approach is similar to anecdotal record. However, in this approach, the lecturer does not take a note to record but s/he prefers to use his own memory to remember the events (Brookhart, 1999) in Moskal, (2000).
The last is rubrics. Rubrics (scoring tools) are a way of describing evaluation criteria (grading standards) based on the expected outcomes and performances of students. Typically, rubrics are used in scoring or grading written assignments (essay exams and group work), homework (in-class activities and lab report), participation (oral presentations, portfolio) and project (self-assessment/term-paper) or other forms of student performance (Stevens & Levi, 2005; May, 2005; Allen, 2004; Huba and Freed, 2000).
The elements of a rubric typically designed as a grid-type structure, a grading rubric includes criteria, levels of performance, scores, and descriptors. The first element is criteria identify the trait, feature or dimension which is to be measured and include a definition and example to clarify the meaning of each trait being assessed. Each assignment or performance will determine the number of criteria to be scored. Criteria are derived from assignments, checklists, grading sheets or colleagues (Stevens & Levi, 2005).
The second is levels of performance are often labelled as adjectives which describe the performance levels. Levels of performance determine the degree of performance which has been met and will provide for consistent and objective assessment and better feedback to students. These levels tell students what they are expected to do. Levels of performance can be used without descriptors but descriptors help in achieving objectivity (Stevens & Levi, 2005). Words used for levels of performance could influence a student’s interpretation of performance level (such as superior, moderate, poor or above or below average).
Levels of performance, examples:
1. Excellent, Good, Fair, Poor
2. Master, Apprentice, Beginner
3. Exemplary, Accomplished, Developing, Beginning, Undeveloped
4. Unacceptable, Marginal, Proficient, Distinguished
5. Beginning, Developing, Competent, Exemplary
6. Novice, Intermediate, Proficient, Distinguished, Master
7. Needs Improvement, Satisfactory, Good, Accomplished
8. Poor, Minimal, Sufficient, Above Average, Excellent
9. Unacceptable, Emerging, Minimally (Jackson, 2001-2006)
The third is scores make up the system of numbers or values used to rate each criterion and often are combined with levels of performance. Begin by asking how many points are needed to adequately describe the range of performance you expect to see in students’ work. Consider the range of possible performance level (Stevens & Levi, 2005). Score example: 1, 2, 3, 4, 5.
The last is descriptors are explicit descriptions of the performance and show how the score is derived and what is expected of the students. Descriptors spell out each level (gradation) of performance for each criterion and describe what performance at a particular level looks like. Descriptors describe how well students’ work is distinguished from the work of their peers and will help you to distinguish between each student’s work. Finally, the same descriptors can be used for different criteria within one rubric. For example, the three level of performance: Excellent, Good, Fair and Poor can be used for the separate criteria of Accuracy, Organization, Punctuation & Grammar, and Spelling. Descriptors should be detailed enough to differentiate between the different level and increase the objectivity of the rater or assessor.
Subsequently, the important aspect when dealing with the rubric is developing a Grading Rubric. The aspects which must be taken into account is first is to select a performance/assignment to be assessed. The second is to list criteria. The third is to write criteria descriptions. The fourth is to determine level of performance adjectives. The fifth is to develop scores in numerical value. The sixth is to write the descriptors. The last is to evaluate the rubric whether it has met or matched the instructional goals and objectives (Stevens & Levi, 2005).
The educational experts, Stevens & Levi (2005) put forward four types of rubrics. They are holistic, analytical, general, and task-specific. Each of them is described as follows. The first is holistic rubric assesses all criteria are as a single score. Holistic rubrics are good for evaluating overall performance on a task. Because only one score is given, holistic rubrics tend to be easier to score. However, holistic rubrics do not provide detailed information on student performance for each criterion; the levels of performance are treated as a whole. Reasons to use holistic rubric are for
1. simple tasks and performances such as reading fluency or response to an essay question
2. scoring quickly; no feedback and no detail information provided
3. requiring lecturers to score the overall process/product as a whole without judging the component parts separately (Nitko, 2001)
4. getting a quick snapshot of overall quality or achievement
5. judging the impact of a product or performance” (Arter & McTighe, 2001: 21).
The second is analytical rubric. In this rubric, each criterion is assessed separately, using different descriptive ratings. Each criterion receives a separate score. Analytical rubrics take more time to score but provide more detailed feedback. Reasons to use analytical rubric are for
1. judging complex performances involving several significant (criteria)
2. helping students and lecturers identify the strengths for improvements/feedback. It takes longer to prepare as it judges the component parts separately (Nitko, 2001).Time consuming to assess the students’ tasks.
3. providing more specific information or feedback to students (Arter ; McTighe, 2001, p 22).
The fourth is generic rubric contains criteria that are general across tasks and can be used for similar tasks or performances (or is used to evaluate/assess a process, problem solving). Criteria are assessed separately, as in an analytical rubric. Reasons to use generic rubric are
1. when students will not all be doing exactly the same task; when students have a choice as to what evidence will be chosen to show competence on a particular skill or product.
2. when instructors are trying to judge consistently in different course sections” (Arter & McTighe, 2001:30)