Introducing
Your new presentation assistant.
Refine, enhance, and tailor your content, source relevant images, and edit visuals quicker than ever before.
Trending searches
November 23 2017
This study aimed to evaluate and examine the appropriateness of vocabulary presentation in the Iranian English textbooks of Junior High school known as the Prospect Series through a corpus-based investigation.
To determine the characteristics of the frequency of the words contained in the Iranian Junior High School English Language Textbooks (Prospects 1-3)
To investigate the characteristics of vocabulary loading within and across the Iranian Junior High School English Language Textbooks (Prospects 1-3).
To identify the extent to which the words in the GSL and the AWL are covered in the Iranian Junior High School English Language Textbooks (Prospects 1-3).
To identify the recycling characteristics of the words from the GSL and the AWL in the Iranian Junior High School English Language Textbooks (Prospects 1-3).
To determine the distribution of the randomly selected words from the GSL and the AWL within and across the entire Iranian High School English Language textbooks (Prospects 1-3).
.
Research Question 1: What are the general characteristics of the frequency of the words contained in the Prospects Series?
Research Question 2: What are the characteristics of vocabulary loading in the Prospects Series?
Research Question 3: To what extent are the words in the GSL and the AWL covered in the Prospects Series?
Research Question 4: What are the recycling characteristics of the words from the GSL and the AWL in the Prospects Series?
Research Question 5: How is the distribution of the randomly selected words from the GSL and the AWL within and across the entire set of the Prospects Series?
3.5. Data Collection Procedure
The data collection procedure consisted of several phases. As for the first phase of this study, all the textbooks were downloaded from the website of Iranian School Books retrievable and accessible at http://www.chap.sch.ir. Then, the PDF files were converted into text files so that they were ready for being used by WordSmith Tools (Version 7.0) and RANGE and Frequency Program (Heatley, Nation & Coxhead, 2002) as well as R Software. After collecting the data related to the lessons, the lessons related to each textbook w collected and saved as separate files to be representative of the data of each textbook needed for making a comparison of the textbooks themselves. For data analysis, descriptive analyses were obtained which are detailed in the next section. The overview of the data collection methods has been shown in Figure 3.1.
The results of this study could be used as guidelines to provide recommendations on the teaching of English. As Biber, Conrad and Reppen (1994) have suggested, corpus-based research sheds new light on some of our most basic assumptions about English grammar, and as a result it offers the possibility of more effective and appropriate pedagogical applications.
Other than using grammar books, internet sources, dictionaries and the textbooks, teachers may consider using a concordancer. A concordancer is a computer program which is used to find the occurrences of every single word or phrase in a text (Sinclair, 1991). Teachers could also retrieve concordance entries from the accessible website and prepare exercises for their students. The concordancer and concordancing is one example where “the technology can be used to promote autonomous learning.” Such an approach may help in the “empowerment of students’ (Butler, 1990).
In general, the results and findings discussed in this research could be useful, first hand, for the textbook compilers and designers in Iran. First, because there are two books Vision 2 and Vision 3 which have not been compiled and introduced yet, and knowing all the results reported here could help them in designing the textbooks so that the deficiencies and shortages would be compensated in the upcoming textbooks. The textbook compilers can take into account the loading patterns reported here and apply the right method when the lessons in each textbook are organized, as well as across the two upcoming textbooks. As for the coverage of the words from the GSL and AWL, they can take into account the introduced list of missing words so that such important words could be presented in the upcoming textbooks. By this the exposure to the high frequency words would be increased, hopefully resulting in a better learning especially when the students graduate from high school levels and intend to continue their studies by going to colleges and universities. As for the inadequate repetition and distribution of the words from the GSL and AWL, the textbook compilers could also endeavor to apply and implement a better repetition and distribution, at least for the upcoming textbooks, while taking into account the results of this study when they intend to revise the already introduced textbooks of Prospect Series and Vision 1.
In addition to textbook compilers and policy makers, the English teachers working for the MoE could benefit from the findings in diverse ways. First and foremost, they could balance their teaching when they know, for the findings of this research, some lessons are more difficult in terms of vocabulary loading; as such, they could plan to teach those difficult lessons during a lengthier period of time so that their students would not be exhausted and exposed with so many new types and this ensures better learning. In addition, the missing lists of the GSL and AWL could be used by the teachers so that whenever they think it is appropriate, they can add some words to their lesson plans and by this expose their learners with more vocabularies, compensating the very low coverage of these words. For instance, this could be achieved by selecting reading excerpts or listening activities whereby other words which have not been introduced in the book from the GSL ad AWL could be taught to them accordingly. Knowing a list of the words which have been repeated fewer than 7 times could enable the teachers to bring more repetition and use of such words, as introduced in the two lists of the GSL and AWL being recycled fewer than 7 times. This would definitely bring more understanding and exposure to the most important high frequency words and academic words. At last, when the teachers are aware that some words have been distributed in certain textbooks poorly, they can compensate this by creating situations where such words could be taught to them at the levels where these words are distributed weakly or never distributed.
Likewise, the students could benefit from the results by having a list of the words mentioned above so that they could do self-studies and this would help, especially the ones who are more enthusiastic, so that without the teachers’ effort, the learners themselves could plan for a better learning. Generally, researchers in the field of textbook analysis and corpus linguistics could benefit from the findings especially when they deal with Prospect Series and Vision Series and this enables them to have a clear and reliable perspective about the newly-designed textbooks in Iran. The next section elaborates on the recommendations for future research.
This study only focused on presentation of types and tokens, other studies could analyze the part of speech including adjectives, nouns, verbs, and adverbs.
Other research could replicate this study and instead of the GSL, other references such as BNC could be applied when they focus on coverage of high frequency words, their distribution, and recycling.
The future research could focus on other aspects apart from vocabulary, such as prepositions, modal auxiliary words, grammar, syntax, etc.
Researchers could also devise checklists specialized for vocabulary presentation and distribute among the teachers to find out if their ideas resemble the realities regarding the vocabularies as reported in this research.
WordSmith Tools Version 7.0
Afterwards, ten words were randomly selected by utilizing R Software (Version R-3.4.0 for Windows), downloadable from http://wbc.upm.edu.my/cran/bin/windows/base/. In practice R is reported as an open source software environment and programming language for statistical computing and graphics. University Putra Malaysia is a pioneer in providing the Comprehensive R Archive Network (CRAN) mirror among higher education institutions in Malaysia, available at http://wbc.upm.edu.my/cran/. It is asserted that this software provides a wide variety of statistical and graphical techniques, and is highly extensible (R Core, Team (2013).
Using R Software, with precondition of randomly selecting 10 words from a range of 1-790, the software yielded a list of 10 numbers (Appendix 1). By referring to the of lists the words from the GSL presented in the Prospect Series and Vision 1, the numbers given by R Software were assigned to the words and the results are detailed in Table 4.30 and Table 4.31.
RANGE (Heatley, Nation & Coxhead, 2002) yields the number of words and a text frequency figure. RANGE comprises three unique lists: list one and two (hereafter L1 and L2) represent the 2000 most frequent words in English, while L3 includes words that are not found in the first 2000 words but are frequent in secondary school and university texts. These lists are based on the GSL and the AWL.
The RANGE differentiates between three unique classes: tokens, types and families. While for some purposes, the tokens and types were considered in this study for the analyses, in other instances only the families (lemmas) were the categories of analysis. RANGE is downloadable and free (http://www.victoria.ac.nz/lals/about/staff/paul-nation).
With reference to the GSL, it is said that it contains 2000 headwords and was developed in the 1940s. The frequency figures for most items are based on a 5,000,000 word written corpus. Percentage figures are given for different meanings and parts of speech of the headword. In spite of its age, some errors, and its solely written base, it still remains the best of the available lists because of its information about the frequency of meanings, and West's careful application of criteria other than frequency and range. The classic list of high frequency words is Michael West's General Service List (1953). The 2000 word GSL is of practical use to teachers and curriculum planners as it contains words within the word family each with its own frequency. For example, excited, excites, exciting and excitement come under the headword excite. The GSL was written so that it could be used as a resource for compiling simplified reading texts into stages or steps. West and his colleagues produced vast numbers of simplified readers using this vocabulary. This is actually a very old list being based on frequency studies done in the early decades of this century. Doubts have been cast on its adequacy because of its age (Richards, 1974) and the relatively poor coverage provided by the words not in the first 1000 words of the list (Engels, 1968).
Vocabulary Loading in Prospect Series
Objectives Two was established to address general characteristics of vocabulary loading in the Prospect Series:
• To investigate the characteristics of vocabulary loading within and across the Iranian Junior High School English Language Textbooks (Prospects 1-3).
In line with this, the following research questions and subquestions were formulated:
• Research Question 2: What are the characteristics of vocabulary loading in the Prospects Series?
• Subquestion 1: What are the characteristics of vocabulary loading of each textbook of the Prospects Series?
• Subquestion 2: What are the general characteristics of vocabulary loading in the lessons in each textbook of the Prospects Series?
Table 4.8 presents the results yielded by the WordList tool, accompanied by WordSmith Tools (Version 7) with reference to Standardized Type/Token Ratio (STTR), Density Ratios (TTR) and Consistency Ratios. Diverse criteria have been employed in this section, such as the Standardized Type/Token Ratio (STTR) to measure the density level of textbooks, in addition to density ratios (TTR) and Consistency ratios to address Research Questions Three and Four. Textbooks with higher percentage of STTR indicate that the textbooks have more types being introduced for every 1000 tokens in the textbooks (Mukundan & Aziz, 2009). It needs to be highlighted that Jin, Tong, Nor, Tarmizi, and Mahmad (2012) used both density ratio and consistency ratio to discuss the vocabulary density and text difficulty in their research. As for the density ratio (TTR), they admit that the highest density ratio indicates that the passages are cramped with large tokens of words with many introductions of new words. As asserted by Nation (1990), in order to calculate the lexical density index (LDI or TTR) of a given text, the number of different words (Types) by the total number of words (Tokens) in the text. This index has been used by many researchers to discuss vocabulary input in the textbooks (Such as Jin et al., 2012 & Mármol, 2011). Moreover, Jin et al. (2012) acknowledge that the consistency ratio can be obtained using a simple formula i.e. to divide the number of tokens by the number of types. By consistency ratio, the obtained statistic shows after how many words a new word has been introduced in the textbook and the lowest rate means that a particular textbook or lesson is difficult as the rate of introduction to new words is frequent. Mármol (2011) admits that the lexical density of a text may indicate its difficulty, approving that texts with low density (less than 40-50%) are considered not dense and relatively easy to understand; on the other hand, texts over 60-70% LDI are lexically dense and more complex to read.
As observed in Table 4.8 and Figure 4.2, there is a steady surge in the total number of tokens from Prospect 1 Prospect 3. There is also an upward trend in introducing the total number of types found in the Prospect 1-3 textbooks, with a noteworthy growth in favor of the Prospect 3 textbook in terms of the sum of types. Again, it is acknowledged for the purpose of these research objectives, unit of analysis is considered to be the word heedless of whether some types belong to a headword or lemma.
Table 4 8: Lexical Density Index for the Prospect Series
As for this section, the current research aimed at determining and identifying the top fifty words (both grammatical and content word types) based on the following Subquestion:
• Subquestion 2: What are the most frequent words contained in the Prospects Series?
The top fifty words for each textbook in Prospect Series taught at Junior High Schools of Iran, alongside their occurrences and percentages with respect to the total number of tokens in the text are tabulated in descending order in Table 4.3, Table 4.4, and Table 4.5.
One noticeable point is that there is a predominance of grammatical words (Function words) over content words in the three textbooks under study. The criteria for deciding on function words lays its basis to Cook (2013) who introduced the fifty most frequent words in English based on BNC. In Prospect 1, the number of function words equates the number of content words, 25 for both. In prospect 2, function words amount to 29 versus 21 content words. In Prospect 3, the number favors functions words with 35 versus 15 content words.
In terms of shared function words, it is observed that a count of 20 words were shared by the three textbooks and they are a, about, and, are, can, do, he, how, I, in, is, it, she, the, then, to, what, with, you, and your. The number of words being shared only by two textbooks was 9 words, including at, his, my, no, of, on, there, they, we, and where. Nonetheless, there were 9 non-shared words which occurred only in one textbook and not the others. This includes did, does, for, from, her, if, not, that, and who.
As regards the content words, it was observed that only 6 content words in the three textbooks are shared and they are answer, ask, conversation, listen, talking, and teacher. Yet, a number of 10 words are shared only between two textbooks including check, examples, friend, like, practice, questions, say, student, work, and yes. Also, 23 words are unique to certain books and are presented in either of the books, which include address, age, Ali, below, card, city, classmates, doing, English, fill, have, health, job, letters, name, number, old, play, sentences, some, sounds, spell, and write.
Table 4 3: The top fifty words in Prospect 1
A closer glimpse over the results presented in Table 4.8 and Figure 4.2 convinces us that Prospect 3 has the highest density level (STTR=30.72) compared to the other two English language textbooks. It implies that at this grade (Grade 9 of Junior High School), students are expected to be ready to handle a larger number of words; therefore, more types are introduced. The fact here is that Prospect 1 has the lowest density level (STTR=22.53) and precedes Prospect 2 which ranks next with an STTR of 24.30, i.e. there is an ongoing growth in the density level from Prospect 1 to Prospect 3, meaning that the last book is relatively more difficult than the other two textbooks in terms of the vocabulary load in the textbooks. To sum up, the first book of the series in the junior high school (Prospect 1) is the least difficult textbook in terms of the vocabulary load in the textbook when STTR is considered, and this difficulty progressively proliferates grade by grade and the last book in the series (Prospect 3) is the most difficult textbook in terms of the vocabulary load in the textbook, with the highest STTR of 30.72.
To this end, it can be claimed that lexical variation between the three books is not similar. Totally, the STTR for the three books is 26.74, which denotes a rather less difficult ratio i.e. the three books have a reasonable loading for the junior high schoolers if only STTR is taken into account; nevertheless, when further detailed analyses related to the density ratio are regarded, it is observed that Prospect 3 has the highest in comparison with the other books in the series meaning that Prospect 3 is cramped with large tokens of words with many introductions of new words. The consistency ratio of the same textbook reads 6.75, implying that after every 6 words, a new word is introduced in the textbook. This ratio is almost lower than the other two books of the series, namely Prospect 1 and Prospect 2 with a consistency ratio of 7.38 and 7.81, meaning that Prospect 3 is difficult textbook as the rate of introduction to new words is rather frequent compared to the other two textbooks, and this should be the case; on the other hand, a point of debate arises when both density and consistency ratios are taken into account approving that there is, though, an imperceptible variation for Prospect 1 and Prospect 2. Although the STTR ratios patterned an upward trend from the first to the last textbook, results related to density and consistency ratios, illuminated in Table 4.8 and Figure 4.3 depict that Prospect 1 has a slightly higher density ratio and a corresponding lower consistency ration, meaning that Prospect 1 is a bit denser than the textbook which follows it, and this should not be the case. To conclude, vocabulary loading and density in the studied Prospect Series could be considered to be reasonable, as rather low indexes were expected for the reason that these textbooks are mostly for the elementary levels, which gradually move toward advanced levels.
Figure 4 3: Density and Consistency Ratios of the Prospect Series as a Whole
Then, to answer Subquestion One, the results yielded by WordSmith Tools Version 7 (WordList) have been tabulated in Table 4.2 and plotted in Figure 4.1, demonstrating the total number of running words (tokens) and the total number of distinct words (types) observed in the textbooks. It needs to be asserted that to fulfil this Objective, only the quantity of tokens and types was taken into account, without lemmas or considering lemmatization of the words for the reason that by lemmatization the types would be clustered in a family, reducing the number of the words from a family showing it only under a headword. As such, Prospect one introduces 3,382 running words and 458 types. In Prospect 2, all the units reach 3,891 tokens. Out of this total, 498 are types. Prospect 3 increases its tokens up to 5,151 and there is an upward trend in presenting the types (763).
Table 4 2: The total number of tokens and types in Prospects 1-3
Characteristics of Vocabulary Presentation in Prospect Series
• Research Question 1: What are the general characteristics of the frequency of the words contained in the Prospects Series?
• Subquestion 1: How many words are introduced in the Prospects Series?
• Subquestion 2: What are the most frequent words contained in the Prospects Series?
Table 4.1: General Description of the Prospect Series (1-3)
Apart from the aforementioned variations observed among the textbooks in terms of function words and content words, and shared and non-shared content and function words, Cook (2013) lists the top fifty most frequent words based on BNC as shown in Table 4.6. As proclaimed by Bartsch (2004), all of the top 50 words in BNC belong to the class of function words, that are typically contrasted with the content words based on presence or absence of lexical content. Cook (2013) also admits that the top 100 words in BNC account for 45% of all the words in BNC, confirming that learning these 100 words assists the learners to identify roughly half of the words they might encounter in English. Bearing the prominence of the 50 topmost high frequency words in BNC in mind and the estimation of knowing 100 words, this study made comparisons between the three textbooks of the Prospect Series to determine how they are presented.
Indeed, by having a glance at the first top ten words listed, discrepancies appear as there are 6 words which are shared while having very inconsistent ranking both within the textbooks and as compared to BNC. Although in BNC, the first word is “the”, this word takes the first ranking in Prospect 2 and 3, but the 3rd place in Prospect 1, which is unusual another odd pattern with respect to ranking is that the words “of”, “in”, “it”, and “was” are absent in the Prospect Series’ top ten frequent words while they are the top ten words in BNC. In Prospects 1-3, “was” do not appear in the top 50 lists. Regarding “of”, it is absent in Prospect 1, while Prospect 2 it is the 27th word and in Prosect 3 takes the 32nd tank, as compared to the 2nd rank in BNC. As for “in” it is the 20th in Prospect 1 as compared to the 5th rank of BNC while in Prospect 2 takes the 14th and in Prospect 3 the 21st rank; again an unwieldly and awkward presentation. With reference to “it”, the rank is 49th in Prospect 1, 12th in Prospect 2, and 13th in Prospect 3, while being 7th in BNC. As for “your” which is the most frequent word in Prospect 1, it never appears among the top 50 words of BNC; meaning that this word has been recycled extraordinarily, even more than “that”. The case of “your” takes the 17th position in Prospect 2 and the 9th rank in Prospect 3, which means the three textbooks insist on using it, even more than that of BNC. Moreover, the word “and” in Prospect 1 is even more frequent than “the”, which should not be the case and when it comes to presentation of “and”, it only presents appropriately in Prospect 2 (3rd rank), similar to that of BNC, but the 4th in Prospect 3.
There are many other discrepancies when we compare Prospect Series with the other two textbooks and BNC. For example, “if” appears only in prospect 1, and it takes the rank of 42nd, even higher than that of BNC (47th). Whereas “his” is absent in Prospect 2, in the other two textbooks has the rank of 41st, just less frequent than that of BNC (27th). The word “can” is another sample because in BNC it is the 48th frequent word while in the Iranian textbooks it takes the rank of 13th, 6th, and 20th in Prospects 1-3, respectively; more frequent than BNC. Indeed, this discussion so far suffices to claim that the ranking and frequency of the top most frequent words of neither textbook in the Prospect series accord agreeably with that of BNC and in Iranian textbooks some words have been recycled even more than that of BNC, and there are others which have been either absent or poorly presented.
The same awkward pattern can be observed evidently for the presentation of content words: whereas the word “name” stands 9th in Prospect 1, it does not appear amongst the top 50 high frequent words in the other two textbooks. “Teacher” is the 10th frequent word in textbook 1 appears as the 15th and the 38th in the other two textbooks, respectively. “student” is the 17th frequent word among the top 50 of Prospect 1, while being 9th in the second textbook but totally absent in the list of textbook three. There are many other discrepancies regarding the content words when only the shared content words in the list are taken into account. “Answer” in Prospects 1 and 3 is the 30th, while being the 25th in Prospect 2. Taking into account the content words shared by three textbooks, again an incongruent pattern can be document. “Answer” takes the rank of 31st in book 1, 25th in book 2, and 29th in book 3. Also, “listen” undergoes incongruent repetition in the three sets of textbooks; in Prospects 1, 2, and 3, it takes the ranks of 15th, 18th, and 17th. The words “ask”, “conversation” and “talking” reveal diverse repetition and ranking within each textbook. Nonetheless, the fact is that none of the abovementioned content words even appear in the list of twenty most frequent nouns, verbs, and adjectives in BNC as proposed by Cook (2013) as presented in Table 4.7. To conclude, the textbooks under study suffer from poor presentation of both function words and content words, within and across the books, and particularly when compared and contrasted against BNC.
Figure 4 1: The Total Number of Tokens and Types in Prospects 1-3
Loads of Thanks for Your Attention and Patience