New literacy is built upon change of text. Understanding what a text is is essential in this sense to our understanding of new literacy as well as the pedagogy. However, the kind of research including theorisation so far has only touched upon the surface of this shift regardless of whether we are talking about semiotic texts or social texts or whether pragmatics can be applied to the study on multimodal text. The latter is in fact quite useful, if we could extend its definition from language in use to modality/mode in use. But again, first of all, we have to rethink at a micro-level of text rather than reiterating the now commonsensical perception that multiple modes can be put together to formulate a text. In other words, we need to understand how textual elements are organised to redefine many macro-level representations such as genre and discourse, specifically. In fact, through the lens of multimodality, we should and can rethink many textual conventions. For example, recount in school setting is commonly taught as a written text type that usually involves certain textual features such as orientation, development, and coda. But with multimodality, it is possible for student to simply record a recount or compose a doc in which hyperlinks, images, and videos are embedded. In this sense, the text type is still recount and the generic structure may stay the same. However, many textual features can be very different. In the case of a recorded audio or video text, it may be possible to notice many new features such as repetition, pause, missing words/links, and more complex clause patterns. In the case of multimodal composition, it is possible to notice dilalogues that used to be inserted with double quotation marks are now replaced by short, edited audios or videos.
Such a change of text type has challenged us to think even further and question the definition of many terms. For example, when a teacher is preparing a lesson to teach connectives between clauses or between paragraphs, traditionally what comes to his/her mind are such words and collocations such as and, but, however, yet, consequently, insofar, and nevertheless. But in a multimodal text, first, the functions of such connections may be replaced, say, by images, sound, beats, or emoticons; and second, there might be much greater flexibility in terms of using connectives since they are no longer limited to the written forms. Similarly, we may have to rethink rhetoric or stylistic devices such as simile, metaphor, hyperbole, personification, and ominopohs. We may want to know if an image can perform the function of metaphor in the expression that he is as strong as horse by juxtaposing images of a man’s muscle and a horse. In the case of ominopheolged, probably we can do much better. For example, even though we would write something like: the dog barks, but in reality, we know that dogs never talk that way; and in fact, very few discursive imitations of natural sounds are accurate. However, since it is now feasible to capture and locate such sound from database, it would be much desirable to embed authentic sounds.