Citation:
Abstract:
Judeo-Spanish differs from late 15th-century Spanish and modern Spanish in several respects, such as its morphology, syntax, and semantics, but the most visible difference is in the alphabet. From the end of the 19th century, Judeo-Spanish has been written in various alphabets –Greek, Cyrillic, and especially Latin. However, the Hebrew alphabet had been used since ancient times, before it was abandoned finally only in the 1940s. This means that the majority of Judeo-Spanish texts are written in Hebrew characters.
CoDiAJe is an annotated diachronic corpus that includes documents produced from the 16th century up to the present day, developed in TEITOK. The significance of its development is that this tool processes linguistic data in the alphabets mentioned above, allowing users to visualize each text in five orthographic forms (the original version in which it was written, its transcription in Latin characters, an expanded form to complete abbreviations or to correct defective writing, a version in modern Judeo-Spanish, and a version in orthographic
modern Spanish). CoDiAJe enables the user to conduct searches not only for a specific word, but also for all its linguistic and orthographic variants in the different alphabets. During the annotation process, tags from the EAGLES tagset for Spanish were modified, and others were created: these are simply steps towards the creation of an accurate tagset for Judeo-Spanish. The digitized texts are also enriched with semantic-conceptual information and information on the affiliation of all non-Romance elements.