Digitisation of the Berlin Turfan texts (DFG project)   




Digital Turfan Archive


The collection of Turfan texts of the Berlin Brandenburg Academy of Sciences and Humanities (BBAW) is the result of four archaeological expeditions to Central Asia at the beginning of the 20th century. It encloses about 40,000 fragments of oriental manuscripts and printings whose curator's management has taken over the State Library Berlin-Prussian Cultural Heritage in 1996 by a depositary treaty with the BBAW. For the purpose of the protection of the originals and an easier access to these extraordinary important resources, the Old Turkish, Middle Iranian and Mongolian parts of the collection were digitised and stored between October, 1997 and June, 2005 , under the care of the Turfan studies group and the Union Catalogue of Oriental Manuscripts in a three-stage project funded by the German Research Foundation (DFG) . The  necessary filming and restoration work was carried out by the State Library Berlin-Prussian Cultural Heritage.

In a joint application also funded by the DFG, about 6,200 Chinese and Tibetan fragments have been digitised in cooperation with the State Library Berlin. On account of a cooperative agreement with the International Dunhuang Project of the British Library London which has been signed in June, 2005  the digitised texts were linked with metadata and presented in a database form on the Internet. In addition, a German version of the IDP database was provided, see IDP Berlin. Between 2008 and 2012, the Syriac, Tocharian and Sanskrit fragments have been digitised and entered into the IDP database. Now the complete Turfan collection is accesible in the Internet.


Aims of the digitisation:

- Digital archiving and preservation of the irreplaceable information contained on the fragments
- Making accessible the fragments for a bigger scholar's circle through the Internet presentation
- Reduction of the work with the originals to avoid damage of the same


Signature group Language/Content Script Number* Number of digital images
Ch/U Chinese Uighur texts Chinese
about 1.600 4.658
n Sogdian Nestorian 300 576
KS Khotan Saka Brāhmī 24 41
TS Tumšuq Saka Brāhmī 50 101
U Old Turkish Uighur
about 6.000 14.160
Mainz diverse diverse about 1.500 3.854
M different Middle Iranian languages, especially Middle Persian und Parthian; Sogdian; 1 Bactrian; Old Turkish Manichäisch; z.T. Soghdisch ca. 3.500 3.606


ca. 1.000 1.486
bs Sogdian Brāhmī 8 18
bi different Middle Iranian languages Brāhmī 78 90
h Bactrian Hephthalite 7 10
Ps Middle Persian Pahlavī 12 22
np New Persian Arabic 1 2
MongHT Mongolian Uighur Mongolian
ca. 100 557
Chinese Chinese 4.306 8.734
Chinese/Xixia Chinese/Xixia 1 2
Xixia Xixia 5 10
Manchurian Manchurian 1 2
Tibetan Tibetan 146 298
Syriac Syriac 395 628
Sanskrit/Prakrit Brāhmī/Śāradā/Pāla/
ca. 14.000 22.899
Tocharian A/B Brāhmī ca. 6.800 8.114
Middle Indic Kharoṣṭhī 9 22
total     40.000 69.890

*  As a result of the preservation method used at the beginning (collective record; separate glazing of composeable fragments; fragments proved by documents but lost as a result of World War II war) it is difficult to determine an absolute number of fragments in single signature groups.



Process  of production (until 2005):


- Restoration: in emergencies, e.g., to repair damaged glass plates, to smooth or to clean and to rearrange the text fragments

(Repography Department of the State Library)

- Taking photos: making colour slides of the fragments protected in glass plates

- Scanning: scanning the slides with a resolution of  2,700 dpi and saving the tiff files using file names according to the fragments' signature: text group, number of the fragment, alpabetical labels of single fragments glassed together, identification of recto and verso

- Image editing: rotating the scans to the right position, cutting the detail photographs, enhancing the lightness and the contrast

- File storing: burning the scans on CD, two identical series od CDs were created and stored in two different places

- Quality management: the progress of digitisation is noted in an administrative database with slide number, CD number, information concerning the digital storing and war losses; fragments kept in the Museum of Asian Arts are replaced by black and white photographs

- File compression: compressing the tiff files to jpeg files with a high resolution (2,700 dpi with a size of about 500 KB) and  low sized jped files (1,350 dpi with a size of 100 KB or 675 dpi with a size of 50 KB)

- Presentation in the Internet: the digital Turfan fragment are presented as thumbnail images in tabular form. On the homepage of the Digital Turfan Archive the text groups are arranged by signatures


Future perspectives:

Beside the presentation of the digital images on the Internet we will link the Digital Turfan Archive with a data base to get information concerning the script, the language and the content of the fragment. The TITUS project (Thesaurus Indogermanischer Text- und Sprachmaterialien) represents an example for such an extensive data base. Every fragment is linked with information concerning transliteration, transcription, translation and the best possible interpretation of the content. VATEC (Vorislamische alttürkische Texte: Elektronisches Corpus), a data base for published pre-islamic texts or texts in progress, and MIRTEXT (Mitteliranische Texte), a data base for published Parthian and Middle Persian Manichaean texts, are two examples for a text data base which can be linked with the digital images.




for the use of manuscripts from the Berlin Turfan-Collection


1.      The user is requested most strenuously to inform the ORIENTAL DEPARTMENT in advance in writing of any planned publication, edition or reproduction of the manuscripts from the Turfan-Collection. 


Microfilms or other reproductions of these materials must not be passed on to other persons without prior permission by the BERLIN-BRANDENBURGISCHE AKADEMIE DER WISSENSCHAFTEN.


2.      It is the user's responsibility to pay regard to possibly existing copyright or other personal rights. All commercial utilization requires a special permission from the BERLIN-BRANDENBURGISCHEN AKADEMIE DER WISSENSCHAFTEN.


In any publication, the manuscripts must be referred to as being part of the


                                    Depositum der 
                                    in der 
                                    STAATSBIBLIOTHEK ZU BERLIN - Preussischer Kulturbesitz


with the exact shelfnumber added.


Should the user wish to express his appreciation of the opportunity to publish, the BERLIN-BRANDENBURGISCHE AKADEMIE DER WISSENSCHAFTEN should be mentioned.


3.      In the interest of a continuous documentation and the information of later users, both the STAATSBIBLIOTHEK (ORIENTAL DEPARTMENT) and the BERLIN-BRANDENBURGISCHE AKADEMIE DER WISSENSCHAFTEN each request one copy of any publication. Should this prove impossible, we at least require bibliographical information about the publication.



Prof. Dr.
Abdurishid Yakup


Telefon: +49 (0)30 20370 472


Berlin-Brandenburg Academy of
Science and Humanities

Jägerstraße 22/23
10117 Berlin