Is Unicode page.) Converting into Unicode gives us the potential toĬreate a transliteration table that could cover multiple languages and Unicode, a coding set that handles not just Cyrillic but characters and For the sake of learning a standard method, what we want is ‘windows-1251′ is specifically for Cyrillic and will not handle all So hard to work with the ‘windows-1251′ encoding. Variable with the value text/html charset=windows-1251. You can visit the webpage and view the Page Source and seeįor yourself that the first line does in fact contain a ‘content-type’ The ‘content-type’ is telling us that the file stored at the url weĪccessed is in HTML and that its encoding (after ‘charset=’, meaningĬharacter set) is ‘windows-1251′, a common encoding for CyrillicĬharacters. Our purposes, what is important is that the encoding is stored under the This method returns something a lot like a Pythonĭictionary with information that is important to web programmers. One is that we can access the headers by calling In Python, opening a web pageĭoes not just give you the HTML, but it creates an object with several Makes this easy because almost all web pages specify what kind ofĮncoding they use, in the page’s headers. Tell it how to read these letters using a codec, a library of codes thatĪllows Python to represent non-ASCII characters. That does not use Latin characters? Python can do this but we need to But what if we want to open a page in a language With Web Pages.” There we learned how to open and copy the HTML fromĪ web page in Python. We need to start by modifying the process found in the lesson “ Working The database, found here, but using the lesson on “ Automatedĭownloading with Wget,” it would be possible to go through the entireĭatabase as fast as your computer would allow. This lesson will use just the first page of However,īecause the database has many, many names, it lends itself nicely toĪutomated transliteration. It is an important resource on a dark topic. Some three million entries of people who were arrested or executed by On the Memorial website researchers can find a database with NGO conducts research on a range of civil rights abuses in Russia, butĬollecting data about the victims of Stalinism remains one of its mainįunctions. Banding together, theyįounded Memorial to collect and publicize their findings. Subjects, such as repression under Stalin. Soviet Union gained the ability to conduct research on previously taboo During Glasnost professional and amateur historians in the We will use is from the site of the Russian human rights organization The goal of this lesson is to take a list of names from a Russianĭatabase and convert them from Cyrillic into ASCII characters. Unicode, character translation and using the parser Beautiful Soup in However, this lesson will also allow practice with Researchers dealing with large databases of names can benefitĬonsiderably. (All tables currently available can be accessed here.) Library Association-Library of Congress ( ALA-LC) transliteration Where the convention is to use a simplified version of the American This lesson will be particularly useful for research in fields that useĪ standardized transliteration format, such as Russian history field, Finally, transliteration can be more practical forĪuthors who can type more fluently with Latin letters than in the nativeĪlphabet of a language that does not use Latin characters. Transliterated carefully so that readers can find and verify evidence Moreover, citations from scholarly works need to be In manyįields, the publishing convention is to transliterate any evidence used Leon Trotsky’s Russian name would be transliterated in a standardizedīut transliteration has other uses too, especially for scholars. (Exceptions are when someone’s name is translated in a non-uniform way. Standardized transliterated name is often the same as a translated name. Transliteration is especially useful with names, because a Владимир Путин but know that Vladimir Putin is Russia’s current Many English speakers would have trouble recognizing the name Transliteration is something that most people do every day, knowingly or What Is Transliteration and for Whom Is It Useful? Technique can be reproduced with other alphabets using Unicode. Although the example uses Cyrillic characters, the Russian organization Memorial from Cyrillic into LatinĬharacters. Transliteration dictionary to convert the names from a database of the Pages,” “ From HTML to List of Words (part 1)” and “ Intro toīeautiful Soup.” At the end of the lesson, we will use the Of Python from the lessons “ Viewing HTML Files,” “ Working with Web Standardized format using the American Standard Code for Information List of words from a language with a non-Latin alphabet to a This lesson shows how to use Python to transliterate automatically a What Is Transliteration and for Whom Is It Useful?.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |