|
|
|
Great quotes about language and translation:
"It's a strange world of language in which skating on thin ice can get you into hot water."
-Franklin P. Jones
"Translators live off the differences between languages, all the while working toward eliminating them."
-Edmond Cary
"Learn a new language and get a new soul"
-Czech Proverb
"A different language is a different vision of life"
-Federico Fellini
"Swearing was invented as a compromise between running away and fighting"
-Peter Finley Dunne
"In certain trying circumstances, urgent circumstances, desperate circumstances, profanity furnishes a relief denied even to prayer"
-Mark Twain
Many critics, no defenders,
translators have but two regrets:
when we hit, no one remembers,
when we miss, no one forgets.
-Anonymous
"Common European thought is the fruit of the immense toil of translators. Without translators, Europe would not exist; translators are more important than members of the European Parliament."
-Milan Kundera
"I hope to finish the book before I'm 90. It keeps you alive. The secret of being a translator is not to be in a hurry. Sometimes it takes hours to find a single word."
-Dr. Leonard Rosenman
"There are few efforts more conducive to humility than that of the translator trying to communicate an incommunicable beauty. Yet, unless we do try, something unique and never surpassed will cease to exist except in the libraries of a few inquisitive book lovers."
-Edith Hamilton
"The best translators slip into the glove of a text and then turn it inside out into another language, and the whole thing comes out looking like a brand-new glove again. I'm completely in awe of this skill, since I happen to be both bilingual and a writer, but nevertheless a lousy translator."
-Alma Guillermoprieto
"Language forces us to perceive the world as man presents it to us."
-Julia Penelope
"The quantity of consonants in the English language is constant. If omitted in one place, they turn up in another. When a Bostonian "pahks" his "cah," the lost r's migrate southwest, causing a Texan to "warsh" his car and invest in "erl wells.""
-Author Unknown
"No one means all he says, and yet very few say all they mean, for words are slippery and thought is viscous."
-Henry Brooks Adams
"I personally believe we developed language because of our deep inner need to complain."
-Jane Wagner
"At no time is freedom of speech more precious than when a man hits his thumb with a hammer"
-Marshall Lumsden
"Language is the blood of the soul into which thoughts run and out of which they grow"
-Oliver Wendell Holmes
"If you can speak three languages you're trilingual. If you can speak two languages you're bilingual. If you can speak only one language you're an American"
-Author Unknown
"Dictionaries are like watches; the worst is better than none, and the best cannot be expected to go quite true."
-Samuel Johnson
"Language is the most imperfect and expensive means yet discovered for communicating thought"
-William James
"Language is the means of getting an idea from my brain into yours without surgery"
-Mark Amidon
"He who does not know foreign languages does not know anything about his own"
-Johann Wolfgang von Goethe, Kunst and Alterthum
"I would never use a long word where a short one would answer the purpose. I know there are professors in this country who 'ligate' arteries. Other surgeons only tie them, and it stops the bleeding just as well"
-Oliver Wendell Holmes
"Our language is funny - a fat chance and slim chance are the same thing"
-J. Gustav White
"Whenever ideas fail, men invent words"
-Martin H. Fischer
"Learning preserves the errors of the past, as well as its wisdom. For this reason, dictionaries are public dangers, although they are necessities"
-Alfred North Whitehead
"Every American child should grow up knowing a second language, preferably English"
-Mignon McLaughlin
"But if thought corrupts language, language can also corrupt thought"
-George Orwell
|
|
|
|
|
|
Human Driven Automated Language Translation and Linguistic Solutions
High quality automated translation is Asia Online’s flagship service. In addition to reducing the cost and time of traditional human translation services dramatically, automated translation enables customers to undertake translation of new content that was not previously possible.
Asia Online’s Translation Platform can be tuned and optimized for many different types of content including:
- Highly repetitive technical content where productivity gains with MT can dramatically exceed what is possible with just using TM alone
- Knowledge content that facilitates and enhances the global spread of critical knowledge like patents, leading scientific publications and academic research in science and technology
- High volume localization projects like service manuals, customer self service support and knowledge base content
- Content that is created to enhance and accelerate communication with global customers who prefer a self-service model
- Content that would just not get translated otherwise like call center customer feedback
- High volume, high value real-time global customer communications conducted via email and instant messaging
- Content that cannot afford human translation because of sheer volume, cost or timeliness
- High value content that is changing every hour and every day like important social and business discourse / blogs / discussions that can affect business conditions and change business strategy
- Consumer opinions on social networking sites that discuss products, companies, buyer behavior
- Content that does not need to be perfect but just basically understandable
- Various kinds of user generated content like product and user reviews and that are becoming the fastest growing content on the Internet
- SMS & MMS translation for mobile users: Asia Online’s machine translation can enable cross-cultural SMS and MMS messaging through participating mobile carriers.
Asia Online also has a comprehensive suite of tools to analyze, modify and clean and prepare linguistic assets for automated translation processes. These include:
- Alignment tools to ensure that parallel data is accurately paired and aligned
- Translation memory quality assessment and cleanup tools
- Translation memory normalization and scrubbing tools
- Data format conversion tools
- Dictionary and glossary management tools
- Optical Character Recognition – Asia Online is the first to deliver very high accuracy OCR services for a number of complex Asian language scripts.
- Word and phrase segmentation - Asia Online has software that automatically segments Asian language words and phrases, which speeds automated spell-checking and indexed text searches
Finally, Asia Online also has tools and a management environment to facilitate the gathering and processing of human feedback from internal experts and from crowdsourcing.
|
|
|
|
|
|
|
Asia Online’s flagship technology is it’s Statistical Machine Translation (SMT) platform. Years of development and research have gone into developing a core platform for translation between hundreds of language pairs. European, Middle Eastern and Asian languages are currently supported with many more in development. High quality machine translated output is possible and at scales never imaginable by humans. A single Asia Online translation server can translate more in one minute than a human can translate in an entire day.
Asia Online takes a unique approach to SMT by using only clean data. This requires more up-front work than the traditional approach that uses large volumes of dirty data in the hope that the good data will statistically rise to the top and the bad data will fall to the bottom. The clean data approach can deliver higher quality output with as little as 1/100th of the data used with dirty data. Research has shown that when a corpus contains more than 10% dirty data this has a measurable negative impact on SMT systems.
|
Our SMT sub-site provides details on what SMT is, how it compares to more traditional forms of translation technology, guides to preparing data for SMT, differences between clean and dirty data approaches and in-depth details on our SMT platform.
|
|
|
|
|
|
|
|
In order to build high quality SMT engines, alignment of sentences in paired languages is required. Asia Online has developed a unique set of tools for sentence alignment that are able to align mass volumes of data with high levels of accuracy. Support is provided for Romanized and Asian languages, including Chinese, Japanese, Korean, Hindi...
Many organizations have bilingual information available to them that could be used to help train a SMT engine. However, often the data is not in translation memory format, it is raw documents that have traditionally only had a loose relationship in terms of sentence and paragraph positions.
Asia Online’s sentence alignment technology uses text analysis, length, numeric, punctuation, synonyms, inflections and many other matching criteria to determine which sentence in a source language document corresponds to a translated sentence in a target language document.
Asia Online currently offers alignment as a service to its customers and in the near future will offer these tools online via the Language StudioTM Enterprise.
Languages currently supported include:
Asian Languages
Arabic (AR), Bahasa Malay (MS), Bahasa Indonesia (ID), Bengali (BN), Chinese (ZH), Gujarati (GU), Hindi (HI), Japanese (JA), Korean (KO), Punjabi (PA), Tagalog (TL), Tamil (TM), Thai (TH), Vietnamese (VI).
European Languages
Bulgarian (BG), Czech (CS), Danish (DA), Dutch (NL), English (EN), Estonian (ET), Finnish (FI), French (FR), German (DE), Greek (EL), Hebrew (HE), Hungarian (HU), Irish (GA), Italian (IT), Latvian (LV), Lithuanian (LT), Maltese (MT), Polish (PL), Portuguese (PT), Romanian (RO), Russian (RU), Slovak (SK), Slovene (SL), Spanish (ES) and Swedish (SV).
Text in all supported languages can be aligned with text in any other supported language. The Asia Online sentence alignment tool has been designed with flexibility and productivity as key features. New languages can be added quickly within 5-10 days. If the language you require is not listed above, talk to us about adding support for your desired language.
While it is very difficult to obtain 100% alignment accuracy due to the nature of how humans translate from one language to another, our sentence alignment engine can be tuned to specific linguistic charastics of each language. This ensures that a very high level of accuracy is achieved. Human validation of a recent project which aligned 500,000 documents determined an error margin of just 1.8%.
Asia Online is working with many of its customers on alignment projects ranging from a few hundred marketing documents to 10 million translated patents.
Contact Asia Online now for a FREE data quality and SMT data suitability assessment
Our professional linguists will analyze your data and explain what work is required to make it suitable for use as training material to produce a high quality SMT engine.
|
|
|
|
|
|
|
SOCRO (Smart OCR Optimization) is an OCR technology designed to learn and correct patterns of OCR errors and constantly improve. Created to address the problem of poor OCR capability for Asian languages, SOCRO can deliver OCR quality in trained fonts on Asian languages with accuracy rates as high as 99.6%.
As Asia Online developed SMT tools for Asian languages, it became apparent that Asian language OCR left a lot to be desired when compared to OCR of European languages. Asia Online’s own need for Thai language OCR was a key driver. When determining what tools were available, the best Thai OCR we could find achieved a result of around 70% accuracy. As Thai is an alphabet, 70% accuracy meant that nearly every word had a spelling error making the results almost useless for SMT purposes.
SOCRO was developed as a generic OCR solution that can work with Asia language scripts and characters and allows for fine tuning and adjustments. SOCRO can be adapted to almost any language. With SOCRO, document management vendors can generate Thai language OCR text with accuracy that rivals – and sometimes surpasses – the world’s best English language OCR software.
|
Language Supported:
|
|
Available Now:
|
Thai (TH)
|
|
Under Development:
|
Arabic (AR), Chinese (ZH), Hindi (HI), Japanese (JA) and Korean (KO)
|
Contact Asia Online to discuss adding SOCRO support for your language.
How It Works
SOCRO is a new font-specific OCR system from Asia Online. This means Asia Online trains a system to read each new font for a set of documents.
- To start a new font, Asia Online scans documents and generates OCR output of approximately 500,000 characters of text with flaws in their raw form.
- After a proprietary process of proofreading and statistical analysis, Asia Online’s computational linguists build a reusable Font Correction Model for the font.
- From this point on, the end user scans the documents and drops the image files into the Asia Online SOCRO web service.
- Asia Online converts the image to text and applies the Font Correction Model to create flawless editable text.
- Users receive the text and reintegrate it into their document management system.
Scanned Image Specification
- 300 DPI
- 256-grayscale
- Balanced contrast (samples available)
- Scan zones defined by our tool or absolute page XY coordinates
SOCRO-THAI
The Thai language’s complex script, combined with the lack of effective Thai spell checking software, has meant that the output from Thai OCR software has always been flawed. The high error rate of Thai OCR software has stopped Thailand’s imaging vendors and systems integrators from fully serving their customers’ large scale document conversion and management requirements.
In the absence of commercial grade quality Thai OCR software, enterprises throughout Thailand have not been able to effectively put in place many document capture, digitization, data conversion, data management, content management and workflow solutions. They are spending large sums of money on paper and labor-intensive processes, supporting very large image files on storage and network transfers and manually entering keywords and other critical data. They are also unable to use search technologies against digitized documents.
Challenges of Thai OCR
Like many Asian languages, Thai does not use spaces between words or characters. But there are many other challenges to Thai that make it one of the most difficult languages for computers to process:
|
No punctuation
No spaces between words
No end of sentence markers
|
No upper or lower case text
No tense, gender or subject
Alphabetic, not characters like Chinese or Japanese
|
The below example demonstrates the complexity of Thai as a language by applying Thai grammar rules to English.
|
Original English:
|
This night was young. I met John at my house (the one by the river, not the townhouse) and we talked and drank coffee.
|
English with
Thai Grammar Rules:
|
thisnightwasyoungimetjohnatmyhousetheonebytherivernotthe
townhouseandwetalkedanddrankcoffee
|
Another challenge is the subtle differences between many Thai characters. In the example to the right, the three characters are very similar, which makes it very difficult to get accurate OCR.
SOCRO delivers highly accurate Thai language OCR text to Thai systems integrators with no software to install and no compatibility problems. At last, Thai IT firms can offer their enterprise customers a range of services that can dramatically increase productivity and decrease the costs of operations.
Additional Tools Also Available
In addition to SOCRO, Asia Online can segment the words and phrases in the editable text – using unique and highly accurate Thai sentence and word segmentation software. For an additional fee, SOCRO will build an index list of words on each page. This can be very powerful for document search and other text processing technologies. See Asia Online’s Word Segmentation for more details.
Combined SOCRO and Thai Word Segmentation are able to deliver many new possibilities for document processing and management.
|
|
|
|
|
|
|
Many Asian languages are written without spaces. Wrong segmentation can completely distort the output and change the meaning entirely. Asia Online has developed a series of tools that determine where best to segment words to give the correct meaning and are designed to work with complex Asian scripts and symbols.
The importance of word segmentation is paramount in Asian languages such as Japanese, Korean, Chinese and Thai where word boundaries are not clear in standard text. Before words can be processed, the first task of any information processing system is to segment the initial text into a sequence of words. Accuracy is paramount. The Thai examples below show how segmenting in different places can change the entire meaning of a sentence.
Many traditional approaches use dictionaries, but this does not go far enough. The examples above are all valid using dictionary. Asia Online takes the approach of evaluating all possible segmentation paths through the data and then using n-grams (patterns of word sequences) to score each path, with the most statistically correct path selected. This approach allows for spelling errors, unknown words and mixed language text to be incorporated and handled correctly.
|
Language Supported:
|
|
Available Now:
|
Thai (TH), Chinese (ZH)
|
|
Under Development:
|
Japanese (JA) and Korean (KO)
|
|
|
|
|
|