Created: Jun 12, 2008
Updated: Sep 19, 2008
Viewed: 19 times
Page Status: active
  •  
Not Yet Rated

3. Technical Platform

Edit this Page

Return to Working Plan for Taking WiserEarth in Multiple Languages


Instruction: Discuss in the comment section and summarize here. Upon saving edits, copy the summary to the appropriate section of the main page linked above. This ensure that the main document contains the most updated info.


 

This section intends to identify all the technical needs for this endeavor

 

3.1 Data Center/Server

Currently the data center for WiserEarth is based in the US, Texas. We may decide at first to put all the sites there, and analyze the performance. it may be necessary after to set up another data center in Europe (Europe has data privacy rules that would make it difficult) or Asia. Using a combination of Amazon EC2 (to augment processing power) and Amazon Simple DB or even FreeBase to track all user-generated translations.

 

 

3.2 Technical Features

 

- Ability to translate fields and dialog boxes from WiserEarth in English into local site, or vice versa.

- Ability for users to express language preference either through browser settings / and / or user profile

- Ability for users to toggle between different language versions of the same content

- Ability for users to select text-only or expanded graphics features.

- Ability for an administrator to send to an automatic queue some articles and key profiles that need to be translated so that any users can volunter to do some translations and work on them right away (see wiki where users vote for translation into specific language to guage demand)

- Question of how the sites operate - separate instances (don't think so as that is too hard to manage updates), more probably same UI with translated fields and filter on what language content to show.

- Data sharing between all languages

- Facilities for translation teams to provide translations.

- Human-and-machine translation server to build a WiserEarth community translation memory and assist further translation effort (see discussion for more details)

- The translation would be displayed instead of the original item according to the preferences of the reader.  The type of translation (by author, by team or automatic) would be clear and a button would be available to see the item in the original or in another language.

- To develop a feature that allows organizations to create their profile withing wiserEarth platform. And allow them to have access to it, throught their own domain name. That would be a powerful step towards Wiser Commons.

 

 

3.3 Localized homepage

 

- Ability to automatically queue / rotate content on homepage based on AOFs so that there is minimal actual management required

 


3.4 Ongoing maintenance of code/debugging

 

Having a multilingual version of WiserEarth under a single platform, thus hosted on the same server, enables the maintenance process to be done by the same team. There may be the need to provide specialized features to reflect upon cultural divesity such as the need to enable a more oral-oriented participation in African nations. The development of the technical specification/requirements for the feature can be done by the local/regional community of users, but the actual coding can be done by the core team of developers to ensure the quality and consistency of the code.


Comments (1 - 8 of 8)

Login to Post a Comment.
Sm_avatar
Note: Have just copied JP's edits to the main document here and add some of my own.
Sm_avatar

@Roger: Have incorporated some of your suggestion in 4. Governance Model regarding technical features.

 

@Angus: The system I proposed seems to be working quite well for international firms who need to be present in multiple languages. Though I haven't actually have an experience in using them extensively, the proliferation of such tools in the professional translators community seems like a good indicator that it's actually useful to help with translation efforts. And it needs saying that although it will be useful in the long run to help translate content between any two language, it won't be immediately useful to do initial localization effort of translating the UI and key pages to a different language. This will have to be manual labor.

Sm_avatar

@Rehan: Amazon SimpleDB and Freebase does sound interesting. Freebase seems especially relevant to expand the reach and usefulness of WiserEarth's database.

 

"Amazon SimpleDB is a web service for running queries on structured data in real time. This service works in close conjunction with Amazon Simple Storage Service (Amazon S3) and Amazon Elastic Compute Cloud (Amazon EC2), collectively providing the ability to store, process and query data sets in the cloud. These services are designed to make web-scale computing easier and more cost-effective for developers.

Traditionally, this type of functionality has been accomplished with a clustered relational database that requires a sizable upfront investment, brings more complexity than is typically needed, and often requires a DBA to maintain and administer. In contrast, Amazon SimpleDB is easy to use and provides the core functionality of a database - real-time lookup and simple querying of structured data - without the operational complexity. Amazon SimpleDB requires no schema, automatically indexes your data and provides a simple API for storage and access. This eliminates the administrative burden of data modeling, index maintenance, and performance tuning. Developers gain access to this functionality within Amazon's proven computing environment, are able to scale instantly, and pay only for what they use."

 

 

"Freebase is an open database of the world’s information. It is built by the community and for the community—free for anyone to query, contribute to, build applications on top of, or integrate into their websites.


Already, Freebase covers millions of topics in hundreds of categories. Drawing from large open data sets like Wikipedia, MusicBrainz, and the SEC, it contains structured information on many popular topics, like movies, music, people and locations—all reconciled and freely available via an open API. This information is supplemented by the efforts of a passionate global community of users, who are working together to add structured information on everything from philosophy to European railway stations to the chemical properties of common food ingredients.


... while information in Freebase appears to be structured much like a conventional database, it’s actually built on a system that allows any user to contribute to the schemas—or frameworks—that hold the data. This wiki-like approach to structuring information lets many people organize the database without formal, centralized planning. And it lets subject experts who don’t have database expertise find one another, and then build and maintain the data in their domain of interest.

 

Here are three good reasons to contribute to Freebase:

1) You’ve got a bunch of data that you’d like to share with the world. Freebase gives you a place to do it. A related benefit: once your data is in Freebase, you or anyone else can run MQL (Metaweb Query Language) queries against it.

2) You’ve got a bunch of data that you’d like to share, and said data would benefit from the knowledge and refinement efforts of other people. Freebase gives you a place to share it and others a place to improve it.

3) You don’t have data, but you're an authority on something, and you like sharing your expertise. Freebase lets you dive into the details and improve or add to existing data."

Sm_avatar
@ Bowo: Great work on translating - need to really keep that in mind for localization. Presume a system such as the ones you describe would make it easier to do each new localization.
Sm_avatar
@ Roger: I think the idea for now is one site / one instance but the platform handles multiple languages. However, wikipedia has multiple distinct sites (i.e. one for each language) which bears remarking .....
Sm_avatar

> Currently the data center for WiserEarth is based in the US, Texas.

> We may decide at first to put all the sites there

 

This phrase, "put all the sites" suggests to me that languages will be implemented as entirely separate WiserEarth sites.  If I have interpreted correctly, then this is a basic decision that needs to be discussed.   By having separate WiserEarth sites for each language, we lose the advantage of getting everyone on the same page.  Surely there is a way to handle separate languages on the same site. 

Sm_avatar

How about having a human-and-machine translation server for all translation effort done in/for WiserEarth? This should help ease further translation effort. Below are the basic concept copied from a discussion on What does it mean to Internationalize?

 


Camilla asked:

- does the community automate the translation?

- does the community look for volunteer translators and do this manually?

 

Have done some serendipitous research in this regard, with the answer being both. The translation effort must start manually, but can be automated to a certain degree once sufficient "translation database between languages" is developed by the community.

 

Discovered that in the translator community they have softwares that aid their work in translating documents. The latest development of the software enables the translation know-how of a great number of translators be aggregated into one large software-assisted human-auto-translation engine. Much like google translation, but using the database of translations by real, professional translators!

 

WiserEarth can imagine setting up such translation software/databse and integrate it into WiserPlatform somehow. Or if technically too difficult / impossible, WE can enable the community of volunteer translators to access, add and extract out translations from the software/database to then easily copy paste into WiserEarth's wikipages/wikispaces for further editing/refinements.

 

Here's a breakdown of the concept, explained further below it:

1. Each translator can develop his/her own translation memory (TM).

2. A termbase (TB) for each area of focus can be developed together by all translators.

3. These TMs and TBs can then be integrated into the translation software/database.

4. Any new translation effort can benefit from this database, where the workload for each translation can be reduced significantly.

 

Now, a more detailed explanation of each:

 

1. Each translator can develop his/her own translation memory (TM).


From a leading software in Computer Aided Translation (CAT):


A translation memory is a linguistic database that continually captures your translations as your work for future use. All previous translations are accumulated within the translation memory (in source and target language pairs called translation units) and reused so that you never have to translate the same sentence twice. The more you build up your translation memory, the faster you can translate subsequent translations, enabling you to take on more projects and increase your revenue.

 


2. A termbase (TB) for each area of focus can be developed together by all translators.

 

Again from a leading software in Computer Aided Translation (CAT):

 

Terminology is the foundation of all communication. At its most basic level it is the study and ultimately usage of words or phrases that have a particular meaning, these words or phrases are referred to as terms. Terminology is growing in importance as terms are becoming increasingly adopted by organizations to describe a company, product, service or even a unique selling point.


A termbase is a central repository, similar to a database, which allows for the systematic management of approved terms. It provides definitions and indicates when a particular term should be used. Use of a termbase alongside your existing translation environment ensures that you produce more accurate and consistent translations.

 

3. These TMs and TBs can then be integrated into the translation software/database.

 

I found three example of this where the TM of a large number of translators is connected via the web and thus accessible to all translators:

a. Lingotek's Language Search Engine (LSE). Commercial?

b. Wordfast's Very Large Translation Memories (VLTM). Partly-commercial. The client comes at a cost, the VLTM is free.

c. Across Language Server. Partly-commercial. Personal edition is free with access to some key features of the server.

a. Lingotek's Language Search Engine (LSE). Commercial?

 

Language Search Engines (LSE) work similarly to Internet Search Engines. Rather than searching the internet, however, an LSE searches TM's (Translation Memories) to find useful segments of previously translated documents. [Image]

The Lingotek Language Search Engine is unique in the world in it's capability to conduct meaning based searches against millions of previously translated segments to find the most complete and accurate resources that closely match the actual meaning of the source document. Unlike any other tool, the Lingotek LSE finds sentence fragments, phrases, whole sentences, even complete paragraphs where terms are used in the same context, and with the same meanings as the source document.

Where other tools perform character searches seeking only exact sentence matches or rely on outdated fuzzy matching technologies, they often discard, or simply can't find the data most helpful to the translators. The Lingotek Language Search Engine leverages cutting edge, Google-like, meaning based search algorithms that instantly find, and prioritize all the data that will be most useful to the translators.

While industry leading TM tools often get bogged down, or even crash when dealing with large TM files, the Lingotek Language Search Engine operates much like Google and other Internet Search Engines, and operates faster and more accurately as the amount of searchable content increases. Thus, the translation process increases in both accuracy and speed as it is continually used.

The Lingotek LSE also supports 99% of all the worlds languages and offers targeted glossaries, selected searches, spell checkers and other features that make the translation process as quick and easy as possible for a qualified linguist.

 

b. Wordfast's Very Large Translation Memories (VLTM). Partly-commercial. The client comes at a cost, the VLTM is free.

 

The VLTM project offers translators a set of Very Large Translation Memories, accessible with Wordfast through the web. VLTM use is free and anonymous. The translator works as usual, but can leverage valuable information from a very large public TM in addition to her/his local TM. The VLTM does not replace the local TM, it complements it. All languages are supported.


«Prime content for free? Too good to be true!» This is probably what you thought when first trying Google.

  • Confidentiality You only receive translation units from the VLTM. Your translations are not recorded in the VLTM (unless you specifically set up a sharing workgroup - see below). Your client's intellectual property is safe.
  • Gratuity Connection to the VLTM is free and anonymous. We do not charge money for the translation memory we serve. We don't even send advertizing. The VLTM is a pure give-give, pro bono project.
  • Goodwill VLTM users are welcome to donate translation memories to expand the available database, or to set up Wordfast to write to the VLTM when confidentiality is not at stake. Contact us if you wish to donate a TM. Donations are final and become the property of Wordfast, whatever their final destiny is in the future.

c. Across Language Server. Partly-commercial. Personal edition is free with access to some features of the server.

 

The Across Language Server contains the entire functionality for organizing, delegating, and efficiently processing translation projects.


It serves as a central platform for all corporate language resources and translation processes. It provides a uniform workspace in which all involved internal and external actors meet - from editors and project managers to service providers and freelance translators.


With its help it is possible to recycle content, control processes and integrate corresponding systems. As a result, qualitative foreign-language content is quicker available and translation costs are significantly reduced.


Among other things, the Across Language Server comprises the following:

  • crossLAN, crossWAN, and crossWeb access modes
  • Collaboration tool crossGrid
  • Translation memory crossTank
  • Terminology system crossTerm
  • Multiple-format editor crossDesk
  • Project control utility crossProject
  • Workflow control utility crossFlow
  • Quality management utility crossCheck

plus, optionally

  • Standard interface crossConnect for crossAuthor
  • Standard interface crossConnect for content systems
  • Standard interface crossConnect for software localization
  • Open crossAPI interfaces for user-specific system integration

 

4. Any new translation effort can benefit from this database, where the workload for each translation can be reduced significantly.

 

As a result of the above, and as the database of translation for sentences and terminology expands, each new translation effort takes less time to do and gains in accuracy.

 

Concluding this section, I'll restate my previous point:

 

WiserEarth can imagine setting up such translation software/databse and integrate it into WiserPlatform somehow. Or if technically too difficult / impossible, WE can enable the community of volunteer translators to access, add and extract out translations from the software/database to then easily copy paste into WiserEarth's wikipages/wikispaces for further editing/refinements.

Sm_avatar

Amazon EC2 is highly volatile and designed to augment processing power. It should not be used as a "server". The key technical issue for translations seems to be data storage.

 

It would be interesting to see how we could use Amazon Simple DB or even FreeBase to track all user-generated translations.

1 to 8 of 8 Comments

Contributors to this Page

Add this article to Del.icio.us Add this article to Technorati Add this article to digg Add this article to FURL Add this article to blinklist Add this article to reddit Add this article to Yahoo My Web Add this article to Newsvine