ProZ.com Blog

Bad Data, Good Insights

Written by Gabriel Fairman | July 10, 2024

Translators and Executives Need to Talk.

Last week, Bureau Works published some serious data about our context-sensitive translation technology. It was a rigorous study that evaluated over 4 million translated segments. Our engineering team works closely with our data scientists to produce good data that will be helpful in advancing our industry. They are scientists through and through, dissecting the data we collect for deep and nuanced insights.

I, on the other hand, am more of a data backhoe. I roll into a mine of potential information, scoop out a rough chunk, and see what I find inside. I do this by asking questions on LinkedIn. They are opinion questions, but opinions and emotions are their own type of data. They are data on where we are as a community, and they tell us what we need to develop in the relational part of the localization business.We often think about how to develop from a business perspective, a tech perspective, or a linguistic perspective, but asking people how they feel tells us where we need to develop from a human perspective. My type of data collection isn’t as refined as my team’s work, but it is useful. It is raw, straightforward, and it points straight at the conversations we need to have.

That is exactly what happened with my last LinkedIn poll.

What was the question?

Although the question is visible above, LinkedIn’s character limits on the polls mean that context is lost in the pursuit of concision. But, this question is supercharged with context, so it is important to clarify.

Basically, since the advent of Machine Translation there have been discussions about the value of post-editing. Translators correctly say that their expertise is still required to arrive at a quality translation, and that they are tapping into this expertise to review MT output.

On the other hand, business leaders are also correct when they recognize that post-editing is often (not always) less time-consuming and cognitively intense than translation. And, that much of the work that goes into post-editing is reading and confirming that the machine translation is correct. As machine translation improves across many languages, the number of edits a linguist is required to make continues to decrease. Business leaders also recognize, as any business leader would, that the price of MTPE is often lower than the price of translation. Right or wrong, that is what the market has done.

So, the question is “What do we do?” How do we compensate a job that relies on the same expertise as a different job, yet is arguably less intense and demonstrably worth less from a market perspective?

The way I asked this question within the limited format of a LinkedIn poll (character limits, max 4 options) was to assume a per-word compensation structure for translation “from scratch”, and then ask at what percentage of “from scratch” confirming a perfect match should be worth.

Many people in the comments point out that “per word” is not the only way to compensate translation and/or post-editing. They are correct! However, it is currently the most common. Whether it is the best way moving forward is an interesting question that will likely come up in a future poll! In addition, the comments offer a lot of color on multiple different perspectives. They are certainly worth a read.

But, for now, I will move on to the answers for this specific question.

Exact Numbers

Starting off with the numbers you can see in the graph, we see that the breakdown goes like this:

Category Percent
10%-35% 11%
36%-60% 19%
61%-80% 33%
81%-100% 37%

That means that out of the 183 responses, the raw vote numbers look like this:

Category Votes
10%-35% 20
36%-60% 35
61%-80% 60
81%-100% 68

This data is interesting in itself. It shows that 70% of respondents think the compensation for confirming accurate machine translations should be over 60% of the per-word translation rate.

And, the trend across the categories is that the higher the percentage, the more votes the category receives. This is a powerful insight, but it is not necessarily surprising.

As a whole, my LinkedIn audience has more translators than industry executives. Why this matters will come into play later on.

Rough Numbers

The numbers above are where the exact insights end, but as the “owner” of the poll I was able to see the profiles of all respondents. With that info, I did a quick calculation of “who” voted for which category.


Now, when I say that these numbers are rough, I mean it. The way I collected them was by copying the LinkedIn headlines of each respondent and then querying how often the keywords at the top of the chart appeared in each category.

As you can see, the total number of keyword hits exceeds the number of respondents. It was not uncommon for a single respondent to have multiple keywords in their headline, and for other respondents to have none. I counted each keyword, and did not count respondents whose headlines did not hit keywords. Some people are double or triple-counted, while others are not counted at all.

So, as I mentioned, these numbers are very rough. However, I still think there are some valuable takeaways.

Note: I “measured” this way because I had certain hypotheses about how certain keywords would correlate with opinions on the subject. The other reason I measured like this is that individually logging every respondent's title or description would have been very time-consuming.

The Hypotheses

Here are my hypotheses and conclusions for the experiment:

  1. The keyword “translator” would overwhelmingly correlate with the highest two percentages of compensation.
  2. “Localization” would concentrate in the middle percentages, as it represents a balance of the business end and the language end of the industry.
  3. “Freelance” will concentrate around the highest percentages.
  4. “Specialist” will concentrate around the highest percentages.
  5. “Manager” will concentrate around the middle percentages.
  6. “Director” will concentrate around the lowest percentages.
  7. Machine/MT will concentrate around the lowest percentages.

Note: I queried “machine” and “MT” separately but grouped them together as “machine” always referred to machine translation.

  1. AI will concentrate around the lowest percentages.

The Results

  1. Correct
  2. Incorrect- “Localization” favored the higher percentages
  3. Correct
  4. Correct
  5. Incorrect- “Manager” favored the higher percentages
  6. Correct
  7. Incorrect- “Machine” favored the higher percentages. I was able to see that I was wrong here because I assumed the word would represent tech providers, but it mostly represented translators offering their services using “machine” or “MT” post-editing.
  8. Incorrect- Same reason as hypothesis 7.

What does this mean?

While a lot of this “data” is murky and doesn’t show much, I think two things are very clear:

  1. The majority of respondents to the survey were translators. This is important because it is important to recognize an audience bias in any study (if you can call this a study), but it doesn't take away from the truth or power of their response.
  2. Translators and localization directors have polarized perspectives on this issue. 66% of director-level respondents think that MTPE confirmations should be compensated at less than 60% of the per-word rate. On the opposite side, 75% of translators believe that it should be compensated at greater than 60%.

This second finding is the one I want to dive into because I think it is indicative of the larger conversation our industry needs to have.

Ends of the Spectrum

One thing I found interesting in the very limited “director data” is that an equal number of directors voted for 10%-35% compensation (3) as voted for the top two percentage categories combined (2 and 1).

In other words, 25% of the directors chose the lowest option and a combined 25% chose the two highest options. In contrast, 50% of the directors chose the 2nd lowest category.

On the translator side, the largest number of translators voted for the highest category but the second highest was a close second (28 votes to 26 votes).

If we look at directors as the ones ultimately setting rates (even if indirectly through budget allocation)and translators as the one ultimately accepting rates, these middle two categories are going to be where we see the most productive deals coming from.

And, if we aggregate the percentage of directors in the middle two categories (62.5%) with the percentage of translators in the middle two (56%), we see there is a lot of room to have a conversation and make a deal.

To put it plainly, 62.5% of directors are “neighbors” on the spectrum with 56% of translators. In my mind, this suggests that there are a lot of mutually beneficial deals out there to be made.

Reluctance and Bias

So, what is stopping us from making deals? Why can’t more rate-makers and rate-takers come together and agree on a number? And why is it so hard to talk about?

Real budget limitations on both sides are a huge concern, but I think we have difficulty having these conversations because of a natural human bias. We all think we work harder than others, and that our jobs are more complex, more important, and more demanding.

I can’t tell you how many times I have heard an executive complain about translators and a translator complain about executives all in the same day (or hour). If we took a few seconds to think that the person on the other side of the negotiation has an important job and works hard too, we may just be more willing to have the hard discussions that will lead to deals.

I am not saying that real issues don’t exist in the way our industry handles compensation, or that empathy and reflection will bring about some utopian kumbaya between buyers, agencies, and translators. What I am saying is that there are conversations that need to be had, and ostensibly a middle ground of demands in which they can be based. And, that lowering our level of self-importance and dismissive attitudes towards the role of others in the industry is unlikely to do anything but help us come together.