Do not despair!
Despair is the feeling of not having any hope left. If you completely forgot to study for your final exam in math, you might feel despair when your teacher passes out the test.
This data is at the heart of the modern web: Google’s fortune, after all, is generated by a relevance pipeline built on clicks and impressions—that is, events.
impressions 展示曝光
You don’t hear much about data integration in all the breathless interest and hype around the idea of big data, but nonetheless, I believe this mundane problem of “making the data available” is one of the more valuable things an organization can focus on.
breathless interest 非常强烈的兴趣
hype “炒作”或“过度宣传” As a noun, hype means extravagant claims about a person or product. All the hype about a miraculous new kind of mop might influence you to buy one, but after trying it out you’ll realize it’s just a mop.
Mundane 的意思是“平凡的”或“普通的”
Real-time data processing—Computing derived data streams.
派生数据流 When something is derived from something else, it is made from that. Ham is derived from pork, and the active ingredient in aspirin is derived from the bark of the willow tree.
Surprisingly, at the core of these problems is the ability to have many machines playback history at their own rate in a deterministic manner.
令人惊讶的是,这些问题的核心在于能够让多台机器以各自的速度以确定性的方式回放历史记录。
I got to watch this data integration problem emerge in fast-forward as LinkedIn moved from a centralized relational database to a collection of distributed systems.
我亲眼目睹了随着 LinkedIn 从一个集中式的关系型数据库转向一组分布式系统,数据集成问题以极快的速度浮现出来。
stale data 过时数据
The consumer system need not concern itself with whether the data came from an RDBMS, a new-fangled key-value store, or was generated without a real-time query system of any kind. This seems like a minor point, but is in fact critical.
new-fangled Fangled 是一个较少使用的词,通常用在组合词中,比如 “new-fangled”,意思是“新奇的”或“新颖但未必实用的”。它常带有一点轻微的贬义,暗示某个新事物虽然新颖但可能不够可靠或过于花哨。在你提供的上下文中,new-fangled key-value store 意思是“新型的键值存储”,暗示这些新技术可能与传统方法有所不同,但未必被广泛接受或完全可靠。
mark 看到 这里了:Let’s talk a little bit about a side benefit of this architecture: it enables decoupled, event-driven systems.
It’s worth noting the obvious: without a reliable and complete data flow, a Hadoop cluster is little more than a very expensive and difficult to assemble space heater.
It’s worth noting the obvious 值得注意的是
“So began a long slog” 的意思是“从此开始了一段漫长而艰苦的努力”。“Slog” 通常用来形容一段艰难、枯燥且耗时的工作或过程。在这个句子中,它表达了在面对某个挑战(比如数据集成问题)时,必须付出长期且艰苦的努力才能取得进展。
Partitioning allows log appends to occur without co-ordination between shards and allows the throughput of the system to scale linearly with the Kafka cluster size.
“Shards” 的意思是“分片”。
So why has the traditional view of stream processing been as a niche application?
Niche 的意思是“小众的”或“针对特定群体的”。
This means ensuring the data is in a canonical form and doesn’t retain any hold-overs from the particular code that produced it or the storage system in which it may have been maintained.
“Canonical”的意思是“规范的”或“标准的”。
The US census provides a good example of batch data collection.
Census 的意思是“人口普查”或“统计调查”。