Twitter #RealTime

Twitter #RealTime

By Twitter Engineering

Date and time

Tuesday, September 20, 2016 · 6 - 9pm PDT

Location

Twitter HQ

1355 Market Street #900 San Francisco, CA 94103

Description

Twitter is all about real-time. Our platform generates billions of events and petabytes of event data every day and analyzing these events in real-time presents a massive challenge. To address that challenge, we designed and deployed an end-to-end scalable real-time stack to analyze these events the instant they arrive. Our real-time stack comprises of several components - Distributed Log, Event Bus, Heron (the next generation streaming system), Manhattan (the key value store for storing results) and NightHawk (fast in-memory cache for output results). This stack has been in production for nearly two years and is widely used across several of our teams.

Sound interesting? Join us for an evening of tech talks, food, drinks and giveaways to learn from the teams working with this technology. We will share our operating experiences and the challenges of running the real-time stack at scale with several use cases.


Agenda:

6:00 PM // Doors Open

6:00 - 6:45 PM // Food, drinks and networking

6:45 - 6:50 PM // Welcome by Karthik Ramasamy

6:50 - 7:05 PM // EventBus: Building a PubSub system on top of DistributedLog by David Rusek

7:05 - 7:20 PM // Twitter Heron in Practice by Maosong Fu

7:20 - 7:35 PM // Scaling Twitter's $2B+ Real-time Ad Infrastructure by Sandy Strong

7:35 - 7:50 PM // Crest: A General Similarity-based Clustering System by Yimin Tan

7:50 - 9:00 PM // Dessert, drinks and networking

9:00 PM // Event End


Presenters and topics:

dave rusek.jpg

EventBus: Building a PubSub system on top of DistributedLog by Dave Rusek

EventBus is a pubsub messaging system built on top of our recently open sourced replicated log, DistributedLog. On top of DistributedLog, EventBus presents a unified partitioned stream model by adding a client library, self-service provisioning interface, and offset tracking. By layering our pubsub system on top of DistributedLog, we are able to take advantage of the operational efficiency and delivery semantics already inherent in the base system while providing a new simple, but powerful use case for our customers.

Dave Rusek is a senior software engineer on the Messaging team at Twitter. He builds and maintains streaming systems which push trillions of events and petabytes worth of data everyday to our customers @Twitter.

Follow him @davidjrusek

---------------------------------------------------------------------------------------------------------------------------------

Twitter Heron in Practice by Maosong Fu

Twitter generates millions of events per second and analyzing them in real-time is a huge challenge. Twitter designed and deployed a new streaming system called Heron. Heron has been in production since mid-2014 and is widely used for diverse use cases across several teams. In this talk, I will describe the demand of real-time analyzing with use cases and share our operating experiences running Heron at scale.

Maosong Fu is a senior software engineer and a technical lead for Heron at Twitter. He has worked on various components of Heron including Heron instance, metrics manager and scheduler. He is the author of publications in the distributed area and has a Master's degree from Carnegie Mellon University and Bachelor’s degree from Huazhong University of Science and Technology.

Follow him @Louis_Fumaosong

---------------------------------------------------------------------------------------------------------------------------

Scaling Twitter's $2B+ Real-time Ad Infrastructure by Sandy Strong

Twitter’s 100K+ advertisers and 300 million+ users generate tens of billions of interactions with promoted content every day. Analyzing all of these events in real-time — in a robust enough way to incorporate into our ad products — presents a massive scaling challenge. In this talk, I will provide a deep dive into the core real-time infrastructure that powers one of the world’s fastest growing businesses to hit $2B a year in revenue. I will discuss how millions of users, use our iconic service every single day, often in an unpredictable manner, and how advertisers seize these opportunities and react quickly to reach their target audience in real-time resulting in demand surges in the marketplace. Through the talk, I will describe the challenges we faced in building the real-time infrastructure powering all of our serving, analytics, prediction, billing, and modeling pipelines to meet advertiser ROI, while providing the best user experience possible.

Sandy Strong is an embedded senior site reliability engineer embedded on the Ads Serving team at Twitter. Her and her team build, maintain, and operate Twitter’s Ad Server, Twitter's revenue engine, that performs ad matching, scoring, and serving at an immense scale.

Follow her @st5are

---------------------------------------------------------------------------------------------------------------------------------

Crest: A General Similarity-based Clustering System by Yimin Tan

The popularity of Twitter has been attracting large amount of spammy contents and spammy engagements. We design “Crest", a general similarity-based clustering system to detect groups of similar spam account, and near-duplicated contents in spam campaigns. Crest is built on top of Heron, which performs data clustering and aggregating tasks in real-time.

Yimin Tan works as a senior software engineer on the Product Safety team building a real-time detection system, fighting spam and fake or compromised accounts to help make Twitter a safe place.

Follow him @YiminTan_Kevin

---------------------------------------------------------------------------------------------------------------------------------

Karthik Ramasamy is the engineering manager and technical lead for Real-Time Compute at Twitter. He is the co-creator of Heron and has more than two decades of experience working in parallel databases, big data infrastructure and networking. He cofounded Locomatix, a company that specializes in real-time streaming processing on Hadoop and Cassandra using SQL that was acquired by Twitter. Karthik also spent time at Juniper Networks where he designed and delivered platforms, protocols, databases and high availability solutions for network routers that are widely deployed in the Internet. At University of Wisconsin, he worked extensively in parallel database systems, query processing, scale out technologies, storage engines and online analytical systems. Several of these research areas were spun as a company later acquired by Teradata. He is the author of several publications, patents and one of the best selling book “Network Routing: Algorithms, Protocols and Architectures.” He has a Ph.D. in Computer Science from UW Madison with a focus on data management.

Follow him @karthikz

Organized by

Sales Ended