talk-data.com
People (53 results)
See all 53 →Activities & events
| Title & Speakers | Event |
|---|---|
|
Ilyas Toumlilt
– Site Reliability Engineer
@ Criteo
We will share Criteo's journey of integrating authentication and authorisation into our Kafka infrastructure, including how we incorporated OAuth and JWT authentication systems into Kafka to enhance the security of our data streams. The talk covers the obstacles we faced and the lessons learned during transforming an open Kafka infra into a safeguarded platform. |
|
|
History and evolution of Apache Kafka
2024-03-21 · 18:45
Jay Kreps
– CEO and co-founder
@ Confluent
Discussion about why Kafka, why Confluent was created, evolution of Kafka and data streaming, open source and communities. |
|
|
Kafka meetup with Jay Kreps @ Criteo !
2024-03-21 · 17:30
***IMPORTANT: Registration will be closed on this page. Please RSVP with the following link: https://www.meetup.com/paris-apache-kafka-meetup/events/299612020/ *** *** Nous avons le plaisir de vous proposer un évènement exceptionnel pour le meetup de mars ! Nous avons l'honneur d'accueillir Jay Kreps (CEO, co-fondateur de Confluent et co-createur de Kafka) pour parler de l'histoire et l'évolution de Kafka suivi d'un retour d'experience de Criteo concernant leur voyage pour sécuriser et authentifier leurs infrastructure Kafka L'événement se déroulera le Jeudi 21 Mars dans les locaux de Criteo qui sponsorise cet événement au côté de la société Confluent. Un grand merci à eux pour rendre ce meetup possible ! ⚠️ Le nombre de places étant limité, nous vous prions de bien vouloir vous inscrire à l'évènement uniquement si vous êtes sûr de pouvoir y participer! Merci. ⚠️ There are limited seats, so if you cannot attend, please change your RSVP so someone else can join. Thanks! *** 🗓 Agenda:
Please note that Jay will not be able to attend the networking part of the event but Confluent people will be here to answer any additional questions in networking area. Especially Gilles Philippart for technical one and Anissa Lallemand for any non technical one. *** 💡 Presentation 1 : Jay Kreps - CEO and co-founder of Confluent, original co-creator of Apache Kafka® Interactive Q&A with the Kafka co-creator and confluent CEO (in english) A 30 min Q/A sessions to ask questions to Jay We will start with discussion about why Kafka, why Confluent was created, evolution of Kafka and data streaming, open source and communities… and then we will switch to question from the public Bio : Jay Kreps is CEO and co-founder of Confluent. Prior to Confluent he was the lead architect for data and infrastructure at Linkedin. He is the initial developer of several open source projects, including Apache Kafka. 💡 Presentation 2 : Ilyas Toumlilt - SRE @ Criteo Safeguarding Our Kafka Kingdom: A Journey into Authentication and Authorisation We will share Criteo's journey of integrating authentication and authorisation into our Kafka infrastructure, a significant leap we took after years of operating without them. We will delve into how we successfully incorporated our Criteo's existing OAuth and JWT authentication systems into Kafka, enhancing the security of our data streams. While the subject might seem technical and complex, we promise an engaging narrative filled with both our victories and challenges. We will recount our integration and deployment war stories, the obstacles we overcame, and the lessons we learned along the way. This talk is not just about the destination but the journey, the transformation that took place as we navigated the intricate maze of transforming a “full open” Kafka infra into a safeguarded platform. Our aim is to keep this presentation fun and easy to follow, avoiding deep technical jargon. Sharing our journey with the community. Bio : Ilyas works at Criteo as Site Reliability Engineer, in the Stream-Processing Platform team. He’s interested about large scale distributed systems design and efficiency, he previously built Concordant & AntidoteDB open-source databases during his PhD. Ilyas is also curious about Linux Kernel topics, and outside of work, enjoys reading books and running. Important info 1:❗For safety reasons, the venue's staff will check everyone's identity on site. 📝Please remember to bring an ID with you and register for the event with your real name and family name. Thank you!# 2 Attendees consent to be photographed\, filmed and sound recorded as members of the audience which may be used for marketing or promotional purposes by Confluent\, Criteo and the Paris Kafka Meetup 3: Please be on time. We can’t guarantee a seat once the meetup has started |
Kafka meetup with Jay Kreps @ Criteo !
|
|
The Alooma Data Pipeline With CTO Yair Weinberger - Episode 33
2018-05-28 · 01:00
Yair Weinberger
– CTO and co-founder
@ Alooma
,
Tobias Macey
– host
Summary Building an ETL pipeline is a common need across businesses and industries. It’s easy to get one started but difficult to manage as new requirements are added and greater scalability becomes necessary. Rather than duplicating the efforts of other engineers it might be best to use a hosted service to handle the plumbing so that you can focus on the parts that actually matter for your business. In this episode CTO and co-founder of Alooma, Yair Weinberger, explains how the platform addresses the common needs of data collection, manipulation, and storage while allowing for flexible processing. He describes the motivation for starting the company, how their infrastructure is architected, and the challenges of supporting multi-tenancy and a wide variety of integrations. Preamble Hello and welcome to the Data Engineering Podcast, the show about modern data management When you’re ready to build your next pipeline you’ll need somewhere to deploy it, so check out Linode. With private networking, shared block storage, node balancers, and a 40Gbit network, all controlled by a brand new API you’ve got everything you need to run a bullet-proof data platform. Go to dataengineeringpodcast.com/linode to get a $20 credit and launch a new server in under a minute. For complete visibility into the health of your pipeline, including deployment tracking, and powerful alerting driven by machine-learning, DataDog has got you covered. With their monitoring, metrics, and log collection agent, including extensive integrations and distributed tracing, you’ll have everything you need to find and fix performance bottlenecks in no time. Go to dataengineeringpodcast.com/datadog today to start your free 14 day trial and get a sweet new T-Shirt. Go to dataengineeringpodcast.com to subscribe to the show, sign up for the newsletter, read the show notes, and get in touch. Your host is Tobias Macey and today I’m interviewing Yair Weinberger about Alooma, a company providing data pipelines as a service Interview Introduction How did you get involved in the area of data management? What is Alooma and what is the origin story? How is the Alooma platform architected? I want to go into stream VS batch here What are the most challenging components to scale? How do you manage the underlying infrastructure to support your SLA of 5 nines? What are some of the complexities introduced by processing data from multiple customers with various compliance requirements? How do you sandbox user’s processing code to avoid security exploits? What are some of the potential pitfalls for automatic schema management in the target database? Given the large number of integrations, how do you maintain the What are some challenges when creating integrations, isn’t it simply conforming with an external API? For someone getting started with Alooma what does the workflow look like? What are some of the most challenging aspects of building and maintaining Alooma? What are your plans for the future of Alooma? Contact Info LinkedIn @yairwein on Twitter Parting Question From your perspective, what is the biggest gap in the tooling or technology for data management today? Links Alooma Convert Media Data Integration ESB (Enterprise Service Bus) Tibco Mulesoft ETL (Extract, Transform, Load) Informatica Microsoft SSIS OLAP Cube S3 Azure Cloud Storage Snowflake DB Redshift BigQuery Salesforce Hubspot Zendesk Spark The Log: What every software engineer should know about real-time data’s unifying abstraction by Jay Kreps RDBMS (Relational Database Management System) SaaS (Software as a Service) Change Data Capture Kafka Storm Google Cloud PubSub Amazon Kinesis Alooma Code Engine Zookeeper Idempotence Kafka Streams Kubernetes SOC2 Jython Docker Python Javascript Ruby Scala PII (Personally Identifiable Information) GDPR (General Data Protection Regulation) Amazon EMR (Elastic Map Reduce) Sequoia Capital Lightspeed Investors Redis Aerospike Cassandra MongoDB The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA Support Data Engineering Podcast |
|
|
Data Engineering Weekly with Joe Crobak - Episode 27
2018-04-15 · 03:00
Joe Crobak
– Data Engineer
@ United States Digital Service (USDS)
,
Tobias Macey
– host
Summary The rate of change in the data engineering industry is alternately exciting and exhausting. Joe Crobak found his way into the work of data management by accident as so many of us do. After being engrossed with researching the details of distributed systems and big data management for his work he began sharing his findings with friends. This led to his creation of the Hadoop Weekly newsletter, which he recently rebranded as the Data Engineering Weekly newsletter. In this episode he discusses his experiences working as a data engineer in industry and at the USDS, his motivations and methods for creating a newsleteter, and the insights that he has gleaned from it. Preamble Hello and welcome to the Data Engineering Podcast, the show about modern data management When you’re ready to build your next pipeline you’ll need somewhere to deploy it, so check out Linode. With private networking, shared block storage, node balancers, and a 40Gbit network, all controlled by a brand new API you’ve got everything you need to run a bullet-proof data platform. Go to dataengineeringpodcast.com/linode to get a $20 credit and launch a new server in under a minute. Go to dataengineeringpodcast.com to subscribe to the show, sign up for the newsletter, read the show notes, and get in touch. Your host is Tobias Macey and today I’m interviewing Joe Crobak about his work maintaining the Data Engineering Weekly newsletter, and the challenges of keeping up with the data engineering industry. Interview Introduction How did you get involved in the area of data management? What are some of the projects that you have been involved in that were most personally fulfilling? As an engineer at the USDS working on the healthcare.gov and medicare systems, what were some of the approaches that you used to manage sensitive data? Healthcare.gov has a storied history, how did the systems for processing and managing the data get architected to handle the amount of load that it was subjected to? What was your motivation for starting a newsletter about the Hadoop space? Can you speak to your reasoning for the recent rebranding of the newsletter? How much of the content that you surface in your newsletter is found during your day-to-day work, versus explicitly searching for it? After over 5 years of following the trends in data analytics and data infrastructure what are some of the most interesting or surprising developments? What have you found to be the fundamental skills or areas of experience that have maintained relevance as new technologies in data engineering have emerged? What is your workflow for finding and curating the content that goes into your newsletter? What is your personal algorithm for filtering which articles, tools, or commentary gets added to the final newsletter? How has your experience managing the newsletter influenced your areas of focus in your work and vice-versa? What are your plans going forward? Contact Info Data Eng Weekly Email Twitter – @joecrobak Twitter – @dataengweekly Parting Question From your perspective, what is the biggest gap in the tooling or technology for data management today? Links USDS National Labs Cray Amazon EMR (Elastic Map-Reduce) Recommendation Engine Netflix Prize Hadoop Cloudera Puppet healthcare.gov Medicare Quality Payment Program HIPAA NIST National Institute of Standards and Technology PII (Personally Identifiable Information) Threat Modeling Apache JBoss Apache Web Server MarkLogic JMS (Java Message Service) Load Balancer COBOL Hadoop Weekly Data Engineering Weekly Foursquare NiFi Kubernetes Spark Flink Stream Processing DataStax RSS The Flavors of Data Science and Engineering CQRS Change Data Capture Jay Kreps The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA Support Data Engineering Podcast |
|
|
I Heart Logs
2014-09-23
Jay Kreps
– author
Why a book about logs? That’s easy: the humble log is an abstraction that lies at the heart of many systems, from NoSQL databases to cryptocurrencies. Even though most engineers don’t think much about them, this short book shows you why logs are worthy of your attention. Based on his popular blog posts, LinkedIn principal engineer Jay Kreps shows you how logs work in distributed systems, and then delivers practical applications of these concepts in a variety of common uses—data integration, enterprise architecture, real-time stream processing, data system design, and abstract computing models. Go ahead and take the plunge with logs; you’re going love them. Learn how logs are used for programmatic access in databases and distributed systems Discover solutions to the huge data integration problem when more data of more varieties meet more systems Understand why logs are at the heart of real-time stream processing Learn the role of a log in the internals of online data systems Explore how Jay Kreps applies these ideas to his own work on data infrastructure systems at LinkedIn |
O'Reilly Data Engineering Books
|