ClickHouse® is famous for real-time response, cost-efficiency, and flexible Apache 2.0 licensing. In the first half of this talk, we’ll show how ClickHouse® implements sub-second analytics with columnar storage, vectorized query, and compression on datasets running to hundreds of terabytes. We’ll then pivot and discuss how Altinity is adapting open source ClickHouse® to deliver economical, real-time response over 100 Petabytes of data or more.
talk-data.com
Speaker
Robert Hodges
2
talks
Robert Hodges is the CEO of Altinity.
Bio from: Real-Time Analytics and AI at Scale Meetup
Filter by Event / Source
Talks & appearances
2 activities · Newest first
Summary The market for data warehouse platforms is large and varied, with options for every use case. ClickHouse is an open source, column-oriented database engine built for interactive analytics with linear scalability. In this episode Robert Hodges and Alexander Zaitsev explain how it is architected to provide these features, the various unique capabilities that it provides, and how to run it in production. It was interesting to learn about some of the custom data types and performance optimizations that are included.
Announcements
Hello and welcome to the Data Engineering Podcast, the show about modern data management When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With 200Gbit private networking, scalable shared block storage, and a 40Gbit public network, you’ve got everything you need to run a fast, reliable, and bullet-proof data platform. If you need global distribution, they’ve got that covered too with world-wide datacenters including new ones in Toronto and Mumbai. And for your machine learning workloads, they just announced dedicated CPU instances. Go to dataengineeringpodcast.com/linode today to get a $20 credit and launch a new server in under a minute. And don’t forget to thank them for their continued support of this show! Integrating data across the enterprise has been around for decades – so have the techniques to do it. But, a new way of integrating data and improving streams has evolved. By integrating each silo independently – data is able to integrate without any direct relation. At CluedIn they call it “eventual connectivity”. If you want to learn more on how to deliver fast access to your data across the enterprise leveraging this new method, and the technologies that make it possible, get a demo or presentation of the CluedIn Data Hub by visiting dataengineeringpodcast.com/cluedin. And don’t forget to thank them for supporting the show! You listen to this show to learn and stay up to date with what’s happening in databases, streaming platforms, big data, and everything else you need to know about modern data management.For even more opportunities to meet, listen, and learn from your peers you don’t want to miss out on this year’s conference season. We have partnered with organizations such as O’Reilly Media, Dataversity, and the Open Data Science Conference. Coming up this fall is the combined events of Graphorum and the Data Architecture Summit. The agendas have been announced and super early bird registration for up to $300 off is available until July 26th, with early bird pricing for up to $200 off through August 30th. Use the code BNLLC to get an additional 10% off any pass when you register. Go to dataengineeringpodcast.com/conferences to learn more and take advantage of our partner discounts when you register. Go to dataengineeringpodcast.com to subscribe to the show, sign up for the mailing list, read the show notes, and get in touch. To help other people find the show please leave a review on iTunes and tell your friends and co-workers Join the community in the new Zulip chat workspace at dataengineeringpodcast.com/chat Your host is Tobias Macey and today I’m interviewing Robert Hodges and Alexander Zaitsev about Clickhouse, an open source, column-oriented database for fast and scalable OLAP queries
Interview
Introduction How did you get involved in the area of data management? Can you start by explaining what Clickhouse is and how you each got involved with it?
What are the primary use cases that Clickhouse is targeting? Where does it fit in the database market and how does it compare to other column stores, both open source and commercial?
Can you describe how Clickhouse is architected? Can you talk through the lifecycle of a given record or set of records from when they first get inserted into Clickhouse, through the engine an