talk-data.com
People (104 results)
See all 104 →Companies (1 result)
Activities & events
| Title & Speakers | Event |
|---|---|
|
Shift-Left: A DevOps and Database Story
2025-07-23 · 18:15
Kevin Kline
– Database expert and software industry veteran
Database code often works well in a test environment, but end up causing significant performance issues when rolled out to production. In turn, these performance issues lead to lost productivity, excess resource consumption, and lost time spent troubleshooting the issue. But what if we could catch those issues during development? Join Kevin Kline to learn how SolarWind's SQL Sentry can shift performance tuning left to the dev cycle so those issues never occur on production systems in the first place. |
Shift-Left: A DevOps and Database Story
|
|
Pandas Workout
2024-06-05
Reuven M. Lerner
– author
Practice makes perfect pandas! Work out your pandas skills against dozens of real-world challenges, each carefully designed to build an intuitive knowledge of essential pandas tasks. In Pandas Workout you’ll learn how to: Clean your data for accurate analysis Work with rows and columns for retrieving and assigning data Handle indexes, including hierarchical indexes Read and write data with a number of common formats, such as CSV and JSON Process and manipulate textual data from within pandas Work with dates and times in pandas Perform aggregate calculations on selected subsets of data Produce attractive and useful visualizations that make your data come alive Pandas Workout hones your pandas skills to a professional-level through two hundred exercises, each designed to strengthen your pandas skills. You’ll test your abilities against common pandas challenges such as importing and exporting, data cleaning, visualization, and performance optimization. Each exercise utilizes a real-world scenario based on real-world data, from tracking the parking tickets in New York City, to working out which country makes the best wines. You’ll soon find your pandas skills becoming second nature—no more trips to StackOverflow for what is now a natural part of your skillset. About the Technology Python’s pandas library can massively reduce the time you spend analyzing, cleaning, exploring, and manipulating data. And the only path to pandas mastery is practice, practice, and, you guessed it, more practice. In this book, Python guru Reuven Lerner is your personal trainer and guide through over 200 exercises guaranteed to boost your pandas skills. About the Book Pandas Workout is a thoughtful collection of practice problems, challenges, and mini-projects designed to build your data analysis skills using Python and pandas. The workouts use realistic data from many sources: the New York taxi fleet, Olympic athletes, SAT scores, oil prices, and more. Each can be completed in ten minutes or less. You’ll explore pandas’ rich functionality for string and date/time handling, complex indexing, and visualization, along with practical tips for every stage of a data analysis project. What's Inside Clean data with less manual labor Retrieving and assigning data Process and manipulate text Calculations on selected data subsets About the Reader For Python programmers and data analysts. About the Author Reuven M. Lerner teaches Python and data science around the world and publishes the “Bamboo Weekly” newsletter. He is the author of Manning’s Python Workout (2020). Quotes A carefully crafted tour through the pandas library, jam-packed with wisdom that will help you become a better pandas user and a better data scientist. - Kevin Markham, Founder of Data School, Creator of pandas in 30 days Will help you apply pandas to real problems and push you to the next level. - Michael Driscoll, RFA Engineering, creator of Teach Me Python The explanations, paired with Reuven’s storytelling and personal tone, make the concepts simple. I’ll never get them wrong again! - Rodrigo Girão Serrão, Python developer and educator The definitive source! - Kiran Anantha, Amazon |
O'Reilly Data Science Books
|
|
Real AI Solutions for Accessibility Challenges
2023-09-12 · 23:30
Important note Online attendance is open to everyone. We need to collect names for in-person attendees, so please answer the question when RSVPing. ***If you do not submit your name, you won't be able to get into the building.*** Thank you! Artificial intelligence (AI) is here today and can make it easier to do the right thing when building web and mobile applications. Large language models and the coding tools built around them (Github, Amazon Code Whisperer) have made it faster than ever to create code. However, since the majority of the code used to train these models is itself inaccessible, the code completions they offer are more often than not, inaccessible. The good news for accessibility is, there are some important techniques that get around this problem. Come learn how it's easier than ever to build web and mobile experiences the right way from the get-go. There will be time for your questions. This event will be online and in-person. Presenter bio Kevin Berg is the VP of Solution Engineering at Evinced, where he is dedicated to helping leading enterprises improve and accelerate their digital accessibility efforts. Kevin trained originally as a classical musician at the University of Cincinnati and the University of Texas, where he got his doctoral degree. He spent six years at Sauce Labs, a leading functional test automation platform, where he eventually led global solutions engineering before joining Evinced in a similar role. When he is not helping Evinced customers through challenges, Kevin is home in Montana making stuff (he is a hobby machinist), playing his oboe, and enjoying family life with his wife and two kids. Accessibility The presentation will have captions [CC]. Please let us know about accessibility requirements two weeks before the event. Livestream Provided by Internet Society Accessibility SIG. Accreditation All A11yNYC meetups are pre-approved for IAAP Continuing Accessibility Education Credits (CAEC). Sponsors Thanks to Asana, Evinced, Equal Entry, Fable, and Google for sponsoring. |
Real AI Solutions for Accessibility Challenges
|
|
Build Your Python Data Processing Your Way And Run It Anywhere With Fugue
2022-02-21 · 03:00
Kevin Kho
– core contributor
@ Fugue
,
Tobias Macey
– host
Summary Python has grown to be one of the top languages used for all aspects of data, from collection and cleaning, to analysis and machine learning. Along with that growth has come an explosion of tools and engines that help power these workflows, which introduces a great deal of complexity when scaling from single machines and exploratory development to massively parallel distributed computation. In answer to that challenge the Fugue project offers an interface to automatically translate across Pandas, Spark, and Dask execution environments without having to modify your logic. In this episode core contributor Kevin Kho explains how the slight differences in the underlying engines can lead to big problems, how Fugue works to hide those differences from the developer, and how you can start using it in your own work today. Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With their managed Kubernetes platform it’s now even easier to deploy and scale your workflows, or try out the latest Helm charts from tools like Pulsar and Pachyderm. With simple pricing, fast networking, object storage, and worldwide data centers, you’ve got everything you need to run a bulletproof data platform. Go to dataengineeringpodcast.com/linode today and get a $100 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show! Atlan is a collaborative workspace for data-driven teams, like Github for engineering or Figma for design teams. By acting as a virtual hub for data assets ranging from tables and dashboards to SQL snippets & code, Atlan enables teams to create a single source of truth for all their data assets, and collaborate across the modern data stack through deep integrations with tools like Snowflake, Slack, Looker and more. Go to dataengineeringpodcast.com/atlan today and sign up for a free trial. If you’re a data engineering podcast listener, you get credits worth $3000 on an annual subscription The only thing worse than having bad data is not knowing that you have it. With Bigeye’s data observability platform, if there is an issue with your data or data pipelines you’ll know right away and can get it fixed before the business is impacted. Bigeye let’s data teams measure, improve, and communicate the quality of your data to company stakeholders. With complete API access, a user-friendly interface, and automated yet flexible alerting, you’ve got everything you need to establish and maintain trust in your data. Go to dataengineeringpodcast.com/bigeye today to sign up and start trusting your analyses. Every data project starts with collecting the information that will provide answers to your questions or inputs to your models. The web is the largest trove of information on the planet and Oxylabs helps you unlock its potential. With the Oxylabs scraper APIs you can extract data from even javascript heavy websites. Combined with their residential proxies you can be sure that you’ll have reliable and high quality data whenever you need it. Go to dataengineeringpodcast.com/oxylabs today and use code DEP25 to get your special discount on residential proxies. Your host is Tobias Macey and today I’m interviewing Kevin Kho about Fugue, a library that offers a unified interface for distributed computing that lets users execute Python, pandas, and SQL code on Spark and Dask without rewrites Interview Introduction How did you get involved in the area of data management? Can you describe what Fugue is and the story behind it? What are the core goals of the Fugue project? Who are the target users for Fugue and how does that influence the feature priorities and API design? How does Fugue compare to projects such as Modin, etc. for abst |
|
|
Data Observability Out Of The Box With Metaplane
2022-01-08 · 01:30
Kevin Hu
– Founder
@ Metaplane
,
Tobias Macey
– host
Summary Data observability is a set of technical and organizational capabilities related to understanding how your data is being processed and used so that you can proactively identify and fix errors in your workflows. In this episode Metaplane founder Kevin Hu shares his working definition of the term and explains the work that he and his team are doing to cut down on the time to adoption for this new set of practices. He discusses the factors that influenced his decision to start with the data warehouse, the potential shortcomings of that approach, and where he plans to go from there. This is a great exploration of what it means to treat your data platform as a living system and apply state of the art engineering to it. Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With their managed Kubernetes platform it’s now even easier to deploy and scale your workflows, or try out the latest Helm charts from tools like Pulsar and Pachyderm. With simple pricing, fast networking, object storage, and worldwide data centers, you’ve got everything you need to run a bulletproof data platform. Go to dataengineeringpodcast.com/linode today and get a $100 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show! Today’s episode is Sponsored by Prophecy.io – the low-code data engineering platform for the cloud. Prophecy provides an easy-to-use visual interface to design & deploy data pipelines on Apache Spark & Apache Airflow. Now all the data users can use software engineering best practices – git, tests and continuous deployment with a simple to use visual designer. How does it work? – You visually design the pipelines, and Prophecy generates clean Spark code with tests on git; then you visually schedule these pipelines on Airflow. You can observe your pipelines with built in metadata search and column level lineage. Finally, if you have existing workflows in AbInitio, Informatica or other ETL formats that you want to move to the cloud, you can import them automatically into Prophecy making them run productively on Spark. Create your free account today at dataengineeringpodcast.com/prophecy. Are you bored with writing scripts to move data into SaaS tools like Salesforce, Marketo, or Facebook Ads? Hightouch is the easiest way to sync data into the platforms that your business teams rely on. The data you’re looking for is already in your data warehouse and BI tools. Connect your warehouse to Hightouch, paste a SQL query, and use their visual mapper to specify how data should appear in your SaaS systems. No more scripts, just SQL. Supercharge your business teams with customer data using Hightouch for Reverse ETL today. Get started for free at dataengineeringpodcast.com/hightouch. Your host is Tobias Macey and today I’m interviewing Kevin Hu about Metaplane, a platform aiming to provide observability for modern data stacks, from warehouses to BI dashboards and everything in between. Interview Introduction How did you get involved in the area of data management? Can you describe what Metaplane is and the story behind it? Data observability is an area that has seen a huge amount of activity over the past couple of years. What is your working definition of that term? What are the areas of differentiation that you see across vendors in the space? Can you describe how the Metaplane platform is architected? How have the design and goals of Metaplane changed or evolved since you started working on it? establishing seasonality in data metrics blind spots from operating at the level of the data warehouse What are the most interesting, innovative, or unexpected ways that you have seen Metaplane used? What are the most interesti |
|
|
The Grand Vision And Present Reality of DataOps
2021-05-04 · 01:00
Maxime Beauchemin
– guest
,
Kevin Stumpf
– guest
@ Tecton
,
Tobias Macey
– host
,
Lior Gavish
– co-founder
@ Monte Carlo
Summary The Data industry is changing rapidly, and one of the most active areas of growth is automation of data workflows. Taking cues from the DevOps movement of the past decade data professionals are orienting around the concept of DataOps. More than just a collection of tools, there are a number of organizational and conceptual changes that a proper DataOps approach depends on. In this episode Kevin Stumpf, CTO of Tecton, Maxime Beauchemin, CEO of Preset, and Lior Gavish, CTO of Monte Carlo, discuss the grand vision and present realities of DataOps. They explain how to think about your data systems in a holistic and maintainable fashion, the security challenges that threaten to derail your efforts, and the power of using metadata as the foundation of everything that you do. If you are wondering how to get control of your data platforms and bring all of your stakeholders onto the same page then this conversation is for you. Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With their managed Kubernetes platform it’s now even easier to deploy and scale your workflows, or try out the latest Helm charts from tools like Pulsar and Pachyderm. With simple pricing, fast networking, object storage, and worldwide data centers, you’ve got everything you need to run a bulletproof data platform. Go to dataengineeringpodcast.com/linode today and get a $100 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show! Modern Data teams are dealing with a lot of complexity in their data pipelines and analytical code. Monitoring data quality, tracing incidents, and testing changes can be daunting and often takes hours to days. Datafold helps Data teams gain visibility and confidence in the quality of their analytical data through data profiling, column-level lineage and intelligent anomaly detection. Datafold also helps automate regression testing of ETL code with its Data Diff feature that instantly shows how a change in ETL or BI code affects the produced data, both on a statistical level and down to individual rows and values. Datafold integrates with all major data warehouses as well as frameworks such as Airflow & dbt and seamlessly plugs into CI workflows. Go to dataengineeringpodcast.com/datafold today to start a 30-day trial of Datafold. Once you sign up and create an alert in Datafold for your company data, they will send you a cool water flask. RudderStack’s smart customer data pipeline is warehouse-first. It builds your customer data warehouse and your identity graph on your data warehouse, with support for Snowflake, Google BigQuery, Amazon Redshift, and more. Their SDKs and plugins make event streaming easy, and their integrations with cloud applications like Salesforce and ZenDesk help you go beyond event streaming. With RudderStack you can use all of your customer data to answer more difficult questions and then send those insights to your whole customer data stack. Sign up free at dataengineeringpodcast.com/rudder today. Your host is Tobias Macey and today I’m interviewing Max Beauchemin, Lior Gavish, and Kevin Stumpf about the real world challenges of embracing DataOps practices and systems, and how to keep things secure as you scale Interview Introduction How did you get involved in the area of data management? Before we get started, can you each give your definition of what "DataOps" means to you? How does this differ from "business as usual" in the data industry? What are some of the things that DataOps isn’t (despite what marketers might say)? What are the biggest difficulties that you have faced in going from concept to production with a workflow or system intended to power self-serve access to other membe |
|
|
Kevin Stumpf
– guest
@ Tecton
,
Tobias Macey
– host
Summary As more organizations are gaining experience with data management and incorporating analytics into their decision making, their next move is to adopt machine learning. In order to make those efforts sustainable, the core capability they need is for data scientists and analysts to be able to build and deploy features in a self service manner. As a result the feature store is becoming a required piece of the data platform. To fill that need Kevin Stumpf and the team at Tecton are building an enterprise feature store as a service. In this episode he explains how his experience building the Michelanagelo platform at Uber has informed the design and architecture of Tecton, how it integrates with your existing data systems, and the elements that are required for well engineered feature store. Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With their managed Kubernetes platform it’s now even easier to deploy and scale your workflows, or try out the latest Helm charts from tools like Pulsar and Pachyderm. With simple pricing, fast networking, object storage, and worldwide data centers, you’ve got everything you need to run a bulletproof data platform. Go to dataengineeringpodcast.com/linode today and get a $60 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show! Do you want to get better at Python? Now is an excellent time to take an online course. Whether you’re just learning Python or you’re looking for deep dives on topics like APIs, memory mangement, async and await, and more, our friends at Talk Python Training have a top-notch course for you. If you’re just getting started, be sure to check out the Python for Absolute Beginners course. It’s like the first year of computer science that you never took compressed into 10 fun hours of Python coding and problem solving. Go to dataengineeringpodcast.com/talkpython today and get 10% off the course that will help you find your next level. That’s dataengineeringpodcast.com/talkpython, and don’t forget to thank them for supporting the show. You invest so much in your data infrastructure – you simply can’t afford to settle for unreliable data. Fortunately, there’s hope: in the same way that New Relic, DataDog, and other Application Performance Management solutions ensure reliable software and keep application downtime at bay, Monte Carlo solves the costly problem of broken data pipelines. Monte Carlo’s end-to-end Data Observability Platform monitors and alerts for data issues across your data warehouses, data lakes, ETL, and business intelligence. The platform uses machine learning to infer and learn your data, proactively identify data issues, assess its impact through lineage, and notify those who need to know before it impacts the business. By empowering data teams with end-to-end data reliability, Monte Carlo helps organizations save time, increase revenue, and restore trust in their data. Visit dataengineeringpodcast.com/montecarlo today to request a demo and see how Monte Carlo delivers data observability across your data infrastructure. The first 25 will receive a free, limited edition Monte Carlo hat! Your host is Tobias Macey and today I’m interviewing Kevin Stumpf about Tecton and the role that the feature store plays in a modern MLOps platform Interview Introduction How did you get involved in the area of data management? Can you start by describing what you are building at Tecton and your motivation for starting the business? For anyone who isn’t familiar with the concept, what is an example of a feature? How do you define what a feature store is? What role does a feature store play in the overall lifecycle of a machine learning p |
|
|
Machine Learning In The Enterprise
2019-02-11 · 19:00
Kevin Dewalt
– founder
@ Prolego
,
Tobias Macey
– host
Summary Machine learning is a class of technologies that promise to revolutionize business. Unfortunately, it can be difficult to identify and execute on ways that it can be used in large companies. Kevin Dewalt founded Prolego to help Fortune 500 companies build, launch, and maintain their first machine learning projects so that they can remain competitive in our landscape of constant change. In this episode he discusses why machine learning projects require a new set of capabilities, how to build a team from internal and external candidates, and how an example project progressed through each phase of maturity. This was a great conversation for anyone who wants to understand the benefits and tradeoffs of machine learning for their own projects and how to put it into practice. Introduction Hello and welcome to the Data Engineering Podcast, the show about modern data management When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out Linode. With 200Gbit private networking, scalable shared block storage, and a 40Gbit public network, you’ve got everything you need to run a fast, reliable, and bullet-proof data platform. If you need global distribution, they’ve got that covered too with world-wide datacenters including new ones in Toronto and Mumbai. Go to dataengineeringpodcast.com/linode today to get a $20 credit and launch a new server in under a minute. Go to dataengineeringpodcast.com to subscribe to the show, sign up for the mailing list, read the show notes, and get in touch. To help other people find the show please leave a review on iTunes, or Google Play Music, tell your friends and co-workers, and share it on social media. Join the community in the new Zulip chat workspace at dataengineeringpodcast.com/chat Your host is Tobias Macey and today I’m interviewing Kevin Dewalt about his experiences at Prolego, building machine learning projects for Fortune 500 companies Interview Introduction How did you get involved in the area of data management? For the benefit of software engineers and team leaders who are new to machine learning, can you briefly describe what machine learning is and why is it relevant to them? What is your primary mission at Prolego and how did you identify, execute on, and establish a presence in your particular market? How much of your sales process is spent on educating your clients about what AI or ML are and the benefits that these technologies can provide? What have you found to be the technical skills and capacity necessary for being successful in building and deploying a machine learning project? When engaging with a client, what have you found to be the most common areas of technical capacity or knowledge that are needed? Everyone talks about a talent shortage in machine learning. Can you suggest a recruiting or skills development process for companies which need to build out their data engineering practice? What challenges will teams typically encounter when creating an efficient working relationship between data scientists and data engineers? Can you briefly describe a successful project of developing a first ML model and putting it into production? What is the breakdown of how much time was spent on different activities such as data wrangling, model development, and data engineering pipeline development? When releasing to production, can you share the types of metrics that you track to ensure the health and proper functioning of the models? What does a deployable artifact for a machine learning/deep learning application look like? What basic technology stack is necessary for putting the first ML models into production? How does the build vs. buy debate break down in this space and what products do you typically recommend to your clients? What are the major risks associated with deploying ML models and how can a team mitigate them? Suppose a software engineer wants to break into ML. What data engineering skills would you suggest they learn? How should they position themselves for the right opportunity? Contact Info Email: Kevin Dewalt [email protected] and Russ Rands [email protected] Connect on LinkedIn: Kevin Dewalt and Russ Rands Twitter: @kevindewalt Parting Question From your perspective, what is the biggest gap in the tooling or technology for data management today? Links Prolego Download our book: Become an AI Company in 90 Days Google Rules Of ML AI Winter Machine Learning Supervised Learning O’Reilly Strata Conference GE Rebranding Commercials Jez Humble: Stop Hiring Devops Experts (And Start Growing Them) SQL ORM Django RoR Tensorflow PyTorch Keras Data Engineering Podcast Episode About Data Teams DevOps For Data Teams – DevOps Days Boston Presentation by Tobias Jupyter Notebook Data Engineering Podcast: Notebooks at Netflix Pandas Podcast Interview Joel Grus JupyterCon Presentation Data Science From Scratch Expensify Airflow James Meickle Interview Git Jenkins Continuous Integration Practical Deep Learning For Coders Course by Jeremy Howard Data Carpentry The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA Support Data Engineering Podcast |
|