Since the dawn of analytics we have strived to improve both the quality and volume of our data, with no other ambition to ensure the largest possible dataset – not because we need it, but because we might need it. GDPR have temporarily put a wrench in our original approach, but it takes more that the law to keep a good analyst away from his data and with Machine Learning as an active part of the toolbox the value of data have grown exponentially. The sessions in a reflection on how we have done things so far and where we might end if you don’t stop doing business as usual and instead calibrate our efforts in a more strategic and ethical direction.
talk-data.com
Topic
Analytics
4552
tagged
Activity Trend
Top Events
The Google Analytics Suite of products is now part of the Google Marketing Platform. We will cover how key pieces of the Platform can be used including the Salesforce connectors, Display & Video 360, Google Optimize integration, and Google Cloud integrations. We will review how data can be used actionably for advertising, e-mail, personalization, and surveys.
This thing called agile development starting popping up more and more around me. Last year, I decided to investigate it: talking to scrum masters, discussing implementation issues with fellow analysts, and reading about the lean methodology. After that, something clicked into place. In this talk, I’ll share my ideas on and experiences in applying lean to my analytics team.
Zorin's personal experiences gathered with small family businesses to large enterprises in terms of expectation management. If we look at Analytics through ages, how far it has gone, the massive tooling around it and the widespread ability to test, predict and sustainably create mistakes or better say learn one would think the expectations should be easy to manage. One would be somewhat wrong or?
These days there is a strong case for Machine Learning in Analytics and everywhere you turn you are presented with case and examples on how ML have made the analysis better, faster and more consistent. But for a man with a new hammer everything looks like a nail and sometimes we forget to draw a line when machine learning is the right tool and when an analyst actually is required to do the job right. Join this session to dive into some big thoughts on how to distinguish what is needed when.
Mobile Apps are not the same as web, so why have we been measuring them as such? With the old GA Services SDK turning down for some users, it's time to look into how to use Google Analytics for Firebase to measure and action on your mobile app's data. Krista will walk you through the benefits and power of the tool, explain the differences in data model and implementation best practices, and tips for how to migrate.
We'll look ahead into the rapid changes hitting Analytics as an industry and all of us as people needing to protect our digital profiles and review where we've come from, what needs to change, and where we headed.
Frustrated with the present, Peter jumps ahead to describe the future for Digital Analytics within organisations. He will describe the Digital Analytics solution for the future across the people, technology and processes requirements. This won’t just be the dream set-up for large organisations with big budgets (that’s easy) but instead how Digital Analytics can be made to be useful, in a practical sense, for organisations of any size.
If someone asked you what first sparked your interest in web analytics, you'd probably say something like "solving problems using data". Most of us talk about finding insight in numbers for the benefit of whichever company is picking up the check and putting food on our tables, but how often does our work go beyond problems like "tag this" and "troubleshoot that", and really translate into solving a tangible, true, big problem for an organisation?
"The vast majority of commercial web data we analyse, even as professionals, is poor quality." A large part of my job involves auditing Google Analytics setups in order to establish the quality of the data collected. This story brings together some of the extraordinary findings of my work. Its a study of 75 enterprise websites using Google Analytics. The results are somewhat surprising (and depressing) in that they show the general poor quality of data that organisations are working with. For example: the Average Quality Index score is only 35.7 out of 100, and one in five websites have a PII issue i.e. were collecting personal information into Google Analytics.
Plug-and-play analytics doesn't work - we should know that by now. But even an amazing, super-charged analytics pipeline will fail in an organization that lacks the maturity to maintain it.
Discover the ultimate guide to Tableau 2019.x that offers over 115 practical recipes to tackle business intelligence and data analysis challenges. This book takes you from the basics to advanced techniques, empowering you to create insightful dashboards, leverage powerful analytics, and seamlessly integrate with modern cloud data platforms. What this Book will help me do Master both basic and advanced functionalities of Tableau Desktop to effectively analyze and visualize data. Understand how to create impactful dashboards and compelling data stories for drive decision-making. Deploy advanced analytical tools including R-based forecasting and statistical techniques with Tableau. Set up and utilize Tableau Server in multi-node environments on Linux and Windows. Utilize Tableau Prep to efficiently clean, shape, and transform data for seamless integration into Tableau workflows. Author(s) The authors of the Tableau 2019.x Cookbook are recognized industry professionals with rich expertise in business intelligence, data analytics, and Tableau's ecosystem. Dmitry Anoshin and his co-authors bring hands-on experience from various industries to provide actionable insights. They focus on delivering practical solutions through structured learning paths. Who is it for? This book is tailored for data analysts, BI developers, and professionals equipped with some knowledge of Tableau wanting to enhance their skills. If you're aiming to solve complex analytics challenges or want to fully utilize the capabilities of Tableau products, this book offers the guidance and knowledge you need.
Send us a text Adam Weinstein is currently CEO and Co-Founder of Cursor, having worked at LinkedIn as a Senior Manager of Business Development and having founded enGreet, a print-on-demand greeting card company that merged crowd-sourcing with social expressions. In this episode, he describes his data analytics company and provides insight into creating a successful startup.
Shownotes
00:00 - Check us out on YouTube and SoundCloud!
00:10 - Connect with Producer Steve Moore on LinkedIn & Twitter
00:15 - Connect with Producer Liam Seston on LinkedIn & Twitter.
00:20 - Connect with Producer Rachit Sharma on LinkedIn.
00:25 - Connect with Host Al Martin on LinkedIn & Twitter.
00:55 - Connect with Adam Weinstein on LinkedIn.
03:55 - Find out more about Cursor.
06:45 - Learn more about Cursor's Co-Founder and CEO Adam Weinstein.
13:10 - Learn more about Big Data Analytics.
19:20 - What is Python/Jupyter Notebooks?
26:35 - Learn more about Data Fluency.
35:30 - What is a startup?
Want to be featured as a guest on Making Data Simple? Reach out to us at [email protected] and tell us why you should be next. The Making Data Simple Podcast is hosted by Al Martin, WW VP Technical Sales, IBM, where we explore trending technologies, business innovation, and leadership ... while keeping it simple & fun.
Summary Building internal expertise around big data in a large organization is a major competitive advantage. However, it can be a difficult process due to compliance needs and the need to scale globally on day one. In this episode Jesper Søgaard and Keld Antonsen share the story of starting and growing the big data group at LEGO. They discuss the challenges of being at global scale from the start, hiring and training talented engineers, prototyping and deploying new systems in the cloud, and what they have learned in the process. This is a useful conversation for engineers, managers, and leadership who are interested in building enterprise big data systems.
Preamble
Hello and welcome to the Data Engineering Podcast, the show about modern data management When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out Linode. With 200Gbit private networking, scalable shared block storage, and a 40Gbit public network, you’ve got everything you need to run a fast, reliable, and bullet-proof data platform. If you need global distribution, they’ve got that covered too with world-wide datacenters including new ones in Toronto and Mumbai. Go to dataengineeringpodcast.com/linode today to get a $20 credit and launch a new server in under a minute. Go to dataengineeringpodcast.com to subscribe to the show, sign up for the mailing list, read the show notes, and get in touch. To help other people find the show please leave a review on iTunes, or Google Play Music, tell your friends and co-workers, and share it on social media. Join the community in the new Zulip chat workspace at dataengineeringpodcast.com/chat Your host is Tobias Macey and today I’m interviewing Keld Antonsen and Jesper Soegaard about the data infrastructure and analytics that powers LEGO
Interview
Introduction How did you get involved in the area of data management? My understanding is that the big data group at LEGO is a fairly recent development. Can you share the story of how it got started?
What kinds of data practices were in place prior to starting a dedicated group for managing the organization’s data? What was the transition process like, migrating data silos into a uniformly managed platform?
What are the biggest data challenges that you face at LEGO? What are some of the most critical sources and types of data that you are managing? What are the main components of the data infrastructure that you have built to support the organizations analytical needs?
What are some of the technologies that you have found to be most useful? Which have been the most problematic?
What does the team structure look like for the data services at LEGO?
Does that reflect in the types/numbers of systems that you support?
What types of testing, monitoring, and metrics do you use to ensure the health of the systems you support? What have been some of the most interesting, challenging, or useful lessons that you have learned while building and maintaining the data platforms at LEGO? How have the data systems at Lego evolved over recent years as new technologies and techniques have been developed? How does the global nature of the LEGO business influence the design strategies and technology choices for your platform? What are you most excited for in the coming year?
Contact Info
Jesper
Keld
Parting Question
From your perspective, what is the biggest gap in the tooling or technology for data management today?
Links
LEGO Group ERP (Enterprise Resource Planning) Predictive Analytics Prescriptive Analytics Hadoop Center Of Excellence Continuous Integration Spark
Podcast Episode
Apache NiFi
Podcast Episode
The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA
Support Data Engineering Podcast
The road to AI adoption is far more complex than one can imagine. Building data science models and testing them is only one piece of the puzzle. To understand the roadblocks and best practices, Wayne Eckerson invited Nir Kaldero in our latest episode to learn why organizations need to start paying more attention to people, culture and processes to make data science projects a success and how democratization skills pays off in the long run.
Nir Kaldero is the Head of Data Science, Vice President at Galvanize Inc. and the creator of the GalvanizeU Master’s of Science in Data Science program. A tireless advocate for transforming education and reshaping the field of data science, his vision and mission is to make an impact on a wide variety of communities through education, science, and technology. In addition to his work at some of the world’s largest international corporations, Kaldero serves as a Google expert/mentor and has been named an IBM Analytics Champion 2017 & 2018, a prestigious honor given to leaders in the field of science, technology, engineering, and math (STEM).
WHERE were you the first time you listened to this podcast? Did you feel like you were JOINing a SELECT GROUP BY doing so? Can you COUNT the times you've thought to yourself, "Wow. These guys are sometimes really unFILTERed?" On this episode, Pawel Kapuscinski from Analytics Pros (and the Burnley Football Club) sits down with the group to shout at them in all caps. Or, at least, to talk about SQL: where it fits in the analyst's toolbox, how it is a powerful and necessary complement to Python and R, and who's to blame for the existence of so many different flavors of the language. Give it a listen. That's an ORDER (BY?)! For complete show notes, including links to items mentioned in this episode and a transcript of the show, visit the show page.
Summary
The past year has been an active one for the timeseries market. New products have been launched, more businesses have moved to streaming analytics, and the team at Timescale has been keeping busy. In this episode the TimescaleDB CEO Ajay Kulkarni and CTO Michael Freedman stop by to talk about their 1.0 release, how the use cases for timeseries data have proliferated, and how they are continuing to simplify the task of processing your time oriented events.
Introduction
Hello and welcome to the Data Engineering Podcast, the show about modern data management When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out Linode. With 200Gbit private networking, scalable shared block storage, and a 40Gbit public network, you’ve got everything you need to run a fast, reliable, and bullet-proof data platform. If you need global distribution, they’ve got that covered too with world-wide datacenters including new ones in Toronto and Mumbai. Go to dataengineeringpodcast.com/linode today to get a $20 credit and launch a new server in under a minute. Go to dataengineeringpodcast.com to subscribe to the show, sign up for the mailing list, read the show notes, and get in touch. To help other people find the show please leave a review on iTunes, or Google Play Music, tell your friends and co-workers, and share it on social media. Join the community in the new Zulip chat workspace at dataengineeringpodcast.com/chat Your host is Tobias Macey and today I’m welcoming Ajay Kulkarni and Mike Freedman back to talk about how TimescaleDB has grown and changed over the past year
Interview
Introduction How did you get involved in the area of data management? Can you refresh our memory about what TimescaleDB is? How has the market for timeseries databases changed since we last spoke? What has changed in the focus and features of the TimescaleDB project and company? Toward the end of 2018 you launched the 1.0 release of Timescale. What were your criteria for establishing that milestone?
What were the most challenging aspects of reaching that goal?
In terms of timeseries workloads, what are some of the factors that differ across varying use cases?
How do those differences impact the ways in which Timescale is used by the end user, and built by your team?
What are some of the initial assumptions that you made while first launching Timescale that have held true, and which have been disproven? How have the improvements and new features in the recent releases of PostgreSQL impacted the Timescale product?
Have you been able to leverage some of the native improvements to simplify your implementation? Are there any use cases for Timescale that would have been previously impractical in vanilla Postgres that would now be reasonable without the help of Timescale?
What is in store for the future of the Timescale product and organization?
Contact Info
Ajay
@acoustik on Twitter LinkedIn
Mike
LinkedIn Website @michaelfreedman on Twitter
Timescale
Website Documentation Careers timescaledb on GitHub @timescaledb on Twitter
Parting Question
From your perspective, what is the biggest gap in the tooling or technology for data management today?
Links
TimescaleDB Original Appearance on the Data Engineering Podcast 1.0 Release Blog Post PostgreSQL
Podcast Interview
RDS DB-Engines MongoDB IOT (Internet Of Things) AWS Timestream Kafka Pulsar
Podcast Episode
Spark
Podcast Episode
Flink
Podcast Episode
Hadoop DevOps PipelineDB
Podcast Interview
Grafana Tableau Prometheus OLTP (Online Transaction Processing) Oracle DB Data Lake
The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA Support Data Engineering Podcast
Send us a text Seth Dobrin is back to kick off season 3 and reflect on data and tech in 2018. Seth Dobrin, vice president and Chief Data Officer of IBM Analytics, gives insight to leading the data science elite team, and he details the steps and strategies required to be successful in the field. Host Al Martin and Seth also make some data science predictions for 2019, letting you know what you should be looking out for in the year ahead.
Shownotes: 00:00 - Check us out on YouTube and SoundCloud. 00:10 - Connect with Producer Steve Moore on LinkedIn and Twitter. 00:15 - Connect with Producer Liam Seston on LinkedIn and Twitter. 00:20 - Connect with Producer Rachit Sharma on LinkedIn. 00:25 - Connect with Host Al Martin on LinkedIn and Twitter. 00:55 – Connect with Seth Dobrin on LinkedIn and Twitter. 02:00 – Seth Dobrin’s first podcast from January 2018. 03:30 - What is data science? 04:25 - Seth Dobrin’s Blog: Don’t let data science become a scam. 10:55 - IBM Data Science Elite Team: Kickstart, build andaccelerate 31:55 - What is AI? 37:58 - What are data pipelines? 41:55 - What is Blockchain? Want to be featured as a guest on Making Data Simple? Reach out to us at [email protected] and tell us why you should be next. The Making Data Simple Podcast is hosted by Al Martin, WW VP Technical Sales, IBM, where we explore trending technologies, business innovation, and leadership ... while keeping it simple & fun.
Summary
The Hadoop platform is purpose built for processing large, slow moving data in long-running batch jobs. As the ecosystem around it has grown, so has the need for fast data analytics on fast moving data. To fill this need the Kudu project was created with a column oriented table format that was tuned for high volumes of writes and rapid query execution across those tables. For a perfect pairing, they made it easy to connect to the Impala SQL engine. In this episode Brock Noland and Jordan Birdsell from PhData explain how Kudu is architected, how it compares to other storage systems in the Hadoop orbit, and how to start integrating it into you analytics pipeline.
Preamble
Hello and welcome to the Data Engineering Podcast, the show about modern data management When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out Linode. With 200Gbit private networking, scalable shared block storage, and a 40Gbit public network, you’ve got everything you need to run a fast, reliable, and bullet-proof data platform. If you need global distribution, they’ve got that covered too with world-wide datacenters including new ones in Toronto and Mumbai. Go to dataengineeringpodcast.com/linode today to get a $20 credit and launch a new server in under a minute. Go to dataengineeringpodcast.com to subscribe to the show, sign up for the mailing list, read the show notes, and get in touch. To help other people find the show please leave a review on iTunes, or Google Play Music, tell your friends and co-workers, and share it on social media. Join the community in the new Zulip chat workspace at dataengineeringpodcast.com/chat Your host is Tobias Macey and today I’m interviewing Brock Noland and Jordan Birdsell about Apache Kudu and how it is able to provide fast analytics on fast data in the Hadoop ecosystem
Interview
Introduction How did you get involved in the area of data management? Can you start by explaining what Kudu is and the motivation for building it?
How does it fit into the Hadoop ecosystem? How does it compare to the work being done on the Iceberg table format?
What are some of the common application and system design patterns that Kudu supports? How is Kudu architected and how has it evolved over the life of the project? There are many projects in and around the Hadoop ecosystem that rely on Zookeeper as a building block for consensus. What was the reasoning for using Raft in Kudu? How does the storage layer in Kudu differ from what would be found in systems like Hive or HBase?
What are the implementation details in the Kudu storage interface that have had the greatest impact on its overall speed and performance?
A number of the projects built for large scale data processing were not initially built with a focus on operational simplicity. What are the features of Kudu that simplify deployment and management of production infrastructure? What was the motivation for using C++ as the language target for Kudu?
If you were to start the project over today what would you do differently?
What are some situations where you would advise against using Kudu? What have you found to be the most interesting/unexpected/challenging lessons learned in the process of building and maintaining Kudu? What are you most excited about for the future of Kudu?
Contact Info
Brock
LinkedIn @brocknoland on Twitter
Jordan
LinkedIn @jordanbirdsell jbirdsell on GitHub
PhData
Website phdata on GitHub @phdatainc on Twitter
Parting Question
From your perspective, what is the biggest gap in the tooling or technology for data management today?
Links
Kudu PhData Getting Started with Apache Kudu Thomson Reuters Hadoop Oracle Exadata Slowly Changing Dimensions HDFS S3 Azure Blob Storage State Farm Stanly Black & Decker ETL (Extract, Transform, Load) Parquet
Podcast Episode
ORC HBase Spark
Podcast Episode
Take a deep dive into the many uses of dynamic SQL in Microsoft SQL Server. This edition has been updated to use the newest features in SQL Server 2016 and SQL Server 2017 as well as incorporating the changing landscape of analytics and database administration. Code examples have been updated with new system objects and functions to improve efficiency and maintainability. Executing dynamic SQL is key to large-scale searching based on user-entered criteria. Dynamic SQL can generate lists of values and even code with minimal impact on performance. Dynamic SQL enables dynamic pivoting of data for business intelligence solutions as well as customizing of database objects. Yet dynamic SQL is feared by many due to concerns over SQL injection or code maintainability. Dynamic SQL: Applications, Performance, and Security in Microsoft SQL Server helps you bring the productivity and user-satisfaction of flexible and responsive applications to your organization safely and securely. Your organization’s increased ability to respond to rapidly changing business scenarios will build competitive advantage in an increasingly crowded and competitive global marketplace. With a focus on new applications and modern database architecture, this edition illustrates that dynamic SQL continues to evolve and be a valuable tool for administration, performance optimization, and analytics. What You'ill Learn Build flexible applications that respond to changing business needs Take advantage of creative, innovative, and productive uses of dynamic SQL Know about SQL injection and be confident in your defenses against it Address performance concerns in stored procedures and dynamic SQL Troubleshoot and debug dynamic SQL to ensure correct results Automate your administration of features within SQL Server Who This Book is For Developers and database administrators looking to hone and build their T-SQL coding skills. The book is ideal for developers wanting to plumb the depths of application flexibility and troubleshoot performance issues involving dynamic SQL. The book is also ideal for programmers wanting to learn what dynamic SQL is about and how it can help them deliver competitive advantage to their organizations.