talk-data.com talk-data.com

Topic

Talend

data_integration etl big_data

13

tagged

Activity Trend

4 peak/qtr
2020-Q1 2026-Q1

Activities

13 activities · Newest first

Un copilote n’est utile que s’il s’appuie sur des données fiables, contextualisées et gouvernées. Venez voir comment la solution Qlik Talend Cloud permet de bâtir un Knowledge Mart orienté métier, puis comment Qlik Answers, grâce au RAG (Retrieval-Augmented Generation), exploite ce Knowledge Mart pour fournir des réponses en langage naturel, traçables, pertinentes et actionnables.

Vous souhaitez en savoir plus ? Toute l'équipe Qlik vous donne rendez-vous sur le stand D38 pour des démos live, des cas d'usage et des conseils d'experts.

Cas client : Aésio Mutuelle révolutionne l'exploitation de ses données grâce aux solutions Qlik Talend Data Intégration et Qlik Cloud Analytics

Les cas d'usage de la data dans les sociétés d'assurance sont nombreux et sensibles : de la simple gestion opérationnelle des dossiers jusqu'à la lutte contre le blanchiment, en passant par les obligations légales de type LCB-FT. Mais leur gestion a tendance à créer des silos au fil du temps.

Pour ce spécialiste de l'assurance santé et prévoyance, la mise en place de la plateforme data moderne de Qlik, permettant de gérer la donnée de bout-en-bout, a transformé les tâches quotidiennes des métiers, en leur donnant une complète autonomie.

Vous souhaitez en savoir plus ? Toute l'équipe Qlik vous donne rendez-vous sur le stand D38 pour des démos live, des cas d'usage et des conseils d'experts. 

Vos données ne valent rien si on ne sait pas où les trouver ni si on peut leur faire confiance. En 30 minutes, découvrez comment Qlik Talend Cloud – Data Products Catalog transforme un dataset brut en data product réutilisable : glossaire métier, métadonnées, profiling, lineage, règles d’accès et certification avant publication dans un catalogue simple à adopter par les équipes data, BI et IA. Démo end-to-end : créer → valider→ publier → suivre l’adoption - sans lourdeur ni copier-coller. Résultat : des data products identifés, fiables et auditables qui accélèrent vos cas d’usage et instaurent une gouvernance de qualité.

Découvrez comment Qlik Cloud révolutionne la manière dont les utilisateurs trouvent, comprennent et exploitent leurs données en toute confiance. De l’analyse intuitive de Qlik Cloud Analytics à la gouvernance complète des pipelines de données avec Qlik Talend Cloud, explorez une plateforme unifiée pour producteurs et consommateurs de données. Plongez dans l’univers du Data Products Catalog, du Trust Score et de l’IA générative avec Qlik Answers. Une session inspirante pour voir vos données s’animer, de la source à l’action. Ne manquez pas cette immersion dans l’avenir des données !

Sponsored by: Datafold | Breaking Free: How Evri is Modernizing SAP HANA Workflows to Databricks with AI and Datafold

With expensive contracts up for renewal, Evri faced the challenge of migrating 1,000 SAP HANA assets and 200+ Talend jobs to Databricks. This talk will cover how we transformed SAP HANA and Talend workflows into modern Databricks pipelines through AI-powered translation and validation -- without months of manual coding. We'll cover:- Techniques for handling SAP HANA's proprietary formats- Approaches for refactoring incremental pipelines while ensuring dashboard stability- The technology enabling automated translation of complex business logic- Validation strategies that guarantee migration accuracye'll share real examples of SAP HANA stored procedures transformed into Databricks code and demonstrate how we maintained 100% uptime of critical dashboards during the transition. Join us to discover how AI is revolutionizing what's possible in enterprise migrations from GUI-based legacy systems to modern, code-first data platforms.

Data Wrangling

DATA WRANGLING Written and edited by some of the world’s top experts in the field, this exciting new volume provides state-of-the-art research and latest technological breakthroughs in data wrangling, its theoretical concepts, practical applications, and tools for solving everyday problems. Data wrangling is the process of cleaning and unifying messy and complex data sets for easy access and analysis. This process typically includes manually converting and mapping data from one raw form into another format to allow for more convenient consumption and organization of the data. Data wrangling is increasingly ubiquitous at today’s top firms. Data cleaning focuses on removing inaccurate data from your data set whereas data wrangling focuses on transforming the data’s format, typically by converting “raw” data into another format more suitable for use. Data wrangling is a necessary component of any business. Data wrangling solutions are specifically designed and architected to handle diverse, complex data at any scale, including many applications, such as Datameer, Infogix, Paxata, Talend, Tamr, TMMData, and Trifacta. This book synthesizes the processes of data wrangling into a comprehensive overview, with a strong focus on recent and rapidly evolving agile analytic processes in data-driven enterprises, for businesses and other enterprises to use to find solutions for their everyday problems and practical applications. Whether for the veteran engineer, scientist, or other industry professional, this book is a must have for any library.

Summary Data warehouses have gone through many transformations, from standard relational databases on powerful hardware, to column oriented storage engines, to the current generation of cloud-native analytical engines. SnowflakeDB has been leading the charge to take advantage of cloud services that simplify the separation of compute and storage. In this episode Kent Graziano, chief technical evangelist for SnowflakeDB, explains how it is differentiated from other managed platforms and traditional data warehouse engines, the features that allow you to scale your usage dynamically, and how it allows for a shift in your workflow from ETL to ELT. If you are evaluating your options for building or migrating a data platform, then this is definitely worth a listen.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With 200Gbit private networking, scalable shared block storage, and a 40Gbit public network, you’ve got everything you need to run a fast, reliable, and bullet-proof data platform. If you need global distribution, they’ve got that covered too with world-wide datacenters including new ones in Toronto and Mumbai. And for your machine learning workloads, they just announced dedicated CPU instances. Go to dataengineeringpodcast.com/linode today to get a $20 credit and launch a new server in under a minute. And don’t forget to thank them for their continued support of this show! You listen to this show to learn and stay up to date with what’s happening in databases, streaming platforms, big data, and everything else you need to know about modern data management. For even more opportunities to meet, listen, and learn from your peers you don’t want to miss out on this year’s conference season. We have partnered with organizations such as O’Reilly Media and the Python Software Foundation. Upcoming events include the Software Architecture Conference in NYC and PyCOn US in Pittsburgh. Go to dataengineeringpodcast.com/conferences to learn more about these and other events, and take advantage of our partner discounts to save money when you register today. Your host is Tobias Macey and today I’m interviewing Kent Graziano about SnowflakeDB, the cloud-native data warehouse

Interview

Introduction How did you get involved in the area of data management? Can you start by explaining what SnowflakeDB is for anyone who isn’t familiar with it?

How does it compare to the other available platforms for data warehousing? How does it differ from traditional data warehouses?

How does the performance and flexibility affect the data modeling requirements?

Snowflake is one of the data stores that is enabling the shift from an ETL to an ELT workflow. What are the features that allow for that approach and what are some of the challenges that it introduces? Can you describe how the platform is architected and some of the ways that it has evolved as it has grown in popularity?

What are some of the current limitations that you are struggling with?

For someone getting started with Snowflake what is involved with loading data into the platform?

What is their workflow for allocating and scaling compute capacity and running anlyses?

One of the interesting features enabled by your architecture is data sharing. What are some of the most interesting or unexpected uses of that capability that you have seen? What are some other features or use cases for Snowflake that are not as well known or publicized which you think users should know about? When is SnowflakeDB the wrong choice? What are some of the plans for the future of SnowflakeDB?

Contact Info

LinkedIn Website @KentGraziano on Twitter

Parting Question

From your perspective, what is the biggest gap in the tooling or technology for data management today?

Links

SnowflakeDB

Free Trial Stack Overflow

Data Warehouse Oracle DB MPP == Massively Parallel Processing Shared Nothing Architecture Multi-Cluster Shared Data Architecture Google BigQuery AWS Redshift AWS Redshift Spectrum Presto

Podcast Episode

SnowflakeDB Semi-Structured Data Types Hive ACID == Atomicity, Consistency, Isolation, Durability 3rd Normal Form Data Vault Modeling Dimensional Modeling JSON AVRO Parquet SnowflakeDB Virtual Warehouses CRM == Customer Relationship Management Master Data Management

Podcast Episode

FoundationDB

Podcast Episode

Apache Spark

Podcast Episode

SSIS == SQL Server Integration Services Talend Informatica Fivetran

Podcast Episode

Matillion Apache Kafka Snowpipe Snowflake Data Exchange OLTP == Online Transaction Processing GeoJSON Snowflake Documentation SnowAlert Splunk Data Catalog

The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA

Support Data Engineering Podcast

Next-Generation Big Data: A Practical Guide to Apache Kudu, Impala, and Spark

Utilize this practical and easy-to-follow guide to modernize traditional enterprise data warehouse and business intelligence environments with next-generation big data technologies. Next-Generation Big Data takes a holistic approach, covering the most important aspects of modern enterprise big data. The book covers not only the main technology stack but also the next-generation tools and applications used for big data warehousing, data warehouse optimization, real-time and batch data ingestion and processing, real-time data visualization, big data governance, data wrangling, big data cloud deployments, and distributed in-memory big data computing. Finally, the book has an extensive and detailed coverage of big data case studies from Navistar, Cerner, British Telecom, Shopzilla, Thomson Reuters, and Mastercard. What You’ll Learn Install Apache Kudu, Impala, and Spark to modernize enterprise data warehouse and business intelligence environments, complete with real-world, easy-to-follow examples, and practical advice Integrate HBase, Solr, Oracle, SQL Server, MySQL, Flume, Kafka, HDFS, and Amazon S3 with Apache Kudu, Impala, and Spark Use StreamSets, Talend, Pentaho, and CDAP for real-time and batch data ingestion and processing Utilize Trifacta, Alteryx, and Datameer for data wrangling and interactive data processing Turbocharge Spark with Alluxio, a distributed in-memory storage platform Deploy big data in the cloud using Cloudera Director Perform real-time data visualization and time series analysis using Zoomdata, Apache Kudu, Impala, and Spark Understand enterprise big data topics such as big data governance, metadata management, data lineage, impact analysis, and policy enforcement, and how to use Cloudera Navigator to perform common data governance tasks Implement big data use cases such as big data warehousing, data warehouse optimization, Internet of Things, real-time data ingestion and analytics, complex event processing, and scalable predictive modeling Study real-world big data case studies from innovative companies, including Navistar, Cerner, British Telecom, Shopzilla, Thomson Reuters, and Mastercard Who This Book Is For BI and big data warehouse professionals interested in gaining practical and real-world insight into next-generation big data processing and analytics using Apache Kudu, Impala, and Spark; and those who want to learn more about other advanced enterprise topics

Summary

Managing an analytics project can be difficult due to the number of systems involved and the need to ensure that new information can be delivered quickly and reliably. That challenge can be met by adopting practices and principles from lean manufacturing and agile software development, and the cross-functional collaboration, feedback loops, and focus on automation in the DevOps movement. In this episode Christopher Bergh discusses ways that you can start adding reliability and speed to your workflow to deliver results with confidence and consistency.

Preamble

Hello and welcome to the Data Engineering Podcast, the show about modern data management When you’re ready to build your next pipeline you’ll need somewhere to deploy it, so check out Linode. With private networking, shared block storage, node balancers, and a 40Gbit network, all controlled by a brand new API you’ve got everything you need to run a bullet-proof data platform. Go to dataengineeringpodcast.com/linode to get a $20 credit and launch a new server in under a minute. For complete visibility into the health of your pipeline, including deployment tracking, and powerful alerting driven by machine-learning, DataDog has got you covered. With their monitoring, metrics, and log collection agent, including extensive integrations and distributed tracing, you’ll have everything you need to find and fix performance bottlenecks in no time. Go to dataengineeringpodcast.com/datadog today to start your free 14 day trial and get a sweet new T-Shirt. Go to dataengineeringpodcast.com to subscribe to the show, sign up for the newsletter, read the show notes, and get in touch. Your host is Tobias Macey and today I’m interviewing Christopher Bergh about DataKitchen and the rise of DataOps

Interview

Introduction How did you get involved in the area of data management? How do you define DataOps?

How does it compare to the practices encouraged by the DevOps movement? How does it relate to or influence the role of a data engineer?

How does a DataOps oriented workflow differ from other existing approaches for building data platforms? One of the aspects of DataOps that you call out is the practice of providing multiple environments to provide a platform for testing the various aspects of the analytics workflow in a non-production context. What are some of the techniques that are available for managing data in appropriate volumes across those deployments? The practice of testing logic as code is fairly well understood and has a large set of existing tools. What have you found to be some of the most effective methods for testing data as it flows through a system? One of the practices of DevOps is to create feedback loops that can be used to ensure that business needs are being met. What are the metrics that you track in your platform to define the value that is being created and how the various steps in the workflow are proceeding toward that goal?

In order to keep feedback loops fast it is necessary for tests to run quickly. How do you balance the need for larger quantities of data to be used for verifying scalability/performance against optimizing for cost and speed in non-production environments?

How does the DataKitchen platform simplify the process of operationalizing a data analytics workflow? As the need for rapid iteration and deployment of systems to capture, store, process, and analyze data becomes more prevalent how do you foresee that feeding back into the ways that the landscape of data tools are designed and developed?

Contact Info

LinkedIn @ChrisBergh on Twitter Email

Parting Question

From your perspective, what is the biggest gap in the tooling or technology for data management today?

Links

DataOps Manifesto DataKitchen 2017: The Year Of DataOps Air Traffic Control Chief Data Officer (CDO) Gartner W. Edwards Deming DevOps Total Quality Management (TQM) Informatica Talend Agile Development Cattle Not Pets IDE (Integrated Devel

Business Intelligence Tools for Small Companies: A Guide to Free and Low-Cost Solutions

Learn how to transition from Excel-based business intelligence (BI) analysis to enterprise stacks of open-source BI tools. Select and implement the best free and freemium open-source BI tools for your company's needs and design, implement, and integrate BI automation across the full stack using agile methodologies. Business Intelligence Tools for Small Companies provides hands-on demonstrations of open-source tools suitable for the BI requirements of small businesses. The authors draw on their deep experience as BI consultants, developers, and administrators to guide you through the extract-transform-load/data warehousing (ETL/DWH) sequence of extracting data from an enterprise resource planning (ERP) database freely available on the Internet, transforming the data, manipulating them, and loading them into a relational database. The authors demonstrate how to extract, report, and dashboard key performance indicators (KPIs) in a visually appealing format from the relational database management system (RDBMS). They model the selection and implementation of free and freemium tools such as Pentaho Data Integrator and Talend for ELT, Oracle XE and MySQL/MariaDB for RDBMS, and Qliksense, Power BI, and MicroStrategy Desktop for reporting. This richly illustrated guide models the deployment of a small company BI stack on an inexpensive cloud platform such as AWS. What You'll Learn You will learn how to manage, integrate, and automate the processes of BI by selecting and implementing tools to: Implement and manage the business intelligence/data warehousing (BI/DWH) infrastructure Extract data from any enterprise resource planning (ERP) tool Process and integrate BI data using open-source extract-transform-load (ETL) tools Query, report, and analyze BI data using open-source visualization and dashboard tools Use a MOLAP tool to define next year's budget, integrating real data with target scenarios Deploy BI solutions and big data experiments inexpensively on cloud platforms Who This Book Is For Engineers, DBAs, analysts, consultants, and managers at small companies with limited resources but whose BI requirements have outgrown the limitations of Excel spreadsheets; personnel in mid-sized companies with established BI systems who are exploring technological updates and more cost-efficient solutions

Self-Service Analytics

Organizations today are swimming in data, but most of them manage to analyze only a fraction of what they collect. To help build a stronger data-driven culture, many organizations are adopting a new approach called self-service analytics. This O’Reilly report examines how this approach provides data access to more people across a company, allowing business users to work with data themselves and create their own customized analyses. The result? More eyes looking at more data in more ways. Along with the perceived benefits, author Sandra Swanson also delves into the potential pitfalls of self-service analytics: balancing greater data access with concerns about security, data governance, and siloed data stores. Read this report and gain insights from enterprise tech (Yahoo), government (the City of Chicago), and disruptive retail (Warby Parker and Talend). Learn how these organizations are handling self-service analytics in practice. Sandra Swanson is a Chicago-based writer who’s covered technology, science, and business for dozens of publications, including ScientificAmerican.com. Connect with her on Twitter (@saswanson) or at www.saswanson.com.

Talend Open Studio Cookbook

Talend Open Studio Cookbook is a comprehensive guide for both beginners and intermediate users of Talend Open Studio, the leading open-source data integration software. Through practical recipes, this book covers all aspects of Talend development, from schemas and data mapping to advanced debugging and deployment techniques. What this Book will help me do Master the use of schemas for forming solid data structures. Effectively utilize tMap for data transformation and integration. Develop skills to manage and manipulate various file formats. Understand how to test and debug Talend jobs to ensure robust solutions. Learn to deploy, schedule, and manage Talend integrations in production environments. Author(s) None Barton is an experienced developer and a passionate advocate for open-source data tools. With years of hands-on experience in data integration and Talend development, they bring a practical and results-driven perspective to their writing, aiming to empower developers with actionable insights and real-world expertise. Who is it for? Ideal readers for this book are beginner and intermediate developers seeking to enhance their understanding of Talend Open Studio. Whether you've used the software for basic tasks or are completely new to it, this cookbook format is structured to guide you through practical challenges and deeper concepts. If your goal is to build confidence and efficiency in data integration tasks, this book is designed for you.

Getting Started with Talend Open Studio for Data Integration

Discover how to leverage Talend Open Studio for Data Integration to manage and optimize your data workflow. This book provides a hands-on introduction to creating integration jobs and automating data processes using Talend's drag-and-drop interface. Explore practical examples, and realize how powerful and approachable data integration can be. What this Book will help me do Develop and deploy scalable data integration pipelines using Talend Open Studio. Master common data operations like filtering, sorting, transforming, and aggregating. Gain expertise in connecting various data sources, both relational and non-relational. Implement complex flow logic, including conditional processing and dependencies. Learn to package and manage production-ready integration jobs for real-world scenarios. Author(s) Jonathan Bowen is an experienced technologist and author specializing in data integration and software tools. With years of hands-on experience, Jonathan has guided many organizations in adopting efficient data workflows. He conveys technical concepts with clarity and provides practical, actionable content to help readers succeed. Who is it for? This book is perfect for developers, business analysts, and IT professionals tasked with integration projects. Whether you're a novice to data integration or looking to deepen your hands-on experience with Talend, this guide will support your journey. Some prior familiarity with SQL and a data management background are advantageous. Choose this book if you aim to become a proficient data integrator.