DevOps

Embracing a modern data stack in the water industry - Coalesce 2023

2023-10-27 · dbt Coalesce 2023 Watch

video

by Diego Morales (Watercare)

Analytics Azure Azure DevOps Data Modelling dbt Modern Data Stack Snowflake

Learn about Watercare's journey in implementing a modern data stack with a focus on self serving analytics in the water industry. The session covers the reasons behind Watercare's decision to implement a modern data stack, the problem of data conformity, and the tools they used to accelerate their data modeling process. Diego also discusses the benefits of using dbt, Snowflake, and Azure DevOps in data modeling. There is also a parallel drawn between analytics and Diego’s connection with jazz music.

Speaker: Diego Morales, Civil Industrial Engineer, Watercare

Register for Coalesce at https://coalesce.getdbt.com

Understand the implementation of DevOps culture and techniques in the AWS Cloud

2023-10-26 · Master Class: Getting Started with AWS DevOps

talk

AWS Cloud Computing

Understand the implementation of DevOps culture and techniques in the AWS Cloud

Comment gérer un parc de 400 clusters AKS ?

2023-10-12 · Meetup "Maturité et usages Cloud Public" @ Société Générale

talk

Azure Kubernetes Terraform aks

Comment gérer un parc de 400 clusters AKS ? Dans ce REX, nous vous présenterons comment nous pilotons un parc de 400 cluster AKS et comment nous accompagnons les équipes dans la gestion de leurs clusters.

SG APIM - La plateforme d'API Management de Société Générale dans AWS

2023-10-12 · Meetup "Maturité et usages Cloud Public" @ Société Générale

talk

AWS Terraform argocd cloudnative eks

SG APIM - La plateforme d'API Management de Société Générale dans AWS : en passant par les services utilisés pour garantir le fonctionnement de la plateforme jusqu'à l'application réelle sur un cas business, nous expliquerons comment SG APIM est déployée sur AWS.

1/ Présentation du projet OWASP "Top 10 CI/CD Risks"

2023-09-26 · Meetup SecOps #15 - 26 SEPTEMBRE 2023

talk

CI/CD Cloud Computing

Abstract : Les composants des chaines devops (ou pipelines CI/CD) sont désormais des composants critiques du développement des applications cloud native et contiennent des informations hyper sensibles et des secrets. Ils peuvent être compromis et être un vecteur d'attaque sur la software supply chain. Nous présenterons les enjeux de de sécurité relatifs à ces pipelines CI/CD et ferons un focus sur ce nouveau projet OWASP, né en 2022 de la collaboration d'experts Appsec, en l'illustrant par des exemples.

Cloud-native development practices and modernisation approaches

2023-09-26 · Accelerate Application Modernisation Cloud-Natively

talk

by Jacob Schwarz (IBM Automation Expert Labs DACH)

cloud-native continuous deployment gitops hybrid environments serverless

This talk shows how development and operations teams benefit from leveraging modern programming and deployment paradigms. Jacob Schwarz covers practices and patterns from cloud-native projects and shows modernisation approaches to improve existing systems. Cloud-native development practices enable DevOps teams to continuously deliver business value. GitOps and continuous deployments are crucial building blocks of modern platforms. Software is built to be managed in hybrid and serverless environments. Realising these benefits requires a mature cloud platform and experienced DevOps engineers. This session shows how existing solutions can adopt cloud-native thinking to modernise mission critical components. Jacob covers lessons learned from customer engagements to show how cloud-native thinking impacts teams efficiency, software quality, and production resiliency.

Pragmatic and Standardized MLOps - Maria Vechtomova

2023-09-08 · DataTalks.Club Listen

podcast_episode

by Maria Vechtomova (Marvelous MLOps)

AI/ML Data Engineering GitHub HTML LLM MLOps

We talked about:

Maria's background Marvelous MLOps Maria's definition of MLOps Alternate team setups without a central MLOps team Pragmatic vs non-pragmatic MLOps Must-have ML tools (categories) Maturity assessment What to start with in MLOps Standardized MLOps Convincing DevOps to implement Understanding what the tools are used for instead of knowing all the tools Maria's next project plans Is LLM Ops a thing? What Ahold Delhaize does Resource recommendations to learn more about MLOps The importance of data engineering knowledge for ML engineers

Links:

LinkedIn: https://www.linkedin.com/company/marvelous-mlops/

Website: https://marvelousmlops.substack.com/

Free MLOps course: https://github.com/DataTalksClub/mlops-zoomcamp Join DataTalks.Club: https://datatalks.club/slack.html Our events: https://datatalks.club/events.html

The Attributes & Initiatives of a Great SaaS CTO

2023-09-04 · SaaS Scaled - Interviews about SaaS Startups, Analytics, & Operations Listen

podcast_episode

by Ben Johnson (Uptitude)

Data Science SaaS

On today’s episode, we’re joined by Ben Johnson Founder, CEO of Particle41, a provider of software and product development solutions crafted by world-class app development, DevOps, and data science teams. We talk about:

What components the CTO owns in a SaaS companyOptimizing the efficiency of dev teamsHow much of the CTO role is internal vs. externalHow to interview & identify a great CTO candidate

David Cardozo — Machine Learning Engineer, Updata

2023-08-24 · Certification Study Group - Professional ML Engineer - Office Hours 6

talk

by David Cardozo (Updata)

AI/ML Data Science Cyber Security TensorFlow

David is a Machine Learning Engineer and technologist focused on building embedded systems to use novel techniques, and state of the art technologies (Podman, Balena, TensorFlow, Flutter) in machine learning. Software developer with experience in software exploitation, information security, open-source development and DevOps practices. Community leader for the data science community in Colo…

David Cardozo - Machine Learning Engineer, Updata

2023-08-10 · Certification Study Group - Professional ML Engineer - Office Hours 5

talk

by David Cardozo (Updata)

AI/ML Data Science Cyber Security TensorFlow

David is a Machine Learning Engineer and technologist focused on building embedded systems to use novel techniques, and state of the art technologies (Podman, Balena, TensorFlow, Flutter) in machine learning. Software developer with experience in software exploitation, information security, open-source development and DevOps practices. Community leader for the data science community in Colo…

How We Made a Unified Talent Solution Using Databricks Machine Learning, Fine-Tuned LLM & Dolly 2.0

2023-07-26 · Databricks DATA + AI Summit 2023 Watch

video

by Nitu Nivedita

AI/ML Analytics Databricks DataOps Delta LLM Matplotlib NLP Plotly Power BI React Data Streaming

Using Databricks, we built a “Unified Talent Solution” backed by a robust data and AI engine for analyzing skills of a combined pool of permanent employees, contractors, part-time employees and vendors, inferring skill gaps, future trends and recommended priority areas to bridge talent gaps, which ultimately greatly improved operational efficiency, transparency, commercial model, and talent experience of our client. We leveraged a variety of ML algorithms such as boosting, neural networks and NLP transformers to provide better AI-driven insights.

One inevitable part of developing these models within a typical DS workflow is iteration. Databricks' end-to-end ML/DS workflow service, MLflow, helped streamline this process by organizing them into experiments that tracked the data used for training/testing, model artifacts, lineage and the corresponding results/metrics. For checking the health of our models using drift detection, bias and explainability techniques, MLflow's deploying, and monitoring services were leveraged extensively.

Our solution built on Databricks platform, simplified ML by defining a data-centric workflow that unified best practices from DevOps, DataOps, and ModelOps. Databricks Feature Store allowed us to productionize our models and features jointly. Insights were done with visually appealing charts and graphs using PowerBI, plotly, matplotlib, that answer business questions most relevant to clients. We built our own advanced custom analytics platform on top of delta lake as Delta’s ACID guarantees allows us to build a real-time reporting app that displays consistent and reliable data - React (for front-end), Structured Streaming for ingesting data from Delta table with live query analytics on real time data ML predictions based on analytics data.

Talk by: Nitu Nivedita

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Airflow at The Home Depot Canada: Observable orchestration platform for data integration and ML

2023-07-01 · Airflow Summit 2023

session

by Jose Puertos

AI/ML Airflow Beam

The purpose of this session is to indicate how we leverage airflow in a federated way across all our business units to perform a cost-effective platform that accommodates different patterns of data integration, replication and ML tasks in a flexible way providing DevOps tunning of DAGs across environments that integrate to our open-source observability strategy that allows our SREs to have a consistent metrics, monitoring and alerting of data tasks. We will share the opinionated way we setup DAGs that include naming and folder structure conventions along coding expectation like the use of XCom specific entries to report processed elements and support for state for DAGs that require it as well as the expected configurable capabilities for tasks such as the type of runner for Apache Beam tasks. Along these ones we will also indicate the “DevOps DAGs” that we deploy in all our environments that take care of specific platform maintenance/support.

Migrating from Enterprise Scheduler like Autosys, TIDAL, Stonebranch to Airflow

2023-07-01 · Airflow Summit 2023

session

by Ramajayam Gopithirumal

Airflow

How we migrated from Autosys with 1000s of jobs with 800+ dependencies with SLA to be met every hour in a Canada Prominent Bank. Use case to migrate from enterprise scheduler $ spent for every license and renewal cost SLA,Monitoring,Auditing,Devops Integration Vendor lockin 4.Integration to multiple providers

Simplifying the Creation of Data Science Pipelines with Airflow

2023-07-01 · Airflow Summit 2023

session

by Soren Archibald , Jay Thomas

Airflow Cloud Computing Data Engineering Data Science

The ability to create DAGs programmatically opens up new possibilities for collaboration between Data Science and Data Engineering. Engineering and DevOPs are typically incentivized by stability whereas Data Science is typically incentivized by fast iteration and experimentation. With Airflow, it becomes possible for engineers to create tools that allow Data Scientists and Analysts to create robust no-code/low-code data pipelines for feature stores. We will discuss Airlow as a means of bridging the gap between data infrastructure and modeling iteration as well as examine how a Qbiz customer did just this by creating a tool which allows Data Scientists to build features, train models and measure performance, using cloud services, in parallel.

DevOps for Airflow

2023-06-21 · NYC Airflow Meet-Up @ Astronomer

talk

Airflow

Talk about Ramp's DevOps setup for their Airflow usage.

Data Developer Relations - Hugo Bowne-Anderson

2023-06-16 · DataTalks.Club Listen

podcast_episode

by hugo bowne-anderson (Outerbounds)

GitHub HTML Marketing MLOps

We talked about:

Hugo's background Why do tools and the companies that run them have wildly different names Hugo's other projects beside Metaflow Transitioning from educator to DevRel What is DevRel? DevRel vs Marketing How DevRel coordinates with developers How DevRel coordinates with marketers What skills a DevRel needs The challenges that come with being an educator Becoming a good writer: nature vs nurture Hugo's approach to writing and suggestions Establishing a goal for your content Choosing a form of media for your content Is DevRel intercompany or intracompany? The Vanishing Gradients podcast Finding Hugo online

Links:

Hugo Browne's github: http://hugobowne.github.io/ Vanishing Gradients: https://vanishinggradients.fireside.fm/ MLOps and DevOps: Why Data Makes It Differenthttps://www.oreilly.com/radar/mlops-and-devops-why-data-makes-it-different/ Evaluate Metaflow for free, right from your Browser: https://outerbounds.com/sandbox/

Free MLOps course: https://github.com/DataTalksClub/mlops-zoomcamp

Join DataTalks.Club: https://datatalks.club/slack.html

Our events: https://datatalks.club/events.html

Reduce The Overhead In Your Pipelines With Agile Data Engine's DataOps Service

2023-06-04 · Data Engineering Podcast Listen

podcast_episode

by Tevje Olin (Agile Data Engine) , Tobias Macey

Agile/Scrum AI/ML Airflow Analytics API Azure BigQuery CDP CI/CD Data Engineering Data Lake Data Management +9 more

Summary

A significant portion of the time spent by data engineering teams is on managing the workflows and operations of their pipelines. DataOps has arisen as a parallel set of practices to that of DevOps teams as a means of reducing wasted effort. Agile Data Engine is a platform designed to handle the infrastructure side of the DataOps equation, as well as providing the insights that you need to manage the human side of the workflow. In this episode Tevje Olin explains how the platform is implemented, the features that it provides to reduce the amount of effort required to keep your pipelines running, and how you can start using it in your own team.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management RudderStack helps you build a customer data platform on your warehouse or data lake. Instead of trapping data in a black box, they enable you to easily collect customer data from the entire stack and build an identity graph on your warehouse, giving you full visibility and control. Their SDKs make event streaming from any app or website easy, and their extensive library of integrations enable you to automatically send data to hundreds of downstream tools. Sign up free at dataengineeringpodcast.com/rudderstack Your host is Tobias Macey and today I'm interviewing Tevje Olin about Agile Data Engine, a platform that combines data modeling, transformations, continuous delivery and workload orchestration to help you manage your data products and the whole lifecycle of your warehouse

Interview

Introduction How did you get involved in the area of data management? Can you describe what Agile Data Engine is and the story behind it? What are some of the tools and architectures that an organization might be able to replace with Agile Data Engine?

How does the unified experience of Agile Data Engine change the way that teams think about the lifecycle of their data? What are some of the types of experiments that are enabled by reduced operational overhead?

What does CI/CD look like for a data warehouse?

How is it different from CI/CD for software applications?

Can you describe how Agile Data Engine is architected?

How have the design and goals of the system changed since you first started working on it? What are the components that you needed to develop in-house to enable your platform goals?

What are the changes in the broader data ecosystem that have had the most influence on your product goals and customer adoption? Can you describe the workflow for a team that is using Agile Data Engine to power their business analytics?

What are some of the insights that you generate to help your customers understand how to improve their processes or identify new opportunities?

In your "about" page it mentions the unique approaches that you take for warehouse automation. How do your practices differ from the rest of the industry? How have changes in the adoption/implementation of ML and AI impacted the ways that your customers exercise your platform? What are the most interesting, innovative, or unexpected ways that you have seen the Agile Data Engine platform used? What are the most interesting, unexpected, or challenging lessons that you have learned while working on Agile Data Engine? When is Agile Data Engine the wrong choice? What do you have planned for the future of Agile Data Engine?

Guest Contact Info

Parting Question

From your perspective, what is the biggest gap in the tooling or technology for data management today?

About Agile Data Engine

Agile Data Engine unlocks the potential of your data to drive business value - in a rapidly changing world. Agile Data Engine is a DataOps Management platform for designing, deploying, operating and managing data products, and managing the whole lifecycle of a data warehouse. It combines data modeling, transformations, continuous delivery and workload orchestration into the same platform.

Links

Agile Data Engine Bill Inmon Ralph Kimball Snowflake Redshift BigQuery Azure Synapse Airflow

The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA Sponsored By: Rudderstack:

RudderStack provides all your customer data pipelines in one platform. You can collect, transform, and route data across your entire stack with its event streaming, ETL, and reverse ETL pipelines.

RudderStack’s warehouse-first approach means it does not store sensitive information, and it allows you to leverage your existing data warehouse/data lake infrastructure to build a single source of truth for every team.

RudderStack also supports real-time use cases. You can Implement RudderStack SDKs once, then automatically send events to your warehouse and 150+ business tools, and you’ll never have to worry about API changes again.

Visit dataengineeringpodcast.com/rudderstack to sign up for free today, and snag a free T-Shirt just for being a Data Engineering Podcast listener.Support Data Engineering Podcast

Incident Management for Data People | Bigeye

2023-05-11 · Data Council 2023 Watch

video

by Kyle Kirwan (Bigeye)

AI/ML Analytics BigEye Data Engineering Data Quality Datadog

ABOUT THE TALK: Incident management is a key practice used by DevOps and SRE teams to keep software reliable—but it's still uncommon among data teams! Datadog says incident management can "streamline their response procedures, reducing mean time to repair (MTTR) and minimizing any impact on end users."

In this talk, Kyle Kirwan, co-founder of data observability company Bigeye, will explain the basics of incident management and how data teams can use it to reduce disruptions to analytics and machine learning applications.

ABOUT THE SPEAKER: Kyle Kirwan is the co-founder and CEO of Bigeye. He began his career as a data scientist, went on to lead the development of Uber's internal data catalog/lineage/quality tools, and now helps data teams use data observability to improve pipeline reliability and data quality.

ABOUT DATA COUNCIL: Data Council (https://www.datacouncil.ai/) is a community and conference series that provides data professionals with the learning and networking opportunities they need to grow their careers.

Make sure to subscribe to our channel for the most up-to-date talks from technical professionals on data related topics including data infrastructure, data engineering, ML systems, analytics and AI from top startups and tech companies.

FOLLOW DATA COUNCIL: Twitter: https://twitter.com/DataCouncilAI LinkedIn: https://www.linkedin.com/company/datacouncil-ai/

Streaming Data Mesh

2023-05-11 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Stephen Mooney , Hubert Dulay

Data Governance Kafka MLOps Data Streaming data data-engineering streaming-architecture streaming-messaging

Data lakes and warehouses have become increasingly fragile, costly, and difficult to maintain as data gets bigger and moves faster. Data meshes can help your organization decentralize data, giving ownership back to the engineers who produced it. This book provides a concise yet comprehensive overview of data mesh patterns for streaming and real-time data services. Authors Hubert Dulay and Stephen Mooney examine the vast differences between streaming and batch data meshes. Data engineers, architects, data product owners, and those in DevOps and MLOps roles will learn steps for implementing a streaming data mesh, from defining a data domain to building a good data product. Through the course of the book, you'll create a complete self-service data platform and devise a data governance system that enables your mesh to work seamlessly. With this book, you will: Design a streaming data mesh using Kafka Learn how to identify a domain Build your first data product using self-service tools Apply data governance to the data products you create Learn the differences between synchronous and asynchronous data services Implement self-services that support decentralized data

Cloud Infrastructure From Python Code: How Far Could We Go?

2023-04-19 · PyConDE & PyData Berlin 2023

talk

by Asher Sterkin , Etzik Bega

Cloud Computing Python

Discover how Infrastructure From Code (IfC) can revolutionize Cloud DevOps automation by generating cloud deployment templates directly from Python code. Learn how this technology empowers Python developers to easily deploy and operate cost-effective, secure, reliable, and sustainable cloud software. Join us to explore the strategic potential of IfC.

talk-data.com

Activity Trend

Top Events

Top Speakers

Embracing a modern data stack in the water industry - Coalesce 2023

Understand the implementation of DevOps culture and techniques in the AWS Cloud

Comment gérer un parc de 400 clusters AKS ?

SG APIM - La plateforme d'API Management de Société Générale dans AWS

1/ Présentation du projet OWASP "Top 10 CI/CD Risks"

Cloud-native development practices and modernisation approaches

Pragmatic and Standardized MLOps - Maria Vechtomova

The Attributes & Initiatives of a Great SaaS CTO

David Cardozo — Machine Learning Engineer, Updata

David Cardozo - Machine Learning Engineer, Updata

How We Made a Unified Talent Solution Using Databricks Machine Learning, Fine-Tuned LLM & Dolly 2.0

Airflow at The Home Depot Canada: Observable orchestration platform for data integration and ML

Migrating from Enterprise Scheduler like Autosys, TIDAL, Stonebranch to Airflow

Simplifying the Creation of Data Science Pipelines with Airflow

DevOps for Airflow

Data Developer Relations - Hugo Bowne-Anderson

Reduce The Overhead In Your Pipelines With Agile Data Engine's DataOps Service

Incident Management for Data People | Bigeye

Streaming Data Mesh

Cloud Infrastructure From Python Code: How Far Could We Go?