talk-data.com talk-data.com

Topic

Jenkins

ci_cd automation devops software_development

16

tagged

Activity Trend

2 peak/qtr
2020-Q1 2026-Q1

Activities

16 activities · Newest first

Enabling Sleep Science Research With Databricks and Delta Sharing

Leveraging Databricks as a platform, we facilitate the sharing of anonymized datasets across various Databricks workspaces and accounts, spanning multiple cloud environments such as AWS, Azure, and Google Cloud. This capability, powered by Delta Sharing, extends both within and outside Sleep Number, enabling accelerated insights while ensuring compliance with data security and privacy standards. In this session, we will showcase our architecture and implementation strategy for data sharing, highlighting the use of Databricks’ Unity Catalog and Delta Sharing, along with integration with platforms like Jira, Jenkins, and Terraform to streamline project management and system orchestration.

This talk is about using synthetic monitoring to reduce MTTD&MTTR significantly and achieve high devops maturity. Daniel is a big believer in synthetic monitoring as a concept to build reliable production services. If engineers are supposed to run what they build, they need monitoring tools that work for them. He has built his own custom solutions in the past using Jenkins or GH Actions and later used SaaS tools for this. He would like to share his experience getting frontend engineers to build monitoring and get everyone on an engineering team to care about production system reliability. Daniel Paulus has taken a unique journey from military officer to tech leader, and he’s now the VP of Engineering at Checkly. Along the way, he’s worn many hats— from engineering lead to director —learning how to build strong teams and solve tough challenges. Outside of work, Daniel lives near Berlin with his family and four kids, while also finding time to maintain an open-source project. Whether it’s scaling teams or debugging code, he’s passionate about technology and enjoys sharing his knowledge with others.

I am a big believer in synthetic monitoring as a concept to build reliable production services. If engineers are supposed to run what they build, they need monitoring tools that work for them. I have built my own custom solutions in the past using Jenkins or GH Actions and later used SaaS tools for this. I want to share my experience how I got frontend engineers to build monitoring and get everyone on an engineering team to care about production system reliability. Daniel Paulus is an accomplished technology leader, presently leading as the VP of Engineering at Checkly, building synthetic monitoring with Playwright.

Up until a few years ago, teams at Uber used multiple data workflow systems, with some based on open source projects such as Apache Oozie, Apache Airflow, and Jenkins while others were custom built solutions written in Python and Clojure. Every user who needed to move data around had to learn about and choose from these systems, depending on the specific task they needed to accomplish. Each system required additional maintenance and operational burdens to keep it running, troubleshoot issues, fix bugs, and educate users. After this evaluation, and with the goal in mind of converging on a single workflow system capable of supporting Uber’s scale, we settled on an Airflow-based system. The Airflow-based DSL provided the best trade-off of flexibility, expressiveness, and ease of use while being accessible for our broad range of users, which includes data scientists, developers, machine learning experts, and operations employees. This talk will focus on scaling Airflow to Uber’s scale and providing a no-code seamless user experience

As user of Airflow we often use DagRun.conf attributes to control content and flow of a DAG run. Previously the Airflow UI only allowed to launch via JSON in the UI. This was technically feasible but not user friendly. A user needs to model, check and understand the JSON and enter parameters manually without the option to validate before trigger. Similar like Jenkins or Github/Azure pipelines we desire an UI option to trigger with a UI and specifying parameters. With Airflow 2.6.0 now the DAG.params are used to render a nice entry form and with a bit of options a user friendly trigger UI can be implemented. This session is showing how the new feature works and provides some examples how to use it for your purposes.

Whenever Kris and I chat, there's an agenda, which is totally useless. Every single time we've talked, the conversation goes into different (I'll argue better) directions. In this episode, Kris and I delve into the art and craft of programming, finding your tribe as a developer advocate, and so much more. I hope you enjoy this great and meandering conversation.

Developer Voices podcast: https://open.spotify.com/show/2gXhwz0AQRv2cvw61kobE5

Kris's LinkedIn: https://www.linkedin.com/in/krisjenkins/

Kris's Twitter: https://twitter.com/krisajenkins


If you like this show, give it a 5-star rating on your favorite podcast platform.

Purchase Fundamentals of Data Engineering at your favorite bookseller.

Subscribe to my Substack: https://joereis.substack.com/

We talked about;

Antonis' background The pros and cons of working for a startup Useful skills for working at a startup and the Lean way to work How Antonis joined the DataTalks.Club community Suggestions for students joining the MLOps course Antonis contributing to Evidently AI How Antonis started freelancing Getting your first clients on Upwork Pricing your work as a freelancer The process after getting approved by a client Wearing many hats as a freelancer and while working at a startup Other suggestions for getting clients as a freelancer Antonis' thoughts on the Data Engineering course Antonis' resource recommendations

Links:

Lean Startup by Eric Ries: https://theleanstartup.com/ Lean Analytics: https://leananalyticsbook.com/ Designing Machine Learning Systems by Chip Huyen: https://www.oreilly.com/library/view/designing-machine-learning/9781098107956/ Kafka Streaming with python by Khris Jenkins tutorial video: https://youtu.be/jItIQ-UvFI4

Free MLOps course: https://github.com/DataTalksClub/mlops-zoomcamp Join DataTalks.Club: https://datatalks.club/slack.html Our events: https://datatalks.club/events.html

Automating Model Lifecycle Orchestration with Jenkins

A key part of the lifecycle involves bringing a model to production. In regular software systems, this is accomplished via a CI/CD pipeline such as one built with Jenkins. However, integrating Jenkins into a typical DS/ML workflow is not straightforward for X, Y, Z reasons. In this hands-on talk, I will talk about what Jenkins and CI/CD practices can bring to your ML workflows, demonstrate a few of these workflows, and share some best practices on how a bit of Jenkins can level up your MLOps processes.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

We talked about:

CJ’s background Evolutionary biology Learning machine learning Learning on the job and being honest with what you don’t know Convincing that you will be useful CJ’s first interview Transitioning to industry Tailoring your CV Data science courses Moving to Berlin Being selective vs ‘spray and pray’ Moving on to new jobs Plan for transitioning to industry Requirements for getting hired Publications, portfolios and pet projects Adjusting to industry Bad habits from academia Topics with long-term value CJ’s textbook

Links:

CJ's LinkedIn: https://www.linkedin.com/in/christina-jenkins/ Positions for master students: one two

Join DataTalks.Club: https://datatalks.club/slack.html

Our events: https://datatalks.club/events.html

Highlights  In Part 3 of the music "trigger cities" mini-series, we explore the music tastes of Mexico City, São Paulo, Buenos Aires, Rio de Janiero, Bogotá, Lima and Santiago.Mission   Good morning, it’s Jason here at Chartmetric with your 3-minute Data Dump where we upload charts, artists and playlists into your brain so you can stay up on the latest in the music data world.We’re on the socials at “chartmetric”, that’s Chartmetric, no “S ”- follow us on Instagram, Twitter, Facebook or LinkedIn, and talk to us! We’d love to hear from you.DateThis is your Data Dump for Wednesday, July 17th, 2019.Latin America "Trigger" CitiesIn case you missed them, we have been working on a written mini-series called “trigger cities”, it’s a concept that Chartmetric’s Partner and Advisor, Chaz Jenkins, an international marketing guru coined many years ago.It’s the idea that in the streaming environment, our algorithms on YouTube, Spotify and all platforms are connected with the tastes of huge cities around the world who also love the same apps.Lauv, the uber-successful independent artist first saw playlist success with his 2017 hit “I Like Me Better” in Southeast Asia! Lauv...is not Asian, but SE Asians adore great pop love songs.Reggaeton from the likes of huge superstars like Colombia’s J Balvin and Puerto Rico’s Bad Bunny are now on top playlists like Spotify’s Today’s Top Hits, a primarily English-language playlist...but their come-up was based on Latin American listeners supporting them more than any other region.So in the interest of knowing what the local markets are like, we wrote about  seven different metropolitan areas in Latin America: Mexico City, São Paulo, Buenos Aires, Rio de Janiero, Bogotá, Lima and Santiago.Five speak Spanish, two speak Brazilian Portuguese, and all love the YouTube.It’s a known fact that Latin America turns to the Google platform more than anything else to listen to music, and the numbers are quite impressive: Bogotá, despite having less than half (10.7M) of Mexico City’s population, took the #1 spot in YouTube views in one week last month with 26.5M views across 1.6M+ artists. The Mexican capital, however, was not far behind with 24.8M, and the two cities seem to be leading YouTube’s consumption in the region, with Lima a distant #3 with 17.1M views.On Spotify, Mexico City-as Spotify’s proclaimed “World’s Music-Streaming Mecca”-took the top spot in the same week with 2.3B non-unique monthly listeners (and this is admittedly odd metric, check the show notes for a link to the explanation), far outstripping Santiago in the #2 spot with 1.5B non-unique monthly listeners (MLs).When it comes to genres, we compiled genre tags on Shazam chart occurrences in these seven cities and found what sounds each city was most curious about when they flipped out their phones.“Urbano latino”-which is primarily reggaeton and Latin trap and the most popular in Santiago, Lima and Bogotá-didn’t show up at all in Brazil, with Brazilian-native genres such as “Sertanejo” (Brazilian country music) asserting their unique identity in the region, with Pop/Rock/Dance all showing strongly in the past month for both cities.This is contrary to the idea that all of Latin America loves reggaeton...just not true.On Instagram, who do you think are the ten most followed artists in the region?Well there’s Selena Gomez, Justin Bieber, Ariana Grande and Beyoncé…...there’s also Maluma and Daddy Yankee...But do you know pop queen Anitta, local icon Ivete Sangalo, comedian-entertainer Whindersson Nunes or the Beyoncé-inspired Ludmilla? They’re all Brazilian, showing how much Brazilians love IG, and also how much they love their own country’s artists.So there’s a taste of Part 3 of our trigger cities mini-series, please do check it out on Medium or LinkedIn and let us know what you think! If you’re into Southeast Asia, we wrote about that too (Medium or LinkedIn). We hope they’re useful insights as you target social media campaigns, forge international collaborations or plan out a tour!Outro That’s it for your Daily Data Dump for Wednesday, July 17th 2019. This is Jason from Chartmetric.Free accounts are at chartmetric.comAnd article links and show notes are at: podcast.chartmetric.comHappy Wednesday, and we’ll see you Friday! 

Summary Machine learning is a class of technologies that promise to revolutionize business. Unfortunately, it can be difficult to identify and execute on ways that it can be used in large companies. Kevin Dewalt founded Prolego to help Fortune 500 companies build, launch, and maintain their first machine learning projects so that they can remain competitive in our landscape of constant change. In this episode he discusses why machine learning projects require a new set of capabilities, how to build a team from internal and external candidates, and how an example project progressed through each phase of maturity. This was a great conversation for anyone who wants to understand the benefits and tradeoffs of machine learning for their own projects and how to put it into practice.

Introduction

Hello and welcome to the Data Engineering Podcast, the show about modern data management When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out Linode. With 200Gbit private networking, scalable shared block storage, and a 40Gbit public network, you’ve got everything you need to run a fast, reliable, and bullet-proof data platform. If you need global distribution, they’ve got that covered too with world-wide datacenters including new ones in Toronto and Mumbai. Go to dataengineeringpodcast.com/linode today to get a $20 credit and launch a new server in under a minute. Go to dataengineeringpodcast.com to subscribe to the show, sign up for the mailing list, read the show notes, and get in touch. To help other people find the show please leave a review on iTunes, or Google Play Music, tell your friends and co-workers, and share it on social media. Join the community in the new Zulip chat workspace at dataengineeringpodcast.com/chat Your host is Tobias Macey and today I’m interviewing Kevin Dewalt about his experiences at Prolego, building machine learning projects for Fortune 500 companies

Interview

Introduction How did you get involved in the area of data management? For the benefit of software engineers and team leaders who are new to machine learning, can you briefly describe what machine learning is and why is it relevant to them? What is your primary mission at Prolego and how did you identify, execute on, and establish a presence in your particular market?

How much of your sales process is spent on educating your clients about what AI or ML are and the benefits that these technologies can provide?

What have you found to be the technical skills and capacity necessary for being successful in building and deploying a machine learning project?

When engaging with a client, what have you found to be the most common areas of technical capacity or knowledge that are needed?

Everyone talks about a talent shortage in machine learning. Can you suggest a recruiting or skills development process for companies which need to build out their data engineering practice? What challenges will teams typically encounter when creating an efficient working relationship between data scientists and data engineers? Can you briefly describe a successful project of developing a first ML model and putting it into production?

What is the breakdown of how much time was spent on different activities such as data wrangling, model development, and data engineering pipeline development? When releasing to production, can you share the types of metrics that you track to ensure the health and proper functioning of the models? What does a deployable artifact for a machine learning/deep learning application look like?

What basic technology stack is necessary for putting the first ML models into production?

How does the build vs. buy debate break down in this space and what products do you typically recommend to your clients?

What are the major risks associated with deploying ML models and how can a team mitigate them? Suppose a software engineer wants to break into ML. What data engineering skills would you suggest they learn? How should they position themselves for the right opportunity?

Contact Info

Email: Kevin Dewalt [email protected] and Russ Rands [email protected] Connect on LinkedIn: Kevin Dewalt and Russ Rands Twitter: @kevindewalt

Parting Question

From your perspective, what is the biggest gap in the tooling or technology for data management today?

Links

Prolego Download our book: Become an AI Company in 90 Days Google Rules Of ML AI Winter Machine Learning Supervised Learning O’Reilly Strata Conference GE Rebranding Commercials Jez Humble: Stop Hiring Devops Experts (And Start Growing Them) SQL ORM Django RoR Tensorflow PyTorch Keras Data Engineering Podcast Episode About Data Teams DevOps For Data Teams – DevOps Days Boston Presentation by Tobias Jupyter Notebook Data Engineering Podcast: Notebooks at Netflix Pandas

Podcast Interview

Joel Grus

JupyterCon Presentation Data Science From Scratch

Expensify Airflow

James Meickle Interview

Git Jenkins Continuous Integration Practical Deep Learning For Coders Course by Jeremy Howard Data Carpentry

The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA

Support Data Engineering Podcast

podcast_episode
by Kyle Polich , Dan Kahan (Yale University (Cultural Cognition Project at Yale University))

In this episode, our guest is Dan Kahan about his research into how people consume and interpret science news. In an era of fake news, motivated reasoning, and alternative facts, important questions need to be asked about how people understand new information. Dan is a member of the Cultural Cognition Project at Yale University, a group of scholars interested in studying how cultural values shape public risk perceptions and related policy beliefs. In a paper titled Cultural cognition of scientific consensus, Dan and co-authors Hank Jenkins‐Smith and Donald Braman discuss the "cultural cognition of risk" and establish experimentally that individuals tend to update their beliefs about scientific information through a context of their pre-existing cultural beliefs. In this way, topics such as climate change, nuclear power, and conceal-carry handgun permits often result in people. The findings of this and other studies tell us that on topics such as these, even when people are given proper information about a scientific consensus, individuals still interpret those results through the lens of their pre-existing cultural beliefs. The 'cultural cognition of risk' refers to the tendency of individuals to form risk perceptions that are congenial to their values. The study presents both correlational and experimental evidence confirming that cultural cognition shapes individuals' beliefs about the existence of scientific consensus, and the process by which they form such beliefs, relating to climate change, the disposal of nuclear wastes, and the effect of permitting concealed possession of handguns. The implications of this dynamic for science communication and public policy‐making are discussed.

An Introduction to Discrete-Valued Time Series

A much-needed introduction to the field of discrete-valued time series, with a focus on count-data time series Time series analysis is an essential tool in a wide array of fields, including business, economics, computer science, epidemiology, finance, manufacturing and meteorology, to name just a few. Despite growing interest in discrete-valued time series—especially those arising from counting specific objects or events at specified times—most books on time series give short shrift to that increasingly important subject area. This book seeks to rectify that state of affairs by providing a much needed introduction to discrete-valued time series, with particular focus on count-data time series. The main focus of this book is on modeling. Throughout numerous examples are provided illustrating models currently used in discrete-valued time series applications. Statistical process control, including various control charts (such as cumulative sum control charts), and performance evaluation are treated at length. Classic approaches like ARMA models and the Box-Jenkins program are also featured with the basics of these approaches summarized in an Appendix. In addition, data examples, with all relevant R code, are available on a companion website. Provides a balanced presentation of theory and practice, exploring both categorical and integer-valued series Covers common models for time series of counts as well as for categorical time series, and works out their most important stochastic properties Addresses statistical approaches for analyzing discrete-valued time series and illustrates their implementation with numerous data examples Covers classical approaches such as ARMA models, Box-Jenkins program and how to generate functions Includes dataset examples with all necessary R code provided on a companion website An Introduction to Discrete-Valued Time Series is a valuable working resource for researchers and practitioners in a broad range of fields, including statistics, data science, machine learning, and engineering. It will also be of interest to postgraduate students in statistics, mathematics and economics.

Multiple Time Series Modeling Using the SAS VARMAX Procedure

Aimed at econometricians who have completed at least one course in time series modeling, Multiple Time Series Modeling Using the SAS VARMAX Procedure will teach you the time series analytical possibilities that SAS offers today. Estimations of model parameters are now performed in a split second. For this reason, working through the identifications phase to find the correct model is unnecessary. Instead, several competing models can be estimated, and their fit can be compared instantaneously.

Consequently, for time series analysis, most of the Box and Jenkins analysis process for univariate series is now obsolete. The former days of looking at cross-correlations and pre-whitening are over, because distributed lag models are easily fitted by an automatic lag identification method. The same goes for bivariate and even multivariate models, for which PROC VARMAX models are automatically fitted. For these models, other interesting variations arise: Subjects like Granger causality testing, feedback, equilibrium, cointegration, and error correction are easily addressed by PROC VARMAX.

One problem with multivariate modeling is that it includes many parameters, making parameterizations unstable. This instability can be compensated for by application of Bayesian methods, which are also incorporated in PROC VARMAX. Volatility modeling has now become a standard part of time series modeling, because of the popularity of GARCH models. Both univariate and multivariate GARCH models are supported by PROC VARMAX. This feature is especially interesting for financial analytics in which risk is a focus.

This book teaches with examples. Readers who are analyzing a time series for the first time will find PROC VARMAX easy to use; readers who know more advanced theoretical time series models will discover that PROC VARMAX is a useful tool for advanced model building.

Enterprise Integration: The Essential Guide to Integration Solutions

“The book’s use of real-world case study vignettes really does go to the heart of the subject matter. This stuff is real, it has real applicability to real problems, and, as with most things in life, it shows how it all comes down to real money in the final analysis. This book shows you what your peers are doing to drive costs out of integration projects and to build new applications without re-inventing the entire wheel—just a few new spokes and off you go. This is a good book. Read it.” — Peter Rhys Jenkins, Complex Systems Architect, Candle Corporation “When you get two long-term, acknowledged experts on integration and interoperability together to lay out the current state of the IT universe you expect an immediate return on investment—and this book delivers. It’s common knowledge that 90% of total software lifecycle cost is in maintenance and integration, and that needs to drive IT decision-making. With comprehensive coverage of the integration technology landscape, and clear case studies presented at every turn, this book belongs on every IT manager’s, every system architect’s, and every software developer’s bookshelf.” — Richard Mark Soley, chairman and CEO, Object Management Group “Today’s myriad of integration technologies and alternatives can be daunting. This book presents a framework and process for the evaluation, design, and selection of the appropriate integration technologies to meet your strategic business needs. You will find the templates a particularly useful mechanism to jump-start documentation and drive your decision-making process.” — Ron Zahavi, CIO, Global Business Transformation, Unisys Global Transformation Team; author of Enterprise Application Integration with CORBA “It is refreshing to read a book that presents a good business approach to the integration challenge facing most business leaders today, while at the same time educating them about the major components of the required technologies and management practices changes required. The narrative, examples, and templates establish a common reference point between the business and the technology organizations. A must-read for senior business leaders challenged with the complexities of business integration, as well as Senior IT Leaders challenged with shrinking budgets and lower tolerances for failures.” — Chuck Papageorgiou, managing partner, Ideasphere “Integration has, and will continue to be, one of the success indicators of any enterprise project. Failing to understand the nuances of integration is a critical mistake managers cannot afford to make.” — Marcia Robinson, author of Services Blueprint: Roadmap for Execution “A much-needed book; it ties together the business and technology aspects of information system implementation, emphasizing best practices for really getting things done. I believe that both the technical and business communities will benefit from the in-depth material provided in this book.” — Dr. Barry Horowitz, professor of systems and information engineering, University of Virginia (former CEO, Mitre Corporation) Integration of applications, information, and business process has become today’s #1 IT investment priority. Most enterprise integration books simply explain the technology. This one shows exactly how to apply it. It’s a step-by-step roadmap for your entire project—from the earliest exploratory stages through analysis, design, architecture, and implementation. Renowned enterprise integration experts Beth Gold-Bernstein and William Ruh present best practices and case studies that bring their methodology to life. They address every stage from the decision-maker’s and implementer’s point of view—showing how to align business requirements to specific solutions, systematically reduce risk, and maximize ROI throughout the entire lifecycle. Coverage includes: Supporting strategies, tactics, and business planning: enterprise integration from the business perspective Defining realistic project success indicators and metrics Establishing integration architectures: supporting near-term needs while building reusable infrastructure services for the long-term Adopting metadata architecture and standards Implementing four essential implementation patterns: application, information, composite, and process integration Understanding service integration and implementing service-oriented architectures Providing organizational structure and governance to support effective integration The authors provide detailed plans and specification templates for application integration projects—both in the book and on the CD-ROM. These projects include identifying business drivers and requirements; establishing strategy; and integrating services, information, process, and applications. was written for every member of the integration team: business and IT leaders, strategists, architects, project managers, and technical staff. Regardless of your role, you’ll discover Enterprise Integration where you fit, what to do, and how to drive maximum business value from your next integration project.