As we look back at 2024, we're highlighting some of our favourite episodes of the year, and with 100 of them to choose from, it wasn't easy! The four guests we'll be recapping with are: Lea Pica - A celebrity in the data storytelling and visualisation space. Richie and Lea cover the full picture of data presentation, how to understand your audience, how to leverage hollywood storytelling and more. Out December 19.Alex Banks - Founder of Sunday Signal. Adel and Alex cover Alex’s journey into AI and what led him to create Sunday Signal, the potential of AI, prompt engineering at its most basic level, chain of thought prompting, the future of LLMs and more. Out December 23.Don Chamberlin - The renowned co-inventor of SQL. Richie and Don explore the early development of SQL, how it became standardized, the future of SQL through NoSQL and SQL++ and more. Out December 26.Tom Tunguz - general Partner at Theory Ventures, a $235m VC firm. Richie and Tom explore trends in generative AI, cloud+local hybrid workflows, data security, the future of business intelligence and data analytics, AI in the corporate sector and more. Out December 30. For our 200th episode, we bring you a special guest and taking a walk down memory lane—to the creation and development of one of the most popular programming languages in the world. Don Chamberlin is renowned as the co-inventor of SQL (Structured Query Language), the predominant database language globally, which he developed with Raymond Boyce in the mid-1970s. Chamberlin's professional career began at IBM Research in Yorktown Heights, New York, following a summer internship there during his academic years. His work on IBM's System R project led to the first SQL implementation and significantly advanced IBM’s relational database technology. His contributions were recognized when he was made an IBM Fellow in 2003 and later a Fellow of the Computer History Museum in 2009 for his pioneering work on SQL and database architectures. Chamberlin also contributed to the development of XQuery, an XML query language, as part of the W3C, which became a W3C Recommendation in January 2007. Additionally, he holds fellowships with ACM and IEEE and is a member of the National Academy of Engineering. In the episode, Richie and Don explore his early career at IBM and the development of his interest in databases alongside Ray Boyce, the database task group (DBTG), the transition to relational databases and the early development of SQL, the commercialization and adoption of SQL, how it became standardized, how it evolved and spread via open source, the future of SQL through NoSQL and SQL++ and much more. Links Mentioned in the Show: The first-ever journal paper on SQL. SEQUEL: A Structured English Query LanguageDon’s Book: SQL++ for SQL Users: A TutorialSystem R: Relational approach to database managementSQL CoursesSQL Articles, Tutorials and Code-AlongsRelated Episode: Scaling Enterprise Analytics with...
talk-data.com
Topic
XML
Extensible Markup Language (XML)
5
tagged
Activity Trend
Top Events
Over the past 199 episodes of DataFramed, we’ve heard from people at the forefront of data and AI, and over the past year we’ve constantly looked ahead to the future AI might bring. But all of the technologies and ways of working we’ve witnessed have been built on foundations that were laid decades ago. For our 200th episode, we’re bringing you a special guest and taking a walk down memory lane—to the creation and development of one of the most popular programming languages in the world. Don Chamberlin is renowned as the co-inventor of SQL (Structured Query Language), the predominant database language globally, which he developed with Raymond Boyce in the mid-1970s. Chamberlin's professional career began at IBM Research in Yorktown Heights, New York, following a summer internship there during his academic years. His work on IBM's System R project led to the first SQL implementation and significantly advanced IBM’s relational database technology. His contributions were recognized when he was made an IBM Fellow in 2003 and later a Fellow of the Computer History Museum in 2009 for his pioneering work on SQL and database architectures. Chamberlin also contributed to the development of XQuery, an XML query language, as part of the W3C, which became a W3C Recommendation in January 2007. Additionally, he holds fellowships with ACM and IEEE and is a member of the National Academy of Engineering. In the episode, Richie and Don explore his early career at IBM and the development of his interest in databases alongside Ray Boyce, the database task group (DBTG), the transition to relational databases and the early development of SQL, the commercialization and adoption of SQL, how it became standardized, how it evolved and spread via open source, the future of SQL through NoSQL and SQL++ and much more. Links Mentioned in the Show: The first-ever journal paper on SQL. SEQUEL: A Structured English Query LanguageDon’s Book: SQL++ for SQL Users: A TutorialSystem R: Relational approach to database managementSQL CoursesSQL Articles, Tutorials and Code-AlongsRelated Episode: Scaling Enterprise Analytics with Libby Duane Adams, Chief Advocacy Officer and Co-Founder of AlteryxRewatch sessions from RADAR: The Analytics Edition New to DataCamp? Learn on the go using the DataCamp mobile appEmpower your business with world-class data and AI skills with DataCamp for business
Q&A Episode 25- General Update, Minimum Effective Training Dose, Rate of Force Development, and More
Thanks for tuning in to the Data Driven Strength Podcast!
Timestamps:
00:00 Intro/general update
10:02 Home gym equipment and setting up weighted pushups
23:47 Training frequency and proximity to failure
36:14 Rate of force development discussion
01:09:00 Minimum effective dose for strength and hypertrophy
To learn more about 1 on 1 coaching: https://datadrivenstrength.typeform.com/to/JR3Gzm?typeform-source=linktr.ee
If you'd like to sign up to our email list, please visit the bottom section of our website via this link: https://www.data-drivenstrength.com
If you’d like to submit a question for a future episode please follow the link provided: https://forms.gle/c5aCswfCq6XUDTiAA
Link to Individualized Programming + Self Coaching Toolkit Product Page: https://www.data-drivenstrength.com/individualized-programming
Training to Failure Fatigue Meta discussed:
- https://link.springer.com/article/10.1007/s40279-021-01602-x
Links to RFD papers discussed:
-
https://onlinelibrary.wiley.com/doi/pdf/10.1111/sms.13775?casa_token=EkLP_ZxQKuEAAAAA:pZoYDR1zERHdyTDFJhdkxJByWY4POb2kilm1JQnhf2o4-K-wWGKwk_iPxKpYJPrIwXHfxUfC1eso4yI
-
https://link.springer.com/content/pdf/10.1007/s00421-016-3439-2.pdf
-
https://www.mdpi.com/2076-3417/11/1/45
-
https://www.tandfonline.com/doi/pdf/10.1080/02640414.2015.1119299?needAccess=true
-
https://europepmc.org/article/med/34100789
-
https://pubmed.ncbi.nlm.nih.gov/29577974/
-
https://www.researchgate.net/publication/325748706_Functional_and_physiological_adaptations_following_concurrent_training_using_sets_with_and_without_concentric_failure_in_elderly_men_A_randomized_clinical_trial
-
https://pubmed.ncbi.nlm.nih.gov/32049887/
-
https://journals.humankinetics.com/view/journals/ijspp/14/1/article-p46.xml
Links to MED papers discussed:
-
https://pubmed.ncbi.nlm.nih.gov/21131862/
-
https://www.frontiersin.org/articles/10.3389/fphys.2021.735932/full
-
https://pubmed.ncbi.nlm.nih.gov/31373973/
Follow us on Instagram at: @datadrivenstrength @zac.datadrivenstrength @josh.datadrivenstrength @jake.datadrivenstrength @drake.datadrivenstrength
Summary
Data integration and routing is a constantly evolving problem and one that is fraught with edge cases and complicated requirements. The Apache NiFi project models this problem as a collection of data flows that are created through a self-service graphical interface. This framework provides a flexible platform for building a wide variety of integrations that can be managed and scaled easily to fit your particular needs. In this episode project members Kevin Doran and Andy LoPresto discuss the ways that NiFi can be used, how to start using it in your environment, and plans for future development. They also explained how it fits in the broad landscape of data tools, the interesting and challenging aspects of the project, and how to build new extensions.
Preamble
Hello and welcome to the Data Engineering Podcast, the show about modern data management When you’re ready to build your next pipeline you’ll need somewhere to deploy it, so check out Linode. With private networking, shared block storage, node balancers, and a 40Gbit network, all controlled by a brand new API you’ve got everything you need to run a bullet-proof data platform. Go to dataengineeringpodcast.com/linode to get a $20 credit and launch a new server in under a minute. Are you struggling to keep up with customer request and letting errors slip into production? Want to try some of the innovative ideas in this podcast but don’t have time? DataKitchen’s DataOps software allows your team to quickly iterate and deploy pipelines of code, models, and data sets while improving quality. Unlike a patchwork of manual operations, DataKitchen makes your team shine by providing an end to end DataOps solution with minimal programming that uses the tools you love. Join the DataOps movement and sign up for the newsletter at datakitchen.io/de today. After that learn more about why you should be doing DataOps by listening to the Head Chef in the Data Kitchen at dataengineeringpodcast.com/datakitchen Go to dataengineeringpodcast.com to subscribe to the show, sign up for the mailing list, read the show notes, and get in touch. Your host is Tobias Macey and today I’m interviewing Kevin Doran and Andy LoPresto about Apache NiFi
Interview
Introduction How did you get involved in the area of data management? Can you start by explaining what NiFi is? What is the motivation for building a GUI as the primary interface for the tool when the current trend is to represent everything as code? How did you get involved with the project?
Where does it sit in the broader landscape of data tools?
Does the data that is processed by NiFi flow through the servers that it is running on (á la Spark/Flink/Kafka), or does it orchestrate actions on other systems (á la Airflow/Oozie)?
How do you manage versioning and backup of data flows, as well as promoting them between environments?
One of the advertised features is tracking provenance for data flows that are managed by NiFi. How is that data collected and managed?
What types of reporting are available across this information?
What are some of the use cases or requirements that lend themselves well to being solved by NiFi?
When is NiFi the wrong choice?
What is involved in deploying and scaling a NiFi installation?
What are some of the system/network parameters that should be considered? What are the scaling limitations?
What have you found to be some of the most interesting, unexpected, and/or challenging aspects of building and maintaining the NiFi project and community? What do you have planned for the future of NiFi?
Contact Info
Kevin Doran
@kevdoran on Twitter Email
Andy LoPresto
@yolopey on Twitter Email
Parting Question
From your perspective, what is the biggest gap in the tooling or technology for data management today?
Links
NiFi HortonWorks DataFlow HortonWorks Apache Software Foundation Apple CSV XML JSON Perl Python Internet Scale Asset Management Documentum DataFlow NSA (National Security Agency) 24 (TV Show) Technology Transfer Program Agile Software Development Waterfall Spark Flink Kafka Oozie Luigi Airflow FluentD ETL (Extract, Transform, and Load) ESB (Enterprise Service Bus) MiNiFi Java C++ Provenance Kubernetes Apache Atlas Data Governance Kibana K-Nearest Neighbors DevOps DSL (Domain Specific Language) NiFi Registry Artifact Repository Nexus NiFi CLI Maven Archetype IoT Docker Backpressure NiFi Wiki TLS (Transport Layer Security) Mozilla TLS Observatory NiFi Flow Design System Data Lineage GDPR (General Data Protection Regulation)
The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA Support Data Engineering Podcast
Summary With the wealth of formats for sending and storing data it can be difficult to determine which one to use. In this episode Doug Cutting, creator of Avro, and Julien Le Dem, creator of Parquet, dig into the different classes of serialization formats, what their strengths are, and how to choose one for your workload. They also discuss the role of Arrow as a mechanism for in-memory data sharing and how hardware evolution will influence the state of the art for data formats.
Preamble
Hello and welcome to the Data Engineering Podcast, the show about modern data infrastructure When you’re ready to launch your next project you’ll need somewhere to deploy it. Check out Linode at dataengineeringpodcast.com/linode and get a $20 credit to try out their fast and reliable Linux virtual servers for running your data pipelines or trying out the tools you hear about on the show. Continuous delivery lets you get new features in front of your users as fast as possible without introducing bugs or breaking production and GoCD is the open source platform made by the people at Thoughtworks who wrote the book about it. Go to dataengineeringpodcast.com/gocd to download and launch it today. Enterprise add-ons and professional support are available for added peace of mind. Go to dataengineeringpodcast.com to subscribe to the show, sign up for the newsletter, read the show notes, and get in touch. You can help support the show by checking out the Patreon page which is linked from the site. To help other people find the show you can leave a review on iTunes, or Google Play Music, and tell your friends and co-workers This is your host Tobias Macey and today I’m interviewing Julien Le Dem and Doug Cutting about data serialization formats and how to pick the right one for your systems.
Interview
Introduction How did you first get involved in the area of data management? What are the main serialization formats used for data storage and analysis? What are the tradeoffs that are offered by the different formats? How have the different storage and analysis tools influenced the types of storage formats that are available? You’ve each developed a new on-disk data format, Avro and Parquet respectively. What were your motivations for investing that time and effort? Why is it important for data engineers to carefully consider the format in which they transfer their data between systems?
What are the switching costs involved in moving from one format to another after you have started using it in a production system?
What are some of the new or upcoming formats that you are each excited about? How do you anticipate the evolving hardware, patterns, and tools for processing data to influence the types of storage formats that maintain or grow their popularity?
Contact Information
Doug:
cutting on GitHub Blog @cutting on Twitter
Julien
Email @J_ on Twitter Blog julienledem on GitHub
Links
Apache Avro Apache Parquet Apache Arrow Hadoop Apache Pig Xerox Parc Excite Nutch Vertica Dremel White Paper
Twitter Blog on Release of Parquet
CSV XML Hive Impala Presto Spark SQL Brotli ZStandard Apache Drill Trevni Apache Calcite
The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA
Support Data Engineering Podcast