![]() With support for all JOIN types and three JOIN algorithms, including broadcast join, shuffle distributed hash join, and lookup join, Apache Pinot delivers versatility and scalability. ![]() These include predicate push-down to individual tables and using indexing and pruning to reduce scanning which speeds up query processing, smart data layout considerations to minimize data shuffling, and query hints for fine-tuning JOIN operations. ![]() JOIN optimization strategies play a pivotal role in Apache Pinot 1.0. Additionally, a new planner supporting full SQL semantics enhances Pinot's analytical capabilities. This engine alleviates computational burdens by offloading tasks from brokers to a dedicated intermediate compute stage. Underpinning this innovation is the multi-stage query engine, introduced a year ago, which efficiently manages complex analytical queries, including JOIN operations. Improved Pinot-Spark Integration - Spark3 CompatibilityĪpache Pinot 1.0 introduces native query-time JOIN support equipping Pinot to handle a broad spectrum of JOIN scenarios providing full coverage from user-facing analytics all the way up to ad hoc analytics.Encode User-Specified Compressed Log Processor (CLP) During Ingestion.Improved Upserts - Deletion and Compaction Support.Join Support - Part of the Multi-Stage Query Engine.Otherwise, let’s have a look at some of the highlighted changes: And if you’d like a video treatment of many of the main features in 1.0, including some helpful animations, watch here: This post is a summary of the high points, but you can find a full list of everything included in the release notes. (While you can read more below, check out the accompanying blog by Apache Pinot PMC Neha Pawar about using query-time JOINs here). While this query engine has been available within Apache Pinot already (since release 0.11.0), with the release of Apache Pinot 1.0 this feature is functionally complete. The new engine also resolves this by introducing intermediary compute stages on the query servers, and brings Apache Pinot closer to full ANSI SQL semantics. The original engine works very well for simpler filter-and-aggregate queries, but the broker could become a bottleneck for more complex queries. The most critical part of the 1.0 release is undoubtedly the Multi-Stage Query Engine, which permits Apache Pinot users to do performant and scalable query-time JOINs. Over the past year, since September 2022, engineers across the Apache Pinot community have closed over 300 issues to provide new features, optimize performance, expand test coverage, and squash bugs.įeatures are also a key thing that makes a new release worthy of “1.0” status. The first foundational pillar of what makes something worthy of a “1.0” release is software quality. With all of these features and capabilities, Apache Pinot moves farther and farther from mere database status, and becomes more of a complete platform that can tackle entire new classes of use cases that were beyond its capabilities in earlier days.įirst let’s look at what Apache Pinot 1.0 itself is delivering. Improving ANSI SQL Compliance - to that end, we’ve added better NULL handling, window functions, and as stated above, the capability for native JOINs.Handling Semi-Structured/Unstructured Data - Pinot can easily index JSON and text data types at scale.Pluggable architecture - a broad user base requires the ability to extend the database with new customizable index types, routing strategies and storage options.This we will discuss in more detail below. Query-time Native JOINs - it was important to get this right, so that they were performant and scalable, allowing high QPS.Upserts - data-in-motion tends to stay in motion, and one of the cornerstone capabilities of Apache Pinot is upsert support to handle upsert mutations in real-time.Let’s look at how much innovation has gone into Apache Pinot over the years: Apache Pinot in 2023 is continuously evolving to address emerging needs in the real-time analytics community. Back then it was developed at a single company with a single use case in mind: to power “who viewed my profile?” Over the ensuing decade the Apache Pinot community expanded to be embraced by many other organizations, and those organizations have expanded its capabilities to address new use cases. ![]() By: Hubert Dulay, Mayank Shrivastava, Neha Pawar What Makes a “1.0 Release?” #Īpache Pinot has continuously evolved since the project’s inception within LinkedIn in 2013.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |