Data Manipulation and Aggregation in KDB+ and Q

Introduction

KDB+ and Q excel in handling large datasets with unparalleled speed and efficiency. This chapter delves into the core functionalities of data manipulation and aggregation, showcasing how to transform and summarize data within the KDB+ environment.

Data Selection and Filtering

Q provides concise and powerful tools for extracting specific data subsets.

Basic Selection

Code snippet

// Sample data
trades:([] time:`time$til 10:00:01; price:100+til 10; symbol:`AAPL`GOOG`IBM)

// Select first 5 rows
first 5 trades

// Select last 3 rows
last 3 trades

Conditional Selection

Code snippet

// Select trades where price is greater than 105
select from trades where price > 105

// Select trades for AAPL
select from trades where symbol=`AAPL

Data Transformation

Q offers a rich set of functions for modifying data structures and values.

Adding and Removing Columns

Code snippet

// Add a new column
trades[`volume]:1000*til 10

// Remove a column
del volume from trades

Grouping and Sorting

Code snippet

// Group by symbol and calculate average price
avg price by symbol from trades

// Sort trades by time
trades: trades[asc time]

Joining Tables

Code snippet

// Create another table
quotes:([] time:`time$til 10:00:01; open:95+til 10; symbol:`AAPL`GOOG`IBM)

// Join trades and quotes on time and symbol
joinedTable: aj[`time`symbol; trades; quotes]

Aggregation Functions

Q provides a suite of built-in functions for summarizing data.

Basic Aggregations

Code snippet

// Calculate sum, average, min, max, and count of price
sum price from trades
avg price from trades
min price from trades
max price from trades
count price from trades

Grouped Aggregations

Code snippet

// Calculate average price by symbol
avg price by symbol from trades

Window Functions

Code snippet

// Calculate moving average of price
mavg: moving_avg[price; 3] // Calculate 3-period moving average

Advanced Data Manipulation

Q offers powerful tools for complex data transformations.

Functional Programming

Code snippet

// Calculate square of each price
sq price

// Apply a custom function to each row
f[x]: {x[`price] * x[`volume]}
trades[`turnover]: f each trades

Query and Update

Code snippet

// Update prices by 10% for AAPL
update price: price * 1.1 from trades where symbol=`AAPL

Performance Optimization

KDB+ excels in performance. Consider these tips:

  • Vectorized operations: Utilize vector operations for efficient computations.

  • Data types: Use appropriate data types for optimal memory usage.

  • Indexing: Create indexes on frequently queried columns.

  • Compression: Compress data to reduce memory footprint.

Summary

This chapter has introduced essential data manipulation and aggregation techniques in KDB+ and Q. Mastering these concepts will empower you to efficiently extract insights from your data. In the subsequent chapters, we will explore time series analysis, financial calculations, and advanced data engineering using KDB+.

Last updated