# Data Manipulation and Aggregation in KDB+ and Q

###

#### Introduction

KDB+ and Q excel in handling large datasets with unparalleled speed and efficiency. This chapter delves into the core functionalities of data manipulation and aggregation, showcasing how to transform and summarize data within the KDB+ environment.

#### Data Selection and Filtering

Q provides concise and powerful tools for extracting specific data subsets.

**Basic Selection**

Code snippet

```
// Sample data
trades:([] time:`time$til 10:00:01; price:100+til 10; symbol:`AAPL`GOOG`IBM)

// Select first 5 rows
first 5 trades

// Select last 3 rows
last 3 trades
```

**Conditional Selection**

Code snippet

```
// Select trades where price is greater than 105
select from trades where price > 105

// Select trades for AAPL
select from trades where symbol=`AAPL
```

#### Data Transformation

Q offers a rich set of functions for modifying data structures and values.

**Adding and Removing Columns**

Code snippet

```
// Add a new column
trades[`volume]:1000*til 10

// Remove a column
del volume from trades
```

**Grouping and Sorting**

Code snippet

```
// Group by symbol and calculate average price
avg price by symbol from trades

// Sort trades by time
trades: trades[asc time]
```

**Joining Tables**

Code snippet

```
// Create another table
quotes:([] time:`time$til 10:00:01; open:95+til 10; symbol:`AAPL`GOOG`IBM)

// Join trades and quotes on time and symbol
joinedTable: aj[`time`symbol; trades; quotes]
```

#### Aggregation Functions

Q provides a suite of built-in functions for summarizing data.

**Basic Aggregations**

Code snippet

```
// Calculate sum, average, min, max, and count of price
sum price from trades
avg price from trades
min price from trades
max price from trades
count price from trades
```

**Grouped Aggregations**

Code snippet

```
// Calculate average price by symbol
avg price by symbol from trades
```

**Window Functions**

Code snippet

```
// Calculate moving average of price
mavg: moving_avg[price; 3] // Calculate 3-period moving average
```

#### Advanced Data Manipulation

Q offers powerful tools for complex data transformations.

**Functional Programming**

Code snippet

```
// Calculate square of each price
sq price

// Apply a custom function to each row
f[x]: {x[`price] * x[`volume]}
trades[`turnover]: f each trades
```

**Query and Update**

Code snippet

```
// Update prices by 10% for AAPL
update price: price * 1.1 from trades where symbol=`AAPL
```

#### Performance Optimization

KDB+ excels in performance. Consider these tips:

* **Vectorized operations:** Utilize vector operations for efficient computations.
* **Data types:** Use appropriate data types for optimal memory usage.
* **Indexing:** Create indexes on frequently queried columns.
* **Compression:** Compress data to reduce memory footprint.

#### Summary

This chapter has introduced essential data manipulation and aggregation techniques in KDB+ and Q. Mastering these concepts will empower you to efficiently extract insights from your data. In the subsequent chapters, we will explore time series analysis, financial calculations, and advanced data engineering using KDB+.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://alex-semenov-ie.gitbook.io/book/master/data-manipulation-and-aggregation-in-kdb+-and-q.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
