Pipelines reference - Preparers Innovation Release

Suggest edits

This documentation covers the current Innovation Release of EDB Postgres AI. You may also want the docs for the current LTS version.

This reference documentation for Pipelines preparers includes information on the functions and views available in the aidb extension related to preparers. See Usage and Examples for more details.

Views

`aidb.preparers`

Also referenceable as aidb.preps, the aidb.preparers view contains information about the configured data preparers. It includes the following columns:

Column	Type	Description
id	INTEGER
name	TEXT	Name of the preparer.
operation	aidb.DataPreparationOperation	The kind of processing step to perform.
destination_schema	TEXT	Schema of the destination table to store the output data.
destination_table	TEXT	Name of the destination table to store the output data.
destination_key_column	TEXT	Column of the destination table that references the key in source data.
destination_data_column	TEXT	Column of the destination table to store the output data.
options	JSONB	Configuration options for the data preparation operation. Uses the same API as the data preparation primitives.
source_type	TEXT	Type of source data the preparer is working with. Can be either 'Table' or 'Volume'.
source_schema	TEXT	Schema of the table with the source data the preparer will process. Applies only to preparers of `Table` source type.
source_table	TEXT	Name of the table with the source data the preparer will process. Applies only to preparers of `Table` source type.
source_data_column	TEXT	Column in the source table with the source data the preparer will process. Applies only to preparers of `Table` source type.
source_key_column	TEXT	Name of the key column in the source table for reference with the output processed data. Applies only to preparers of `Table` source type.
source_volume_name	TEXT	Name of the volume to use as a data source. Applies only to preparers of `Volume` source type.

Types

`aidb.DataPreparationOperation`

The aidb.DataPreparationOperation type is an enum that represents the different types of preprocessing steps that can be performed:

ChunkText
SummarizeText
ParseHtml
ParsePdf
PerformOcr

Functions

`aidb.create_table_preparer`

Creates a preparer with a source data table.

Parameters

Parameter	Type	Default	Description
name	TEXT	Required	Name of the preparer
operation	aidb.DataPreparationOperation	Required	Type of data preparation operation
source_table	TEXT	Required	Name of the source data table
source_data_column	TEXT	Required	Column in the source table containing the raw data
destination_table	TEXT	Required	Name of the destination table
destination_data_column	TEXT	Required	Column in the destination table for processed data
source_key_column	TEXT	'id'	Unique column in the source table to use as key to reference the rows.
destination_key_column	TEXT	'id'	Key column in the destination table that references the `source_key_column`
options	JSONB	'{}'::JSONB	Configuration options for the data preparation operation. Uses the same API as the data preparation primitives.

`aidb.create_preparer_for_table` (deprecated)

Replaced by aidb.create_table_preparer with same arguments.

`aidb.create_volume_preparer`

Creates a preparer for a given PGFS volume.

Parameters

Parameter	Type	Default	Description
name	TEXT	Required	Name of the preparer
operation	aidb.DataPreparationOperation	Required	Type of data preparation operation
source_volume_name	TEXT	Required	Name of the source volume containing the raw data
destination_table	TEXT	Required	Name of the destination table
destination_data_column	TEXT	Required	Column in the destination table for processed data
destination_key_column	TEXT	'id'	Key column in the destination table that uniquely identifies the processed data
options	JSONB	'{}'::JSONB	Configuration options for the data preparation operation. Uses the same API as the data preparation primitives.
invoker_role	TEXT	NULL	Role owning the tables, pipelines, and background job execution.

`aidb.create_preparer_for_volume` (deprecated)

Replaced by aidb.create_volume_preparer with same arguments.

`aidb.bulk_data_preparation`

Executes the configured data preparation operation on all data from the specified preparer’s source.

Parameters

Parameter	Type	Description
preparer_name	TEXT	Name of the preparer

`aidb.set_auto_preparer`

Sets the automatic processing mode for this preparer. This function is used to enable and disable automatic data preparation: Live mode enables the Postgres trigger-based automation and Disabled disables all automation.

Parameters

Parameter	Type	Default	Description
preparer_name	TEXT		Name of preparer
mode	aidb.PipelineAutoProcessingMode		Desired processing mode

Example

SELECT aidb.set_auto_preparer('test_preparer', 'Live');
SELECT aidb.set_auto_preparer('test_preparer', 'Disabled');

`aidb.set_preparer_auto_processing` (deprecated)

Replaced by aidb.set_auto_preparer with same arguments.

`aidb.delete_preparer`

Deletes the preparer's configuration.

Parameters

Parameter	Type	Description
preparer_name	TEXT	Name of preparer to delete

Note

This function doesn't delete the destination table or any data in it.

Helper functions

Helper functions simplify the creation of configuration JSON for data preparation operations by providing a structured way to specify options.

`aidb.chunk_text_config`

Creates a configuration JSON object for the ChunkText operation.

Parameters

Parameter	Type	Default	Description
desired_length	INTEGER	Required	Target chunk size (unit depends on `strategy`)
max_length	INTEGER	NULL	Maximum chunk size (unit depends on `strategy`)
overlap_length	INTEGER	NULL	Amount to overlap between consecutive chunks (unit depends on `strategy`)
strategy	TEXT	NULL	Chunking strategy: `'chars'` (default) or `'words'`. Determines the unit for all size parameters

Returns

JSONB configuration object for use with ChunkText operation.

Examples

-- Basic chunking with desired length only
SELECT aidb.chunk_text_config(100);

-- Chunking with max length
SELECT aidb.chunk_text_config(100, 150);

-- Chunking with overlap
SELECT aidb.chunk_text_config(100, 150, 20);

-- Character-based chunking with overlap
SELECT aidb.chunk_text_config(100, 120, 10, 'chars');

-- Use in a preparer
SELECT aidb.create_table_preparer(
    name => 'my_chunker',
    operation => 'ChunkText',
    source_table => 'source_data',
    source_data_column => 'text_content',
    destination_table => 'chunked_data',
    options => aidb.chunk_text_config(120, 150, 15)
);

`aidb.summarize_text_config`

Creates a configuration JSON object for the summarize text operation.

Parameters

Parameter	Type	Default	Description
model	TEXT	Required	Name of the model to use for summarization
chunk_config	JSONB	NULL	Optional chunking configuration (created with `chunk_text_config`)
prompt	TEXT	NULL	Custom prompt to guide the summarization
strategy	TEXT	NULL	Summarization strategy: `'append'` (default) or `'reduce'`
reduction_factor	INTEGER	NULL	Used with `'reduce'` strategy to control how aggressively text is reduced in each iteration (default: 3)

Returns

JSONB configuration object for use with summarize text operation or summarize_text_aggregate function.

Examples

-- Basic summarization with model only
SELECT aidb.summarize_text_config('my_t5_model');

-- Summarization with chunking
SELECT aidb.summarize_text_config(
    'my_t5_model',
    aidb.chunk_text_config(100)
);

-- Summarization with custom prompt and append strategy
SELECT aidb.summarize_text_config(
    'my_t5_model',
    aidb.chunk_text_config(80, 80, 5, 'words'),
    'create a concise summary'
);

-- Summarization with reduce strategy
SELECT aidb.summarize_text_config(
    'my_t5_model',
    aidb.chunk_text_config(100, 100, 5, 'words'),
    'summarize the key points',
    'reduce',
    5
);

-- Use in a preparer
SELECT aidb.create_table_preparer(
    name => 'my_summarizer',
    operation => 'SummarizeText',
    source_table => 'source_data',
    source_data_column => 'text_content',
    destination_table => 'summarized_data',
    options => aidb.summarize_text_config('my_t5_model')
);

-- Use with aggregate function
SELECT
    category,
    aidb.summarize_text_aggregate(
        content,
        aidb.summarize_text_config('my_t5_model')::json ORDER BY id
    ) AS summary
FROM my_table
GROUP BY category;

← Prev

Pipelines models reference

↑ Up

Pipelines reference

Pipelines knowledge bases reference

Pipelines reference - Preparers Innovation Release

Views

aidb.preparers

Types

aidb.DataPreparationOperation

Functions

aidb.create_table_preparer

Parameters

aidb.create_preparer_for_table (deprecated)

aidb.create_volume_preparer

Parameters

aidb.create_preparer_for_volume (deprecated)

aidb.bulk_data_preparation

Parameters

aidb.set_auto_preparer

Parameters

Example

aidb.set_preparer_auto_processing (deprecated)

aidb.delete_preparer

Parameters

Note

Helper functions

aidb.chunk_text_config

Parameters

Returns

Examples

aidb.summarize_text_config

Parameters

Returns

Examples

← Prev

↑ Up

Next →

`aidb.preparers`

`aidb.DataPreparationOperation`

`aidb.create_table_preparer`

`aidb.create_preparer_for_table` (deprecated)

`aidb.create_volume_preparer`

`aidb.create_preparer_for_volume` (deprecated)

`aidb.bulk_data_preparation`

`aidb.set_auto_preparer`

`aidb.set_preparer_auto_processing` (deprecated)

`aidb.delete_preparer`

`aidb.chunk_text_config`

`aidb.summarize_text_config`