Pipelines reference - Preparers Innovation Release
This reference documentation for Pipelines preparers includes information on the functions and views available in the aidb extension related to preparers. See Usage and Examples for more details.
Views
aidb.preparers
Also referenceable as aidb.preps, the aidb.preparers view contains information about the configured data preparers. It includes the following columns:
| Column | Type | Description |
|---|---|---|
| id | INTEGER | |
| name | TEXT | Name of the preparer. |
| operation | aidb.DataPreparationOperation | The kind of processing step to perform. |
| destination_schema | TEXT | Schema of the destination table to store the output data. |
| destination_table | TEXT | Name of the destination table to store the output data. |
| destination_key_column | TEXT | Column of the destination table that references the key in source data. |
| destination_data_column | TEXT | Column of the destination table to store the output data. |
| options | JSONB | Configuration options for the data preparation operation. Uses the same API as the data preparation primitives. |
| source_type | TEXT | Type of source data the preparer is working with. Can be either 'Table' or 'Volume'. |
| source_schema | TEXT | Schema of the table with the source data the preparer will process. Applies only to preparers of Table source type. |
| source_table | TEXT | Name of the table with the source data the preparer will process. Applies only to preparers of Table source type. |
| source_data_column | TEXT | Column in the source table with the source data the preparer will process. Applies only to preparers of Table source type. |
| source_key_column | TEXT | Name of the key column in the source table for reference with the output processed data. Applies only to preparers of Table source type. |
| source_volume_name | TEXT | Name of the volume to use as a data source. Applies only to preparers of Volume source type. |
Types
aidb.DataPreparationOperation
The aidb.DataPreparationOperation type is an enum that represents the different types of preprocessing steps that can be performed:
- ChunkText
- SummarizeText
- ParseHtml
- ParsePdf
- PerformOcr
Functions
aidb.create_table_preparer
Creates a preparer with a source data table.
Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
| name | TEXT | Required | Name of the preparer |
| operation | aidb.DataPreparationOperation | Required | Type of data preparation operation |
| source_table | TEXT | Required | Name of the source data table |
| source_data_column | TEXT | Required | Column in the source table containing the raw data |
| destination_table | TEXT | Required | Name of the destination table |
| destination_data_column | TEXT | Required | Column in the destination table for processed data |
| source_key_column | TEXT | 'id' | Unique column in the source table to use as key to reference the rows. |
| destination_key_column | TEXT | 'id' | Key column in the destination table that references the source_key_column |
| options | JSONB | '{}'::JSONB | Configuration options for the data preparation operation. Uses the same API as the data preparation primitives. |
aidb.create_preparer_for_table (deprecated)
Replaced by aidb.create_table_preparer with same arguments.
aidb.create_volume_preparer
Creates a preparer for a given PGFS volume.
Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
| name | TEXT | Required | Name of the preparer |
| operation | aidb.DataPreparationOperation | Required | Type of data preparation operation |
| source_volume_name | TEXT | Required | Name of the source volume containing the raw data |
| destination_table | TEXT | Required | Name of the destination table |
| destination_data_column | TEXT | Required | Column in the destination table for processed data |
| destination_key_column | TEXT | 'id' | Key column in the destination table that uniquely identifies the processed data |
| options | JSONB | '{}'::JSONB | Configuration options for the data preparation operation. Uses the same API as the data preparation primitives. |
| invoker_role | TEXT | NULL | Role owning the tables, pipelines, and background job execution. |
aidb.create_preparer_for_volume (deprecated)
Replaced by aidb.create_volume_preparer with same arguments.
aidb.bulk_data_preparation
Executes the configured data preparation operation on all data from the specified preparer’s source.
Parameters
| Parameter | Type | Description |
|---|---|---|
| preparer_name | TEXT | Name of the preparer |
aidb.set_auto_preparer
Sets the automatic processing mode for this preparer. This function is used to enable and disable automatic data preparation: Live mode enables the Postgres trigger-based automation and Disabled disables all automation.
Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
| preparer_name | TEXT | Name of preparer | |
| mode | aidb.PipelineAutoProcessingMode | Desired processing mode |
Example
SELECT aidb.set_auto_preparer('test_preparer', 'Live'); SELECT aidb.set_auto_preparer('test_preparer', 'Disabled');
aidb.set_preparer_auto_processing (deprecated)
Replaced by aidb.set_auto_preparer with same arguments.
aidb.delete_preparer
Deletes the preparer's configuration.
Parameters
| Parameter | Type | Description |
|---|---|---|
| preparer_name | TEXT | Name of preparer to delete |
Note
This function doesn't delete the destination table or any data in it.
Helper functions
Helper functions simplify the creation of configuration JSON for data preparation operations by providing a structured way to specify options.
aidb.chunk_text_config
Creates a configuration JSON object for the ChunkText operation.
Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
| desired_length | INTEGER | Required | Target chunk size (unit depends on strategy) |
| max_length | INTEGER | NULL | Maximum chunk size (unit depends on strategy) |
| overlap_length | INTEGER | NULL | Amount to overlap between consecutive chunks (unit depends on strategy) |
| strategy | TEXT | NULL | Chunking strategy: 'chars' (default) or 'words'. Determines the unit for all size parameters |
Returns
JSONB configuration object for use with ChunkText operation.
Examples
-- Basic chunking with desired length only SELECT aidb.chunk_text_config(100); -- Chunking with max length SELECT aidb.chunk_text_config(100, 150); -- Chunking with overlap SELECT aidb.chunk_text_config(100, 150, 20); -- Character-based chunking with overlap SELECT aidb.chunk_text_config(100, 120, 10, 'chars'); -- Use in a preparer SELECT aidb.create_table_preparer( name => 'my_chunker', operation => 'ChunkText', source_table => 'source_data', source_data_column => 'text_content', destination_table => 'chunked_data', options => aidb.chunk_text_config(120, 150, 15) );
aidb.summarize_text_config
Creates a configuration JSON object for the summarize text operation.
Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
| model | TEXT | Required | Name of the model to use for summarization |
| chunk_config | JSONB | NULL | Optional chunking configuration (created with chunk_text_config) |
| prompt | TEXT | NULL | Custom prompt to guide the summarization |
| strategy | TEXT | NULL | Summarization strategy: 'append' (default) or 'reduce' |
| reduction_factor | INTEGER | NULL | Used with 'reduce' strategy to control how aggressively text is reduced in each iteration (default: 3) |
Returns
JSONB configuration object for use with summarize text operation or summarize_text_aggregate function.
Examples
-- Basic summarization with model only SELECT aidb.summarize_text_config('my_t5_model'); -- Summarization with chunking SELECT aidb.summarize_text_config( 'my_t5_model', aidb.chunk_text_config(100) ); -- Summarization with custom prompt and append strategy SELECT aidb.summarize_text_config( 'my_t5_model', aidb.chunk_text_config(80, 80, 5, 'words'), 'create a concise summary' ); -- Summarization with reduce strategy SELECT aidb.summarize_text_config( 'my_t5_model', aidb.chunk_text_config(100, 100, 5, 'words'), 'summarize the key points', 'reduce', 5 ); -- Use in a preparer SELECT aidb.create_table_preparer( name => 'my_summarizer', operation => 'SummarizeText', source_table => 'source_data', source_data_column => 'text_content', destination_table => 'summarized_data', options => aidb.summarize_text_config('my_t5_model') ); -- Use with aggregate function SELECT category, aidb.summarize_text_aggregate( content, aidb.summarize_text_config('my_t5_model')::json ORDER BY id ) AS summary FROM my_table GROUP BY category;
- On this page
- Views
- Types
- Functions
- Helper functions