vastdb.table

VAST Database table.

class vastdb.table.Projection(name: str, table: Table, handle: int, stats: TableStats)[source]

Bases: object

VAST semi-sorted projection.

property bucket

Return bucket.

columns() pyarrow.Schema[source]

Return this projections’ columns as an Arrow schema.

drop() None[source]

Drop this projection.

handle: int
name: str
rename(new_name) None[source]

Rename this projection.

property schema

Return schema.

stats: TableStats
table: Table
property tx

Return transaction.

class vastdb.table.SelectSplitState(query_data_request, table: Table, split_id: int, config: QueryConfig)[source]

Bases: object

State of a specific query split execution.

property done

Returns true iff the pagination over.

process_split(api: VastdbApi, record_batches_queue: Queue[pyarrow.RecordBatch], check_stop: Callable)[source]

Execute a sequence of QueryData requests, and queue the parsed RecordBatch objects.

Can be called repeatedly, to support resuming the query after a disconnection / retriable error.

class vastdb.table.Table(name: str, schema: Schema, handle: int, _imports_table: bool)[source]

Bases: object

VAST Table.

add_column(new_column: pyarrow.Schema) None[source]

Add a new column.

arrow_schema: pyarrow.Schema
property bucket

Return bucket.

columns() pyarrow.Schema[source]

Return columns’ metadata.

create_imports_table(fail_if_exists=True) Table[source]

Create imports table.

create_projection(projection_name: str, sorted_columns: List[str], unsorted_columns: List[str]) Projection[source]

Create a new semi-sorted projection.

delete(rows: pyarrow.RecordBatch | pyarrow.Table) None[source]

Delete a subset of rows in this table.

Row IDs are specified using a special field (named “$row_id” of uint64 type).

drop() None[source]

Drop this table.

drop_column(column_to_drop: pyarrow.Schema) None[source]

Drop an existing column.

get_stats() TableStats[source]

Get the statistics of this table.

handle: int
import_files(files_to_import: Iterable[str], config: ImportConfig | None = None) None[source]

Import a list of Parquet files into this table.

The files must be on VAST S3 server and be accessible using current credentials.

import_partitioned_files(files_and_partitions: Dict[str, pyarrow.RecordBatch], config: ImportConfig | None = None) None[source]

Import a list of Parquet files into this table.

The files must be on VAST S3 server and be accessible using current credentials. Each file must have its own partition values defined as an Arrow RecordBatch.

imports_table() Table | None[source]

Get the imports table of this table.

insert(rows: pyarrow.RecordBatch | pyarrow.Table)[source]

Insert a RecordBatch into this table.

insert_in_column_batches(rows: pyarrow.RecordBatch)[source]

Split the RecordBatch into max_columns that can be inserted in single RPC.

Insert first MAX_COLUMN_IN_BATCH columns and get the row_ids. Then loop on the rest of the columns and update in groups of MAX_COLUMN_IN_BATCH.

name: str
property path

Return table’s path.

projection(name: str) Projection[source]

Get a specific semi-sorted projection of this table.

projections(projection_name=None) Iterable[Projection][source]

List all semi-sorted projections of this table.

rename(new_name) None[source]

Rename this table.

rename_column(current_column_name: str, new_column_name: str) None[source]

Rename an existing column.

schema: Schema
select(columns: List[str] | None = None, predicate: ibis.expr.types.BooleanColumn | ibis.common.deferred.Deferred = None, config: QueryConfig | None = None, *, internal_row_id: bool = False) pyarrow.RecordBatchReader[source]

Execute a query over this table.

To read a subset of the columns, specify their names via columns argument. Otherwise, all columns will be read.

In order to apply a filter, a predicate can be specified. See https://github.com/vast-data/vastdb_sdk/blob/main/README.md#filters-and-projections for more details.

Query-execution configuration options can be specified via the optional config argument.

property stats

Fetch table’s statistics from server.

property tx

Return transaction.

update(rows: pyarrow.RecordBatch | pyarrow.Table, columns: List[str] | None = None) None[source]

Update a subset of cells in this table.

Row IDs are specified using a special field (named “$row_id” of uint64 type) - this function assume that this special field is part of arguments.

A subset of columns to be updated can be specified via the columns argument.

class vastdb.table.TableStats(num_rows: int, size_in_bytes: int, is_external_rowid_alloc: bool = False, endpoints: Tuple[str, ...] = ())[source]

Bases: object

Table-related information.

endpoints: Tuple[str, ...] = ()
is_external_rowid_alloc: bool = False
num_rows: int
size_in_bytes: int