vastdb.table

VAST Database table.

class vastdb.table.Projection(name: str, table: Table, handle: int, stats: TableStats)[source]

Bases: object

VAST semi-sorted projection.

property bucket: Return bucket.

columns() → pyarrow.Schema[source]: Return this projections’ columns as an Arrow schema.

drop() → None[source]: Drop this projection.

handle: int

name: str

rename(new_name: str) → None[source]: Rename this projection.

property schema: Return schema.

stats: TableStats

table: Table

property tx: Return transaction.

class vastdb.table.SelectSplitState(query_data_request, table: Table, split_id: int, config: QueryConfig)[source]

Bases: object

State of a specific query split execution.

property done: Returns true iff the pagination over.

process_split(api: VastdbApi, record_batches_queue: Queue[pyarrow.RecordBatch], check_stop: Callable)[source]

Execute a sequence of QueryData requests, and queue the parsed RecordBatch objects.

Can be called repeatedly, to support resuming the query after a disconnection / retriable error.

class vastdb.table.Table(name: str, schema: Schema, handle: int, _imports_table: bool)[source]

Bases: object

VAST Table.

add_column(new_column: pyarrow.Schema) → None[source]: Add a new column.

arrow_schema: pyarrow.Schema

property bucket: Return bucket.

columns() → pyarrow.Schema[source]: Return columns’ metadata.

create_imports_table(fail_if_exists=True) → Table[source]: Create imports table.

create_projection(projection_name: str, sorted_columns: List[str], unsorted_columns: List[str]) → Projection[source]: Create a new semi-sorted projection.

delete(rows: pyarrow.RecordBatch | pyarrow.Table) → None[source]

Delete a subset of rows in this table.

Row IDs are specified using a special field (named “$row_id” of uint64 type).

drop() → None[source]: Drop this table.

drop_column(column_to_drop: pyarrow.Schema) → None[source]: Drop an existing column.

get_stats() → TableStats[source]: Get the statistics of this table.

handle: int

import_files(files_to_import: Iterable[str], config: ImportConfig | None = None) → None[source]

Import a list of Parquet files into this table.

The files must be on VAST S3 server and be accessible using current credentials.

import_partitioned_files(files_and_partitions: Dict[str, pyarrow.RecordBatch], config: ImportConfig | None = None) → None[source]

Import a list of Parquet files into this table.

The files must be on VAST S3 server and be accessible using current credentials. Each file must have its own partition values defined as an Arrow RecordBatch.

imports_table() → Table | None[source]: Get the imports table of this table.

insert(rows: pyarrow.RecordBatch | pyarrow.Table)[source]: Insert a RecordBatch into this table.

insert_in_column_batches(rows: pyarrow.RecordBatch)[source]

Split the RecordBatch into max_columns that can be inserted in single RPC.

Insert first MAX_COLUMN_IN_BATCH columns and get the row_ids. Then loop on the rest of the columns and update in groups of MAX_COLUMN_IN_BATCH.

name: str

property path: Return table’s path.

projection(name: str) → Projection[source]: Get a specific semi-sorted projection of this table.

projections(projection_name: str = '') → Iterable[Projection][source]

List all semi-sorted projections of this table if projection_name is empty.

Otherwise, list only the specific projection (if exists).

rename(new_name: str) → None[source]: Rename this table.

rename_column(current_column_name: str, new_column_name: str) → None[source]: Rename an existing column.

schema: Schema

select(columns: List[str] | None = None, predicate: ibis.expr.types.BooleanColumn | ibis.common.deferred.Deferred = None, config: QueryConfig | None = None, *, internal_row_id: bool = False) → pyarrow.RecordBatchReader[source]

Execute a query over this table.

To read a subset of the columns, specify their names via columns argument. Otherwise, all columns will be read.

In order to apply a filter, a predicate can be specified. See https://github.com/vast-data/vastdb_sdk/blob/main/README.md#filters-and-projections for more details.

Query-execution configuration options can be specified via the optional config argument.

property stats: Fetch table’s statistics from server.

property tx: Return transaction.

update(rows: pyarrow.RecordBatch | pyarrow.Table, columns: List[str] | None = None) → None[source]

Update a subset of cells in this table.

Row IDs are specified using a special field (named “$row_id” of uint64 type) - this function assume that this special field is part of arguments.

A subset of columns to be updated can be specified via the columns argument.

class vastdb.table.TableStats(num_rows: int, size_in_bytes: int, is_external_rowid_alloc: bool = False, endpoints: Tuple[str, ...] = ())[source]

Bases: object

Table-related information.

endpoints: Tuple[str, ...] = ()

is_external_rowid_alloc: bool = False

num_rows: int

size_in_bytes: int