vastdb.table
VAST Database table.
- class vastdb.table.Projection(name: str, table: Table, handle: int, stats: TableStats)[source]
Bases:
objectVAST semi-sorted projection.
- property bucket
Return bucket.
- columns() pyarrow.Schema[source]
Return this projections’ columns as an Arrow schema.
- handle: int
- name: str
- property schema
Return schema.
- stats: TableStats
- property tx
Return transaction.
- class vastdb.table.SelectSplitState(query_data_request, table: Table, split_id: int, config: QueryConfig)[source]
Bases:
objectState of a specific query split execution.
- property done
Returns true iff the pagination over.
- process_split(api: VastdbApi, record_batches_queue: Queue[pyarrow.RecordBatch], check_stop: Callable)[source]
Execute a sequence of QueryData requests, and queue the parsed RecordBatch objects.
Can be called repeatedly, to support resuming the query after a disconnection / retriable error.
- class vastdb.table.Table(name: str, schema: Schema, handle: int, _imports_table: bool)[source]
Bases:
objectVAST Table.
- add_column(new_column: pyarrow.Schema) None[source]
Add a new column.
- arrow_schema: pyarrow.Schema
- property bucket
Return bucket.
- columns() pyarrow.Schema[source]
Return columns’ metadata.
- create_projection(projection_name: str, sorted_columns: List[str], unsorted_columns: List[str]) Projection[source]
Create a new semi-sorted projection.
- delete(rows: pyarrow.RecordBatch | pyarrow.Table) None[source]
Delete a subset of rows in this table.
Row IDs are specified using a special field (named “$row_id” of uint64 type).
- drop_column(column_to_drop: pyarrow.Schema) None[source]
Drop an existing column.
- get_stats() TableStats[source]
Get the statistics of this table.
- handle: int
- import_files(files_to_import: Iterable[str], config: ImportConfig | None = None) None[source]
Import a list of Parquet files into this table.
The files must be on VAST S3 server and be accessible using current credentials.
- import_partitioned_files(files_and_partitions: Dict[str, pyarrow.RecordBatch], config: ImportConfig | None = None) None[source]
Import a list of Parquet files into this table.
The files must be on VAST S3 server and be accessible using current credentials. Each file must have its own partition values defined as an Arrow RecordBatch.
- insert(rows: pyarrow.RecordBatch | pyarrow.Table)[source]
Insert a RecordBatch into this table.
- insert_in_column_batches(rows: pyarrow.RecordBatch)[source]
Split the RecordBatch into max_columns that can be inserted in single RPC.
Insert first MAX_COLUMN_IN_BATCH columns and get the row_ids. Then loop on the rest of the columns and update in groups of MAX_COLUMN_IN_BATCH.
- name: str
- property path
Return table’s path.
- projection(name: str) Projection[source]
Get a specific semi-sorted projection of this table.
- projections(projection_name: str = '') Iterable[Projection][source]
List all semi-sorted projections of this table if projection_name is empty.
Otherwise, list only the specific projection (if exists).
- rename_column(current_column_name: str, new_column_name: str) None[source]
Rename an existing column.
- select(columns: List[str] | None = None, predicate: ibis.expr.types.BooleanColumn | ibis.common.deferred.Deferred = None, config: QueryConfig | None = None, *, internal_row_id: bool = False) pyarrow.RecordBatchReader[source]
Execute a query over this table.
To read a subset of the columns, specify their names via columns argument. Otherwise, all columns will be read.
In order to apply a filter, a predicate can be specified. See https://github.com/vast-data/vastdb_sdk/blob/main/README.md#filters-and-projections for more details.
Query-execution configuration options can be specified via the optional config argument.
- property stats
Fetch table’s statistics from server.
- property tx
Return transaction.
- update(rows: pyarrow.RecordBatch | pyarrow.Table, columns: List[str] | None = None) None[source]
Update a subset of cells in this table.
Row IDs are specified using a special field (named “$row_id” of uint64 type) - this function assume that this special field is part of arguments.
A subset of columns to be updated can be specified via the columns argument.