class Google::Cloud::Bigquery::LoadJob

# LoadJob

A {Job} subclass representing a load operation that may be performed on a {Table}. A LoadJob instance is created when you call {Table#load_job}.

@see cloud.google.com/bigquery/loading-data

Loading Data Into BigQuery

@see cloud.google.com/bigquery/docs/reference/v2/jobs Jobs API

reference

@example

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"

gcs_uri = "gs://my-bucket/file-name.csv"
load_job = dataset.load_job "my_new_table", gcs_uri do |schema|
  schema.string "first_name", mode: :required
  schema.record "cities_lived", mode: :repeated do |nested_schema|
    nested_schema.string "place", mode: :required
    nested_schema.integer "number_of_years", mode: :required
  end
end

load_job.wait_until_done!
load_job.done? #=> true

Public Instance Methods

allow_jagged_rows?() click to toggle source

Checks if the load operation accepts rows that are missing trailing optional columns. The missing values are treated as nulls. If `false`, records with missing trailing columns are treated as bad records, and if there are too many bad records, an error is returned. The default value is `false`. Only applicable to CSV, ignored for other formats.

@return [Boolean] `true` when jagged rows are allowed, `false`

otherwise.
# File lib/google/cloud/bigquery/load_job.rb, line 252
def allow_jagged_rows?
  val = @gapi.configuration.load.allow_jagged_rows
  val = false if val.nil?
  val
end
autodetect?() click to toggle source

Checks if BigQuery should automatically infer the options and schema for CSV and JSON sources. The default is `false`.

@return [Boolean] `true` when autodetect is enabled, `false`

otherwise.
# File lib/google/cloud/bigquery/load_job.rb, line 183
def autodetect?
  val = @gapi.configuration.load.autodetect
  val = false if val.nil?
  val
end
backup?() click to toggle source

Checks if the source data is a Google Cloud Datastore backup.

@return [Boolean] `true` when the source format is `DATASTORE_BACKUP`,

`false` otherwise.
# File lib/google/cloud/bigquery/load_job.rb, line 218
def backup?
  @gapi.configuration.load.source_format == "DATASTORE_BACKUP"
end
clustering?() click to toggle source

Checks if the destination table will be clustered.

See {LoadJob::Updater#clustering_fields=}, {Table#clustering_fields} and {Table#clustering_fields=}.

@see cloud.google.com/bigquery/docs/clustered-tables

Introduction to clustered tables

@see cloud.google.com/bigquery/docs/creating-clustered-tables

Creating and using clustered tables

@return [Boolean] `true` when the table will be clustered,

or `false` otherwise.

@!group Attributes

# File lib/google/cloud/bigquery/load_job.rb, line 619
def clustering?
  !@gapi.configuration.load.clustering.nil?
end
clustering_fields() click to toggle source

One or more fields on which the destination table should be clustered. Must be specified with time-based partitioning, data in the table will be first partitioned and subsequently clustered. The order of the returned fields determines the sort order of the data.

BigQuery supports clustering for both partitioned and non-partitioned tables.

See {LoadJob::Updater#clustering_fields=}, {Table#clustering_fields} and {Table#clustering_fields=}.

@see cloud.google.com/bigquery/docs/clustered-tables

Introduction to clustered tables

@see cloud.google.com/bigquery/docs/creating-clustered-tables

Creating and using clustered tables

@return [Array<String>, nil] The clustering fields, or `nil` if the

destination table will not be clustered.

@!group Attributes

# File lib/google/cloud/bigquery/load_job.rb, line 645
def clustering_fields
  @gapi.configuration.load.clustering.fields if clustering?
end
csv?() click to toggle source

Checks if the format of the source data is CSV. The default is `true`.

@return [Boolean] `true` when the source format is `CSV`, `false`

otherwise.
# File lib/google/cloud/bigquery/load_job.rb, line 206
def csv?
  val = @gapi.configuration.load.source_format
  return true if val.nil?
  val == "CSV"
end
delimiter() click to toggle source

The delimiter used between fields in the source data. The default is a comma (,).

@return [String] A string containing the character, such as `“,”`.

# File lib/google/cloud/bigquery/load_job.rb, line 79
def delimiter
  @gapi.configuration.load.field_delimiter || ","
end
destination() click to toggle source

The table into which the operation loads data. This is the table on which {Table#load_job} was invoked.

@return [Table] A table instance.

# File lib/google/cloud/bigquery/load_job.rb, line 67
def destination
  table = @gapi.configuration.load.destination_table
  return nil unless table
  retrieve_table table.project_id, table.dataset_id, table.table_id
end
encryption() click to toggle source

The encryption configuration of the destination table.

@return [Google::Cloud::BigQuery::EncryptionConfiguration] Custom

encryption configuration (e.g., Cloud KMS keys).

@!group Attributes

# File lib/google/cloud/bigquery/load_job.rb, line 349
def encryption
  EncryptionConfiguration.from_gapi(
    @gapi.configuration.load.destination_encryption_configuration
  )
end
hive_partitioning?() click to toggle source

Checks if hive partitioning options are set.

@see cloud.google.com/bigquery/docs/hive-partitioned-loads-gcs Loading externally partitioned data

@return [Boolean] `true` when hive partitioning options are set, or `false` otherwise.

@!group Attributes

# File lib/google/cloud/bigquery/load_job.rb, line 376
def hive_partitioning?
  !@gapi.configuration.load.hive_partitioning_options.nil?
end
hive_partitioning_mode() click to toggle source

The mode of hive partitioning to use when reading data. The following modes are supported:

1. `AUTO`: automatically infer partition key name(s) and type(s).
2. `STRINGS`: automatically infer partition key name(s). All types are interpreted as strings.
3. `CUSTOM`: partition key schema is encoded in the source URI prefix.

@see cloud.google.com/bigquery/docs/hive-partitioned-loads-gcs Loading externally partitioned data

@return [String, nil] The mode of hive partitioning, or `nil` if not set.

@!group Attributes

# File lib/google/cloud/bigquery/load_job.rb, line 393
def hive_partitioning_mode
  @gapi.configuration.load.hive_partitioning_options.mode if hive_partitioning?
end
hive_partitioning_source_uri_prefix() click to toggle source

The common prefix for all source uris when hive partition detection is requested. The prefix must end immediately before the partition key encoding begins. For example, consider files following this data layout:

“` gs://bucket/path_to_table/dt=2019-01-01/country=BR/id=7/file.avro gs://bucket/path_to_table/dt=2018-12-31/country=CA/id=3/file.avro “`

When hive partitioning is requested with either `AUTO` or `STRINGS` mode, the common prefix can be either of `gs://bucket/path_to_table` or `gs://bucket/path_to_table/` (trailing slash does not matter).

@see cloud.google.com/bigquery/docs/hive-partitioned-loads-gcs Loading externally partitioned data

@return [String, nil] The common prefix for all source uris, or `nil` if not set.

@!group Attributes

# File lib/google/cloud/bigquery/load_job.rb, line 415
def hive_partitioning_source_uri_prefix
  @gapi.configuration.load.hive_partitioning_options.source_uri_prefix if hive_partitioning?
end
ignore_unknown_values?() click to toggle source

Checks if the load operation allows extra values that are not represented in the table schema. If `true`, the extra values are ignored. If `false`, records with extra columns are treated as bad records, and if there are too many bad records, an invalid error is returned. The default is `false`.

@return [Boolean] `true` when unknown values are ignored, `false`

otherwise.
# File lib/google/cloud/bigquery/load_job.rb, line 268
def ignore_unknown_values?
  val = @gapi.configuration.load.ignore_unknown_values
  val = false if val.nil?
  val
end
input_file_bytes() click to toggle source

The number of bytes of source data in the load job.

@return [Integer] The number of bytes.

# File lib/google/cloud/bigquery/load_job.rb, line 324
def input_file_bytes
  Integer @gapi.statistics.load.input_file_bytes
rescue StandardError
  nil
end
input_files() click to toggle source

The number of source data files in the load job.

@return [Integer] The number of source files.

# File lib/google/cloud/bigquery/load_job.rb, line 313
def input_files
  Integer @gapi.statistics.load.input_files
rescue StandardError
  nil
end
iso8859_1?() click to toggle source

Checks if the character encoding of the data is ISO-8859-1.

@return [Boolean] `true` when the character encoding is ISO-8859-1,

`false` otherwise.
# File lib/google/cloud/bigquery/load_job.rb, line 114
def iso8859_1?
  @gapi.configuration.load.encoding == "ISO-8859-1"
end
json?() click to toggle source

Checks if the format of the source data is [newline-delimited JSON](jsonlines.org/). The default is `false`.

@return [Boolean] `true` when the source format is

`NEWLINE_DELIMITED_JSON`, `false` otherwise.
# File lib/google/cloud/bigquery/load_job.rb, line 196
def json?
  @gapi.configuration.load.source_format == "NEWLINE_DELIMITED_JSON"
end
max_bad_records() click to toggle source

The maximum number of bad records that the load operation can ignore. If the number of bad records exceeds this value, an error is returned. The default value is `0`, which requires that all records be valid.

@return [Integer] The maximum number of bad records.

# File lib/google/cloud/bigquery/load_job.rb, line 140
def max_bad_records
  val = @gapi.configuration.load.max_bad_records
  val = 0 if val.nil?
  val
end
null_marker() click to toggle source

Specifies a string that represents a null value in a CSV file. For example, if you specify `N`, BigQuery interprets `N` as a null value when loading a CSV file. The default value is the empty string. If you set this property to a custom value, BigQuery throws an error if an empty string is present for all data types except for STRING and BYTE. For STRING and BYTE columns, BigQuery interprets the empty string as an empty value.

@return [String] A string representing null value in a CSV file.

# File lib/google/cloud/bigquery/load_job.rb, line 157
def null_marker
  val = @gapi.configuration.load.null_marker
  val = "" if val.nil?
  val
end
orc?() click to toggle source

Checks if the source format is ORC.

@return [Boolean] `true` when the source format is `ORC`,

`false` otherwise.
# File lib/google/cloud/bigquery/load_job.rb, line 228
def orc?
  @gapi.configuration.load.source_format == "ORC"
end
output_bytes() click to toggle source

The number of bytes that have been loaded into the table. While an import job is in the running state, this value may change.

@return [Integer] The number of bytes that have been loaded.

# File lib/google/cloud/bigquery/load_job.rb, line 361
def output_bytes
  Integer @gapi.statistics.load.output_bytes
rescue StandardError
  nil
end
output_rows() click to toggle source

The number of rows that have been loaded into the table. While an import job is in the running state, this value may change.

@return [Integer] The number of rows that have been loaded.

# File lib/google/cloud/bigquery/load_job.rb, line 336
def output_rows
  Integer @gapi.statistics.load.output_rows
rescue StandardError
  nil
end
parquet?() click to toggle source

Checks if the source format is Parquet.

@return [Boolean] `true` when the source format is `PARQUET`,

`false` otherwise.
# File lib/google/cloud/bigquery/load_job.rb, line 238
def parquet?
  @gapi.configuration.load.source_format == "PARQUET"
end
parquet_enable_list_inference?() click to toggle source

Indicates whether to use schema inference specifically for Parquet `LIST` logical type.

@see cloud.google.com/bigquery/docs/loading-data-cloud-storage-parquet Loading Parquet data from Cloud

Storage

@return [Boolean, nil] The `enable_list_inference` value in Parquet options, or `nil` if Parquet options are

not set.

@!group Attributes

# File lib/google/cloud/bigquery/load_job.rb, line 444
def parquet_enable_list_inference?
  @gapi.configuration.load.parquet_options.enable_list_inference if parquet_options?
end
parquet_enum_as_string?() click to toggle source

Indicates whether to infer Parquet `ENUM` logical type as `STRING` instead of `BYTES` by default.

@see cloud.google.com/bigquery/docs/loading-data-cloud-storage-parquet Loading Parquet data from Cloud

Storage

@return [Boolean, nil] The `enum_as_string` value in Parquet options, or `nil` if Parquet options are not set.

@!group Attributes

# File lib/google/cloud/bigquery/load_job.rb, line 458
def parquet_enum_as_string?
  @gapi.configuration.load.parquet_options.enum_as_string if parquet_options?
end
parquet_options?() click to toggle source

Checks if Parquet options are set.

@see cloud.google.com/bigquery/docs/loading-data-cloud-storage-parquet Loading Parquet data from Cloud

Storage

@return [Boolean] `true` when Parquet options are set, or `false` otherwise.

@!group Attributes

# File lib/google/cloud/bigquery/load_job.rb, line 429
def parquet_options?
  !@gapi.configuration.load.parquet_options.nil?
end
quote() click to toggle source

The value that is used to quote data sections in a CSV file. The default value is a double-quote (`“`). If your data does not contain quoted sections, the value should be an empty string. If your data contains quoted newline characters, {#quoted_newlines?} should return `true`.

@return [String] A string containing the character, such as `“"”`.

# File lib/google/cloud/bigquery/load_job.rb, line 127
def quote
  val = @gapi.configuration.load.quote
  val = "\"" if val.nil?
  val
end
quoted_newlines?() click to toggle source

Checks if quoted data sections may contain newline characters in a CSV file. The default is `false`.

@return [Boolean] `true` when quoted newlines are allowed, `false`

otherwise.
# File lib/google/cloud/bigquery/load_job.rb, line 170
def quoted_newlines?
  val = @gapi.configuration.load.allow_quoted_newlines
  val = false if val.nil?
  val
end
range_partitioning?() click to toggle source

Checks if the destination table will be range partitioned. See [Creating and using integer range partitioned tables](cloud.google.com/bigquery/docs/creating-integer-range-partitions).

@return [Boolean] `true` when the table is range partitioned, or `false` otherwise.

@!group Attributes

# File lib/google/cloud/bigquery/load_job.rb, line 470
def range_partitioning?
  !@gapi.configuration.load.range_partitioning.nil?
end
range_partitioning_end() click to toggle source

The end of range partitioning, exclusive. See [Creating and using integer range partitioned tables](cloud.google.com/bigquery/docs/creating-integer-range-partitions).

@return [Integer, nil] The end of range partitioning, exclusive, or `nil` if not range partitioned.

@!group Attributes

# File lib/google/cloud/bigquery/load_job.rb, line 522
def range_partitioning_end
  @gapi.configuration.load.range_partitioning.range.end if range_partitioning?
end
range_partitioning_field() click to toggle source

The field on which the destination table will be range partitioned, if any. The field must be a top-level `NULLABLE/REQUIRED` field. The only supported type is `INTEGER/INT64`. See [Creating and using integer range partitioned tables](cloud.google.com/bigquery/docs/creating-integer-range-partitions).

@return [String, nil] The partition field, if a field was configured, or `nil` if not range partitioned.

@!group Attributes

# File lib/google/cloud/bigquery/load_job.rb, line 484
def range_partitioning_field
  @gapi.configuration.load.range_partitioning.field if range_partitioning?
end
range_partitioning_interval() click to toggle source

The width of each interval. See [Creating and using integer range partitioned tables](cloud.google.com/bigquery/docs/creating-integer-range-partitions).

@return [Integer, nil] The width of each interval, for data in range partitions, or `nil` if not range

partitioned.

@!group Attributes

# File lib/google/cloud/bigquery/load_job.rb, line 509
def range_partitioning_interval
  return nil unless range_partitioning?
  @gapi.configuration.load.range_partitioning.range.interval
end
range_partitioning_start() click to toggle source

The start of range partitioning, inclusive. See [Creating and using integer range partitioned tables](cloud.google.com/bigquery/docs/creating-integer-range-partitions).

@return [Integer, nil] The start of range partitioning, inclusive, or `nil` if not range partitioned.

@!group Attributes

# File lib/google/cloud/bigquery/load_job.rb, line 496
def range_partitioning_start
  @gapi.configuration.load.range_partitioning.range.start if range_partitioning?
end
schema() click to toggle source

The schema for the destination table. The schema can be omitted if the destination table already exists, or if you're loading data from Google Cloud Datastore.

The returned object is frozen and changes are not allowed. Use {Table#schema} to update the schema.

@return [Schema, nil] A schema object, or `nil`.

# File lib/google/cloud/bigquery/load_job.rb, line 284
def schema
  Schema.from_gapi(@gapi.configuration.load.schema).freeze
end
schema_update_options() click to toggle source

Allows the schema of the destination table to be updated as a side effect of the load job if a schema is autodetected or supplied in the job configuration. Schema update options are supported in two cases: when write disposition is `WRITE_APPEND`; when write disposition is `WRITE_TRUNCATE` and the destination table is a partition of a table, specified by partition decorators. For normal tables, `WRITE_TRUNCATE` will always overwrite the schema. One or more of the following values are specified:

  • `ALLOW_FIELD_ADDITION`: allow adding a nullable field to the schema.

  • `ALLOW_FIELD_RELAXATION`: allow relaxing a required field in the original schema to nullable.

@return [Array<String>] An array of strings.

# File lib/google/cloud/bigquery/load_job.rb, line 304
def schema_update_options
  Array @gapi.configuration.load.schema_update_options
end
skip_leading_rows() click to toggle source

The number of rows at the top of a CSV file that BigQuery will skip when loading the data. The default value is 0. This property is useful if you have header rows in the file that should be skipped.

@return [Integer] The number of header rows at the top of a CSV file

to skip.
# File lib/google/cloud/bigquery/load_job.rb, line 91
def skip_leading_rows
  @gapi.configuration.load.skip_leading_rows || 0
end
sources() click to toggle source

The URI or URIs representing the Google Cloud Storage files from which the operation loads data.

# File lib/google/cloud/bigquery/load_job.rb, line 57
def sources
  Array @gapi.configuration.load.source_uris
end
time_partitioning?() click to toggle source

Checks if the destination table will be time partitioned. See [Partitioned Tables](cloud.google.com/bigquery/docs/partitioned-tables).

@return [Boolean] `true` when the table will be time-partitioned,

or `false` otherwise.

@!group Attributes

# File lib/google/cloud/bigquery/load_job.rb, line 535
def time_partitioning?
  !@gapi.configuration.load.time_partitioning.nil?
end
time_partitioning_expiration() click to toggle source

The expiration for the destination table time partitions, if any, in seconds. See [Partitioned Tables](cloud.google.com/bigquery/docs/partitioned-tables).

@return [Integer, nil] The expiration time, in seconds, for data in

time partitions, or `nil` if not present.

@!group Attributes

# File lib/google/cloud/bigquery/load_job.rb, line 579
def time_partitioning_expiration
  return nil unless time_partitioning?
  return nil if @gapi.configuration.load.time_partitioning.expiration_ms.nil?

  @gapi.configuration.load.time_partitioning.expiration_ms / 1_000
end
time_partitioning_field() click to toggle source

The field on which the destination table will be time partitioned, if any. If not set, the destination table will be time partitioned by pseudo column `_PARTITIONTIME`; if set, the table will be time partitioned by this field. See [Partitioned Tables](cloud.google.com/bigquery/docs/partitioned-tables).

@return [String, nil] The time partition field, if a field was configured.

`nil` if not time partitioned or not set (partitioned by pseudo column
'_PARTITIONTIME').

@!group Attributes

# File lib/google/cloud/bigquery/load_job.rb, line 565
def time_partitioning_field
  @gapi.configuration.load.time_partitioning.field if time_partitioning?
end
time_partitioning_require_filter?() click to toggle source

If set to true, queries over the destination table will require a time partition filter that can be used for partition elimination to be specified. See [Partitioned Tables](cloud.google.com/bigquery/docs/partitioned-tables).

@return [Boolean] `true` when a time partition filter will be required,

or `false` otherwise.

@!group Attributes

# File lib/google/cloud/bigquery/load_job.rb, line 597
def time_partitioning_require_filter?
  tp = @gapi.configuration.load.time_partitioning
  return false if tp.nil? || tp.require_partition_filter.nil?
  tp.require_partition_filter
end
time_partitioning_type() click to toggle source

The period for which the destination table will be time partitioned, if any. See [Partitioned Tables](cloud.google.com/bigquery/docs/partitioned-tables).

@return [String, nil] The time partition type. The supported types are `DAY`,

`HOUR`, `MONTH`, and `YEAR`, which will generate one partition per day,
hour, month, and year, respectively; or `nil` if not present.

@!group Attributes

# File lib/google/cloud/bigquery/load_job.rb, line 549
def time_partitioning_type
  @gapi.configuration.load.time_partitioning.type if time_partitioning?
end
utf8?() click to toggle source

Checks if the character encoding of the data is UTF-8. This is the default.

@return [Boolean] `true` when the character encoding is UTF-8,

`false` otherwise.
# File lib/google/cloud/bigquery/load_job.rb, line 102
def utf8?
  val = @gapi.configuration.load.encoding
  return true if val.nil?
  val == "UTF-8"
end