tally.dataset.DataSet.hmerge#

DataSet.hmerge(**kwargs)#

Merge Quantipy datasets together by appending rows. This function merges two Quantipy datasets together, updating variables that exist in the left dataset and appending others. New variables will be appended in the order indicated by the ‘data file’ set if found, otherwise they will be appended in alphanumeric order. This merge happens vertically (row-wise).

Parameters
  • dataset (object) – (quantipy.DataSet instance). Test if all variables in the provided dataset are also in self and compare their metadata definitions.

  • on (str, default=None) – The column to use to identify unique rows in both datasets.

  • left_on (str, default=None) – The column to use to identify unique in the left dataset.

  • right_on (str, default=None) – The column to use to identify unique in the right dataset.

  • row_id_name (str, default=None) – The named column will be filled with the ids indicated for each dataset, as per left_id/right_id/row_ids. If meta for the named column doesn’t already exist a new column definition will be added and assigned a reductive-appropriate type.

  • left_id (str, int, float, default=None) – Where the row_id_name column is not already populated for the dataset_left, this value will be populated.

  • right_id (str, int, float, default=None) – Where the row_id_name column is not already populated for the dataset_right, this value will be populated.

  • row_ids (array of (str, int, float), default=None) – When datasets has been used, this list provides the row ids that will be populated in the row_id_name column for each of those datasets, respectively.

  • overwrite_text (bool, default=False) – If True, text_keys in the left meta that also exist in right meta will be overwritten instead of ignored.

  • from_set (str, default=None) – Use a set defined in the right meta to control which columns are merged from the right dataset.

  • uniquify_key (str, default=None) – An int-like column name found in all the passed DataSet objects that will be protected from having duplicates. The original version of the column will be kept under its name prefixed with ‘original’.

  • reset_index (bool, default=True) – If True pandas.DataFrame.reindex() will be applied to the merged dataframe.