MadLib._internal.featurization.featurize
- MadLib._internal.featurization.featurize(features: List[Callable], A, B, candidates, output_col: str = 'features', fill_na: float = 0.0) DataFrame [source]
applies the featurizer to the record pairs in candidates
- Parameters:
features (List[Callable]) – a DataFrame containing initialized feature objects for columns in A, B
A (Union[pd.DataFrame, SparkDataFrame]) – the records of table A
B (Union[pd.DataFrame, SparkDataFrame]) – the records of table B
candidates (Union[pd.DataFrame, SparkDataFrame]) – id pairs of A and B that are potential matches
output_col (str) – the name of the column for the resulting feature vectors, default fvs
fill_na (float) – value to fill in for missing data, default 0.0
- Returns:
DataFrame with feature vectors created with the following schema: (id2, id1, fv, other columns from candidates)
- Return type:
pandas DataFrame