GBDT (Gradient Boosted Decision Tree) models require you or your ML/DS teams to manually prepare the data, produce appropriate transformations, joins, feature embeddings and encodings, feature aggregates, tune hyperparameters, choose a loss function, data split strategy, training & evaluation protocols for every single problem you want to solve. Any additional data sources, schema changes or even slight modifications to the original problem statement that may come up in the future, require additional maintenance work to keep those pipelines up to date. BaseModel eliminates these problems.
GBDTs are discriminative models, which work well on well-structured problems, for example classification based on some attributes (age, height, total spending). You can think of this problem as 0-dimensional. The input to a GBDT is a flat file, with a single target column for each row. Sequence modeling, as in large language models can be thought of as a 1-dimensional problem, which is not suitable for GBDTs, because the input is a sequence of observations, (e.g., text tokens), and such data cannot easily be represented as a flat file suitable for GBDTs.
Generalized behavioral modeling can be thought of as an N-dimensional problem. The data is multi-source, multi-modal, exhibits graph and hypergraph interaction structures in addition to temporal aspects. Translating rich and diverse temporal hypergraph interaction data into a flat file is a formidable challenge. A product catalog may contain millions of SKUs, a website may have hundreds of thousands of URLs, millions of telco subscribers may interact with each other, forming user-item, user-user, item-item graphs, or hypergraphs with tens of millions of nodes and billions of edges.
Manual feature aggregation treats the problem as if it were possible to describe the richness of information with a few dozen coarse-grained features. This is not the case without incurring a significant loss of information. Manually created aggregate features are dependent on the data scientist’s efforts, diligence, and creativity. Features suitable for e.g., churn prediction for a debit card, will be vastly different from the features for propensity prediction for FX brokerage services. Features for predicting re-purchase time of baked goods will be quite different from features for predicting expected future spending for a brand of detergents, or probability of first-time purchases of a never-before bought product type. Such manual per-use case feature creation can not only miss crucial information about the customer from all available data sources, but also has an exceedingly high maintenance cost. Every change in the underlying data source schemas, every additional field or data source requires work to keep the data engineering pipelines up-to-date and operational. If neglected, the quality of models can silently deteriorate. BaseModel eliminates all these problems by considering the totality of available information and deriving universal understanding that is not limited to a few dozen manually created features. Internal representations in BaseModel can have hundreds of thousands of features created on-the-fly during model training and inference, from the freshest available data. This allows us to represent behavioral profiles with extremely high accuracy & capture fine detail. Subsequent fine-tuning can easily adapt the universal foundation model to specific use-cases without any manual work.