Behavioral Signals of Audiobook App Churn: A Supervised Learning Approach to Retention Prediction

Abstract

Customer retention is a key performance indicator for digital content platforms, yet churn prediction remains a challenging task due to complex user behavior. In this study, we investigate user disengagement in a commercial audiobook application using a dataset of over 14,000 anonymized user records. We define churn as a lack of purchases during the final six months of the observation period. Through data cleaning, feature engineering, and multicollinearity reduction, we isolate the most predictive behavioral variables. Surprisingly, we find that content completion—a conventional proxy for engagement—is a strong signal of churn. Using synthetic minority oversampling (SMOTE), class weighting, and supervised classification models, we achieve an F1-score of 0.82 for active user prediction. The final model, a tuned Random Forest, provides both high accuracy and interpretability, suggesting that sustained post-purchase engagement, not content completion, is the critical indicator of user retention.

1. Introduction

The ability to anticipate customer churn is vital for subscription-based platforms, especially those reliant on digital content consumption such as audiobook applications. Existing retention strategies typically focus on maximizing engagement, assuming that users who consume more content are more likely to stay. However, emerging research in behavioral analytics suggests this assumption may be overly simplistic. This paper explores churn prediction in an audiobook platform through the lens of user behavior and applies modern machine learning techniques to uncover predictive patterns.

We aim to build a robust, interpretable churn prediction model using historical user activity and purchase behavior data. Our primary contribution is twofold: (1) demonstrating that conventional indicators of engagement—such as content completion and listening time—may paradoxically signal user exit, and (2) implementing a tuned ensemble classification pipeline that accurately identifies users at risk of churning, enabling targeted intervention.

2. Data and Methodology

2.1 Dataset Overview

The dataset comprises 14,084 anonymized user records, each reflecting cumulative and session-level audiobook usage statistics. Key features include total minutes listened, book completion percentage, average book price, review behavior, customer support interactions, and time elapsed between the most recent activity and the user's first purchase.

2.2 Preprocessing and Feature Engineering

We first addressed discrepancies in feature naming, corrected mislabeled columns, and removed the non-informative user ID column. Correlation analysis and VIF testing revealed multicollinearity among several predictors. To reduce redundancy and stabilize model training, we removed highly correlated variables (e.g., total vs. average book length; total listening time vs. completion percentage).

2.3 Handling Class Imbalance

Approximately 80% of users in the dataset were labeled as churned. To address this imbalance, we employed SMOTE to synthetically augment the minority class (active users). We experimented with several sampling ratios (0.5, 0.75, 1.0) and determined that a 0.75 ratio provided the optimal tradeoff between bias and variance.

3. Exploratory Analysis

3.1 Content Completion as a Churn Signal

A key finding emerged from a binary comparison of users who completed >0% of content versus those who did not. Every user with >0% completion ultimately churned. Conversely, all retained users had zero completion. This suggests that for some users, the platform fulfills a narrow goal (e.g., finishing one book), after which engagement ceases.

3.2 Listening Time and Post-Purchase Activity

Listening time and post-purchase engagement revealed similar trends. Churned users exhibited a wide spread in listening duration, including many with high totals. Retained users, on the other hand, tended to show minimal listening behavior but longer post-purchase activity windows. These patterns challenge traditional definitions of “high engagement” as a predictor of loyalty.

3.3 Price Sensitivity and User Value

Users who remained active spent significantly more per book on average, suggesting that higher spending correlates with perceived value and platform investment. This finding aligns with consumer behavior literature, where price may serve as a proxy for commitment.

4. Model Development

4.1 Model Selection and Setup

We evaluated four classifiers: Logistic Regression, Support Vector Classifier (SVC), Random Forest, and Histogram-Based Gradient Boosting (HGBC). Each model was trained on the resampled dataset using a 75:100 ratio of active to churned users and included class weights to penalize misclassification of active users more heavily.

4.2 Performance Evaluation

The Random Forest model outperformed all others, achieving an overall accuracy of 84%, with an F1-score of 0.82 on the active class. HGBC showed competitive results but slightly underperformed in recall. SVC, while theoretically suited for complex boundaries, lagged in both recall and computational efficiency. Hyperparameter tuning using grid search further enhanced the Random Forest model’s performance.

5. Results and Discussion

The tuned Random Forest model identified churn with high precision (0.88) and maintained strong recall across both classes. Notably, it significantly outperformed the Logistic Regression baseline, especially in classifying minority (active) users. Post hoc analysis of feature importance underscored the predictive value of post-purchase engagement time and average book price, while content completion and listening time proved misleading.

These results underscore the necessity of redefining user engagement in digital platforms. Rather than interpreting consumption as a loyalty metric, platforms may benefit from focusing on post-consumption behavior and recurring interaction. Triggering retention workflows based on completion events or declines in follow-up activity may yield more actionable insights.

6. Conclusion

This study challenges conventional assumptions in digital engagement analytics by demonstrating that deep engagement does not always equate to long-term retention. Through a robust machine learning framework, we identified predictive signals of churn that defy traditional metrics. The strongest model, a hyperparameter-tuned Random Forest classifier with adjusted SMOTE ratios, delivers a production-ready pipeline capable of guiding personalized retention strategies.

Future work may explore temporal sequence modeling or survival analysis to refine prediction horizons, and apply explainability techniques such as SHAP to better understand individual decision boundaries.