TUBITAK Project
This web application is a scientific output of a TÜBİTAK-funded research project conducted under
the ARDEB program (Project Code: 124K233). The project is led by Principal Investigator
Hasan TUTAR and aims to develop intelligent tools that support methodological
decisions in qualitative research.
Project Purpose
Q-Sat AI (Qualitative Data Saturation Estimation using AI) is an AI-supported prediction tool
developed to determine the
optimal sample size in qualitative research. The system guides researchers
using advanced machine learning models trained on 3000+ qualitative research data points.
Technical Specifications
Dataset
- 3000+ qualitative research samples
- 6 different research designs
- 11 different parameters
- 95th percentile data cleaning
- Balanced dataset (n=500 per design)
Model Architecture
- Ensemble Learning Model
- Different Machine Learning
- Different meta-models
- Cross-validation (k=5)
- 85% R² Score (Coefficient of Determination)
Research Designs
- Narrative Research: Focus on storytelling
- Ethnographic Research: Cultural analysis
- Phenomenology: Experience analysis
- Grounded Theory: Theory development
- Case Study: In-depth examination
- Other Designs: Mixed approaches
Model Performance (Detailed)
| Model |
Test R² (Avg.) |
Train R² (Avg.) |
Test MAE (Avg.) |
Best Params |
| KNeighbors |
0.852742 |
0.909446 |
0.151285 |
{'model__n_neighbors': 15, 'model__p': 1, 'model__weights': 'distance'}
|
| GradientBoosting |
0.852534 |
0.907133 |
0.175680 |
{'learning_rate': 0.1, 'loss': 'squared_error', 'max_depth': 7, 'n_estimators':
200, 'subsample': 0.8}
|
| RandomForest |
0.852449 |
0.905714 |
0.183468 |
{'max_depth': None, 'max_features': 'sqrt', 'min_samples_leaf': 1,
'min_samples_split': 2, 'n_estimators': 200}
|
| XGBoost |
0.849898 |
0.904114 |
0.186369 |
{'colsample_bytree': 1.0, 'learning_rate': 0.1, 'max_depth': 7, 'n_estimators':
200, 'reg_alpha': 0, 'reg_lambda': 1.5, 'subsample': 0.8}
|
| DecisionTree |
0.845724 |
0.912250 |
0.147432 |
{'criterion': 'squared_error', 'max_depth': None, 'max_features': 'sqrt',
'min_samples_leaf': 1, 'min_samples_split': 2}
|
| SVR |
0.763296 |
0.849767 |
0.263101 |
{'model__C': 10.0, 'model__degree': 2, 'model__gamma': 'scale', 'model__kernel':
'rbf'}
|
| MLP |
0.685608 |
0.779126 |
0.370804 |
{'activation': 'logistic', 'alpha': 0.01, 'early_stopping': True,
'hidden_layer_sizes': (30,), 'learning_rate': 'constant', 'solver': 'lbfgs'}
|
| AdaBoost |
0.423281 |
0.438722 |
0.586717 |
{'learning_rate': 0.05, 'loss': 'square', 'n_estimators': 100}
|
| Ridge |
0.391545 |
0.400687 |
0.575250 |
{'model__alpha': 50.0}
|
User Guide
- Select Data Quality: Choose the expected quality of your data.
- Select Information Power: Assess the knowledge level of your participants.
- Select Homogeneity/Heterogeneity: Define if your group is similar or
diverse.
- Select Number of Interviews: Specify the number of interviews per
participant.
- Select Researcher Competence: Rate the researcher's experience level.
- Select Research Scope: Define if the scope is narrow or broad.
- Select Data Diversity (Triangulation): Assess the diversity of your data
sources.
- Select Participant Originality: Rate the originality of participant
insights.
- Select Interview Duration: Specify the average length of the interviews.
- Get Prediction: Click "Predict Sample Size" to see the results.
Important Warnings
Points to Consider
- Prediction results are for guidance and are not absolute values.
- You can make adjustments based on your research topic and methodology.
- Consider data saturation criteria.
- Don't forget to get expert opinions and conduct a literature review.
- Do not neglect to obtain ethics committee approval and necessary permissions.
Technical Details
Technologies Used
- Python 3.10+
- Scikit-learn
- Pandas & NumPy
- Flask Web Framework
- Tailwind CSS
Model Algorithms
- Ridge Regression
- K-Nearest Neighbors (KNeighbors)
- Support Vector Regression (SVR)
- Decision Tree
- Random Forest
- Gradient Boosting
- AdaBoost
- Neural Network (MLP)
- XGBoost