About the System

TUBITAK Project

This web application is a scientific output of a TÜBİTAK-funded research project conducted under the ARDEB program (Project Code: 124K233). The project is led by Principal Investigator Hasan TUTAR and aims to develop intelligent tools that support methodological decisions in qualitative research.

Project Purpose

Q-Sat AI (Qualitative Data Saturation Estimation using AI) is an AI-supported prediction tool developed to determine the optimal sample size in qualitative research. The system guides researchers using advanced machine learning models trained on 3000+ qualitative research data points.

Technical Specifications

Dataset
  • 3000+ qualitative research samples
  • 6 different research designs
  • 11 different parameters
  • 95th percentile data cleaning
  • Balanced dataset (n=500 per design)
Model Architecture
  • Ensemble Learning Model
  • Different Machine Learning
  • Different meta-models
  • Cross-validation (k=5)
  • 85% R² Score (Coefficient of Determination)

Research Designs

  • Narrative Research: Focus on storytelling
  • Ethnographic Research: Cultural analysis
  • Phenomenology: Experience analysis
  • Grounded Theory: Theory development
  • Case Study: In-depth examination
  • Other Designs: Mixed approaches

Model Performance (Detailed)

Model Test R² (Avg.) Train R² (Avg.) Test MAE (Avg.) Best Params
KNeighbors 0.852742 0.909446 0.151285 {'model__n_neighbors': 15, 'model__p': 1, 'model__weights': 'distance'}
GradientBoosting 0.852534 0.907133 0.175680 {'learning_rate': 0.1, 'loss': 'squared_error', 'max_depth': 7, 'n_estimators': 200, 'subsample': 0.8}
RandomForest 0.852449 0.905714 0.183468 {'max_depth': None, 'max_features': 'sqrt', 'min_samples_leaf': 1, 'min_samples_split': 2, 'n_estimators': 200}
XGBoost 0.849898 0.904114 0.186369 {'colsample_bytree': 1.0, 'learning_rate': 0.1, 'max_depth': 7, 'n_estimators': 200, 'reg_alpha': 0, 'reg_lambda': 1.5, 'subsample': 0.8}
DecisionTree 0.845724 0.912250 0.147432 {'criterion': 'squared_error', 'max_depth': None, 'max_features': 'sqrt', 'min_samples_leaf': 1, 'min_samples_split': 2}
SVR 0.763296 0.849767 0.263101 {'model__C': 10.0, 'model__degree': 2, 'model__gamma': 'scale', 'model__kernel': 'rbf'}
MLP 0.685608 0.779126 0.370804 {'activation': 'logistic', 'alpha': 0.01, 'early_stopping': True, 'hidden_layer_sizes': (30,), 'learning_rate': 'constant', 'solver': 'lbfgs'}
AdaBoost 0.423281 0.438722 0.586717 {'learning_rate': 0.05, 'loss': 'square', 'n_estimators': 100}
Ridge 0.391545 0.400687 0.575250 {'model__alpha': 50.0}

User Guide

  1. Select Data Quality: Choose the expected quality of your data.
  2. Select Information Power: Assess the knowledge level of your participants.
  3. Select Homogeneity/Heterogeneity: Define if your group is similar or diverse.
  4. Select Number of Interviews: Specify the number of interviews per participant.
  5. Select Researcher Competence: Rate the researcher's experience level.
  6. Select Research Scope: Define if the scope is narrow or broad.
  7. Select Data Diversity (Triangulation): Assess the diversity of your data sources.
  8. Select Participant Originality: Rate the originality of participant insights.
  9. Select Interview Duration: Specify the average length of the interviews.
  10. Get Prediction: Click "Predict Sample Size" to see the results.

Important Warnings

Technical Details

Technologies Used
  • Python 3.10+
  • Scikit-learn
  • Pandas & NumPy
  • Flask Web Framework
  • Tailwind CSS
Model Algorithms
  • Ridge Regression
  • K-Nearest Neighbors (KNeighbors)
  • Support Vector Regression (SVR)
  • Decision Tree
  • Random Forest
  • Gradient Boosting
  • AdaBoost
  • Neural Network (MLP)
  • XGBoost