Train #3

TKira4 · 2025-11-16T11:09:58Z

Summary by Sourcery

Extend the CLI tool with trainable market-state analysis and forecasting by adding K-Means clustering, state classification, RandomForest-based next-state prediction, corresponding commands, and documentation

New Features:

Introduce a train command to fit and persist K-Means clustering models on historical kline data
Add a market command to label the latest data point as Bullish, Bearish, or Sideway using a trained clustering model
Implement a train-classifier command to train a supervised RandomForest classifier for next-state prediction
Provide a forecast command to predict and output the next market state with probabilities

Enhancements:

Register new training and forecasting commands in the CLI entrypoints
Encapsulate clustering and classification logic in dedicated service modules

Documentation:

Update README to document training, market-state, classifier, and forecast commands
Add a detailed guide for the market-state workflow in docs/market_state_guide.md

Introduces train and market commands for K-Means clustering of market states (Bullish/Bearish/Sideway). Adds MarketStateService for model training and prediction, updates CLI registration, expands README and documentation, and adds required dependencies (pandas, numpy, scikit-learn, joblib).

Introduces RandomForest-based classifier training and forecasting for market state prediction. Updates CLI, documentation, and service layer to support train-classifier and forecast commands, including new workflow and options in README and guide.

sourcery-ai · 2025-11-16T11:10:03Z

Reviewer's Guide

This PR extends the CLI tool with unsupervised market-state labeling (K-Means clustering) and supervised forecasting (RandomForest), introduces new commands to train and apply these models, and updates documentation and wiring to integrate the features throughout the application.

Sequence diagram for training and using market state models

sequenceDiagram
    actor User
    participant CLI
    participant MarketStateService
    participant KlineService
    participant FileSystem
    User->>CLI: run train -c <CRYPTO>
    CLI->>MarketStateService: train_model(crypto_name)
    MarketStateService->>KlineService: get_kline_data(crypto_name)
    KlineService-->>MarketStateService: kline data
    MarketStateService->>FileSystem: save trained model
    MarketStateService-->>CLI: training summary
    User->>CLI: run market -c <CRYPTO>
    CLI->>MarketStateService: predict_market_state(crypto_name)
    MarketStateService->>FileSystem: load trained model
    MarketStateService->>KlineService: get_kline_data(crypto_name)
    KlineService-->>MarketStateService: kline data
    MarketStateService-->>CLI: latest state and distribution

Sequence diagram for classifier training and forecasting

sequenceDiagram
    actor User
    participant CLI
    participant MarketClassifierService
    participant MarketStateService
    participant FileSystem
    User->>CLI: run train-classifier -c <CRYPTO>
    CLI->>MarketClassifierService: train_classifier(crypto_name)
    MarketClassifierService->>MarketStateService: get_labeled_feature_dataset(crypto_name)
    MarketStateService->>FileSystem: load clustering model
    MarketStateService-->>MarketClassifierService: labeled feature dataset
    MarketClassifierService->>FileSystem: save classifier model
    MarketClassifierService-->>CLI: training summary
    User->>CLI: run forecast -c <CRYPTO>
    CLI->>MarketClassifierService: forecast_next_state(crypto_name)
    MarketClassifierService->>MarketStateService: get_labeled_feature_dataset(crypto_name)
    MarketStateService->>FileSystem: load clustering model
    MarketStateService-->>MarketClassifierService: labeled feature dataset
    MarketClassifierService->>FileSystem: load classifier model
    MarketClassifierService-->>CLI: prediction and probabilities

Class diagram for new and updated market state and classifier services

classDiagram
    class MarketStateService {
        +train_model(crypto_name, n_clusters, min_clusters, max_clusters)
        +predict_market_state(crypto_name)
        +prepare_feature_dataset(crypto_name)
        +get_labeled_feature_dataset(crypto_name)
        -_model_path(symbol, interval)
        -_load_dataframe(crypto_name)
        -_compute_features(frame)
        -_assign_labels(cluster_returns)
        -_resolve_cluster_count(X, forced_clusters, min_clusters, max_clusters)
    }
    class MarketClassifierService {
        +train_classifier(crypto_name, test_size, random_state, n_estimators, max_depth)
        +forecast_next_state(crypto_name)
        -_model_path(symbol, interval)
        -state_service: MarketStateService
    }
    class KlineService {
        +get_kline_data(crypto_name)
    }
    MarketClassifierService --> MarketStateService
    MarketStateService --> KlineService
    class ModelTrainingError
    class MarketModelNotFoundError
    class ClassifierTrainingError
    class ClassifierModelNotFoundError
    MarketStateService ..> ModelTrainingError
    MarketStateService ..> MarketModelNotFoundError
    MarketClassifierService ..> ClassifierTrainingError
    MarketClassifierService ..> ClassifierModelNotFoundError

Class diagram for new CLI commands

classDiagram
    class train_command {
        +train_command(crypto, clusters, min_clusters, max_clusters)
    }
    class market_command {
        +market_command(crypto, show_history)
    }
    class train_classifier_command {
        +train_classifier_command(crypto, test_size, estimators, max_depth)
    }
    class forecast_command {
        +forecast_command(crypto)
    }
    train_command --> MarketStateService
    market_command --> MarketStateService
    train_classifier_command --> MarketClassifierService
    forecast_command --> MarketClassifierService

File-Level Changes

Change	Details	Files
Introduce MarketStateService for feature extraction, K-Means training, and market-state prediction	Add service class with data loading and feature computation Implement silhouette-based and forced cluster resolution Train and save KMeans payload including labels and returns Load model to label historical data and predict latest state	`src/service/market_state_service.py`
Introduce MarketClassifierService for supervised RandomForest classifier to forecast next market state	Add classifier service to build labeled dataset from clustering output Train/test split with scaling and RandomForest training Generate classification report, confusion matrix, and save model Load classifier to forecast next state with probability distribution	`src/service/market_classifier_service.py`
Add new CLI commands to train clusters, classify state, train classifier, and forecast	Create command modules for train, market, train-classifier, and forecast Update commands/init.py to export new commands Register new commands in run.py	`src/commands/train.py` `src/commands/market.py` `src/commands/train_classifier.py` `src/commands/forecast.py` `src/commands/__init__.py` `src/run.py`
Update documentation to include training and forecasting workflows	Expand README with new sections for training, market-state, classifier, forecast commands and options Add detailed market_state_guide.md with workflow, examples, and feature reference	`README.md` `docs/market_state_guide.md`

Tips and commands

Interacting with Sourcery

Trigger a new review: Comment @sourcery-ai review on the pull request.
Continue discussions: Reply directly to Sourcery's review comments.
Generate a GitHub issue from a review comment: Ask Sourcery to create an
issue from a review comment by replying to it. You can also reply to a
review comment with @sourcery-ai issue to create an issue from it.
Generate a pull request title: Write @sourcery-ai anywhere in the pull
request title to generate a title at any time. You can also comment
@sourcery-ai title on the pull request to (re-)generate the title at any time.
Generate a pull request summary: Write @sourcery-ai summary anywhere in
the pull request body to generate a PR summary at any time exactly where you
want it. You can also comment @sourcery-ai summary on the pull request to
(re-)generate the summary at any time.
Generate reviewer's guide: Comment @sourcery-ai guide on the pull
request to (re-)generate the reviewer's guide at any time.
Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
pull request to resolve all Sourcery comments. Useful if you've already
addressed all the comments and don't want to see them anymore.
Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
request to dismiss all existing Sourcery reviews. Especially useful if you
want to start fresh with a new review - don't forget to comment
@sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

Enable or disable review features such as the Sourcery-generated pull request
summary, the reviewer's guide, and others.
Change the review language.
Add, remove or edit custom review instructions.
Adjust other review settings.

Getting Help

Contact our support team for questions or feedback.
Visit our documentation for detailed guides and information.
Keep in touch with the Sourcery team by following us on X/Twitter, LinkedIn or GitHub.

sourcery-ai

Hey there - I've reviewed your changes and they look great!

Prompt for AI Agents

Please address the comments from this code review:

## Individual Comments

### Comment 1
<location> `src/service/market_state_service.py:152-154` </location>
<code_context>
+        self, X: np.ndarray, forced_clusters: Optional[int], min_clusters: int, max_clusters: int
+    ) -> int:
+        sample_count = len(X)
+        if sample_count < 2:
+            raise ModelTrainingError("Need at least two samples to train KMeans.")
+
</code_context>

<issue_to_address>
**suggestion:** ModelTrainingError for sample_count < 2 may not be sufficient for clustering.

Increase the minimum sample requirement to ensure KMeans produces stable, non-trivial clusters.

```suggestion
        sample_count = len(X)
        min_required_samples = max(6, min_clusters * 2)
        if sample_count < min_required_samples:
            raise ModelTrainingError(
                f"Need at least {min_required_samples} samples to train KMeans (got {sample_count})."
            )
```
</issue_to_address>

### Comment 2
<location> `src/service/market_state_service.py:163-180` </location>
<code_context>
+
+        best_k = None
+        best_score = -1.0
+        for k in range(min_clusters, max_clusters + 1):
+            if sample_count <= k:
+                break
+            model = KMeans(n_clusters=k, n_init=10, random_state=42)
+            labels = model.fit_predict(X)
+            if len(set(labels)) == 1:
+                continue
+            score = silhouette_score(X, labels)
+            if score > best_score:
+                best_score = score
</code_context>

<issue_to_address>
**suggestion:** Silhouette score may be unreliable for small sample sizes.

For small datasets, consider implementing a minimum sample size check or alternative logic to avoid unreliable silhouette scores.

```suggestion
        MIN_SILHOUETTE_SAMPLES = 10

        best_k = None
        best_score = -1.0

        if sample_count < MIN_SILHOUETTE_SAMPLES:
            # For small datasets, avoid silhouette_score and use min_clusters or raise an error
            if sample_count <= min_clusters:
                raise ModelTrainingError("Not enough samples for the minimum number of clusters.")
            return min_clusters

        for k in range(min_clusters, max_clusters + 1):
            if sample_count <= k:
                break
            model = KMeans(n_clusters=k, n_init=10, random_state=42)
            labels = model.fit_predict(X)
            if len(set(labels)) == 1:
                continue
            score = silhouette_score(X, labels)
            if score > best_score:
                best_score = score
                best_k = k

        if best_k is None:
            raise ModelTrainingError("Unable to determine an appropriate number of clusters.")

        return best_k
```
</issue_to_address>

### Comment 3
<location> `src/service/market_classifier_service.py:64-70` </location>
<code_context>
+        scaler = StandardScaler()
+        X_scaled = scaler.fit_transform(X)
+
+        stratify = y if len(np.unique(y)) > 1 else None
+        X_train, X_test, y_train, y_test = train_test_split(
+            X_scaled, y, test_size=test_size, random_state=random_state, stratify=stratify
</code_context>

<issue_to_address>
**suggestion:** Stratification may fail if class distribution is highly imbalanced.

Handle potential exceptions from train_test_split when stratification fails, or add a warning for cases where stratification cannot be performed.

```suggestion
        import warnings

        scaler = StandardScaler()
        X_scaled = scaler.fit_transform(X)

        stratify = y if len(np.unique(y)) > 1 else None
        try:
            X_train, X_test, y_train, y_test = train_test_split(
                X_scaled, y, test_size=test_size, random_state=random_state, stratify=stratify
            )
        except ValueError as e:
            warnings.warn(
                f"Stratified train_test_split failed due to: {e}. Proceeding without stratification.",
                UserWarning
            )
            X_train, X_test, y_train, y_test = train_test_split(
                X_scaled, y, test_size=test_size, random_state=random_state, stratify=None
            )
```
</issue_to_address>

### Comment 4
<location> `src/service/market_classifier_service.py:53-54` </location>
<code_context>
+        feature_cols = labeled_dataset["feature_columns"]
+        meta = labeled_dataset["meta"]
+
+        if frame["state"].isna().all():
+            raise ModelTrainingError("No market-state labels available. Train the clustering model first.")
+
</code_context>

<issue_to_address>
**suggestion (bug_risk):** All NaN states may indicate a deeper issue with clustering or feature engineering.

Log diagnostic details to identify whether the issue originates from feature computation or clustering.

```suggestion
        if frame["state"].isna().all():
            # Diagnostic logging for feature engineering and clustering issues
            import logging
            logger = logging.getLogger(__name__)

            num_rows = len(frame)
            num_state_nans = frame["state"].isna().sum()
            feature_nan_counts = frame[feature_cols].isna().sum()
            feature_nan_summary = feature_nan_counts.to_dict()
            logger.error(
                "All market-state labels are NaN. Diagnostics: "
                f"num_rows={num_rows}, num_state_nans={num_state_nans}, "
                f"feature_nan_summary={feature_nan_summary}"
            )
            logger.error(
                "Feature columns summary statistics:\n%s",
                frame[feature_cols].describe(include='all').to_string()
            )
            raise ModelTrainingError("No market-state labels available. Train the clustering model first.")
```
</issue_to_address>

### Comment 5
<location> `README.md:42` </location>
<code_context>
 - Time range analysis
 - Clean CLI interface with detailed logging
 - Cross-platform support (Windows & Linux)
+- Trainable K-Means clustering to classify market states (Bullish/Bearish/Sideway)
+- RandomForest classifier to forecast the next market state

 ## Installation
</code_context>

<issue_to_address>
**issue (typo):** Replace 'Sideway' with 'Sideways' for correct terminology.

Update all instances of 'Sideway' to 'Sideways' for accuracy and consistency.

```suggestion
- Trainable K-Means clustering to classify market states (Bullish/Bearish/Sideways)
```
</issue_to_address>

### Comment 6
<location> `README.md:162` </location>
<code_context>
+./scripts/linux/run forecast -c <CRYPTO>
+```
+
+> Recommended workflow: fetch data with `dataset`, train the clustering model once with `train`, optionally train the classifier with `train-classifier`, then re-run `market` (current state) and `forecast` (next state) whenever new klines are pulled. Xem thêm hướng dẫn chi tiết tại `docs/market_state_guide.md`.
+
 ### Command Options
</code_context>

<issue_to_address>
**issue:** Translate the Vietnamese sentence to English for consistency.

Please translate the Vietnamese sentence to English to match the rest of the documentation and improve accessibility.
</issue_to_address>

### Comment 7
<location> `src/commands/forecast.py:18-27` </location>
<code_context>
def forecast_command(crypto: str = typer.Option(..., "--crypto", "-c", help="Crypto symbol to forecast (e.g., BTC)")):
    logger = logging.getLogger(__name__)

    try:
        service = MarketClassifierService()
        result = service.forecast_next_state(crypto)

        logger.info("Forecast for %s (%s)", result["symbol"], result["interval"])
        logger.info("=" * 50)
        logger.info("Prediction timestamp: %s", result["prediction_timestamp"])
        logger.info("Predicted next state: %s", result["predicted_state"])
        if result["state_probabilities"]:
            logger.info("Probabilities:\n%s", json.dumps(result["state_probabilities"], indent=2))
        logger.info("Model: %s", result["model_path"])

    except (
        ClassifierModelNotFoundError,
        ClassifierTrainingError,
        MarketModelNotFoundError,
        ModelTrainingError,
        KlineNotFoundError,
        ValueError,
    ) as exc:
        logger.error(f"Forecast failed: {exc}")
    except Exception as exc:
        logger.error(f"Unexpected error: {exc}")

</code_context>

<issue_to_address>
**issue (code-quality):** Extract code out into function ([`extract-method`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/extract-method/))
</issue_to_address>

### Comment 8
<location> `src/commands/market.py:18` </location>
<code_context>
def market_command(
    crypto: str = typer.Option(..., "--crypto", "-c", help="Crypto symbol to analyze (e.g., BTC)"),
    show_history: bool = typer.Option(
        False, "--show-history", help="Print cluster distribution in JSON for deeper analysis"
    ),
):
    logger = logging.getLogger(__name__)

    try:
        service = MarketStateService()
        summary = service.predict_market_state(crypto)

        logger.info("Market state for %s (%s)", summary["symbol"], summary["interval"])
        logger.info("=" * 50)
        latest = summary["latest_state"]
        logger.info("Latest timestamp: %s", latest["timestamp"])
        logger.info("Closing price: $%s", f"{latest['close']:,.2f}")
        logger.info("Cluster %d → %s", latest["cluster"], latest["state"])

        logger.info("Feature snapshot:")
        for name, value in latest["features"].items():
            logger.info("  %s: %.6f", name, value)

        logger.info("State distribution:")
        for state, count in summary["state_distribution"].items():
            logger.info("  %s: %d", state, count)

        if show_history:
            logger.info("Full distribution JSON:\n%s", json.dumps(summary["state_distribution"], indent=2))

    except (MarketModelNotFoundError, ModelTrainingError, KlineNotFoundError, ValueError) as exc:
        logger.error(f"Market analysis failed: {exc}")
    except Exception as exc:
        logger.error(f"Unexpected error: {exc}")

</code_context>

<issue_to_address>
**issue (code-quality):** Extract code out into function ([`extract-method`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/extract-method/))
</issue_to_address>

### Comment 9
<location> `src/service/market_state_service.py:68` </location>
<code_context>
    def _load_dataframe(self, crypto_name: str) -> Dict[str, Any]:
        data = self.kline_service.get_kline_data(crypto_name)
        columns = [
            "open_time",
            "open",
            "high",
            "low",
            "close",
            "volume",
            "close_time",
            "quote_asset_volume",
            "trade_count",
            "taker_buy_base",
            "taker_buy_quote",
            "ignore",
        ]
        frame = pd.DataFrame(data["klines"], columns=columns)
        numeric_cols = [
            "open",
            "high",
            "low",
            "close",
            "volume",
            "quote_asset_volume",
            "taker_buy_base",
            "taker_buy_quote",
        ]
        for col in numeric_cols:
            frame[col] = frame[col].astype(float)

        frame["open_time"] = pd.to_datetime(frame["open_time"], unit="ms")
        frame["close_time"] = pd.to_datetime(frame["close_time"], unit="ms")
        frame.sort_values("open_time", inplace=True)
        frame.reset_index(drop=True, inplace=True)
        return {"data": data, "frame": frame}

</code_context>

<issue_to_address>
**suggestion (code-quality):** Don't use `inplace` for methods that always create a copy under the hood ([`pandas-avoid-inplace`](https://docs.sourcery.ai/Reference/Default-Rules/suggestions/pandas-avoid-inplace/))

```suggestion
        frame = frame.sort_values("open_time")
```

<br/><details><summary>Explanation</summary>Some `DataFrame` methods can never operate inplace. Their operation (like reordering rows) requires copying, so they create a copy even if you provide `inplace=True`.

For these methods, `inplace` doesn't bring a performance gain.

It's only a "syntactic sugar for reassigning the new result to the calling DataFrame/Series."
[PDEP-8](https://github.com/pandas-dev/pandas/pull/51466) suggests to deprecate the `inplace` option for these methods.

Best practice: Explicitly reassign the result to the caller `DataFrame`.

E.g.

```python
df = df.sort_values("language")
```</details>
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

sourcery-ai · 2025-11-16T11:10:47Z

src/service/market_state_service.py

+        sample_count = len(X)
+        if sample_count < 2:
+            raise ModelTrainingError("Need at least two samples to train KMeans.")


suggestion: ModelTrainingError for sample_count < 2 may not be sufficient for clustering.

Increase the minimum sample requirement to ensure KMeans produces stable, non-trivial clusters.

Suggested change

sample_count = len(X)

if sample_count < 2:

raise ModelTrainingError("Need at least two samples to train KMeans.")

sample_count = len(X)

min_required_samples = max(6, min_clusters * 2)

if sample_count < min_required_samples:

raise ModelTrainingError(

f"Need at least {min_required_samples} samples to train KMeans (got {sample_count})."

)

sourcery-ai · 2025-11-16T11:10:47Z

src/service/market_state_service.py

+        best_k = None
+        best_score = -1.0
+        for k in range(min_clusters, max_clusters + 1):
+            if sample_count <= k:
+                break
+            model = KMeans(n_clusters=k, n_init=10, random_state=42)
+            labels = model.fit_predict(X)
+            if len(set(labels)) == 1:
+                continue
+            score = silhouette_score(X, labels)
+            if score > best_score:
+                best_score = score
+                best_k = k
+
+        if best_k is None:
+            raise ModelTrainingError("Unable to determine an appropriate number of clusters.")
+
+        return best_k


suggestion: Silhouette score may be unreliable for small sample sizes.

For small datasets, consider implementing a minimum sample size check or alternative logic to avoid unreliable silhouette scores.

Suggested change

best_k = None

best_score = -1.0

for k in range(min_clusters, max_clusters + 1):

if sample_count <= k:

break

model = KMeans(n_clusters=k, n_init=10, random_state=42)

labels = model.fit_predict(X)

if len(set(labels)) == 1:

continue

score = silhouette_score(X, labels)

if score > best_score:

best_score = score

best_k = k

if best_k is None:

raise ModelTrainingError("Unable to determine an appropriate number of clusters.")

return best_k

MIN_SILHOUETTE_SAMPLES = 10

best_k = None

best_score = -1.0

if sample_count < MIN_SILHOUETTE_SAMPLES:

# For small datasets, avoid silhouette_score and use min_clusters or raise an error

if sample_count <= min_clusters:

raise ModelTrainingError("Not enough samples for the minimum number of clusters.")

return min_clusters

for k in range(min_clusters, max_clusters + 1):

if sample_count <= k:

break

model = KMeans(n_clusters=k, n_init=10, random_state=42)

labels = model.fit_predict(X)

if len(set(labels)) == 1:

continue

score = silhouette_score(X, labels)

if score > best_score:

best_score = score

best_k = k

if best_k is None:

raise ModelTrainingError("Unable to determine an appropriate number of clusters.")

return best_k

sourcery-ai · 2025-11-16T11:10:47Z

src/service/market_classifier_service.py

+        scaler = StandardScaler()
+        X_scaled = scaler.fit_transform(X)
+
+        stratify = y if len(np.unique(y)) > 1 else None
+        X_train, X_test, y_train, y_test = train_test_split(
+            X_scaled, y, test_size=test_size, random_state=random_state, stratify=stratify
+        )


suggestion: Stratification may fail if class distribution is highly imbalanced.

Handle potential exceptions from train_test_split when stratification fails, or add a warning for cases where stratification cannot be performed.

Suggested change

scaler = StandardScaler()

X_scaled = scaler.fit_transform(X)

stratify = y if len(np.unique(y)) > 1 else None

X_train, X_test, y_train, y_test = train_test_split(

X_scaled, y, test_size=test_size, random_state=random_state, stratify=stratify

)

import warnings

scaler = StandardScaler()

X_scaled = scaler.fit_transform(X)

stratify = y if len(np.unique(y)) > 1 else None

try:

X_train, X_test, y_train, y_test = train_test_split(

X_scaled, y, test_size=test_size, random_state=random_state, stratify=stratify

)

except ValueError as e:

warnings.warn(

f"Stratified train_test_split failed due to: {e}. Proceeding without stratification.",

UserWarning

)

X_train, X_test, y_train, y_test = train_test_split(

X_scaled, y, test_size=test_size, random_state=random_state, stratify=None

)

sourcery-ai · 2025-11-16T11:10:47Z

src/service/market_classifier_service.py

+        if frame["state"].isna().all():
+            raise ModelTrainingError("No market-state labels available. Train the clustering model first.")


suggestion (bug_risk): All NaN states may indicate a deeper issue with clustering or feature engineering.

Log diagnostic details to identify whether the issue originates from feature computation or clustering.

Suggested change

if frame["state"].isna().all():

raise ModelTrainingError("No market-state labels available. Train the clustering model first.")

if frame["state"].isna().all():

# Diagnostic logging for feature engineering and clustering issues

import logging

logger = logging.getLogger(__name__)

num_rows = len(frame)

num_state_nans = frame["state"].isna().sum()

feature_nan_counts = frame[feature_cols].isna().sum()

feature_nan_summary = feature_nan_counts.to_dict()

logger.error(

"All market-state labels are NaN. Diagnostics: "

f"num_rows={num_rows}, num_state_nans={num_state_nans}, "

f"feature_nan_summary={feature_nan_summary}"

)

logger.error(

"Feature columns summary statistics:\n%s",

frame[feature_cols].describe(include='all').to_string()

)

raise ModelTrainingError("No market-state labels available. Train the clustering model first.")

sourcery-ai · 2025-11-16T11:10:47Z

README.md

 - Time range analysis
 - Clean CLI interface with detailed logging
 - Cross-platform support (Windows & Linux)
+- Trainable K-Means clustering to classify market states (Bullish/Bearish/Sideway)


issue (typo): Replace 'Sideway' with 'Sideways' for correct terminology.

Update all instances of 'Sideway' to 'Sideways' for accuracy and consistency.

Suggested change

- Trainable K-Means clustering to classify market states (Bullish/Bearish/Sideway)

- Trainable K-Means clustering to classify market states (Bullish/Bearish/Sideways)

sourcery-ai · 2025-11-16T11:10:47Z

README.md

+./scripts/linux/run forecast -c <CRYPTO>
+```
+
+> Recommended workflow: fetch data with `dataset`, train the clustering model once with `train`, optionally train the classifier with `train-classifier`, then re-run `market` (current state) and `forecast` (next state) whenever new klines are pulled. Xem thêm hướng dẫn chi tiết tại `docs/market_state_guide.md`.


issue: Translate the Vietnamese sentence to English for consistency.

Please translate the Vietnamese sentence to English to match the rest of the documentation and improve accessibility.

sourcery-ai · 2025-11-16T11:10:47Z

src/commands/forecast.py

+        service = MarketClassifierService()
+        result = service.forecast_next_state(crypto)
+
+        logger.info("Forecast for %s (%s)", result["symbol"], result["interval"])
+        logger.info("=" * 50)
+        logger.info("Prediction timestamp: %s", result["prediction_timestamp"])
+        logger.info("Predicted next state: %s", result["predicted_state"])
+        if result["state_probabilities"]:
+            logger.info("Probabilities:\n%s", json.dumps(result["state_probabilities"], indent=2))
+        logger.info("Model: %s", result["model_path"])


issue (code-quality): Extract code out into function (extract-method)

src/commands/market.py

src/service/market_state_service.py

TKira4 added 2 commits November 16, 2025 17:03

sourcery-ai bot reviewed Nov 16, 2025

View reviewed changes

Merge branch 'main' into train

7931aaa

wwenrr merged commit bf55216 into main Nov 16, 2025
1 of 3 checks passed

		if frame["state"].isna().all():
		raise ModelTrainingError("No market-state labels available. Train the clustering model first.")

	- Trainable K-Means clustering to classify market states (Bullish/Bearish/Sideway)
	- Trainable K-Means clustering to classify market states (Bullish/Bearish/Sideways)

Train #3

Train #3

Uh oh!

Conversation

TKira4 commented Nov 16, 2025 • edited by sourcery-ai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by Sourcery

Uh oh!

sourcery-ai bot commented Nov 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviewer's Guide

Sequence diagram for training and using market state models

Sequence diagram for classifier training and forecasting

Class diagram for new and updated market state and classifier services

Class diagram for new CLI commands

File-Level Changes

Interacting with Sourcery

Customizing Your Experience

Getting Help

Uh oh!

sourcery-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

sourcery-ai bot Nov 16, 2025

Choose a reason for hiding this comment

Uh oh!

sourcery-ai bot Nov 16, 2025

Choose a reason for hiding this comment

Uh oh!

sourcery-ai bot Nov 16, 2025

Choose a reason for hiding this comment

Uh oh!

sourcery-ai bot Nov 16, 2025

Choose a reason for hiding this comment

Uh oh!

sourcery-ai bot Nov 16, 2025

Choose a reason for hiding this comment

Uh oh!

sourcery-ai bot Nov 16, 2025

Choose a reason for hiding this comment

Uh oh!

sourcery-ai bot Nov 16, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

TKira4 commented Nov 16, 2025 •

edited by sourcery-ai bot

Loading

sourcery-ai bot commented Nov 16, 2025 •

edited

Loading