/data

5 datasets, 5 Snake models, 1 product result

The 5 Datasets

Dataset	Classes	Best Acc.	Samples	Drives
scoring	A (chaud) / B / C / D (perdu)	93.3%	1,500	Priorite prospect + KPIs
objection	Prix / Delai / Qualite / Relation / Inertie / Concurrent	100%	1,000	Strategie du message
action	Relancer / Attendre / Contre-offre / Escalade / Abandonner	93.3%	1,500	Decision — quoi faire
canal	Email / Telephone / Visite / LinkedIn	95.6%	1,700	Par ou contacter
timing	Immediat / Sous_3j / Sous_7j / Attendre_signal	95.6%	1,700	Quand contacter

All 5 share the same 21 features extracted from CRM data: segment, ca_potentiel, jours_sans_contact, has_concurrent, concurrent, phase, nb_signaux_positifs, nb_signaux_negatifs, has_visite, has_devis, has_volume_eleve, has_besoin_precis, has_objection_prix, has_non_reponse, objection_type, gravite, niveau, accessibilite, tone, nb_echanges, has_relance_recente.

How Each Dataset Drives the Product

When a commercial sends a text to /comprendre, the 5 models fire in sequence. Each one answers a different question. Together, they compose the full recommendation.

The Pipeline

Input: "Verrerie du Nord, trop cher, 3 semaines sans nouvelles"
                        |
                 extract 21 features
                        |
        +-------+-------+-------+-------+
        |       |       |       |       |
    scoring  objection  action   canal   timing
     C 87%   Prix 100%  Contre   Email   Immediat
        |       |      offre 99%  100%    70%
        |       |       |       |       |
        +-------+-------+-------+-------+
                        |
                compose response
                        |
   scoring ---> KPIs (P(closing)=20%, expected_value=10K)
   scoring ---> trust_score (snake_confidence component)
   objection -> message strategie ("concession partielle")
   objection -> xai ("Prix 1.00: trop cher vs GlassTech")
   action ----> message template (contre-offre structure)
   action ----> xai ("Contre-offre 0.99: objection traitable")
   canal -----> message tone (email = formel, tel = direct)
   canal -----> xai ("email prioritaire: 21j silence")
   timing ----> urgence flag ("Immediat: risque perte")

Dataset → Product Field Mapping

scoring → Qui est chaud, qui est froid

Le scoring est le filtre de priorite. Un prospect A passe devant un B dans la file du commercial. Le scoring alimente aussi les KPIs : probabilite_closing (A=70%, B=48%, C=20%, D=5%) et expected_value (CA × P(closing)).

Produit par le scoring :
scoring.Prediction → A/B/C/D
scoring.Probability → distribution
scoring.evolution → "A→B (degrade)"
kpis.probabilite_closing
kpis.expected_value

Donnees d'entrainement :
• phase_cycle_vente (poids fort)
• jours_sans_contact (seuils: 14/30/45/60)
• nb_signaux_positifs vs negatifs
• gravite de l'objection
• tone du dernier echange

objection → Contre quoi on se bat

L'objection dicte la strategie du message. Prix = concession partielle. Concurrent = repositionnement valeur. Inertie = changer de canal. Le type d'objection est le facteur n°1 dans le choix de l'action.

Produit par l'objection :
objection.Prediction → type
objection.detail → contexte
objection.sous_type → benchmark_concurrent
message_genere.strategie
xai.objection_audit

Donnees d'entrainement :
• objection_type (feature dominante)
• has_objection_prix
• has_concurrent
• gravite
• 100% accuracy = presque un lookup

action → Quoi faire maintenant

L'action est le coeur de la decision. Elle consomme les sorties du scoring (le prospect vaut-il l'effort ?) et de l'objection (que contrer ?). C'est elle qui determine la structure du message genere : contre-offre, relance, escalade, ou desengagement.

Produit par l'action :
action.Prediction → decision
message_genere.corps (structure)
message_genere.contraintes_snake
xai.action_audit

Depend de :
• scoring (D + petit CA = abandonner)
• objection (prix traitable = contre-offre)
• gravite × ca_potentiel (escalade si gros deal bloque)
• jours_sans_contact × scoring

canal → Par ou passer

Le canal adapte le ton et le format du message. Email = formel, detaille. Telephone = court, direct. LinkedIn = prospection, leger. Visite = closing. Le canal optimal depend de l'accessibilite et du tone de la relation.

Produit par le canal :
canal.Prediction → moyen
canal.raison → pourquoi ce canal
message_genere (ton adapte)

Donnees d'entrainement :
• accessibilite (difficile → email)
• tone (positif → telephone/visite)
• niveau (inconnu → LinkedIn)
• jours_sans_contact × has_non_reponse

timing → Quand agir

Le timing transforme l'action en urgence. Meme action (relancer), mais "immediat" vs "sous 7j" change tout. Un concurrent mentionne + silence prolonge = chaque jour compte. Un prospect froid apres refus = attendre un signal.

Produit par le timing :
timing.Prediction → quand
timing.raison → pourquoi maintenant
kpis.jours_avant_perte_estimee

Donnees d'entrainement :
• jours_sans_contact (dominante)
• has_concurrent (accelere)
• scoring (D → attendre_signal)
• gravite × tone

The Interaction Graph

The 5 models are independent at inference (each takes the same 21 features), but the action dataset's label function depends on the scoring and objection labels. This creates an implicit dependency graph:

  features (21)
      |
      +---> scoring ----+
      |                  |
      +---> objection --+---> action ---> message structure
      |                  |
      +---> canal ------+---> message tone
      |                  |
      +---> timing -----+---> urgence

At inference, the 5 models run in parallel on the same features — no chaining. But the training data was generated with the action label depending on scoring + objection labels. This means action's accuracy is bounded by the quality of the scoring/objection label functions. If those are wrong in training, action inherits the error.

Synthetic vs Real — The Gap

Dimension	Synthetic (now)	Real CRM data (goal)
Labels	Deterministic rules	Human judgement + outcomes
Features	21 clean fields	21 fields + noise, nulls, typos
Distribution	Uniform random + edge cases	Power law (80% Prix, long tail)
Objection	1:1 feature-to-label	Implicit, mixed, ambiguous
Volume	800-1,700 / model	Need 300-500 labeled for production
Class balance	Attendre 1.4%, Visite 0.6%	Would be 10-15% each in reality

The synthetic models prove the pipeline works end-to-end. The architecture is sound. What's missing is ground truth from real commercial activity — Salesforce exports, call notes, win/loss post-mortems. 500 labeled interactions per model gets us to production quality.

Trust Score & Data Quality

The trust_score in every /comprendre response reports how much signal the system had to work with:

"trust_score": {
  "score": 85,                          // 0-100 overall
  "breakdown": {
    "extraction_quality": 100,          // did we find real data vs defaults
    "snake_confidence": 85,             // avg max prob across 5 models
    "input_richness": 90,               // how much signal in the text
    "method_bonus": 50                  // 50=keywords, 100=haiku
  }
}

snake_confidence is the direct link between data quality and product quality. When the models are uncertain (low max probability), the trust score drops, and the commercial knows to verify manually. Better training data → higher confidence → higher trust → less manual work.