← Agent Vente

/data

5 datasets, 5 Snake models, 1 product result

The 5 Datasets

Dataset Classes Best Acc. Samples Drives
scoring A (chaud) / B / C / D (perdu) 93.3% 1,500 Priorite prospect + KPIs
objection Prix / Delai / Qualite / Relation / Inertie / Concurrent 100% 1,000 Strategie du message
action Relancer / Attendre / Contre-offre / Escalade / Abandonner 93.3% 1,500 Decision — quoi faire
canal Email / Telephone / Visite / LinkedIn 95.6% 1,700 Par ou contacter
timing Immediat / Sous_3j / Sous_7j / Attendre_signal 95.6% 1,700 Quand contacter

All 5 share the same 21 features extracted from CRM data: segment, ca_potentiel, jours_sans_contact, has_concurrent, concurrent, phase, nb_signaux_positifs, nb_signaux_negatifs, has_visite, has_devis, has_volume_eleve, has_besoin_precis, has_objection_prix, has_non_reponse, objection_type, gravite, niveau, accessibilite, tone, nb_echanges, has_relance_recente.

How Each Dataset Drives the Product

When a commercial sends a text to /comprendre, the 5 models fire in sequence. Each one answers a different question. Together, they compose the full recommendation.

The Pipeline

Input: "Verrerie du Nord, trop cher, 3 semaines sans nouvelles"
                        |
                 extract 21 features
                        |
        +-------+-------+-------+-------+
        |       |       |       |       |
    scoring  objection  action   canal   timing
     C 87%   Prix 100%  Contre   Email   Immediat
        |       |      offre 99%  100%    70%
        |       |       |       |       |
        +-------+-------+-------+-------+
                        |
                compose response
                        |
   scoring ---> KPIs (P(closing)=20%, expected_value=10K)
   scoring ---> trust_score (snake_confidence component)
   objection -> message strategie ("concession partielle")
   objection -> xai ("Prix 1.00: trop cher vs GlassTech")
   action ----> message template (contre-offre structure)
   action ----> xai ("Contre-offre 0.99: objection traitable")
   canal -----> message tone (email = formel, tel = direct)
   canal -----> xai ("email prioritaire: 21j silence")
   timing ----> urgence flag ("Immediat: risque perte")

Dataset → Product Field Mapping

scoring → Qui est chaud, qui est froid

Le scoring est le filtre de priorite. Un prospect A passe devant un B dans la file du commercial. Le scoring alimente aussi les KPIs : probabilite_closing (A=70%, B=48%, C=20%, D=5%) et expected_value (CA × P(closing)).

Produit par le scoring :
scoring.Prediction → A/B/C/D
scoring.Probability → distribution
scoring.evolution → "A→B (degrade)"
kpis.probabilite_closing
kpis.expected_value
Donnees d'entrainement :
• phase_cycle_vente (poids fort)
• jours_sans_contact (seuils: 14/30/45/60)
• nb_signaux_positifs vs negatifs
• gravite de l'objection
• tone du dernier echange

objection → Contre quoi on se bat

L'objection dicte la strategie du message. Prix = concession partielle. Concurrent = repositionnement valeur. Inertie = changer de canal. Le type d'objection est le facteur n°1 dans le choix de l'action.

Produit par l'objection :
objection.Prediction → type
objection.detail → contexte
objection.sous_type → benchmark_concurrent
message_genere.strategie
xai.objection_audit
Donnees d'entrainement :
• objection_type (feature dominante)
• has_objection_prix
• has_concurrent
• gravite
• 100% accuracy = presque un lookup

action → Quoi faire maintenant

L'action est le coeur de la decision. Elle consomme les sorties du scoring (le prospect vaut-il l'effort ?) et de l'objection (que contrer ?). C'est elle qui determine la structure du message genere : contre-offre, relance, escalade, ou desengagement.

Produit par l'action :
action.Prediction → decision
message_genere.corps (structure)
message_genere.contraintes_snake
xai.action_audit
Depend de :
• scoring (D + petit CA = abandonner)
• objection (prix traitable = contre-offre)
• gravite × ca_potentiel (escalade si gros deal bloque)
• jours_sans_contact × scoring

canal → Par ou passer

Le canal adapte le ton et le format du message. Email = formel, detaille. Telephone = court, direct. LinkedIn = prospection, leger. Visite = closing. Le canal optimal depend de l'accessibilite et du tone de la relation.

Produit par le canal :
canal.Prediction → moyen
canal.raison → pourquoi ce canal
message_genere (ton adapte)
Donnees d'entrainement :
• accessibilite (difficile → email)
• tone (positif → telephone/visite)
• niveau (inconnu → LinkedIn)
• jours_sans_contact × has_non_reponse

timing → Quand agir

Le timing transforme l'action en urgence. Meme action (relancer), mais "immediat" vs "sous 7j" change tout. Un concurrent mentionne + silence prolonge = chaque jour compte. Un prospect froid apres refus = attendre un signal.

Produit par le timing :
timing.Prediction → quand
timing.raison → pourquoi maintenant
kpis.jours_avant_perte_estimee
Donnees d'entrainement :
• jours_sans_contact (dominante)
• has_concurrent (accelere)
• scoring (D → attendre_signal)
• gravite × tone

The Interaction Graph

The 5 models are independent at inference (each takes the same 21 features), but the action dataset's label function depends on the scoring and objection labels. This creates an implicit dependency graph:

  features (21)
      |
      +---> scoring ----+
      |                  |
      +---> objection --+---> action ---> message structure
      |                  |
      +---> canal ------+---> message tone
      |                  |
      +---> timing -----+---> urgence
At inference, the 5 models run in parallel on the same features — no chaining. But the training data was generated with the action label depending on scoring + objection labels. This means action's accuracy is bounded by the quality of the scoring/objection label functions. If those are wrong in training, action inherits the error.

Synthetic vs Real — The Gap

DimensionSynthetic (now)Real CRM data (goal)
LabelsDeterministic rulesHuman judgement + outcomes
Features21 clean fields21 fields + noise, nulls, typos
DistributionUniform random + edge casesPower law (80% Prix, long tail)
Objection1:1 feature-to-labelImplicit, mixed, ambiguous
Volume800-1,700 / modelNeed 300-500 labeled for production
Class balanceAttendre 1.4%, Visite 0.6%Would be 10-15% each in reality

The synthetic models prove the pipeline works end-to-end. The architecture is sound. What's missing is ground truth from real commercial activity — Salesforce exports, call notes, win/loss post-mortems. 500 labeled interactions per model gets us to production quality.

Trust Score & Data Quality

The trust_score in every /comprendre response reports how much signal the system had to work with:

"trust_score": {
  "score": 85,                          // 0-100 overall
  "breakdown": {
    "extraction_quality": 100,          // did we find real data vs defaults
    "snake_confidence": 85,             // avg max prob across 5 models
    "input_richness": 90,               // how much signal in the text
    "method_bonus": 50                  // 50=keywords, 100=haiku
  }
}

snake_confidence is the direct link between data quality and product quality. When the models are uncertain (low max probability), the trust score drops, and the commercial knows to verify manually. Better training data → higher confidence → higher trust → less manual work.