| Dataset | Classes | Best Acc. | Samples | Drives |
|---|---|---|---|---|
| scoring | A (chaud) / B / C / D (perdu) | 93.3% | 1,500 | Priorite prospect + KPIs |
| objection | Prix / Delai / Qualite / Relation / Inertie / Concurrent | 100% | 1,000 | Strategie du message |
| action | Relancer / Attendre / Contre-offre / Escalade / Abandonner | 93.3% | 1,500 | Decision — quoi faire |
| canal | Email / Telephone / Visite / LinkedIn | 95.6% | 1,700 | Par ou contacter |
| timing | Immediat / Sous_3j / Sous_7j / Attendre_signal | 95.6% | 1,700 | Quand contacter |
All 5 share the same 21 features extracted from CRM data: segment, ca_potentiel, jours_sans_contact, has_concurrent, concurrent, phase, nb_signaux_positifs, nb_signaux_negatifs, has_visite, has_devis, has_volume_eleve, has_besoin_precis, has_objection_prix, has_non_reponse, objection_type, gravite, niveau, accessibilite, tone, nb_echanges, has_relance_recente.
When a commercial sends a text to /comprendre, the 5 models fire in sequence. Each one answers a different question. Together, they compose the full recommendation.
Input: "Verrerie du Nord, trop cher, 3 semaines sans nouvelles"
|
extract 21 features
|
+-------+-------+-------+-------+
| | | | |
scoring objection action canal timing
C 87% Prix 100% Contre Email Immediat
| | offre 99% 100% 70%
| | | | |
+-------+-------+-------+-------+
|
compose response
|
scoring ---> KPIs (P(closing)=20%, expected_value=10K)
scoring ---> trust_score (snake_confidence component)
objection -> message strategie ("concession partielle")
objection -> xai ("Prix 1.00: trop cher vs GlassTech")
action ----> message template (contre-offre structure)
action ----> xai ("Contre-offre 0.99: objection traitable")
canal -----> message tone (email = formel, tel = direct)
canal -----> xai ("email prioritaire: 21j silence")
timing ----> urgence flag ("Immediat: risque perte")
Le scoring est le filtre de priorite. Un prospect A passe devant un B dans la file du commercial. Le scoring alimente aussi les KPIs : probabilite_closing (A=70%, B=48%, C=20%, D=5%) et expected_value (CA × P(closing)).
scoring.Prediction → A/B/C/Dscoring.Probability → distributionscoring.evolution → "A→B (degrade)"kpis.probabilite_closingkpis.expected_value
L'objection dicte la strategie du message. Prix = concession partielle. Concurrent = repositionnement valeur. Inertie = changer de canal. Le type d'objection est le facteur n°1 dans le choix de l'action.
objection.Prediction → typeobjection.detail → contexteobjection.sous_type → benchmark_concurrentmessage_genere.strategiexai.objection_audit
L'action est le coeur de la decision. Elle consomme les sorties du scoring (le prospect vaut-il l'effort ?) et de l'objection (que contrer ?). C'est elle qui determine la structure du message genere : contre-offre, relance, escalade, ou desengagement.
action.Prediction → decisionmessage_genere.corps (structure)message_genere.contraintes_snakexai.action_audit
Le canal adapte le ton et le format du message. Email = formel, detaille. Telephone = court, direct. LinkedIn = prospection, leger. Visite = closing. Le canal optimal depend de l'accessibilite et du tone de la relation.
canal.Prediction → moyencanal.raison → pourquoi ce canalmessage_genere (ton adapte)
Le timing transforme l'action en urgence. Meme action (relancer), mais "immediat" vs "sous 7j" change tout. Un concurrent mentionne + silence prolonge = chaque jour compte. Un prospect froid apres refus = attendre un signal.
timing.Prediction → quandtiming.raison → pourquoi maintenantkpis.jours_avant_perte_estimee
The 5 models are independent at inference (each takes the same 21 features), but the action dataset's label function depends on the scoring and objection labels. This creates an implicit dependency graph:
features (21)
|
+---> scoring ----+
| |
+---> objection --+---> action ---> message structure
| |
+---> canal ------+---> message tone
| |
+---> timing -----+---> urgence
| Dimension | Synthetic (now) | Real CRM data (goal) |
|---|---|---|
| Labels | Deterministic rules | Human judgement + outcomes |
| Features | 21 clean fields | 21 fields + noise, nulls, typos |
| Distribution | Uniform random + edge cases | Power law (80% Prix, long tail) |
| Objection | 1:1 feature-to-label | Implicit, mixed, ambiguous |
| Volume | 800-1,700 / model | Need 300-500 labeled for production |
| Class balance | Attendre 1.4%, Visite 0.6% | Would be 10-15% each in reality |
The synthetic models prove the pipeline works end-to-end. The architecture is sound. What's missing is ground truth from real commercial activity — Salesforce exports, call notes, win/loss post-mortems. 500 labeled interactions per model gets us to production quality.
The trust_score in every /comprendre response reports how much signal the system had to work with:
"trust_score": {
"score": 85, // 0-100 overall
"breakdown": {
"extraction_quality": 100, // did we find real data vs defaults
"snake_confidence": 85, // avg max prob across 5 models
"input_richness": 90, // how much signal in the text
"method_bonus": 50 // 50=keywords, 100=haiku
}
}
snake_confidence is the direct link between data quality and product quality. When the models are uncertain (low max probability), the trust score drops, and the commercial knows to verify manually. Better training data → higher confidence → higher trust → less manual work.