yzx hace 3 días
padre
commit
fca9d42d5c

+ 77 - 0
debug-xfyun-asr-no-response.md

@@ -0,0 +1,77 @@
+# [OPEN] xfyun-asr-no-response
+
+## Symptoms
+- Outbound call connects and TTS says "您好" for both XFYUN standard voice and clone voice.
+- After the greeting finishes, the callee speaks, but the robot does not respond.
+
+## Expected
+- After the greeting ends, ASR should receive caller speech and trigger the next interaction round.
+
+## Hypotheses
+1. ASR is still paused after TTS playback and never resumes, so caller speech is never delivered to the robot loop.
+2. `recvPlayBackEndEvent` or `ttsChannelClosed` is not transitioned correctly, so the main loop keeps waiting and drops ASR events.
+3. The first greeting path still goes through direct streaming `sendToTts()/closeTts()` and bypasses the new file-playback flow, causing TTS/ASR timing mismatch.
+4. ASR does receive speech, but `allowInterrupt` / VAD gating drops the result before `interactWithRobot()` can continue.
+5. The outbound greeting completes, but the robot never re-enters the next wait cycle because playback-finished signaling is released too early or too late.
+
+## Existing Evidence
+- Prior logs show XFYUN clone could buffer a large amount of audio and keep copying after final frame.
+- Prior logs also showed Doubao duplicate playback due to Java-side double feed, which has been narrowed down separately.
+- Current symptom now affects both XFYUN standard and clone, so shared post-TTS state handling is highly suspect.
+- Standard-voice call `2606022315190110001` first 100 log lines confirm normal outbound setup:
+  - call answered successfully
+  - `record_session` attached
+  - `llm_wait.wav` played
+  - `xf_tts_mode` variable was set
+- The provided slice stops before the key evidence region. It does not yet include:
+  - first `speak(...)`
+  - `Speech-Open` / `Speech-Closed`
+  - `PLAYBACK_STOP`
+  - ASR middle/vad/result logs
+  - Java-side robot state logs (`waitForCustomerSpeak`, `resumeAsr`, dropped ASR decisions)
+- Additional evidence from standard voice call `2606022315190110001`:
+  - `Speech-Closed` is emitted at `23:15:25.928`
+  - ASR continues reading and sending audio after that
+  - ASR produces valid `middle` and `vad` results:
+    - `text=对`
+    - `text=哎你好`
+    - `text=喂,你好。`
+    - `text=你好`
+    - `text=你好。`
+  - Despite valid ASR results, the next robot reply at `23:15:33.448` is `不好意思我刚刚没听清,您能再说一遍吗?`
+- User confirmed Doubao TTS works normally in the same outbound conversation flow and can continue multi-turn dialogue.
+
+## Hypothesis Status
+- H1 rejected: ASR is not completely paused forever. Evidence shows continuous `audio-read`, `audio-send`, and valid `result-generated`.
+- H2 confirmed as root cause in Java state handling: `Speech-Closed` does arrive, but `recvPlayBackEndEvent` was not set to `true` in the `TtsEvent -> Speech-Closed` branch. As a result, XFYUN ASR results were dropped by the playback gate (`!getAllowInterrupt() && !recvPlayBackEndEvent`).
+- H3 confirmed as contributing factor: the greeting still goes through direct streaming TTS path.
+- H4 strongly supported: caller speech reaches ASR, but the robot logic does not consume it as an effective next-turn input and falls back to the no-hear retry prompt.
+- H5 still possible: application-side playback-finished / wake-up signaling may be delayed or mismatched with ASR result timing.
+- New narrowing: because Doubao works, the failure is likely XFYUN-specific event semantics, XFYUN resume/close handling, or XFYUN path differences in the first-turn streaming flow, rather than a fully generic outbound/ASR loop failure.
+
+## Fix Applied
+- In `RobotChat`, when receiving `CUSTOM TtsEvent -> Speech-Closed`, also set:
+  - `recvPlayBackEndEvent = true`
+  - `playbackEndTime = System.currentTimeMillis()`
+- This aligns XFYUN `Speech-Closed` semantics with the gating logic used later in `waitForCustomerSpeakEx()` and ASR event consumption.
+- Added an XFYUN-specific ASR resume guard delay before `resumeAsr()`:
+  - config key: `xfyun-asr-resume-delay-ms`
+  - default: `600`
+- Added an XFYUN-specific short-window greeting echo filter:
+  - config key: `xfyun-greeting-filter-window-ms`
+  - default: `1800`
+  - filters immediate post-playback short greetings like `你好/您好/喂你好/嗯你好`
+  - goal: avoid self-playback bleed entering the next LLM turn
+
+## Next Evidence To Collect
+- Runtime logs around:
+  - greeting TTS start/end
+  - `Speech-Open` / `Speech-Closed`
+  - `PLAYBACK_STOP`
+  - `pauseAsr()` / `resumeAsr()`
+  - incoming ASR middle/vad events
+  - `recvPlayBackEndEvent`, `ttsChannelClosed`, `interactiveParam.inSpeaking`
+
+## Status
+- Session opened.
+- No business logic changed in this debug session yet.

+ 1 - 0
src/main/java/com/telerobot/fs/entity/pojo/TtsProvider.java

@@ -4,6 +4,7 @@ public class TtsProvider {
     public static final String ALIYUN = "aliyun_tts";
     public static final String DOUBAO = "doubao_vcl_tts";
     public static final String XFYUN = "xf_tts";
+    public static final String TX_TTS1 = "tx_tts1";
     public static final String MICROSOFT = "microsoft_tts";
     public static final String CHINA_TELECOM = "chinatelecom_tts";
 

+ 10 - 2
src/main/java/com/telerobot/fs/ivr/IvrSession.java

@@ -4,6 +4,7 @@ import com.alibaba.fastjson.JSON;
 import com.telerobot.fs.config.AudioUtils;
 import com.telerobot.fs.config.SystemConfig;
 import com.telerobot.fs.entity.bo.InboundDetail;
+import com.telerobot.fs.entity.pojo.TtsProvider;
 import com.telerobot.fs.global.BizThreadPoolForEsl;
 import com.telerobot.fs.utils.*;
 import com.telerobot.fs.wshandle.MessageResponse;
@@ -238,7 +239,14 @@ public class IvrSession {
             lastPlaybackType = PlaybackType.PLAYBACK_TEXT;
             if (text != null && !text.trim().isEmpty()) {
                 setWavPlaybackTimeout();
-                String args = String.format("%s|%s|%s", ttsProvider, voiceCode, text);
+                String args;
+                if (TtsProvider.XFYUN.equalsIgnoreCase(ttsProvider)
+                        || TtsProvider.TX_TTS1.equalsIgnoreCase(ttsProvider)) {
+                    EslConnectionUtil.sendExecuteCommand("set", "cache_speech_handles=true", sessionId);
+                    args = String.format("%s|%s|{channel-uuid=%s}%s", ttsProvider, voiceCode, sessionId, text);
+                } else {
+                    args = String.format("%s|%s|%s", ttsProvider, voiceCode, text);
+                }
                 logger.info("Play TTS: SessionID={}, Text={}", sessionId, args);
                 EslConnectionUtil.sendExecuteCommand("speak",
                         args,
@@ -561,4 +569,4 @@ public class IvrSession {
         }
         return false;
     }
-}
+}

+ 10 - 3
src/main/java/com/telerobot/fs/robot/AbstractChatRobot.java

@@ -114,6 +114,9 @@ public abstract class AbstractChatRobot implements IChatRobot {
         if (isXfyunTtsProvider()) {
             return false;
         }
+        if (TtsProvider.TX_TTS1.equalsIgnoreCase(ttsProvider)) {
+            return false;
+        }
         return ttsTextLength >= 5 && checkPauseFlag(speechContent);
     }
 
@@ -227,14 +230,18 @@ public abstract class AbstractChatRobot implements IChatRobot {
     }
 
     private String buildSpeakCommand(String text) {
-        if (!TtsProvider.XFYUN.equalsIgnoreCase(ttsProvider)) {
+        if (!TtsProvider.XFYUN.equalsIgnoreCase(ttsProvider)
+                && !TtsProvider.TX_TTS1.equalsIgnoreCase(ttsProvider)) {
             return String.format("%s|%s|%s", ttsProvider, ttsVoiceName, text);
         }
 
         StringBuilder inlineParams = new StringBuilder();
         inlineParams.append("channel-uuid=").append(uuid);
-        inlineParams.append(",xf_tts_verify_peer=false");
-        if ("clone".equalsIgnoreCase(StringUtils.trimToEmpty(getAccount().ttsModels))) {
+        if (TtsProvider.XFYUN.equalsIgnoreCase(ttsProvider)) {
+            inlineParams.append(",xf_tts_verify_peer=false");
+        }
+        if (TtsProvider.XFYUN.equalsIgnoreCase(ttsProvider)
+                && "clone".equalsIgnoreCase(StringUtils.trimToEmpty(getAccount().ttsModels))) {
             inlineParams.append(",xf_tts_mode=clone");
         }
         return String.format("%s|%s|{%s}%s", ttsProvider, ttsVoiceName, inlineParams, text);

+ 3 - 1
src/main/java/com/telerobot/fs/robot/RobotBase.java

@@ -754,7 +754,9 @@ public abstract class RobotBase implements IEslEventListener {
     protected void resumeAsr(){
         if(asrPauseEnabled && !getAllowInterrupt()) {
             logger.info("{} try to resume asr  ", this.uuid);
-            EslConnectionUtil.sendExecuteCommand("pause_asr", "0", this.uuid, this.eslConnectionPool);
+            EslConnectionUtil.sendExecuteCommand(
+                    String.format("pause_%s_asr", chatRobot.getAccount().asrProvider),
+                    "0", this.uuid, this.eslConnectionPool);
         }
     }
 

+ 8 - 0
src/main/java/com/telerobot/fs/robot/RobotChat.java

@@ -175,6 +175,14 @@ public class RobotChat extends RobotBase {
                     uuid
             );
         }
+        if(ttsProvider.equalsIgnoreCase(TtsProvider.TX_TTS1)) {
+            logger.info("{} Current tts provider is tx_tts1, enable cache_speech_handles for tx_tts1_resume.",
+                    getTraceId());
+            EslConnectionUtil.sendExecuteCommand("set",
+                    "cache_speech_handles=true",
+                    uuid
+            );
+        }
         if(ttsProvider.equalsIgnoreCase(TtsProvider.MICROSOFT)) {
             logger.info("{}  Current tts provider is microsoft!", getTraceId());
         }

+ 12 - 2
src/main/java/com/telerobot/fs/tts/TtsUtil.java

@@ -1,6 +1,7 @@
 package com.telerobot.fs.tts;
 
 import com.telerobot.fs.tts.aliyun.AliyunTTSWebApi;
+import com.telerobot.fs.tts.tencent.TencentTTSWebApi;
 import link.thingscloud.freeswitch.esl.CommonUtils;
 import link.thingscloud.freeswitch.esl.util.CurrentTimeMillisClock;
 import org.apache.commons.lang.StringUtils;
@@ -51,8 +52,17 @@ public class TtsUtil {
                         CommonUtils.getStackTraceString(e.getStackTrace())
                );
             }
-        } else
-        if("azure".equals(voiceSource)) {
+        } else if("tx_tts1".equals(voiceSource)) {
+            try {
+                result = TencentTTSWebApi.shortTextTTSWebAPI(voiceCode, text, wavSavePath);
+            } catch (Throwable e) {
+                logger.error(" TencentTTSWebApi.shortTextTTSWebAPI error! voiceCode={}, wavSavePath={}, text={}, {} {}",
+                        voiceCode, wavSavePath, text,
+                        e.toString(),
+                        CommonUtils.getStackTraceString(e.getStackTrace())
+                );
+            }
+        } else if("azure".equals(voiceSource)) {
 
         }else{
             logger.error("unSupported tts source :{}", voiceSource);

+ 250 - 0
src/main/java/com/telerobot/fs/tts/tencent/TencentTTSWebApi.java

@@ -0,0 +1,250 @@
+package com.telerobot.fs.tts.tencent;
+
+import com.alibaba.fastjson.JSON;
+import com.alibaba.fastjson.JSONObject;
+import com.telerobot.fs.config.SystemConfig;
+import okhttp3.*;
+import org.apache.commons.lang.StringUtils;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import javax.crypto.Mac;
+import javax.crypto.spec.SecretKeySpec;
+import java.io.File;
+import java.io.FileOutputStream;
+import java.nio.charset.StandardCharsets;
+import java.security.MessageDigest;
+import java.time.Instant;
+import java.time.ZoneOffset;
+import java.time.format.DateTimeFormatter;
+import java.util.Base64;
+import java.util.concurrent.TimeUnit;
+
+public class TencentTTSWebApi {
+    private static final Logger log = LoggerFactory.getLogger(TencentTTSWebApi.class);
+    private static final MediaType JSON_TYPE = MediaType.parse("application/json; charset=utf-8");
+    private static final DateTimeFormatter DATE_FORMATTER = DateTimeFormatter.ofPattern("yyyy-MM-dd").withZone(ZoneOffset.UTC);
+    private static final OkHttpClient CLIENT = new OkHttpClient.Builder()
+            .connectTimeout(10, TimeUnit.SECONDS)
+            .readTimeout(20, TimeUnit.SECONDS)
+            .build();
+
+    private static volatile String ttsAccountJson;
+    private static volatile JSONObject ttsAccount;
+
+    private static JSONObject getAccount() {
+        String latestJson = SystemConfig.getValue("tx-tts1-account-json", "");
+        if (StringUtils.isBlank(latestJson)) {
+            log.error("param `tx-tts1-account-json` can not be blank.");
+            return null;
+        }
+        if (!latestJson.equals(ttsAccountJson) || ttsAccount == null) {
+            synchronized (TencentTTSWebApi.class) {
+                latestJson = SystemConfig.getValue("tx-tts1-account-json", "");
+                if (!latestJson.equals(ttsAccountJson) || ttsAccount == null) {
+                    ttsAccountJson = latestJson;
+                    try {
+                        ttsAccount = JSON.parseObject(latestJson);
+                    } catch (Throwable e) {
+                        log.error("parse `tx-tts1-account-json` error: {}", e.toString());
+                        ttsAccount = null;
+                    }
+                }
+            }
+        }
+        return ttsAccount;
+    }
+
+    private static String config(JSONObject account, String key, String defaultValue) {
+        if (account == null) {
+            return defaultValue;
+        }
+        String value = account.getString(key);
+        return StringUtils.isBlank(value) ? defaultValue : value.trim();
+    }
+
+    private static int configInt(JSONObject account, String key, int defaultValue) {
+        if (account == null) {
+            return defaultValue;
+        }
+        Integer value = account.getInteger(key);
+        return value == null ? defaultValue : value;
+    }
+
+    private static boolean configBool(JSONObject account, String key, boolean defaultValue) {
+        if (account == null) {
+            return defaultValue;
+        }
+        String value = account.getString(key);
+        if (StringUtils.isBlank(value)) {
+            return defaultValue;
+        }
+        return "1".equals(value) || "true".equalsIgnoreCase(value) || "yes".equalsIgnoreCase(value);
+    }
+
+    private static String normalizeLanguage(String value) {
+        if (StringUtils.isBlank(value)) {
+            return "zh";
+        }
+        String lang = value.trim().replace('_', '-');
+        if ("zh-CN".equalsIgnoreCase(lang) || "zh".equalsIgnoreCase(lang)) return "zh";
+        if ("en-US".equalsIgnoreCase(lang) || "en".equalsIgnoreCase(lang)) return "en";
+        if ("ja-JP".equalsIgnoreCase(lang) || "ja".equalsIgnoreCase(lang)) return "ja";
+        if ("ko-KR".equalsIgnoreCase(lang) || "ko".equalsIgnoreCase(lang)) return "ko";
+        if ("yue-HK".equalsIgnoreCase(lang) || "yue".equalsIgnoreCase(lang)) return "yue";
+        int idx = lang.indexOf('-');
+        return idx > 0 ? lang.substring(0, idx) : lang;
+    }
+
+    private static String sha256Hex(String value) throws Exception {
+        MessageDigest digest = MessageDigest.getInstance("SHA-256");
+        byte[] hash = digest.digest(value.getBytes(StandardCharsets.UTF_8));
+        return bytesToHex(hash);
+    }
+
+    private static byte[] hmacSha256(byte[] key, String message) throws Exception {
+        Mac mac = Mac.getInstance("HmacSHA256");
+        mac.init(new SecretKeySpec(key, "HmacSHA256"));
+        return mac.doFinal(message.getBytes(StandardCharsets.UTF_8));
+    }
+
+    private static byte[] hmacSha256(String key, String message) throws Exception {
+        return hmacSha256(key.getBytes(StandardCharsets.UTF_8), message);
+    }
+
+    private static String bytesToHex(byte[] bytes) {
+        StringBuilder sb = new StringBuilder(bytes.length * 2);
+        for (byte b : bytes) {
+            sb.append(String.format("%02x", b));
+        }
+        return sb.toString();
+    }
+
+    private static String buildAuthorization(JSONObject account, String body, long timestamp, String endpoint, String action) throws Exception {
+        String secretId = config(account, "secret-id", "");
+        String secretKey = config(account, "secret-key", "");
+        String date = DATE_FORMATTER.format(Instant.ofEpochSecond(timestamp));
+        String canonicalHeaders = "content-type:application/json; charset=utf-8\n" +
+                "host:" + endpoint + "\n" +
+                "x-tc-action:" + action.toLowerCase() + "\n";
+        String signedHeaders = "content-type;host;x-tc-action";
+        String canonicalRequest = "POST\n/\n\n" +
+                canonicalHeaders + "\n" +
+                signedHeaders + "\n" +
+                sha256Hex(body);
+        String credentialScope = date + "/mps/tc3_request";
+        String stringToSign = "TC3-HMAC-SHA256\n" +
+                timestamp + "\n" +
+                credentialScope + "\n" +
+                sha256Hex(canonicalRequest);
+
+        byte[] kDate = hmacSha256("TC3" + secretKey, date);
+        byte[] kService = hmacSha256(kDate, "mps");
+        byte[] kSigning = hmacSha256(kService, "tc3_request");
+        String signature = bytesToHex(hmacSha256(kSigning, stringToSign));
+        return "TC3-HMAC-SHA256 Credential=" + secretId + "/" + credentialScope +
+                ", SignedHeaders=" + signedHeaders +
+                ", Signature=" + signature;
+    }
+
+    private static boolean processPostRequest(JSONObject account, String voiceCode, String text, String audioSaveFile) {
+        try {
+            String endpoint = config(account, "endpoint", "mps.tencentcloudapi.com");
+            String action = config(account, "action", "SyncDubbing");
+            String version = config(account, "version", "2019-06-12");
+            String region = config(account, "region", "");
+            String resourceId = config(account, "resource-id", "");
+            String language = normalizeLanguage(config(account, "text-lang", "zh"));
+            int sampleRate = configInt(account, "sample-rate", 8000);
+            int pitch = configInt(account, "pitch", 0);
+            boolean verifyPeer = configBool(account, "verify-peer", false);
+
+            JSONObject synExt = new JSONObject();
+            synExt.put("sampleRate", sampleRate);
+            synExt.put("pitch", pitch);
+            JSONObject extParam = new JSONObject();
+            extParam.put("synExt", synExt);
+
+            JSONObject bodyJson = new JSONObject();
+            bodyJson.put("Text", text);
+            bodyJson.put("TextLang", language);
+            bodyJson.put("VoiceId", StringUtils.isNotBlank(voiceCode) ? voiceCode : config(account, "voice-id", ""));
+            if (StringUtils.isNotBlank(resourceId)) {
+                bodyJson.put("ResourceId", resourceId);
+            }
+            bodyJson.put("ExtParam", extParam.toJSONString());
+            String body = bodyJson.toJSONString();
+
+            long timestamp = System.currentTimeMillis() / 1000L;
+            Request.Builder builder = new Request.Builder()
+                    .url("https://" + endpoint + "/")
+                    .post(RequestBody.create(JSON_TYPE, body))
+                    .addHeader("Content-Type", "application/json; charset=utf-8")
+                    .addHeader("Host", endpoint)
+                    .addHeader("Authorization", buildAuthorization(account, body, timestamp, endpoint, action))
+                    .addHeader("X-TC-Action", action)
+                    .addHeader("X-TC-Version", version)
+                    .addHeader("X-TC-Timestamp", String.valueOf(timestamp))
+                    .addHeader("X-TC-Language", "zh-CN");
+            if (StringUtils.isNotBlank(region)) {
+                builder.addHeader("X-TC-Region", region);
+            }
+
+            OkHttpClient client = verifyPeer ? CLIENT : CLIENT.newBuilder()
+                    .hostnameVerifier((hostname, session) -> true)
+                    .build();
+
+            try (Response response = client.newCall(builder.build()).execute()) {
+                String responseBody = response.body() == null ? "" : response.body().string();
+                if (!response.isSuccessful()) {
+                    log.error("Tencent tts http request failed, code={}, body={}", response.code(), responseBody);
+                    return false;
+                }
+
+                JSONObject json = JSON.parseObject(responseBody);
+                JSONObject rsp = json.getJSONObject("Response");
+                if (rsp == null) {
+                    log.error("Tencent tts invalid response: {}", responseBody);
+                    return false;
+                }
+                if (rsp.containsKey("Error")) {
+                    log.error("Tencent tts api error: {}", rsp.getJSONObject("Error").toJSONString());
+                    return false;
+                }
+                Integer errorCode = rsp.getInteger("ErrorCode");
+                if (errorCode != null && errorCode != 0) {
+                    log.error("Tencent tts business error, code={}, msg={}", errorCode, rsp.getString("Msg"));
+                    return false;
+                }
+
+                String audioData = rsp.getString("AudioData");
+                if (StringUtils.isBlank(audioData)) {
+                    log.error("Tencent tts response AudioData is empty: {}", responseBody);
+                    return false;
+                }
+
+                byte[] wavBytes = Base64.getDecoder().decode(audioData);
+                File target = new File(audioSaveFile);
+                try (FileOutputStream fout = new FileOutputStream(target)) {
+                    fout.write(wavBytes);
+                }
+                return true;
+            }
+        } catch (Throwable e) {
+            log.error("Tencent tts synthesize failed: {}", e.toString());
+            return false;
+        }
+    }
+
+    public static boolean shortTextTTSWebAPI(String voiceCode, String text, String ttsPath) {
+        if (!StringUtils.isNotBlank(StringUtils.trim(text))) {
+            log.info("tts text can not be null, ttsPath={}", ttsPath);
+            return true;
+        }
+        JSONObject account = getAccount();
+        if (account == null) {
+            return false;
+        }
+        return processPostRequest(account, voiceCode, text, ttsPath);
+    }
+}

+ 421 - 0
src/main/java/com/telerobot/fs/tts/xfyun/XfyunCloneTtsFileSynthesizer.java

@@ -0,0 +1,421 @@
+package com.telerobot.fs.tts.xfyun;
+
+import com.alibaba.fastjson.JSON;
+import com.alibaba.fastjson.JSONObject;
+import com.telerobot.fs.config.CallConfig;
+import com.telerobot.fs.config.SystemConfig;
+import com.telerobot.fs.utils.WaveHeader;
+import okhttp3.OkHttpClient;
+import okhttp3.Request;
+import okhttp3.Response;
+import okhttp3.WebSocket;
+import okhttp3.WebSocketListener;
+import okio.ByteString;
+import org.apache.commons.lang.StringUtils;
+import org.dom4j.Document;
+import org.dom4j.Element;
+import org.dom4j.io.SAXReader;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import javax.crypto.Mac;
+import javax.crypto.spec.SecretKeySpec;
+import javax.net.ssl.SSLContext;
+import javax.net.ssl.TrustManager;
+import javax.net.ssl.X509TrustManager;
+import java.io.ByteArrayOutputStream;
+import java.io.File;
+import java.io.FileOutputStream;
+import java.io.IOException;
+import java.net.URLEncoder;
+import java.nio.charset.StandardCharsets;
+import java.security.KeyManagementException;
+import java.security.NoSuchAlgorithmException;
+import java.security.SecureRandom;
+import java.time.ZoneId;
+import java.time.ZonedDateTime;
+import java.time.format.DateTimeFormatter;
+import java.util.Base64;
+import java.util.List;
+import java.util.UUID;
+import java.util.concurrent.CountDownLatch;
+import java.util.concurrent.TimeUnit;
+import java.util.concurrent.atomic.AtomicReference;
+
+public class XfyunCloneTtsFileSynthesizer {
+    private static final Logger logger = LoggerFactory.getLogger(XfyunCloneTtsFileSynthesizer.class);
+    private static final String CLONE_TTS_WS_URL = "wss://cn-huabei-1.xf-yun.com/v1/private/voice_clone";
+    private static final String CLONE_TTS_HOST = "cn-huabei-1.xf-yun.com";
+    private static final String CLONE_TTS_PATH = "/v1/private/voice_clone";
+    private static final int DEFAULT_SAMPLE_RATE = 8000;
+    private static final int DEFAULT_TIMEOUT_SECONDS = 45;
+
+    private XfyunCloneTtsFileSynthesizer() {
+    }
+
+    public static String synthesizeToWavFile(String traceId, String assetId, String text) {
+        if (StringUtils.isBlank(assetId) || StringUtils.isBlank(text)) {
+            return "";
+        }
+        try {
+            CloneConfig config = loadCloneConfig();
+            byte[] audioBytes = requestCloneAudio(config, assetId.trim(), text);
+            if (audioBytes.length == 0) {
+                logger.warn("{} xfyun clone tts returned empty audio.", traceId);
+                return "";
+            }
+            String wavPath = buildWavFilePath(traceId);
+            writePcmAsWav(wavPath, audioBytes, config.sampleRate);
+            logger.info("{} xfyun clone tts synthesized wav file={}, bytes={}, sampleRate={}",
+                    traceId, wavPath, audioBytes.length, config.sampleRate);
+            return wavPath;
+        } catch (Throwable e) {
+            logger.error("{} xfyun clone tts synthesize failed, assetId={}, err={}",
+                    traceId,
+                    assetId,
+                    e.toString(),
+                    e);
+            return "";
+        }
+    }
+
+    private static byte[] requestCloneAudio(CloneConfig config, String assetId, String text) throws Exception {
+        OkHttpClient client = config.verifyPeer ? createDefaultClient() : createUnsafeOkHttpClient();
+        String authUrl = buildCloneWebsocketUrl(config.apiKey, config.apiSecret);
+        CountDownLatch latch = new CountDownLatch(1);
+        AtomicReference<String> errRef = new AtomicReference<String>("");
+        ByteArrayOutputStream audioOutput = new ByteArrayOutputStream();
+
+        Request request = new Request.Builder().url(authUrl).build();
+        client.newWebSocket(request, new WebSocketListener() {
+            @Override
+            public void onOpen(WebSocket webSocket, Response response) {
+                JSONObject payload = buildCloneTtsPayload(config, assetId, text);
+                webSocket.send(payload.toJSONString());
+            }
+
+            @Override
+            public void onMessage(WebSocket webSocket, String textMessage) {
+                try {
+                    JSONObject json = JSON.parseObject(textMessage);
+                    int code = parseResponseCode(json);
+                    if (code != 0) {
+                        errRef.set(buildResponseError(json));
+                        webSocket.close(1000, "error");
+                        latch.countDown();
+                        return;
+                    }
+
+                    JSONObject payload = json.getJSONObject("payload");
+                    if (payload == null) {
+                        return;
+                    }
+                    JSONObject audio = payload.getJSONObject("audio");
+                    if (audio == null) {
+                        return;
+                    }
+
+                    String audioChunk = audio.getString("audio");
+                    if (StringUtils.isNotBlank(audioChunk)) {
+                        audioOutput.write(Base64.getDecoder().decode(audioChunk));
+                    }
+
+                    if (audio.getIntValue("status") == 2) {
+                        webSocket.close(1000, "done");
+                        latch.countDown();
+                    }
+                } catch (Exception e) {
+                    errRef.set("parse clone tts response failed: " + e.getMessage());
+                    webSocket.close(1000, "parse-error");
+                    latch.countDown();
+                }
+            }
+
+            @Override
+            public void onMessage(WebSocket webSocket, ByteString bytes) {
+                onMessage(webSocket, bytes.utf8());
+            }
+
+            @Override
+            public void onFailure(WebSocket webSocket, Throwable t, Response response) {
+                errRef.set(buildWebsocketFailureMessage(t, response));
+                latch.countDown();
+            }
+
+            @Override
+            public void onClosed(WebSocket webSocket, int code, String reason) {
+                latch.countDown();
+            }
+        });
+
+        if (!latch.await(DEFAULT_TIMEOUT_SECONDS, TimeUnit.SECONDS)) {
+            throw new IOException("xfyun clone tts timeout");
+        }
+        if (StringUtils.isNotBlank(errRef.get())) {
+            throw new IOException(errRef.get());
+        }
+        return audioOutput.toByteArray();
+    }
+
+    private static JSONObject buildCloneTtsPayload(CloneConfig config, String assetId, String text) {
+        JSONObject root = new JSONObject(true);
+
+        JSONObject header = new JSONObject(true);
+        header.put("app_id", config.appId);
+        header.put("status", 2);
+        header.put("res_id", assetId);
+        root.put("header", header);
+
+        JSONObject audio = new JSONObject(true);
+        audio.put("encoding", "raw");
+        audio.put("sample_rate", config.sampleRate);
+
+        JSONObject tts = new JSONObject(true);
+        tts.put("vcn", config.cloneVcn);
+        tts.put("volume", 50);
+        tts.put("rhy", 0);
+        tts.put("pybuffer", 1);
+        tts.put("speed", 50);
+        tts.put("pitch", 50);
+        tts.put("bgs", 0);
+        tts.put("reg", 0);
+        tts.put("rdn", 0);
+        tts.put("audio", audio);
+
+        JSONObject parameter = new JSONObject(true);
+        parameter.put("tts", tts);
+        root.put("parameter", parameter);
+
+        JSONObject textNode = new JSONObject(true);
+        textNode.put("encoding", "utf8");
+        textNode.put("compress", "raw");
+        textNode.put("format", "plain");
+        textNode.put("status", 2);
+        textNode.put("seq", 0);
+        textNode.put("text", Base64.getEncoder().encodeToString(text.getBytes(StandardCharsets.UTF_8)));
+
+        JSONObject payload = new JSONObject(true);
+        payload.put("text", textNode);
+        root.put("payload", payload);
+        return root;
+    }
+
+    private static CloneConfig loadCloneConfig() throws Exception {
+        String fsConfDirectory = SystemConfig.getValue("fs_conf_directory");
+        if (StringUtils.isBlank(fsConfDirectory)) {
+            throw new IllegalStateException("fs_conf_directory is empty");
+        }
+
+        File confFile = new File(fsConfDirectory, "autoload_configs/xf_tts.conf.xml");
+        if (!confFile.exists()) {
+            throw new IllegalStateException("xf_tts.conf.xml not found: " + confFile.getAbsolutePath());
+        }
+
+        SAXReader reader = new SAXReader();
+        Document document = reader.read(confFile);
+        List<Element> params = new java.util.ArrayList<Element>();
+        collectElementsByName(document.getRootElement(), "param", params);
+
+        CloneConfig config = new CloneConfig();
+        for (Element element : params) {
+            String name = StringUtils.trimToEmpty(element.attributeValue("name"));
+            String value = StringUtils.trimToEmpty(element.attributeValue("value"));
+            if ("app-id".equalsIgnoreCase(name)) {
+                config.appId = value;
+            } else if ("api-key".equalsIgnoreCase(name)) {
+                config.apiKey = value;
+            } else if ("api-secret".equalsIgnoreCase(name)) {
+                config.apiSecret = value;
+            } else if ("sample-rate".equalsIgnoreCase(name)) {
+                config.sampleRate = parseSampleRate(value);
+            } else if ("verify-peer".equalsIgnoreCase(name)) {
+                config.verifyPeer = !"false".equalsIgnoreCase(value);
+            } else if ("clone-vcn".equalsIgnoreCase(name)) {
+                config.cloneVcn = value;
+            } else if ("clone-engine-version".equalsIgnoreCase(name) && "omni_v1".equalsIgnoreCase(value)) {
+                config.cloneVcn = "x6_clone";
+            }
+        }
+
+        if (StringUtils.isBlank(config.appId)
+                || StringUtils.isBlank(config.apiKey)
+                || StringUtils.isBlank(config.apiSecret)) {
+            throw new IllegalStateException("xf_tts.conf.xml is missing clone account credentials");
+        }
+        if (StringUtils.isBlank(config.cloneVcn)) {
+            config.cloneVcn = "x5_clone";
+        }
+        return config;
+    }
+
+    private static void collectElementsByName(Element element, String name, List<Element> result) {
+        if (element == null) {
+            return;
+        }
+        if (name.equalsIgnoreCase(element.getName())) {
+            result.add(element);
+        }
+        List<Element> children = element.elements();
+        for (Element child : children) {
+            collectElementsByName(child, name, result);
+        }
+    }
+
+    private static int parseSampleRate(String value) {
+        try {
+            int sampleRate = Integer.parseInt(StringUtils.trimToEmpty(value));
+            return sampleRate > 0 ? sampleRate : DEFAULT_SAMPLE_RATE;
+        } catch (Exception e) {
+            return DEFAULT_SAMPLE_RATE;
+        }
+    }
+
+    private static String buildCloneWebsocketUrl(String apiKey, String apiSecret) throws Exception {
+        String date = DateTimeFormatter.RFC_1123_DATE_TIME.format(ZonedDateTime.now(ZoneId.of("GMT")));
+        String signatureOrigin = "host: " + CLONE_TTS_HOST + "\n"
+                + "date: " + date + "\n"
+                + "GET " + CLONE_TTS_PATH + " HTTP/1.1";
+        Mac mac = Mac.getInstance("HmacSHA256");
+        mac.init(new SecretKeySpec(apiSecret.getBytes(StandardCharsets.UTF_8), "HmacSHA256"));
+        String signature = Base64.getEncoder().encodeToString(mac.doFinal(signatureOrigin.getBytes(StandardCharsets.UTF_8)));
+        String authorizationOrigin = String.format(
+                "api_key=\"%s\", algorithm=\"hmac-sha256\", headers=\"host date request-line\", signature=\"%s\"",
+                apiKey,
+                signature
+        );
+        String authorization = Base64.getEncoder().encodeToString(authorizationOrigin.getBytes(StandardCharsets.UTF_8));
+        return CLONE_TTS_WS_URL
+                + "?authorization=" + urlEncode(authorization)
+                + "&date=" + urlEncode(date)
+                + "&host=" + urlEncode(CLONE_TTS_HOST);
+    }
+
+    private static String buildWavFilePath(String traceId) {
+        String recordRoot = StringUtils.defaultIfEmpty(CallConfig.RECORDINGS_PATH, ".");
+        File targetDir = new File(recordRoot, "xf_tts_clone");
+        if (!targetDir.exists()) {
+            synchronized (targetDir.getAbsolutePath().intern()) {
+                if (!targetDir.exists()) {
+                    targetDir.mkdirs();
+                }
+            }
+        }
+        String safeTraceId = StringUtils.defaultIfEmpty(traceId, "xf_clone").replaceAll("[^0-9A-Za-z_-]", "_");
+        String fileName = safeTraceId + "_" + System.currentTimeMillis() + "_" + UUID.randomUUID().toString().replace("-", "") + ".wav";
+        return new File(targetDir, fileName).getAbsolutePath();
+    }
+
+    private static void writePcmAsWav(String filePath, byte[] pcmBytes, int sampleRate) throws IOException {
+        File targetFile = new File(filePath);
+        File parentDir = targetFile.getParentFile();
+        if (parentDir != null && !parentDir.exists()) {
+            synchronized (parentDir.getAbsolutePath().intern()) {
+                if (!parentDir.exists()) {
+                    parentDir.mkdirs();
+                }
+            }
+        }
+
+        WaveHeader header = new WaveHeader();
+        header.fileLength = pcmBytes.length + 36;
+        header.FmtHdrLeth = 16;
+        header.FormatTag = 1;
+        header.Channels = 1;
+        header.SamplesPerSec = sampleRate;
+        header.BitsPerSample = 16;
+        header.BlockAlign = (short) (header.Channels * header.BitsPerSample / 8);
+        header.AvgBytesPerSec = header.BlockAlign * header.SamplesPerSec;
+        header.DataHdrLeth = pcmBytes.length;
+
+        try (FileOutputStream outputStream = new FileOutputStream(targetFile)) {
+            outputStream.write(header.getHeader());
+            outputStream.write(pcmBytes);
+            outputStream.flush();
+        }
+    }
+
+    private static int parseResponseCode(JSONObject json) {
+        if (json == null) {
+            return -1;
+        }
+        if (json.containsKey("code")) {
+            return json.getIntValue("code");
+        }
+        JSONObject header = json.getJSONObject("header");
+        return header == null ? -1 : header.getIntValue("code");
+    }
+
+    private static String buildResponseError(JSONObject json) {
+        JSONObject header = json == null ? null : json.getJSONObject("header");
+        if (header != null) {
+            return "xfyun clone tts error: code=" + header.getIntValue("code")
+                    + ", message=" + header.getString("message")
+                    + ", body=" + json.toJSONString();
+        }
+        return "xfyun clone tts error: " + (json == null ? "" : json.toJSONString());
+    }
+
+    private static String buildWebsocketFailureMessage(Throwable throwable, Response response) {
+        StringBuilder sb = new StringBuilder("xfyun clone websocket failed");
+        if (response != null) {
+            sb.append(": http ").append(response.code());
+        }
+        if (throwable != null && StringUtils.isNotBlank(throwable.getMessage())) {
+            sb.append(", ").append(throwable.getMessage());
+        }
+        return sb.toString();
+    }
+
+    private static String urlEncode(String value) throws Exception {
+        return URLEncoder.encode(value, "UTF-8");
+    }
+
+    private static OkHttpClient createDefaultClient() {
+        return new OkHttpClient.Builder()
+                .connectTimeout(DEFAULT_TIMEOUT_SECONDS, TimeUnit.SECONDS)
+                .readTimeout(DEFAULT_TIMEOUT_SECONDS, TimeUnit.SECONDS)
+                .writeTimeout(DEFAULT_TIMEOUT_SECONDS, TimeUnit.SECONDS)
+                .build();
+    }
+
+    private static OkHttpClient createUnsafeOkHttpClient() {
+        try {
+            final TrustManager[] trustAllCerts = new TrustManager[]{
+                    new X509TrustManager() {
+                        @Override
+                        public void checkClientTrusted(java.security.cert.X509Certificate[] chain, String authType) {
+                        }
+
+                        @Override
+                        public void checkServerTrusted(java.security.cert.X509Certificate[] chain, String authType) {
+                        }
+
+                        @Override
+                        public java.security.cert.X509Certificate[] getAcceptedIssuers() {
+                            return new java.security.cert.X509Certificate[0];
+                        }
+                    }
+            };
+            SSLContext sslContext = SSLContext.getInstance("SSL");
+            sslContext.init(null, trustAllCerts, new SecureRandom());
+            return new OkHttpClient.Builder()
+                    .connectTimeout(DEFAULT_TIMEOUT_SECONDS, TimeUnit.SECONDS)
+                    .readTimeout(DEFAULT_TIMEOUT_SECONDS, TimeUnit.SECONDS)
+                    .writeTimeout(DEFAULT_TIMEOUT_SECONDS, TimeUnit.SECONDS)
+                    .sslSocketFactory(sslContext.getSocketFactory(), (X509TrustManager) trustAllCerts[0])
+                    .hostnameVerifier((hostname, session) -> true)
+                    .build();
+        } catch (NoSuchAlgorithmException | KeyManagementException e) {
+            throw new IllegalStateException("create unsafe okhttp client failed", e);
+        }
+    }
+
+    private static class CloneConfig {
+        private String appId;
+        private String apiKey;
+        private String apiSecret;
+        private int sampleRate = DEFAULT_SAMPLE_RATE;
+        private boolean verifyPeer = true;
+        private String cloneVcn = "x5_clone";
+    }
+}