Added volume, rate, pitch attributes and getVoices method which returns SpeechSynthesisVoiceList.
authorGlen Shires <gshires@google.com>
Fri, 05 Oct 2012 11:07:29 -0700
changeset 41 fdc26488164f
parent 40 95fa61bdb089
child 42 dcc75df666a5
Added volume, rate, pitch attributes and getVoices method which returns SpeechSynthesisVoiceList.
speechapi.html
--- a/speechapi.html	Thu Oct 04 13:01:43 2012 -0700
+++ b/speechapi.html	Fri Oct 05 11:07:29 2012 -0700
@@ -343,7 +343,7 @@
       <p><a href="http://www.w3.org/"><img alt=W3C height=48 src="http://www.w3.org/Icons/w3c_home" width=72></a></p>
       <!--end-logo-->
       <h1 id="title_heading">Speech JavaScript API Specification</h1>
-      <h2 class="no-num no-toc" id="draft_date">Editor's Draft: 4 October 2012</h2>
+      <h2 class="no-num no-toc" id="draft_date">Editor's Draft: 5 October 2012</h2>
       <dl>
         <dt>Editors:</dt>
         <dd>Glen Shires, Google Inc.</dd>
@@ -398,6 +398,8 @@
       <li><a href="#utterance-attributes"><span class=secno>5.2.3 </span>SpeechSynthesisUtterance Attributes</a></li>
       <li><a href="#utterance-events"><span class=secno>5.2.4 </span>SpeechSynthesisUtterance Events</a></li>
       <li><a href="#callback-parameters"><span class=secno>5.2.5 </span>SpeechSynthesisCallback Parameters</a></li>
+      <li><a href="#speechsynthesisvoice"><span class=secno>5.2.6 </span>SpeechSynthesisVoice</a></li>
+      <li><a href="#speechsynthesisvoicelist"><span class=secno>5.2.7 </span>SpeechSynthesisVoiceList</a></li>
       <li><a href="#examples"><span class=secno>6 </span>Examples</a></li>
       <li class=no-num><a href="#acknowledgments">Acknowledgments</a></li>
       <li class=no-num><a href="#references">References</a></li>
@@ -631,7 +633,7 @@
       <dd>The grammars attribute stores the collection of SpeechGrammar objects which represent the grammars that are active for this recognition.</dd>
 
       <dt><dfn id="dfn-lang">lang</dfn> attribute</dt>
-      <dd>This attribute will set the language of the recognition for the request, using a valid <a href="http://www.ietf.org/rfc/bcp/bcp47.txt">BCP 47</a> language tag.
+      <dd>This attribute will set the language of the recognition for the request, using a valid BCP 47 language tag. <a href="#ref-bcp47">[BCP47]</a>
       If unset it remains unset for getting in script, but will default to use the <a href="http://www.w3.org/TR/html5/elements.html#the-lang-and-xml:lang-attributes">lang</a> of the html document root element and associated hierachy.
       This default value is computed and used when the input request opens a connection to the recognition service.</dd>
 
@@ -929,6 +931,7 @@
       static void <a href="#dfn-ttscancel">cancel</a>();
       static void <a href="#dfn-ttspause">pause</a>();
       static void <a href="#dfn-ttsresume">resume</a>();
+      static SpeechSynthesisVoiceList <a href="#dfn-ttsgetvoices">getVoices</a>();
     };
 
     [NoInterfaceObject]
@@ -950,7 +953,10 @@
     interface SpeechSynthesisUtterance {
       attribute DOMString <a href="#dfn-utterancetext">text</a>;
       attribute DOMString <a href="#dfn-utterancelang">lang</a>;
-      attribute DOMString <a href="#dfn-utteranceserviceuri">serviceURI</a>;
+      attribute DOMString <a href="#dfn-utterancevoiceuri">voiceURI</a>;
+      attribute float <a href="#dfn-utterancevolume">volume</a>;
+      attribute float <a href="#dfn-utterancerate">rate</a>;
+      attribute float <a href="#dfn-utterancepitch">pitch</a>;
 
       attribute Function <a href="#dfn-utteranceonstart">onstart</a>;
       attribute SpeechSynthesisCallback <a href="#dfn-utteranceonend">onend</a>;
@@ -958,6 +964,21 @@
       attribute Function <a href="#dfn-utteranceonresume">onresume</a>;
       attribute SpeechSynthesisUpdateCallback <a href="#dfn-utteranceonupdate">onupdate</a>;
     };
+
+    interface SpeechSynthesisVoice {
+      readonly attribute DOMString <a href="#dfn-voicevoiceuri">voiceURI</a>;
+      readonly attribute DOMString <a href="#dfn-voicename">name</a>;
+      readonly attribute DOMString <a href="#dfn-voicelang">lang</a>;
+      readonly attribute boolean <a href="#dfn-voicelocalservice">localService</a>;
+      readonly attribute boolean <a href="#dfn-voicedefault">default</a>;
+    };
+
+    interface SpeechSynthesisVoiceList {
+      readonly attribute unsigned long <a href="#dfn-voicelistlength">length</a>;
+      getter SpeechSynthesisVoice <a href="#dfn-voicelistitem">item</a>(in unsigned long index);
+    }
+
+
           </code>
         </pre>
       </div>
@@ -1004,28 +1025,52 @@
       <dd>This method puts the global SpeechSynthesis instance into the non-paused state.
       If an utterance was speaking, it continues speaking the utterance at the point at which it was paused, else it begins speaking the next utterance in the queue (if any).
       (If called when the SpeechSynthesis instance was already in the non-paused state, it does nothing.)</dd>
+
+      <dt><dfn id="dfn-ttsgetvoices">getVoices</dfn> method</dt>
+      <dd>This method returns the available voices.
+      It is user agent dependent which voices are available.</dd>
     </dl>
 
     <h4 id="utterance-attributes"><span class=secno>5.2.3 </span>SpeechSynthesisUtterance Attributes</h4>
 
     <dl>
       <dt><dfn id="dfn-utterancetext">text</dfn> attribute</dt>
-      <dd>The text to be synthesized and spoken for this utterance.
+      <dd>This attribute specifies the text to be synthesized and spoken for this utterance.
       This may be either plain text or a complete, well-formed SSML document. <a href="#ref-ssml">[SSML]</a>
       For speech synthesis engines that do not support SSML, or only support certain tags, the user agent or speech engine must strip away the tags they do not support and speak the text.
       There may be a maximum length of the text, it may be limited to 32,767 characters.</dd>
 
       <dt><dfn id="dfn-utterancelang">lang</dfn> attribute</dt>
-      <dd>This attribute will set the language of the speech synthesis for the request, using a valid <a href="http://www.ietf.org/rfc/bcp/bcp47.txt">BCP 47</a> language tag.
+      <dd>This attribute specifies the language of the speech synthesis for the utterance, using a valid BCP 47 language tag. <a href="#ref-bcp47">[BCP47]</a>
       If unset it remains unset for getting in script, but will default to use the <a href="http://www.w3.org/TR/html5/elements.html#the-lang-and-xml:lang-attributes">lang</a> of the html document root element and associated hierachy.
       This default value is computed and used when the input request opens a connection to the recognition service.</dd>
 
-      <dt><dfn id="dfn-utteranceserviceuri">serviceURI</dfn> attribute</dt>
-      <dd>The serviceURI attribute specifies the location of the speech synthesis service that the web application wishes to use.
+      <dt><dfn id="dfn-utterancevoiceuri">voiceURI</dfn> attribute</dt>
+      <dd>The voiceURI attribute specifies speech synthesis voice and the location of the speech synthesis service that the web application wishes to use.
       If this attribute is unset at the time of the play method call, then the user agent <em class="rfc2119" title="must">must</em> use the user agent default speech service.
-      Note that the serviceURI is a generic URI and can thus point to local services either through use of a URN with meaning to the User Agent or by specifying a URL that the User Agent recognizes as a local service.
+      Note that the voiceURI is a generic URI and can thus point to local services either through use of a URN with meaning to the User Agent or by specifying a URL that the User Agent recognizes as a local service.
       Additionally, the User Agent default can be local or remote and can incorporate end user choices via interfaces provided by the User Agent such as browser configuration parameters.
       </dd>
+
+      <dt><dfn id="dfn-utterancevolume">volume</dfn> attribute</dt>
+      <dd>This attribute specifies the speaking volume for the utterance.
+      It ranges between 0 and 1 inclusive, with 0 being the lowest volume and 1 the highest volume, with a default of 1.
+      If SSML is used, this value will be overridden by prosody tags in the markup.</dd>
+
+      <dt><dfn id="dfn-utterancerate">rate</dfn> attribute</dt>
+      <dd>This attribute specifies the speaking rate for the utterance.
+      It is relative to the default rate for this voice.
+      1 is the default rate supported by the speech synthesis engine or specific voice (which should correspond to a normal speaking rate).
+      2 is twice as fast, and 0.5 is half as fast.
+      Values below 0.1 or above 10 are strictly disallowed, but speech synthesis engines or specific voices may constrain the minimum and maximum rates further, for example, a particular voice may not actually speak faster than 3 times normal even if you specify a value larger than 3.
+      If SSML is used, this value will be overridden by prosody tags in the markup.</dd>
+
+      <dt><dfn id="dfn-utterancepitch">pitch</dfn> attribute</dt>
+      <dd>This attribute specifies the speaking pitch for the utterance.
+      It ranges between 0 and 2 inclusive, with 0 being the lowest pitch and 2 the highest pitch.
+      1 corresponds to the default pitch of the speech synthesis engine or specific voice.
+      Speech synthesis engines or voices may constrain the minimum and maximum rates further.
+      If SSML is used, this value will be overridden by prosody tags in the markup.</dd>
     </dl>
 
     <h4 id="utterance-events"><span class=secno>5.2.4 </span>SpeechSynthesisUtterance Events</h4>
@@ -1070,6 +1115,44 @@
       For events with updateType of "word", this value should be undefined.</dd>
     </dl>
 
+    <h4 id="speechsynthesisvoice"><span class=secno>5.2.6 </span>SpeechSynthesisVoice</h4>
+
+    <dl>
+      <dt><dfn id="dfn-voicevoiceuri">voiceURI</dfn> attribute</dt>
+      <dd>The voiceURI attribute specifies the speech synthesis voice and the location of the speech synthesis service for this voice.
+      Note that the voiceURI is a generic URI and can thus point to local or remote services, as described in the SpeechSynthesisUtterance <a href="#dfn-utterancevoiceuri">voiceURI</a> attribute.</dd>
+
+      <dt><dfn id="dfn-voicename">name</dfn> attribute</dt>
+      <dd>This attribute is a human-readable name that represents the voice.
+      There is no guarantee that all names returned are unique.</dd>
+
+      <dt><dfn id="dfn-voicelang">lang</dfn> attribute</dt>
+      <dd>This attribute is a BCP 47 language tag indicating the language of the voice. <a href="#ref-bcp47">[BCP47]</a></dd>
+
+      <dt><dfn id="dfn-voicelocalservice">localService</dfn> attribute</dt>
+      <dd>This attribute is true for voices supplied by a local speech synthesizer, and is false for voices supplied by a remote speech synthesizer service.
+      (This may be useful because remote services may imply additional latency, bandwidth or cost, whereas local voices may imply lower quality, however there is no guarantee that any of these implications are true.)</dd>
+
+      <dt><dfn id="dfn-voicedefault">default</dfn> attribute</dt>
+      <dd>This attribute is true for at most one voice per language.
+      There may be a different default for each language.
+      It is user agent dependent how default voices are determined.</dd>
+    </dl>
+
+    <h4 id="speechsynthesisvoicelist"><span class=secno>5.2.7 </span>SpeechSynthesisVoiceList</h4>
+
+    <p>The SpeechSynthesisVoiceList object holds a collection of SpeechSynthesisVoice objects. This structure has the following attributes.</p>
+
+    <dl>
+      <dt><dfn id="dfn-voicelistlength">length</dfn></dt>
+      <dd>The length attribute indicates how many results are represented in the item array.</dd>
+
+      <dt><dfn id="dfn-voicelistitem">item</dfn></dt>
+      <dd>The item getter returns a SpeechSynthesisVoice from the index into an array of result values.
+      If index is greater than or equal to length, this returns null.
+      The user agent <em class="rfc2119" title="must">must</em> ensure that the length attribute is set to the number of elements in the array.</dd>
+    </dl>
+
     <h2 id="examples"><span class=secno>6 </span>Examples</h2>
 
     <p><em>This section is non-normative.</em></p>
@@ -1168,6 +1251,11 @@
     <h2 class="no-num" id="references">References</h2>
 
     <dl>
+      <dt><a id="ref-bcp47">[BCP47]</a></dt>
+      <dd><a href="http://www.ietf.org/rfc/bcp/bcp47.txt"><cite>Tags for Identifying Languages</cite></a>, A. Phillips, et al. September 2009.
+      Internet BCP 47.
+      URL: <a href="http://www.ietf.org/rfc/bcp/bcp47.txt">http://www.ietf.org/rfc/bcp/bcp47.txt</a></dd>
+
       <dt><a id="ref-rfc2119">[RFC2119]</a></dt>
       <dd><a href="http://www.ietf.org/rfc/rfc2119.txt"><cite>Key words for use in RFCs to Indicate Requirement Levels</cite></a>, S. Bradner. March 1997.
       Internet RFC 2119.
@@ -1184,7 +1272,7 @@
       URL: <a href="http://dev.w3.org/2006/webapi/WebIDL">http://dev.w3.org/2006/webapi/WebIDL</a></dd>
 
       <dt><a id="ref-1">[1]</a></dt>
-      <dd><cite><a href="http://www.w3.org/2005/Incubator/htmlspeech/XGR-htmlspeech/">HTML Speech Incubator Group Final Report</a></cite>, World Wide Web Consortium, 6 December 2011.
+      <dd><cite><a href="http://www.w3.org/2005/Incubator/htmlspeech/XGR-htmlspeech/">HTML Speech Incubator Group Final Report</a></cite>, Michael Bodell, et al., Editors. World Wide Web Consortium, 6 December 2011.
       URL: <a href="http://www.w3.org/2005/Incubator/htmlspeech/XGR-htmlspeech/">http://www.w3.org/2005/Incubator/htmlspeech/XGR-htmlspeech/</a></dd>
 
       <dt><a id="ref-2">[2]</a></dt>