Skip to content

Commit fe616ff

Browse files
author
Steve Canny
committed
docs: document Font analysis
* Introduce a hierarchy for text analysis
1 parent 6186c6a commit fe616ff

File tree

9 files changed

+239
-79
lines changed

9 files changed

+239
-79
lines changed

docs/api/text.rst

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -21,3 +21,12 @@ Text-related objects
2121

2222
.. autoclass:: Run
2323
:members:
24+
25+
26+
|Font| objects
27+
--------------
28+
29+
.. currentmodule:: docx.text.run
30+
31+
.. autoclass:: Font
32+
:members:
File renamed without changes.
File renamed without changes.

docs/dev/analysis/features/bool-run-props.rst renamed to docs/dev/analysis/features/text/font.rst

Lines changed: 212 additions & 71 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,61 @@
11

2-
Boolean Run properties
3-
======================
2+
Font
3+
====
4+
5+
Word supports a rich variety of character formatting. Character formatting
6+
can be applied at various levels in the *style hierarchy*. At the lowest
7+
level, it can be applied directly to a run of text content. Above that, it
8+
can be applied to character, paragraph and table styles. It can also be
9+
applied to an abstract numbering definition. At the highest levels it can be
10+
applied via a theme or document defaults.
11+
12+
13+
Typeface name
14+
-------------
15+
16+
Word allows multiple typefaces to be specified for character content in
17+
a single run. This allows different Unicode character ranges such as ASCII
18+
and Arabic to be used in a single run, each being rendered in the typeface
19+
specified for that range.
20+
21+
Up to eight distinct typefaces may be specified for a font. Four are used to
22+
specify a typeface for a distinct code point range. These are:
23+
24+
* `w:ascii` - used for the first 128 Unicode code points
25+
* `w:cs` - used for complex script code points
26+
* `w:eastAsia` - used for East Asian code points
27+
* `w:hAnsi` - standing for *high ANSI*, but effectively the catch-all for any
28+
code points not specified by one of the other three.
29+
30+
The other four, `w:asciiTheme`, `w:csTheme`, `w:eastAsiaTheme`, and
31+
`w:hAnsiTheme` are used to indirectly specify a theme-defined font. This
32+
allows the typeface to be set centrally in the document. These four attributes
33+
have lower precedence than the first four, so for example the value of
34+
`w:asciiTheme` is ignored if a `w:ascii` attribute is also present.
35+
36+
The typeface name used for a run is specified in the `w:rPr/w:rFonts`
37+
element. There are 8 attributes that in combination specify the typeface to
38+
be used.
39+
40+
Protocol
41+
~~~~~~~~
42+
43+
Initially, only the base typeface name is supported by the API, using the
44+
:attr:`~.Font.name` property. Its value is the that of the `w:rFonts/@w:ascii`
45+
attribute or |None| if not present. Assignment to this property sets both the
46+
`w:ascii` and the `w:hAnsi` attribute to the assigned string or removes them
47+
both if |None| is assigned::
48+
49+
>>> font = document.styles['Normal'].font
50+
>>> font.name
51+
None
52+
>>> font.name = 'Arial'
53+
>>> font.name
54+
'Arial'
55+
56+
57+
Boolean run properties
58+
----------------------
459

560
Character formatting that is either on or off, such as bold, italic, and
661
small caps. Certain of these properties are *toggle properties* that may
@@ -96,14 +151,63 @@ The semantics of the three values are as follows:
96151
+-------+---------------------------------------------------------------+
97152

98153

154+
Toggle properties
155+
-----------------
156+
157+
Certain of the boolean run properties are *toggle properties*. A toggle
158+
property is one that behaves like a *toggle* at certain places in the style
159+
hierarchy. Toggle here means that setting the property on has the effect of
160+
reversing the prior setting rather than unconditionally setting the property
161+
on.
162+
163+
This behavior allows these properties to be overridden (turned off) in
164+
inheriting styles. For example, consider a character style `emphasized` that
165+
sets bold on. Another style, `strong` inherits from `emphasized`, but should
166+
display in italic rather than bold. Setting bold off has no effect because it
167+
is overridden by the bold in `strong` (I think). Because bold is a toggle
168+
property, setting bold on in `emphasized` causes its value to be toggled, to
169+
False, achieving the desired effect. See §17.7.3 for more details on toggle
170+
properties.
171+
172+
The following run properties are toggle properties:
173+
174+
+----------------+------------+-------------------------------------------+
175+
| element | spec | name |
176+
+================+============+===========================================+
177+
| `<b/>` | §17.3.2.1 | Bold |
178+
+----------------+------------+-------------------------------------------+
179+
| `<bCs/>` | §17.3.2.2 | Complex Script Bold |
180+
+----------------+------------+-------------------------------------------+
181+
| `<caps/>` | §17.3.2.5 | Display All Characters as Capital Letters |
182+
+----------------+------------+-------------------------------------------+
183+
| `<emboss/>` | §17.3.2.13 | Embossing |
184+
+----------------+------------+-------------------------------------------+
185+
| `<i/>` | §17.3.2.16 | Italics |
186+
+----------------+------------+-------------------------------------------+
187+
| `<iCs/>` | §17.3.2.17 | Complex Script Italics |
188+
+----------------+------------+-------------------------------------------+
189+
| `<imprint/>` | §17.3.2.18 | Imprinting |
190+
+----------------+------------+-------------------------------------------+
191+
| `<outline/>` | §17.3.2.23 | Display Character Outline |
192+
+----------------+------------+-------------------------------------------+
193+
| `<shadow/>` | §17.3.2.31 | Shadow |
194+
+----------------+------------+-------------------------------------------+
195+
| `<smallCaps/>` | §17.3.2.33 | Small Caps |
196+
+----------------+------------+-------------------------------------------+
197+
| `<strike/>` | §17.3.2.37 | Single Strikethrough |
198+
+----------------+------------+-------------------------------------------+
199+
| `<vanish/>` | §17.3.2.41 | Hidden Text |
200+
+----------------+------------+-------------------------------------------+
201+
202+
99203
Specimen XML
100204
------------
101205

102206
.. highlight:: xml
103207

104208
::
105209

106-
<w:r w:rsidRPr="00FA3070">
210+
<w:r>
107211
<w:rPr>
108212
<w:b/>
109213
<w:i/>
@@ -113,11 +217,11 @@ Specimen XML
113217
<w:szCs w:val="28"/>
114218
<w:u w:val="single"/>
115219
</w:rPr>
116-
<w:t>bold, italic, small caps, strike, size, and underline, applied in
117-
reverse order but not to paragraph mark</w:t>
220+
<w:t>bold, italic, small caps, strike, 14 pt, and underline</w:t>
118221
</w:r>
119222

120223

224+
121225
Schema excerpt
122226
--------------
123227

@@ -128,16 +232,6 @@ times each. Not sure what the semantics of that would be or why one would
128232
want to do it, but something to note. Word seems to place them in the order
129233
below when it writes the file.::
130234

131-
<xsd:complexType name="CT_R"> <!-- denormalized -->
132-
<xsd:sequence>
133-
<xsd:element name="rPr" type="CT_RPr" minOccurs="0"/>
134-
<xsd:group ref="EG_RunInnerContent" minOccurs="0" maxOccurs="unbounded"/>
135-
</xsd:sequence>
136-
<xsd:attribute name="rsidRPr" type="ST_LongHexNumber"/>
137-
<xsd:attribute name="rsidDel" type="ST_LongHexNumber"/>
138-
<xsd:attribute name="rsidR" type="ST_LongHexNumber"/>
139-
</xsd:complexType>
140-
141235
<xsd:complexType name="CT_RPr"> <!-- denormalized -->
142236
<xsd:sequence>
143237
<xsd:choice minOccurs="0" maxOccurs="unbounded"/>
@@ -185,10 +279,61 @@ below when it writes the file.::
185279
</xsd:sequence>
186280
</xsd:group>
187281

282+
<xsd:complexType name="CT_Fonts">
283+
<xsd:attribute name="hint" type="ST_Hint"/>
284+
<xsd:attribute name="ascii" type="s:ST_String"/>
285+
<xsd:attribute name="hAnsi" type="s:ST_String"/>
286+
<xsd:attribute name="eastAsia" type="s:ST_String"/>
287+
<xsd:attribute name="cs" type="s:ST_String"/>
288+
<xsd:attribute name="asciiTheme" type="ST_Theme"/>
289+
<xsd:attribute name="hAnsiTheme" type="ST_Theme"/>
290+
<xsd:attribute name="eastAsiaTheme" type="ST_Theme"/>
291+
<xsd:attribute name="cstheme" type="ST_Theme"/>
292+
</xsd:complexType>
293+
294+
<xsd:complexType name="CT_HpsMeasure">
295+
<xsd:attribute name="val" type="ST_HpsMeasure" use="required"/>
296+
</xsd:complexType>
297+
188298
<xsd:complexType name="CT_OnOff">
189299
<xsd:attribute name="val" type="s:ST_OnOff"/>
190300
</xsd:complexType>
191301

302+
<xsd:complexType name="CT_SignedHpsMeasure">
303+
<xsd:attribute name="val" type="ST_SignedHpsMeasure" use="required"/>
304+
</xsd:complexType>
305+
306+
<xsd:complexType name="CT_String">
307+
<xsd:attribute name="val" type="s:ST_String" use="required"/>
308+
</xsd:complexType>
309+
310+
<xsd:complexType name="CT_Underline">
311+
<xsd:attribute name="val" type="ST_Underline"/>
312+
<xsd:attribute name="color" type="ST_HexColor"/>
313+
<xsd:attribute name="themeColor" type="ST_ThemeColor"/>
314+
<xsd:attribute name="themeTint" type="ST_UcharHexNumber"/>
315+
<xsd:attribute name="themeShade" type="ST_UcharHexNumber"/>
316+
</xsd:complexType>
317+
318+
<xsd:complexType name="CT_VerticalAlignRun">
319+
<xsd:attribute name="val" type="s:ST_VerticalAlignRun" use="required"/>
320+
</xsd:complexType>
321+
322+
<!-- simple types -->
323+
324+
<xsd:simpleType name="ST_Hint">
325+
<xsd:restriction base="xsd:string">
326+
<xsd:enumeration value="default"/>
327+
<xsd:enumeration value="eastAsia"/>
328+
<xsd:enumeration value="cs"/>
329+
</xsd:restriction>
330+
</xsd:simpleType>
331+
332+
<xsd:simpleType name="ST_HpsMeasure">
333+
<xsd:union memberTypes="s:ST_UnsignedDecimalNumber
334+
s:ST_PositiveUniversalMeasure"/>
335+
</xsd:simpleType>
336+
192337
<xsd:simpleType name="ST_OnOff">
193338
<xsd:union memberTypes="xsd:boolean ST_OnOff1"/>
194339
</xsd:simpleType>
@@ -200,64 +345,60 @@ below when it writes the file.::
200345
</xsd:restriction>
201346
</xsd:simpleType>
202347

348+
<xsd:simpleType name="ST_PositiveUniversalMeasure">
349+
<xsd:restriction base="ST_UniversalMeasure">
350+
<xsd:pattern value="[0-9]+(\.[0-9]+)?(mm|cm|in|pt|pc|pi)"/>
351+
</xsd:restriction>
352+
</xsd:simpleType>
203353

204-
Toggle properties
205-
-----------------
206-
207-
Certain of the boolean run properties are *toggle properties*. A toggle
208-
property is one that behaves like a *toggle* at certain places in the style
209-
hierarchy. Toggle here means that setting the property on has the effect of
210-
reversing the prior setting rather than unconditionally setting the property
211-
on.
212-
213-
This behavior allows these properties to be overridden (turned off) in
214-
inheriting styles. For example, consider a character style `emphasized` that
215-
sets bold on. Another style, `strong` inherits from `emphasized`, but should
216-
display in italic rather than bold. Setting bold off has no effect because it
217-
is overridden by the bold in `strong` (I think). Because bold is a toggle
218-
property, setting bold on in `emphasized` causes its value to be toggled, to
219-
False, achieving the desired effect. See §17.7.3 for more details on toggle
220-
properties.
221-
222-
The following run properties are toggle properties:
223-
224-
+----------------+------------+-------------------------------------------+
225-
| element | spec | name |
226-
+================+============+===========================================+
227-
| `<b/>` | §17.3.2.1 | Bold |
228-
+----------------+------------+-------------------------------------------+
229-
| `<bCs/>` | §17.3.2.2 | Complex Script Bold |
230-
+----------------+------------+-------------------------------------------+
231-
| `<caps/>` | §17.3.2.5 | Display All Characters as Capital Letters |
232-
+----------------+------------+-------------------------------------------+
233-
| `<emboss/>` | §17.3.2.13 | Embossing |
234-
+----------------+------------+-------------------------------------------+
235-
| `<i/>` | §17.3.2.16 | Italics |
236-
+----------------+------------+-------------------------------------------+
237-
| `<iCs/>` | §17.3.2.17 | Complex Script Italics |
238-
+----------------+------------+-------------------------------------------+
239-
| `<imprint/>` | §17.3.2.18 | Imprinting |
240-
+----------------+------------+-------------------------------------------+
241-
| `<outline/>` | §17.3.2.23 | Display Character Outline |
242-
+----------------+------------+-------------------------------------------+
243-
| `<shadow/>` | §17.3.2.31 | Shadow |
244-
+----------------+------------+-------------------------------------------+
245-
| `<smallCaps/>` | §17.3.2.33 | Small Caps |
246-
+----------------+------------+-------------------------------------------+
247-
| `<strike/>` | §17.3.2.37 | Single Strikethrough |
248-
+----------------+------------+-------------------------------------------+
249-
| `<vanish/>` | §17.3.2.41 | Hidden Text |
250-
+----------------+------------+-------------------------------------------+
251-
354+
<xsd:simpleType name="ST_SignedHpsMeasure">
355+
<xsd:union memberTypes="xsd:integer s:ST_UniversalMeasure"/>
356+
</xsd:simpleType>
252357

253-
Resources
254-
---------
358+
<xsd:simpleType name="ST_Theme">
359+
<xsd:restriction base="xsd:string">
360+
<xsd:enumeration value="majorEastAsia"/>
361+
<xsd:enumeration value="majorBidi"/>
362+
<xsd:enumeration value="majorAscii"/>
363+
<xsd:enumeration value="majorHAnsi"/>
364+
<xsd:enumeration value="minorEastAsia"/>
365+
<xsd:enumeration value="minorBidi"/>
366+
<xsd:enumeration value="minorAscii"/>
367+
<xsd:enumeration value="minorHAnsi"/>
368+
</xsd:restriction>
369+
</xsd:simpleType>
255370

256-
* `WdBreakType Enumeration on MSDN`_
257-
* `Range.InsertBreak Method (Word) on MSDN`_
371+
<xsd:simpleType name="ST_Underline">
372+
<xsd:restriction base="xsd:string">
373+
<xsd:enumeration value="single"/>
374+
<xsd:enumeration value="words"/>
375+
<xsd:enumeration value="double"/>
376+
<xsd:enumeration value="thick"/>
377+
<xsd:enumeration value="dotted"/>
378+
<xsd:enumeration value="dottedHeavy"/>
379+
<xsd:enumeration value="dash"/>
380+
<xsd:enumeration value="dashedHeavy"/>
381+
<xsd:enumeration value="dashLong"/>
382+
<xsd:enumeration value="dashLongHeavy"/>
383+
<xsd:enumeration value="dotDash"/>
384+
<xsd:enumeration value="dashDotHeavy"/>
385+
<xsd:enumeration value="dotDotDash"/>
386+
<xsd:enumeration value="dashDotDotHeavy"/>
387+
<xsd:enumeration value="wave"/>
388+
<xsd:enumeration value="wavyHeavy"/>
389+
<xsd:enumeration value="wavyDouble"/>
390+
<xsd:enumeration value="none"/>
391+
</xsd:restriction>
392+
</xsd:simpleType>
258393

259-
.. _WdBreakType Enumeration on MSDN:
260-
http://msdn.microsoft.com/en-us/library/office/ff195905.aspx
394+
<xsd:simpleType name="ST_UnsignedDecimalNumber">
395+
<xsd:restriction base="xsd:unsignedLong"/>
396+
</xsd:simpleType>
261397

262-
.. _Range.InsertBreak Method (Word) on MSDN:
263-
http://msdn.microsoft.com/en-us/library/office/ff835132.aspx
398+
<xsd:simpleType name="ST_VerticalAlignRun">
399+
<xsd:restriction base="xsd:string">
400+
<xsd:enumeration value="baseline"/>
401+
<xsd:enumeration value="superscript"/>
402+
<xsd:enumeration value="subscript"/>
403+
</xsd:restriction>
404+
</xsd:simpleType>
Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
2+
Text
3+
====
4+
5+
.. toctree::
6+
:titlesonly:
7+
8+
font
9+
underline
10+
par-alignment
11+
run-content
12+
breaks
13+
char-style
14+
15+
Word supports the definition of *styles* to allow a group of formatting
File renamed without changes.
File renamed without changes.

docs/dev/analysis/features/underline.rst renamed to docs/dev/analysis/features/text/underline.rst

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11

2-
Run underline
3-
=============
2+
Underline
3+
=========
44

55
Text in a Word document can be underlined in a variety of styles.
66

0 commit comments

Comments
 (0)