XML Tag API¶
The Imprint engine comes with a complete set
of processors for the tags specified in the XML Template Specification. However, additional
tags may be necessary for highly customized applications, so an API exists for
defining and registering new tags. The API is defined in the
imprint.core.tags
module. Example usage can be found in the
Writing Custom Tags tutorial.
Contents
Tag Descriptors¶
The tag API revolves around the TagDescriptor
class. The class can
be extended directly, or instantiated through a delegate object that fulfills
the necessary duck-type API. Objects contain a set of attributes and two
callbacks that define how to handle XML tags of a given type. All the elements
are optional and have sensible default values.
Any registered object will be viewed through TagDescriptor.wrap
, so
it is not necessary to extend or instantiate TagDescriptor
to
create a working tag descriptor.
Errors¶
Tag descriptors may raise any type of error they deem necessary in their
start
and end
methods. Most
classes of errors will be logged and cause the application to abort. However,
two special classes of errors will not cause a fatal crash:
KnownError
is used to flag known conditions that can be handled gracefully by the tag.OSError
. Specifically, theFileNotFoundError
andPermissionError
subclasses are deemed to be “known errors”. If they represent a fatal condition, they should be wrapped in another exception type.
Any plugins with a dynamic Data Configuration will generally receive an alt-text placeholder where the content would normally go instead of completely aborting.
-
exception
imprint.core.
KnownError
¶ A custom exception class that is used by the engine to indicate that a tag or plugin handler exited for a known reason.
In cases where this exception is logged, the message is printed without a stack trace.
Configuration¶
Tags have two types of configuration available to them. Static configuration for a given XML Template is provided through the tag attributes in the XML file. Dynamic configuration through the IDC File can be enabled to provide per-document fine-tuning.
XML Attributes¶
XML attributes are supplied to the start
and
end
methods of a TagDescriptor
as the
second argument. The inputs are presented to both methods as a vanilla
dict
. The dictionary are meant to be treated as read-only, but this
is not a requirement, meaning that technically start
can modify what end
sees. The dictionary is filtered
to exclude any attributes that are not listed in the
required
and optional
elements of the TagDescriptor
.
Data Configuration¶
For some types of content, static configuration is not enough. To allow
per-document configurations, a TagDescriptor
must define a
non-None
data_config
attribute. This
attribute gives the name of the dictionary to extract from the
IDC File.
start
and end
methods of a
TagDescriptor
with the data_config
attribute set will receive an additional input argument containing the
Data Configuration loaded from the IDC File.
The data configuration can override some of the static XML Attributes of a tag. For built-in tags, the XML Template Specification notes which attributes can be overriden. Built-in tags that support dynamic configuration are <figure>, <table> and <string>.
All built-in tags that support dynamic configuration also support a type of plugin, but this is not a requirement for custom tags.
References¶
A TagDescriptor
is referenceable if it has a
non-None
reference
. A reference made to a
tag will be substituted by the appropriate reference text. By default reference
tags have the target tag name with “-ref” appended:
<figure-ref> references <figure>,
<table-ref> references <table>. A notable
exception is <segment-ref>, which references paragraphs
(<par> tags), but only ones that have a heading style.
References are usually identified by a required id
attribute. Segments can
also be identified by the title of the segment, which is the aggressively
trimmed collection of all the text in the text in the paragraph. For example,
the title of the following XML snippet would be 'Example Heading'
:
<par style="Heading 3">
<run style="Default Paragraph Font">
Example
Heading
</run>
</par>
<segment-ref> tags can therefore identify their target with
either a id
or title
attribute. User-defined tags can implement their
own customized rules for identiying targets.
Roles¶
For the purpose of creating references, any tag may impersonate, or play the role of, any other tag using a special role attribute. This attribute is implicitly optional for every tag. It is interpreted directly by the parsers in the Engine Layer to determine the type of reference that a tag will represent.
For example, a <table> tag (or any other tag for that
matter), which has role="figure"
must be referenced by a
<figure-ref> tag, not a <table-ref> tag,
in the XML Template. That table will be a figure for the purposes
of the document in question.
Any arbitrary tag can be referenced the same way with the appropriate role. Usually, such a referenceable tag will be styled appropriately, and will have the headings, captions, etc. appropriate for its role rather than its nominal tag.
A specific case is arbitrary tags that have a <par> role.
Such tags are automatically referenceable by <segment-ref>.
Their entire contents will be treated as the title of the heading, so the
par
role must be used carefully.
Registering New Tags¶
Once a TagDescriptor
or a delegate object has been constructed,
there are two main ways to get Imprint to use the descriptor for actual tag
processing.
Via Configuration¶
In the normal course of things, Imprint will not automatically import unspecified user-defined modules. To let it know where to find tag extensions, add them by name or by reference to the IPC File to the mapping in the tags keyword. This will automatically import all the necessary modules, and register the custom descriptor under the requested tag name.
Programatically¶
Under the hood, tags are registered with the Imprint core simply by adding them
to tag_registry
:
tag_registry[name] = descriptor
The registry is a special mapping that ensures that name
is a string not
representing an existing tag. While it is not possible to remove or overwrite
existing tags, the same descriptor can be registered under multiple names.
This method is useful mostly to users wishing to write a custom driver program for the engine. Under normal circumstances, the configuration solution will be more suitable.
Engine State¶
Both callbacks of a TagDescriptor
accept an
EngineState
object as their first argument,
which supports stateful tag processing. The engine state provides a mutable
container for arbitrary attributes. Each TagDescriptor
can add,
remove and modify attributes of the state object to communicate with itself,
the engine, and other tags.
As a rule, objects should prefer to delete state attributes rather than setting
them to None
. This meshes well with the fact that
EngineState
provides a containment check. For
example, to check if the parser is in the middle of a run of text, descriptors
should check
if 'run' in state: ...
The built-in tags and the engine use a set of attributes and methods to operate
properly. Modifying these predefined attributes in a way other than explicitly
documented will almost inevitably lead to unexpected behavior. Properties are
used instead of simple attributes in a few cases to provide sanity checks for
the supported modifications. Custom tags can add, remove and modify any
additional attributes they choose. The full list of built-in attributes is
available in the EngineState
documentation.
The API¶
The imprint.core
package contains the Imprint
Engine Layer. The tags
and
state
modules implement most of the
functionality useful to end-users through the public XML Tag API. The
parsers
and utilities
contain the Internal API.
The imprint.core.tags
module implments the base
XML Tag API, as well as the all the predefined
Built-in Tag Descriptors and Reference Descriptors.
The following members are used to construct and register new tags:
A limited mapping type that contains all the currently registered tag descriptors.
Registering a new descriptor is as easy as doing:
tag_registry[name] = descriptor
The registry is a restricted mapping type that supports adding new elements only if they are not already registered. Existing elements can not be deleted. Deletion operations will raise a
TypeError
, while overwriting existing keys will raise aKeyError
. Aside from that, all operations supported bydict
are allowed (including things likeupdate
).Any tag that is referenceable by design (has a valid
reference
attribute) will have theReferenceDescriptor
’s registration hook invoked after the tag-proper is registered.The built-in tags are registered when the current module is imported.
The basis of the tag API.
Instances of this class contain the information required to process a custom tag. They must contain all of the attributes listed below, with the expected types. The elements in
tag_registry
may be delegate objects that supply only part of the attibute set. In that case, they are wrapped in a proxy as needed at runtime, never up-front. The reason for this is twofold:- There may be stateful objects registered for multiple tags, and wrapping in a proxy will not allow the tags to share state. This would not be a problem, except it would be unexpected behavior.
- Some of the attributes may be dynamic properties (or other descriptors). Fixing the value once would completely defeat such behavior.
Creating an occasional wrapper around a delegate is not expected to be particularly expensive, even if it had to be done for every tag encountered in the XML file. On the other hand, it allows for some very flexible behaviors. At the same time, very few instances of wrapping should occur, since most tags will be implemented by extending this class and implementing it properly. The
wrap
method ensures that all extensions are passed through as-is.All the Built-in Tag Descriptors are instances of children of this class.
A tri-state
bool
flag indicating whether the tag is allowed/expected to have textual content or not. The values are interpreted as follows:- None
- The tag may not have any content. It must be of the form
<tag/>
or<tag><otherTag>...</otherTag></tag>
. Anything else will raise a fatal error. Iftags
is set toFalse
, only the former form is allowed. - False
- The tag should not have content, but content will not raise an error. A warning will be raised instead.
- True
- The tag is expected to have content, but the content may be empty.
Any value is allowed in a delegate. If defined, the value will be converted to
bool
if it is notNone
. Defaults toNone
if not defined.
A
bool
indicating whether or not nested tags are allowed within this one.Any value is allowed in a delegate. If defined, the value will be converted to
bool
. Defaults toTrue
if not defined.
A
tuple
of strings containing the name of required tag attributes. A tag encountered without all of these attributes will raise an error.In a delegate, this may be a single string, an iterable of strings,
None
or simply omitted. Every element of an iterable must be a string, or aTypeError
is raised immediately during construction. Defaults to an emptytuple
if not defined.
A dictionary mapping the names of optional attributes to their default values. Optional attributes are ones that are expected to be present in processing, but have sensible defaults that can be used, meaning that they do not have to be specified explicitly in the XML Template.
In a delegate, this may be any mapping type, an iterable of strings, a single string,
None
or simply omitted. In the case of an iterable or individual string, all the defaults will beNone
. Iterables and mapping keys must be strings, or aTypeError
will be raised during contruction. Defaults to an emptydict
if not defined.
The name of the attribute containing the data configuration name for the tag. This should only be provided for tags that require Data Configuration. If provided, this tag will automatically be added to the
required
sequence.In a delegate, this object must be an instance of
str
orNone
. Defaults toNone
if not defined.
A
ReferenceDescriptor
that is only present if this type of tag can be the target of a reference.Examples of referrable built-in tags are <figure>, <table> and sometimes <par>. Referrable tags can have an optional
role
attribute that changes the type of reference they represent. See the Roles description for more information.In a delegate, this object must be an instance of
ReferenceDescriptor
orNone
. Defaults toNone
if not defined.
After completion, this instance has all of the required attributes defined in the delegate, wrapped in the required types.
A reference to the delegate object is not retained. This method can be invoked multiple times. It updates the current descriptor with the attributes of the delegate, leaving undefined attributes in the delegate untouched.
Create an empty instance, with all required attributes set to default values.
This method is provided to allow bypassing the default
__init__
in child classes. All arguments are ignored.
Each descriptor should provide a method with this signature to process closing tags.
If implemented, this method must accept the Engine State, a tag name and a
dict
of attributes. Normally, the tag name is ignored since a separate descriptor is registered for each tag. The attributes are the same as those passed tostart
, barring any modifications made instart
.Descriptors that have a non-
None
data_config
attribute set will receive an additional argument containing the Data Configuration.The default implementation just logs itself.
Each descriptor should provide a method with this signature to process opening tags.
If implemented, this method must accept the Engine State, a tag name and a
dict
of attributes. Normally, the tag name is ignored since a separate descriptor is registered for each tag.Descriptors that have a non-
None
data_config
attribute set will receive an additional argument containing the Data Configuration.The default implementation just logs itself.
Construct a proxy from the descriptor if it isn’t already one.
This method is provided so that when
TagDescriptor
objects are implemented properly up front, they do not need to be wrapped in an additional layer.If the input is a delegate, the return value will always be of the type that this method was invoked on. However, the type check will always be done agains the base
TagDescriptor
class.
Bases:
imprint.core.tags.TagDescriptor
The base class of all the built-in
TagDescriptor
implementations.Custom tag implementations are welcome to use this class as a base instead of a raw
TagDescriptor
.Updates the required fields with the keywords that are passed in.
If no delegate object (or
None
) is supplied, bypass the default constructor (seeTagDescriptor.__new__
). kwargs will override any defaults and attributes set by a delegate.
Built-in Tag Descriptors¶
The existing tag descriptors implement the XML Template Specification:
Bases:
imprint.core.tags.BuiltinTag
Implements the <break> tag.
Insert a page break into the document.
Bases:
imprint.core.tags.BuiltinTag
Implements the <expr> tag.
Warning
This descriptor uses
eval
to execute arbitrary code and assign it to a new keyword. Use with extreme caution!Evaluate the expression found inside the tag, and add a new entry to the
state
’skeywords
.The
content_stack
will be popped.All errors in importing and evaluation will be propagated up and will terminate the parser.
Begin a new expression.
This just pushes a new
content_stack
entry in the state. All content until the closing tag will be evaluated as a set of Python statements.
Bases:
imprint.core.tags.BuiltinTag
Implements the <figure> tag.
Generate and insert a figure based on the selected handler.
Figures can appear in a run, a paragraph, or on their own.
Just log the tag.
Bases:
imprint.core.tags.BuiltinTag
Implements the <kwd> tag.
Find the value of the keyword in the state’s
keywords
and place it into the currentcontent
.If the keyword is not found, a
KeyError
will be raised. If the tag has aformat
attribute, it is interpreted as aformat_spec
, and used to convert the value. If the attribute is not present, the value is converted with a simple call tostr
.
Bases:
imprint.core.tags.BuiltinTag
Implements the <latex> tag.
Convert the equation in the text of the current tag into an image using
haggis.latex_util.render_latex
, and insert the image into the parent tag.The parent can be a run or a paragraph. If the requested run style does not match the current run, the current run will be interrupted by a run containing a new picture with the requested style, and resumed afterwards. If there is no run to begin with, a new run will be created, but not stored in the
run
attribute of the state.Formulas are rendered at 96dpi in JPEG format by default.
Begin a new LaTeX formula.
Just push a new
content_stack
entry into state. All content until the closing tag is evaluated as a LaTeX document.
Bases:
imprint.core.tags.BuiltinTag
Implements the <n> tag.
Add a line break to the current run.
If not inside a run, append the break to the last run. Make a new run only at the start of a paragraph. Ignore with a warning outside of a paragraph.
Bases:
imprint.core.tags.BuiltinTag
Implements the <par> tag.
Validate the
list
attribute that is found.Log an error if the attribute is invalid, but do not terminate processing. The attribute is simply ignored if the list is neither numbered, bulleted nor continued.
Return the type normalized to a
ListType
, orNone
if not a list item. If the type is valid, andlist-level
is set, it is converted to an integer.
Compute the paragraph style based on whether an explicit style is set in the attributes, and whether or not the paragraph is a list.
- If an explicit style is requested, return it. Otherwise:
- If the paragraph is not a list, return the default paragraph style. Otherwise:
- If the previous paragraph is a list item in the same list
(i.e., the current
list-level
attribute is non-zero), return the style of the previous paragraph. Otherwise: - Return the default list item style.
Parameters: - state (EngineState) – The state is used to check for the previous item’s style in case #3.
- attr (dict) – The tag attributes, used to check for an explicitly set
style
as well as for a style reset withlist-level = 0
. - list_type (ListType or None) – The type of the list, if a list at all, as returned by
check_list
.
Terminate the current paragraph.
See
end_paragraph
inEngineState
.
Terminate any existing paragraph, flush all text and start a new paragraph.
If the new paragraph is a list item, add the necessary metadata to it.
Issue a warning if an existing paragraph is found.
Bases:
imprint.core.tags.BuiltinTag
Implements the <run> tag.
Place any remaining text into the current run, and remove
run
attribute ofstate
.
Create a new run, ensuring that there is a paragraph to go with it.
Creating a run outside a paragraph raises a warning and creates a paragraph with a default style. See
imprint.core.state.EngineState.new_run
.
Bases:
imprint.core.tags.BuiltinTag
Implements the <section> tag.
Begin a new section in the document, optionally altering the page orientation.
Bases:
imprint.core.tags.BuiltinTag
Implements the <skip> tag.
Bases:
imprint.core.tags.BuiltinTag
Implements the <string> tag.
Generate a string based on the appropriate handler.
If the
log_images
key is set to a truthy value instate
.keywords
, the content will also be dumped to a file.
Just log the tag.
Bases:
imprint.core.tags.BuiltinTag
Implements the <table> tag.
Generate and inserts a table based on the selected handler.
The handler creates the table directly in the document (unlike for figures, where only the final product is inserted). Any error that occurs mid-processing leaves a stub table in the document in addition to the automatically-inserted alt-text.
Tables appear on their own, outside any paragraph or run, so if a table is nested in a run or paragraph, a warning will be issued. Any interrupted run or paragraph resumes after the table with their prior styles.
Just log the tag.
Bases:
imprint.core.tags.BuiltinTag
Implements the <toc> tag.
Terminate and insert the TOC.
Gather any text that has been acquired into the heading, which will be a separate pargraph preceding the TOC.
If the TOC interrupted an existing paragraph, a new paragraph will be resumed with the same style as the original. If a run style is present as well, a run will be recreated too.
Create a new TOC.
Log a warning if the tag appears within a paragraph. Truncate the paragraph, and resum with the prior style. The same happens to the current run, if there is one.
Bases:
imprint.core.tags.BuiltinTag
Implements the <figure-ref> and <table-ref> tags.
This processor is not registered explicitly. It gets added by all of the target tags that use it as part of their registration process. Registering this processor under a name that does not end in
'-ref'
will lead to a runtime error inresolve
.Insert a string with the specified reference into the current
content
.
Returns a quasi-singleton instance of the current class.
This instance is not exposed directly, but it is registered by the built-in referencable tags.
Overridable operation for fetching and logging the reference that is to be inserted.
The default is to look up the reference by
'id'
in theimprint.core.state.EngineState
’s.references
.Used by the default implementation of
end
.
Bases:
imprint.core.tags.ReferenceProcessor
Implements the <segment-ref> tag.
This is a special case of
ReferenceProcessor
that allows access by bothtitle
andid
. It’s references always resolve to a <par> tag, or a tag playing that role.Resolve a segment reference be either text or ID.
Either the
id
ortitle
tag attribute must be present. If both are present, they must resolve to the same heading in the document or an error is raised.
Reference Descriptors¶
Defines the process for creating References and using them through the appropriate tag.
References are made by processing the XML Template and mapping out any referenceable tags using the
start
andend
methods. In the default implementation, the reference text is created by themake_reference
method, invoked fromend
.start
andend
return a boolean value to allow custom tags to be processed selectively. A return value ofFalse
from either method means that that the specific instance of the tag being processed is not a valid reference target. Normally both methods always returnTrue
, but for the builtin <par> tag, for example, an exception must be made.References are placed into the document by a special
TagDescriptor
, which is generally registered along with the parent tag that contains aReferenceDescriptor
using theregister
method.Current references are purely textual, rather having a dynamic field assigned to them. This is still a work in progress.
The prefix that normally gets prepended to the reference text. Used by
make_reference
to construct the output string. Extensions are welcome to ignore this attribute.
A string or iterable of strings that lists the attributes that are used to identify target for this reference type. The attribute may be either required or optional for the target tag, but it must be recognized either way. This attribute is used to check for attributes on tags with a non-default role. Defaults to
'id'
.
Process the closing tag for a referencable tag.
The default is to add the reference to the appropriate map in
references
by ID, based on therole
, and log the operation. The attributeid
is required.The actual reference is created by
make_reference
.Returns
True
if the tag is definitely a reference target,False
if not.
-
identifiers
Ensure that
identifiers
is read-only.
Returns a string refering to the specified tag in the specified role.
Keep in mind that the
ReferenceDescriptor
is selected based on the role, not necessarily the tag name. Therefore, therole
argument should always be the “computed” role: the name of the tag should be overriden by the value of the attribute, if it was specified.
A registration hook that is invoked when the parent
TagDescriptor
is registered.The default implementation registers an additional
TagDescriptor
under the namename + '-ref'
, which replaces the<name-ref/>
tag with the formatted reference. SeeReferenceProcessor
.Parameters: - registry – The tag registry that the parent
TagDescriptor
is being inserted into. Seetag_registry
for details on the interface. - name (str) – The name under which the parent tag is being registered.
- descriptor – The parent object being registered, not necessarily a
TagDescriptor
. TheTagDescriptor.wrap
method can be used to retreive the correspondingTagDescriptor
if necessary.
- registry – The tag registry that the parent
Check that the reference identified by
key
does not already exist and set it.Duplicate reference targets cause an error, unless
duplicates
isTrue
, in which case a warning is logged and the new value is discarded.
Bases:
imprint.core.tags.ReferenceDescriptor
Extension of
ReferenceDescriptor
to accumulate heading text and allow references through thetitle
attribute.Used by <par> tags to create heading references.
A class-level regular expression for identifying the <par> tags that represent referenceable headings.
Create a dual reference based on the title and optional ID in addition to the default logging.
Ensure that
identifiers
is read-only.
Add the section heading to the usual reference text.
Register a
SegmentRefProcessor
for the <segment-ref> tag.This registration hook uses a fixed name, so can only be called once.
Check that the reference identified by
key
does not already exist and set it.Duplicate reference targets cause an error, unless
duplicates
isTrue
, in which case a warning is logged and the new value is discarded.
Start accumulating content in addition to the default logging.
If an actual <par> tag is encountered (as opposed to a tag playing that role), and the heading matches
Heading \d+
, the current heading is incremented in the state.If any heading tag, or any tag with
role="par"
is encountered, a new reference will be created. Non-heading paragraphs with no explicit role are non-referenceable. A non-heading paragraph can be made referenceable by explicitly setting the role.Keep in mind that the title for a segment reference is accumulated from all the text in the paragraph. Use carefully with non-default tags.
Utility Functions¶
Resolve the value of
key
with respect toattr
, but with the option to override by the data configuration dictionary.If the final value is sentinel, return default instead. Return default if key is missing entirely as well. Both attr and data must be mapping types that support a get method.
Convert a string, number or pre-constructed size to a
docx.shared.Length
object, usingget_key
for value resolution.Common options for
key
are'width'
and'height'
.Valid units suffixes are
"
,in
,cm
,mm
,pt
,emu
,twip
. Default when no units are specified is inches ("
).
Retrieve and load the handler for the specified attribute mapping and data configuration.
If the handler can not be found, a detailed exception is logged and a
KnownError
is raised.
Load and run the handler for the specified attribute mapping and data configuration.
If the handler can not be found, a detailed exception is logged, as with
get_handler
.All exceptions that occur during execution are converted into
KnownError
.
Compute the required styles based on attr and data configurations.
Style keys are taken from the keys of defaults, while values provide the fallback names used if the keys do not appear in either attr or data. Similarly named keys in data will override ones in
attr
.
Create a dictionary with keys
width
andheight
and values that are instances ofdocx.shared.Length
.Values are resolved according to the rules of
get_key
, withwidth_key
andheight_key
as the inputs. String values may contain units, and will be parsed according toget_size
.If neither key is present in either configuration (or present but set to
None
), set the the width to default_width. If that isNone
as well, return an empty dictionary.
Parser State Objects¶
The imprint.core.state
module supplies the state objects that
enable communication within the Engine Layer
between the engine itself and the tags. The state is therefore crucial
to the XML Tag API without being completely a part of it.
-
class
imprint.core.state.
EngineState
(doc, keywords, references, log)¶ A simple container type used by the main parser to communicate document state to the tag descriptors.
Most of the state is dedicated to monitoring the status of the text acquisition from the XML. The engine and built-in tags rely on a set of attributes to function. A description of acceptable use of these attributes is provided here. Any other use may lead to unexpected behavior. Custom tags may define and use any attributes that are not explicitly documented as they choose.
This class allows for a containment check using
in
in preferece tohasattr
.-
doc
¶ -
The document that is being built. Set once by the engine.
Implemented as a read-only property.
-
keywords
¶ -
The keywords configured for this document by the IPC File. Normally, this dictionary should be treated as read-only, but
ExprTag
can add new entries.As a rule, keywords with lowercase names are system configuration options, while keywords that start with upper case letters affect document content.
Implemented as a read-only property.
-
references
¶ -
A multi-level mapping type that allows references to be fetched by role and attribute. Access to this map is performed by providing a tuple
(role, attribute, key)
. For example:state.references['figure', 'id', 'my_figure']
The map’s values may be of any type, as long as they can be converted to the desired content using
str
.The mapping is made immutable as soon as it becomes part of the state. The read-only lock is irreversible.
Implemented as a read-only property.
-
paragraph
¶ -
A paragraph represents a collection of runs and other objects that make up a logical segment in a document. This attribute exists only when parsing a <par> tag. Usually set and unset by
ParTag
, but can be temporarily switched off and reinstated in response to other tags as well.end_paragraph
deletes this attribute.
-
run
¶ -
A run is a collection of characters with similar formatting within a paragraph. This attribute exists only when parsing a <run> tag. Usually set and unset by
RunTag
.end_paragraph
deletes this attribute.
-
content
¶ -
A mutable buffer used by the engine to accumulate text from the XML Template.
Since whitespace needs to be trimmed rather aggressively from an XML file, this object gets an extra (non-standard) attribute:
-
content.
leading_space
¶ Indicates whether or not to prepend a space when concatenating this buffer with others. In general, the text of the first run in a paragraph is the only one that does not have this attribute set to
True
. This flag is set on the buffer rather than the state object itself so that buffers can be pushed and popped into thecontent_stack
to handle nested tags.
This attribute should be manipulated mostly through the
new_content
,get_content
andflush_run
methods.This attribute must always be present, regardless of the position within the document.
Implemented as a read-write property that can not be deleted or set to
None
. -
-
content_stack
¶ collections.deque
[io.StringIO
]A stack for nested content buffers. Each buffer represents a tag containing independent content. Some tags append to the parent’s buffer, some close the current buffer to start a new one and others, such as <figure>, use a temporary buffer for their content.
The stack allows for a theoretically indefinite level of nesting of text elements. In reality, it will only contain one or two elements: the current run text and the contents of interpersed tags like <figure>.
This attribute should be maniplated through the
push_content_stack
andpop_content_stack
methods.This attribute may be empty, but never missing. Implemented as a read-only property.
-
last_list_item
¶ -
List items in Word are just paragraphs with a particular style and numbering scheme. All of this information can be gathered from the previous paragraph that was assigned a concrete list numbering instance.
This attribute should never be missing. It should only be
None
to indicate that no prior numbered paragraph has occured in the document yet. To this end, it is implemented as a read-only property.
-
latex_count
¶ -
A counter for the number of <latex> tags encountered so far. Used to generate the file name for the equations if Image Logging is enabled. Missing otherwise.
-
__contains__
(name)¶ Checks if the specified name represents an attribute.
-
check_content_tail
()¶ Include any remaining text in
content
into the last run of the last paragraph.This ensures that paragraphs get truncated properly, and that spurious text between paragraphs is cleaned up.
A warning is issued if any non-whitepace text is found.
-
end_paragraph
(tag=None)¶ Terminate the current paragraph.
Any existing run is immediately terminated. Spurious text is appended to the last available run. Both
paragraph
andrun
attributes are deleted by this method.If there is no paragraph to terminate, this method is equivalent to calling
check_content_tail
.Parameters: tag (str or None) – The name of a tag that interrupts the paragraph. If present, a warning will be issued. If omitted, no warning will be issued.
-
flush_run
(renew=True, default='')¶ Flush the text buffer accumulating the current run into the document.
Text flushing aggressively removes whitespace from around individual lines. A single space character is prepended before the text if
content.leading_space
isTrue
.If not inside a run, this is a no-op.
Parameters: - renew (bool) – Whether or not to create a new text buffer when finished.
This is generally a good idea, since the content will
already be in the document, so the default is
True
. The new buffer hasleading_space
set toTrue
. - default (str) – The text to insert if the current
content
buffer is empty. Defaults to nothing (''
).
- renew (bool) – Whether or not to create a new text buffer when finished.
This is generally a good idea, since the content will
already be in the document, so the default is
-
get_content
(default='')¶ Retrieve the text in the current
content
buffer.Whitespace is stripped from each line in the text, which is then recombined with spaces instead of newlines.
If the buffer is empty (or contains only whitespace), return default instead.
If the text is non-empty, and
content
hasleading_space
set toTrue
, prepended a space.
-
image_log_name
(id, ext='')¶ Create an output name to log an image (or data), for a Data Configuration with the given ID, and an optional extension.
This is the standard name-generator for any component ( tag descriptor or plugin handler) that enables image logging in response to log_images.
The base name is the result of concatenating an extension-less log_file (or output_docx if not set), with
id
, separated by an underscore.ext
is appended as-is, if provided.
-
inject_par
(style='Default Paragraph Font', pstyle='Normal', text='')¶ Insert a new paragraph into the document with the specified styles and text, and return it.
The contents of the paragraph will be a single run with the specified text. Any previously existing
paragraph
andrun
will be terminated (seeend_paragraph
) and reinstated with their proir styles once the new content is inserted.Parameters: Returns:
-
insert_picture
(img, flush_existing=True, style='Default Paragraph Font', pstyle='Quote', **kwargs)¶ Insert an image into the current document.
Images must be inserted into a run, so the following cases are recognized:
- Outside <par>
- Create a new temporary
Paragraph
and a newRun
. Neither object is retained (i.e. inparagraph
andrun
). - Inside <par> but outside <run>
- Create a new temporary
Run
, which will not be retained. - Inside <run>
- If the requested
style
matches the style of the currentrun
, it will be flushed and extended. Otherwise, the currentrun
will be interrupted by a temporary run with the new style, and then reinstated.
It is an error to have a run outside a paragraph.
Parameters: - img (str or file-like) – The image can be the name of a file on disk, or an open file
(including in memory files like
io.BytesIO
). In the latter case, the file pointer must be at the beginning of the image data. - style (str) – The name of the Character Style to apply to a new run.
- pstyle (str) – The name of the Paragraph Style to apply if a new paragraph needs to be created.
Two additional keyword-only arguments can be supplied to
add_picture
:width
andheight
.
-
interrupt_paragraph
(warn=None)¶ A context manager for interrupting the current run/paragraph and resuming it when complete.
The current paragraph and run are ended before the body of the
with
block executes. They are reinstated afterwards, if they existed to begin with, with the same styles as before.Parameters: warn (str, bool or None) – If a boolean, determines whether or not to issue a generic warning if a paragraph is actually interrupted. If a string, it is interpreted as the name of the tag that is interrupting the paragraph, and mentioned in the warning. No warning will be issued if falsy. Defaults to None
.
-
log
(lvl, msg, *args, **kwargs)¶ Provide access to the engine’s logging facility.
Usage is analagous to
logging.log
. XML location meta-data will be inserted into any log messages.
-
new_content
(leading_space=None)¶ Update the
content
text buffer to a new, emptyStringIO
.Calling this method is faster than doing a seek-truncate according to http://stackoverflow.com/a/4330829/2988730.
Parameters: leading_space (tri-state bool) – If None
, copyleading_space
from the currentcontent
. Otherwise, set to the provided value. The default is to copy the existing value.
-
new_run
(tag, style='Default Paragraph Font', pstyle='Normal', check_in_par=True, keep_par=True)¶ Create a new
run
.This method handles cases when a run is requested outside a paragraph, or inside an existing run:
- Nested runs are forbidden, but run injection is not.
- Existing content is flushed for injected runs.
- Runs outside a paragraph will generate a temporary paragraph
with a default style.
- Missing paragraphs can optionally raise a warning.
- The temporary paragraph can optionally be retained as the current paragraph.
Parameters: - name (str) – The name of the tag requesting the run. If there is already
a
run
attribute present, settingname='run'
will raise an error because of nesting. - style (str) – The name of the style to use for the new run.
- pstyle (str) – The name of the style to use for a new paragraph, if one has
to be created. Moot if there is already a
paragraph
attribute. - check_in_par (bool) – Whether or not to warn if not in a paragraph. Defaults to
True
. - keep_par (bool) – Whether or not to retain a newly created paragraph object in
the
paragraph
attribute. Moot if there is already aparagraph
attribute.
Returns: - par (docx.text.paragraph.Paragraph) – The paragraph that the run was added to. If
keep_par
isTrue
or there was already aparagraph
attribute set, this will be theparagraph
attribute. - run (docx.run.Run) – The newly created run. This will be set to the
run
attribute unless there is no existingparagraph
attribute, andkeep_par
is set toFalse
.
Notes
Setting
keep_par
toFalse
for a <run> tag outside a paragraph will cause a situation whererun
is set butparagraph
is not. This may cause a problem for the engine, but should never arise with the builtin parsers.- Nested runs are forbidden, but run injection is not.
-
number_paragraph
(list_type, level)¶ Turn the current paragraph into a list item, and store it into
last_list_item
.The exact numbering scheme depends on
last_list_item
, which will be updated to refer to the current paragraph when this method completes.The following behaviors occur in response to
list_type
:list_type
Behavior None
Not a list paragraph. Do not set numbering or change last_list_item
.CONTINUED
Same type and numbering as last_list_item
. Setlast_list_item
.NUMBERED
Start a new numbered list. Set last_list_item
.BULLETED
Start a new numbered list. Set last_list_item
.Parameters:
-
pop_content_stack
()¶ Reinstate the previous level of the
content_stack
to the currentcontent
.Calling this method on an empty stack will cause an error. The current
content
is completely discarded.
-
push_content_stack
(flush=False, leading_space=False)¶ Temporarily create a new text buffer for the
content
.If
flush
isTrue
, the old buffer is flushed to the document and cleared before being pushed to thecontent_stack
. Ifflush
isFalse
, the existing buffer is pushed unchanged. If the content is flushed, itsleading_space
attribute is set toTrue
.If the existing buffer is flushed, the buffer that will be reinstated when the new one is popped will have
leading_space
set toTrue
.The new buffer can have its
leading_space
attribute configured by theleading_space
parameter, which defaults toFalse
.
-
temp_run
(style='Default Paragraph Font', pstyle='Normal', keep_same=False)¶ Create a temporary run in the current context.
The run and paragraph styles will be preserved after the context manager exits. If the run is injected outside a paragraph, a temporary paragraph will be created and forgotten.
Within the context manager, both
paragraph
andrun
are guaranteed to be set to be set.run
will have the style named bystyle
, butparagraph
will only have the style named bypstyle
if it is a temporary paragraph.All content is flushed into the temporary run when this manager exits.
Parameters: - style (str) – The style of the new run.
- pstyle (str) – The style of a new paragraph to contain the run. Used only
if
paragraph
is unset. - keep_same (bool) – If
True
, and a run already exists, and has the same style as this one, retain it instead of making a new one. IfFalse
(the default), always create a new run.
-
-
class
imprint.core.state.
ReferenceState
(registry, log, heading_depth=None)¶ A simple container type used by the reference parser to communicate state to the reference descriptors and accumulate the reference map.
Most of the state is dedicated to monitoring referenceable tags and creating references to them. The engine and built-in tags rely on a set of attributes to function properly. A description of acceptable use of these attributes is provided here. Any other use may lead to unexpected behavior. Custom tags may define and use any attributes that are not explicitly documented as they chose.
This class allows for a containment check using
in
in preferece tohasattr
.-
registry
¶ Mapping
A subtype of
dict
that follows the same rules astag_registry
. Normally a reference to that attribute.Implemented as a read-only property.
-
references
¶ -
A multi-level mapping type that allows references to be fetched and set by role and attribute. Access to this map is performed by providing a tuple
(role, attribute, key)
. For example:state.references['figure', 'id', 'my_figure']
The map’s values may be of any type, as long as they can be converted to the desired content using
str
.The map is mutable at this stage in the processing. It accumulates all the referenceable tags found in the document. Setting a value for a key any of whose levels do not exist is completely acceptable: the missing levels will be filled in.
Implemented as a read-only property.
-
heading_depth
¶ -
The configured depth after which
heading_counter
stops having an effect when a subheading is entered. If omitted entirely (None
), all available heading levels will be used.Implemented as a writable property.
-
heading_counter
¶ -
A list containing counters for each heading level encountered. The list is popped back one element whenever a higher level heading is encountered.
len(heading_counter)
is the depth of the outline the parser is currently in. E.g., if the parser is parsing text underSection 3.4.5
,heading_counter
contains[3, 4, 5]
. WhenSection 4
is encountered next, the counter will be reset to[4]
. The heading may be referenced later by title or by ID.A
deque
is not used because it does not support slice deletion, which makes jumping back a few heading levels much easier.Implemented as a read-only property.
-
item_counters
¶ -
A mapping of the :term:referenceable roles to the counters of items in the current heading. All the counters are reset to zero when a new heading below
heading_depth
is encountered.Implemented as a read-only property. The keys of the mapping should not be modified, but the values may be.
-
content
¶ -
A mutable buffer used by the engine to accumulate text from the XML Template only when necessary.
This attribute should be manipulated mostly through the
start_content
andend_content
methods. It should only be present for tags that care about accumulating content for a reference, like <par>. When present, all content, regardless of nested tags, will be accumulated.
-
__contains__
(name)¶ Checks if the specified name represents an attribute.
-
end_content
()¶ Terminate the current content buffer, if any, and return the content after aggressive stripping of whitespace.
If there is no
content
buffer to begin with, an empty string is returned.
-
format_heading
(prefix=None, prefix_sep=' ', sep='.', suffix_sep='-', suffix=None)¶ Format
heading_counter
for display.If suffix is set to a Truthy value, only
heading_depth
items are shown. Otherwise, the entire list is shown.
-
get_content
(default='')¶ Retrieve the text in the current
content
buffer.Whitespace is stripped from each line in the text, which is then recombined with spaces instead of newlines.
If the buffer is non-existent, empty or contains only whitespace, return default instead.
-
heading_counter
Ensure that
heading_counter
is read-only.
-
heading_depth
Ensure that
heading_depth
is set to a legitimate value.
-
increment_heading
(level)¶ Increment
heading_counter
at the requested level.Any missing levels are set to 1 with a warning. Any further levels are truncated.
item_counters
is reset ifheading_depth
is unset or a greater value than level.
-
item_counters
Ensure that
item_counters
is read-only.
-
log
(lvl, msg, *args, **kwargs)¶ Provide access to the engine’s logging facility.
Usage is analagous to
logging.log
. XML location meta-data will be inserted into any log messages.
-
registry
Ensure that
registry
is read-only.
-
reset_counters
()¶ Set all the values of
item_counters
to zero.
-
-
class
imprint.core.state.
ReferenceMap
¶ A multi-level mapping that stores references in the values.
Values are accessed through a three-level key
(role, attribute, key)
: For a given role, the type of key is determined by theattribute
that names the target. Most tags only supportattribute='id'
, but <segment-ref> also supportsattribute='title'
.key
is the actual value of the attribute that is used to identify the reference.Reference values can be any object whose
__str__
method returns the correct replacement text for the reference.-
__contains__
(key)¶ Checks if this mapping has the specified partial key.
Key may be a single string or a
tuple
with a length between 1 and 3. Checks will be made for the appropriate depth.
-
__getitem__
(key)¶ Retreive the value for the specified three-level key.
-
static
__new__
(cls, *args, **kwargs)¶ Ensure that the map is unlocked when it is first created.
This way calling
__init__
is not a trick for unlocking the map.
-
__setitem__
(key, value)¶ If this mapping is not locked, set the attribute for the specified three-level key.
If any of the levels are new, they are created along the way.
-
__str__
(indent=2)¶ Creates a pretty representation of this map, with indented heading levels.
-
lock
()¶ Lock this mapping to prevent unintentional modification.
This is a one-time operation. There is no way to unlock. After locking,
__setitem__
will raise an error.
-