Document Format Registry¶
This page explains the concept of the document format registry used by JODConverter’s Java library. If you are new to JODConverter, start with the Java Library overview, then come back here for details about how formats are defined and found during a conversion.
What is a Document Format?¶
A DocumentFormat
describes a file type that OOo can read or write. It includes:
- Name, primary extension and alternate extensions (e.g., pdf, docx, odt, xlsx, html(htm), png…).
- Media type (MIME type).
- Optional load properties applied when opening a document of that format.
- Optional store properties per document family (TEXT, SPREADSHEET, PRESENTATION, DRAWING) used when exporting.
What is the Registry?¶
The DocumentFormatRegistry
is a lookup service that maps extensions and media types to DocumentFormat
descriptors,
and lists the available output formats for a document family.
JODConverter ships with a comprehensive default registry covering common office, text, spreadsheet, presentation, drawing and image formats.
Core Types¶
DocumentFormat
: The immutable description of a format (org.jodconverter.core.document.DocumentFormat
).DocumentFormatRegistry
: Interface to look up formats (org.jodconverter.core.document.DocumentFormatRegistry
).DefaultDocumentFormatRegistry
: Static convenience access to the default registry and well-known constants (e.g.,DefaultDocumentFormatRegistry.PDF
).SimpleDocumentFormatRegistry
: A mutable in-memory registry you can build programmatically.JsonDocumentFormatRegistry
: A registry that can be loaded from JSON.
Default Registry¶
The default document format registry is backed by a bundled JSON file: /document-formats.json
.
If a /custom-document-formats.json
is present on the application classpath, it is automatically loaded and merged on
top of the defaults (overrides existing formats or adds new ones).
Typical Usage¶
- Let JODConverter pick formats automatically by file extension: If you pass File or stream with an explicit target format, converters will use the registry to resolve the correct configuration.
- Query formats yourself: Access by extension or media type
import org.jodconverter.core.document.DefaultDocumentFormatRegistry;
import org.jodconverter.core.document.DocumentFormat;
DocumentFormat pdf = DefaultDocumentFormatRegistry.getFormatByExtension("pdf");
DocumentFormat docx = DefaultDocumentFormatRegistry.getFormatByMediaType("application/vnd.openxmlformats-officedocument.wordprocessingml.document");
- List available output formats for a given family
import org.jodconverter.core.document.DefaultDocumentFormatRegistry;
import org.jodconverter.core.document.DocumentFamily;
var outputForText = DefaultDocumentFormatRegistry.getOutputFormats(DocumentFamily.TEXT);
Custom Registry¶
Both LocalConverter
and RemoteConverter
accept a format registry in their builder.
import org.jodconverter.core.document.DocumentFormatRegistry;
import org.jodconverter.core.document.JsonDocumentFormatRegistry;
import org.jodconverter.local.LocalConverter;
// Example: load additional/overridden formats from a JSON string or stream
DocumentFormatRegistry custom = JsonDocumentFormatRegistry.create(jsonInputStream);
LocalConverter converter = LocalConverter.builder()
.formatRegistry(custom) // use your registry
.build();
Programmatic Customization¶
Build a registry in code when you only need a few tweaks:
import org.jodconverter.core.document.*;
SimpleDocumentFormatRegistry reg = new SimpleDocumentFormatRegistry();
DocumentFormat myPdf = DocumentFormat.builder()
.name("PDF-A1")
.extension("pdf")
.mediaType("application/pdf")
// Customize store properties for TEXT family (export as PDF/A-1)
.storeProperty(DocumentFamily.TEXT, "FilterName", "writer_pdf_Export")
.storeProperty(DocumentFamily.TEXT, "SelectPdfVersion", 1)
.unmodifiable(true)
.build();
reg.addFormat(myPdf);
Overriding the Default Registry¶
If you prefer to change the global default used by DefaultDocumentFormatRegistry
constants, set the instance:
import org.jodconverter.core.document.*;
DefaultDocumentFormatRegistryInstanceHolder.setInstance(registry);
Load/Store Property Precedence¶
Load properties used when opening a document are determined as follows:
- Input
DocumentFormat
load properties (from the registry), then - Converter default load properties (
Hidden=true
,ReadOnly=true
,UpdateDocMode=NO_UPDATE
unless configured otherwise), then - Explicit per-converter loadProperty/loadProperties you set in the builder
Store properties used when saving the output are determined as follows:
- Target
DocumentFormat
store properties (from the registry), then - Explicit per-converter storeProperty/storeProperties you set in the builder
Where to put custom JSON¶
- Place
custom-document-formats.json
on your application’s runtime classpath (e.g., insrc/main/resources
) to have it automatically merged with the defaults - Or load any JSON at runtime and pass a
JsonDocumentFormatRegistry
instance to your converter
Troubleshooting¶
If a format cannot be resolved, ensure the extension is correct and present in the registry.
For advanced PDF export options (bookmarks, PDF/A, tagged PDF, etc.), set the corresponding store properties on the
target DocumentFormat
or via converter.storeProperty.
Related Pages¶
Whenever OpenOffice.org (OOo for short) is mentioned, this can generally be interpreted to include any office suite derived from OOo such as Apache OpenOffice and LibreOffice.