Implement the IMSC HRM for EBU-TT-D documents #66

nigelmegitt · 2023-12-07T12:19:53Z

This PR provides improved support for creating EBU-TT-D documents from XML sources, and implements validation of those documents against the IMSC HRM at https://www.w3.org/TR/imsc-hrm/ - it incorporates the tests at w3c/imsc-hrm-tests that are valid EBU-TT-D and runs those tests.

Also closes #62 by incorporating the changes proposed in #63.

Includes all the imsc-hrm-tests test files that can be valid EBU-TT-D, fixed up to be so.

Iterate through the characters getting the glyphs. Decide if they need to be rendered or copied from the cache. Compute NRGA, and check the Glyph Cache size on each iteration. Still to do: compute copyDur and renderDur. Needs UAX24 implementation too.

Parses the Unicode UAX24 scripts list and generates a python file that specifies those lists in a way that can be queried later. Needed for the IMSC-HRM implementation.

* Fix NRGA calculation to square the area * Calculate _GCpy and _Ren based on uax24 script Passes all but 2 of the tests. TODO: tidy up stdout and log messages

For example, as might occur if text is broken with a ` ` child element of the ``

Change all the `print()`s to `log.debug()`s so the log is readable. Fix a bug where a ` ` after some text content would cause the previous text to be processed and counted again, which caused dur014-pass to fail incorrectly.

Should have 0.5s available render time for second ISD.

And regenerate uax24.py script

Previously, we processed each separate `p` in an ISD distinctly as a separate ISD, which was wrong. Now, gather all the elements in each ISD together and process as a group. Also tidies up the time handling. When an ISD has a region with an opaque background colour and showBackground="always", there can _never_ be an empty ISD, because the background always needs to be painted.

The log handling in --verbose mode is flaky, not sure why.

Makes the tests behave as expected.

The upstream repo has had those conversions applied, so the tests are now the same.

Came back during rebase

nigelmegitt marked this pull request as ready for review December 7, 2023 12:20

nigelmegitt added 29 commits December 26, 2023 13:14

wip

e5d5d11

Remove Pipfile and don't need pytest-catchlog

541a48b

Load EBU-TT-D documents from XML

04a1044

Test EBU-TT-D <--> XML

379c181

Fix element name hack when there's no prefix

883e5e3

Empty validator code with tests running

913a5d6

Includes all the imsc-hrm-tests test files that can be valid EBU-TT-D, fixed up to be so.

Remove content with no associated region from imsc-hrm tests

1c5f6c6

Calculate if an ISD is empty

ca05e81

Compute styles for spans in EBU-TT-D

3d2578f

Compute drawing area S

fcb6a56

Add hash function to CellFontSizeType

e6a1ba0

WIP implementing textDuration

e5005a3

Iterate through the characters getting the glyphs. Decide if they need to be rendered or copied from the cache. Compute NRGA, and check the Glyph Cache size on each iteration. Still to do: compute copyDur and renderDur. Needs UAX24 implementation too.

Script to generate Python file for uax24

f964cf2

Parses the Unicode UAX24 scripts list and generates a python file that specifies those lists in a way that can be queried later. Needed for the IMSC-HRM implementation.

Add missing region

0baeb58

Complete implementation

4395753

* Fix NRGA calculation to square the area * Calculate _GCpy and _Ren based on uax24 script Passes all but 2 of the tests. TODO: tidy up stdout and log messages

Handle text broken into different children of span

7ed4c34

For example, as might occur if text is broken with a ` ` child element of the ``

log not print, don't reprocess text due to 

4a13966

Change all the `print()`s to `log.debug()`s so the log is readable. Fix a bug where a ` ` after some text content would cause the previous text to be processed and counted again, which caused dur014-pass to fail incorrectly.

Fix up dur003 tests

c784d56

Should have 0.5s available render time for second ISD.

Handle character codes more than 4 digits long

b1a948b

And regenerate uax24.py script

extend the character ranges by 1 at the end

70053d5

More elegant solution to the range problem

e9636da

Ignore content without a region when checking if an ISD is empty

94af5b8

Integrate the IMSC HRM Validator with a command line switch

d0fdf17

The log handling in --verbose mode is flaky, not sure why.

Add showBackground region tests

9321951

Add unit tests for IMSC HRM Validator

348e650

Edge case fixes

d0093e6

Incorporate proposed fix for w3c/imsc-hrm-tests#12

45454e7

Makes the tests behave as expected.

Delete p0bmslf8_gaps.json

e030def

Delete statement about conversion to EBU-TT-D

8e835ad

The upstream repo has had those conversions applied, so the tests are now the same.

nigelmegitt force-pushed the imsc-hrm branch from 2704d51 to 8e835ad Compare December 26, 2023 13:19

nigelmegitt temporarily deployed to CI December 26, 2023 13:19 — with GitHub Actions Inactive

Don't re-add Pipfile

13c2a7f

Came back during rebase

nigelmegitt temporarily deployed to CI December 26, 2023 13:23 — with GitHub Actions Inactive

Add documentation for imscHrmValidator

dc086c9

nigelmegitt temporarily deployed to CI December 26, 2023 14:30 — with GitHub Actions Inactive

nigelmegitt mentioned this pull request Dec 26, 2023

Issue 0062 support reading ebuttd #63

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement the IMSC HRM for EBU-TT-D documents #66

Implement the IMSC HRM for EBU-TT-D documents #66

nigelmegitt commented Dec 7, 2023 •

edited

Loading

Implement the IMSC HRM for EBU-TT-D documents #66

Are you sure you want to change the base?

Implement the IMSC HRM for EBU-TT-D documents #66

Conversation

nigelmegitt commented Dec 7, 2023 • edited Loading

nigelmegitt commented Dec 7, 2023 •

edited

Loading