Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update pdfalto recognition of non-standard fonts #1216

Open
wants to merge 6 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -923,8 +923,8 @@ else if (biblio.getE_Year().length() == 4)
tei.append("\t\t\t<abstract>\n");
}

if ((abstractText != null) && (abstractText.length() != 0)) {
if ( (biblio.getLabeledAbstract() != null) && (biblio.getLabeledAbstract().length() > 0) ) {
if (StringUtils.isNotBlank(abstractText)) {
if (StringUtils.isNotBlank(biblio.getLabeledAbstract())) {
// we have available structured abstract, which can be serialized as a full text "piece"
StringBuilder buffer = new StringBuilder();
try {
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
0030 zero.fitted
0031 one.fitted
0032 two.fitted
0033 three.fitted
0034 four.fitted
0035 five.fitted
0036 six.fitted
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
fb00 f_f
fb01 f_i
fb02 f_l
fb03 f_f_i


10 changes: 10 additions & 0 deletions grobid-home/pdfalto/languages/xpdf-others/oldstyle.nameToUnicode
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
0030 zero.oldstyle
0031 one.oldstyle
0032 two.oldstyle
0033 three.oldstyle
0034 four.oldstyle
0035 five.oldstyle
0036 six.oldstyle
0037 seven.oldstyle
0038 eight.oldstyle
0039 nine.oldstyle
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
2113 lscript
2202 partialdiff
21A9 arrowhookleft
21A9 arrowrighttophalf
26 changes: 26 additions & 0 deletions grobid-home/pdfalto/languages/xpdf-others/sc.nameToUnicode
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
0061 a.sc
0062 b.sc
0063 c.sc
0064 d.sc
0065 e.sc
0066 f.sc
0067 g.sc
0068 h.sc
0069 i.sc
006a j.sc
006c l.sc
006d m.sc
006e n.sc
006f o.sc
0070 p.sc
0071 q.sc
0072 r.sc
0073 s.sc
0074 t.sc
0075 u.sc
0076 v.sc
0077 w.sc
0078 x.sc
0079 y.sc
007a z.sc
002d hyphen.sc
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
0030 zero.taboldstyle
0031 one.taboldstyle
0032 two.taboldstyle
0033 three.taboldstyle
0034 four.taboldstyle
0035 five.taboldstyle
0036 six.taboldstyle
0037 seven.taboldstyle
0038 eight.taboldstyle
0039 nine.taboldstyle
8 changes: 8 additions & 0 deletions grobid-home/pdfalto/languages/xpdfrc
Original file line number Diff line number Diff line change
Expand Up @@ -59,3 +59,11 @@ unicodeMap TIS-620 languages/xpdf-thai/TIS-620.unicodeMap
#----- begin Turkish support package (2011-aug-15)
unicodeMap ISO-8859-9 languages/xpdf-turkish/ISO-8859-9.unicodeMap
#----- end Turkish support package
#----- begin oldstyle support package (2024-dec-31)
nameToUnicode languages/xpdf-others/oldstyle.nameToUnicode
nameToUnicode languages/xpdf-others/taboldstyle.nameToUnicode
nameToUnicode languages/xpdf-others/ligatures.nameToUnicode
nameToUnicode languages/xpdf-others/fitted.nameToUnicode
nameToUnicode languages/xpdf-others/others.nameToUnicode
nameToUnicode languages/xpdf-others/sc.nameToUnicode
#----- end oldstyle support package
9 changes: 9 additions & 0 deletions grobid-home/pdfalto/lin-64/xpdfrc
Original file line number Diff line number Diff line change
Expand Up @@ -59,3 +59,12 @@ unicodeMap TIS-620 ../languages/xpdf-thai/TIS-620.unicodeMap
#----- begin Turkish support package (2011-aug-15)
unicodeMap ISO-8859-9 ../languages/xpdf-turkish/ISO-8859-9.unicodeMap
#----- end Turkish support package
#----- begin oldstyle support package (2024-dec-31)
nameToUnicode ../languages/xpdf-others/oldstyle.nameToUnicode
nameToUnicode ../languages/xpdf-others/taboldstyle.nameToUnicode
nameToUnicode ../languages/xpdf-others/ligatures.nameToUnicode
nameToUnicode ../languages/xpdf-others/fitted.nameToUnicode
nameToUnicode ../languages/xpdf-others/others.nameToUnicode
nameToUnicode ../languages/xpdf-others/sc.nameToUnicode
#----- end oldstyle support package

8 changes: 8 additions & 0 deletions grobid-home/pdfalto/mac-64/xpdfrc
Original file line number Diff line number Diff line change
Expand Up @@ -59,3 +59,11 @@ unicodeMap TIS-620 ../languages/xpdf-thai/TIS-620.unicodeMap
#----- begin Turkish support package (2011-aug-15)
unicodeMap ISO-8859-9 ../languages/xpdf-turkish/ISO-8859-9.unicodeMap
#----- end Turkish support package
#----- begin oldstyle support package (2024-dec-31)
nameToUnicode ../languages/xpdf-others/oldstyle.nameToUnicode
nameToUnicode ../languages/xpdf-others/taboldstyle.nameToUnicode
nameToUnicode ../languages/xpdf-others/ligatures.nameToUnicode
nameToUnicode ../languages/xpdf-others/fitted.nameToUnicode
nameToUnicode ../languages/xpdf-others/others.nameToUnicode
nameToUnicode ../languages/xpdf-others/sc.nameToUnicode
#----- end oldstyle support package
8 changes: 8 additions & 0 deletions grobid-home/pdfalto/mac_arm-64/xpdfrc
Original file line number Diff line number Diff line change
Expand Up @@ -59,3 +59,11 @@ unicodeMap TIS-620 ../languages/xpdf-thai/TIS-620.unicodeMap
#----- begin Turkish support package (2011-aug-15)
unicodeMap ISO-8859-9 ../languages/xpdf-turkish/ISO-8859-9.unicodeMap
#----- end Turkish support package
#----- begin oldstyle support package (2024-dec-31)
nameToUnicode ../languages/xpdf-others/oldstyle.nameToUnicode
nameToUnicode ../languages/xpdf-others/taboldstyle.nameToUnicode
nameToUnicode ../languages/xpdf-others/ligatures.nameToUnicode
nameToUnicode ../languages/xpdf-others/fitted.nameToUnicode
nameToUnicode ../languages/xpdf-others/others.nameToUnicode
nameToUnicode ../languages/xpdf-others/sc.nameToUnicode
#----- end oldstyle support package
Loading