Project

General

Profile

Statistics
| Revision:
Name Size Revision Age Author Comment
  core 36337 over 9 years Marek Horst #1257 raising oozie.action.max.output.data to 8192
  src 37343 over 9 years Marek Horst #1329 adding affiliations field in ExtractedDoc...
README.markdown 662 Bytes almost 12 years marek.horst
deploy.info 811 Bytes 32243 about 10 years Marek Horst introducing embedded integration test entry
pom.xml 7.44 KB 37348 over 9 years Marek Horst #1330 icm-iis-metadataextraction and icm-iis-in...
  • svn:ignore: .* bin target build

Latest revisions

# Date Author Comment
37348 20/05/2015 07:00 PM Marek Horst

#1330 icm-iis-metadataextraction and icm-iis-ingest-pmc modules cermine dependency upgraded to recently released 1.6 version

37343 20/05/2015 06:49 PM Marek Horst

#1329 adding affiliations field in ExtractedDocumentMetadata PMC schema. Metadata extraction code refactoring by extracting code responsible for building Affiliation avro records to AffiliationBuilder class and sharing it with pmc ingestion. Implementing affiliations ingestion functionality in PmcXmlHandler covered with unit tests. Adding affiliations field support in ingest pmc metadata transformer.

36394 15/04/2015 05:17 PM Marek Horst

#1240 raising mapred.task.timeout to 3600000 (1h) just in case any extremely complex PDF document appear. All time consuming documents will be registered in failure sink.

36366 14/04/2015 12:54 PM Marek Horst

#1277 upgrading cermine dependency to most recent 1.5 release

36337 13/04/2015 01:33 PM Marek Horst

#1257 raising oozie.action.max.output.data to 8192

36289 09/04/2015 07:10 PM Marek Horst

#1257 dropping schema generation related hacks in all map-reduce modules, switching to literal schema parameters

35986 03/04/2015 01:30 PM Marek Horst

#1248 persisting content url in supplementaryData to make it easier to find content causing failure

35937 02/04/2015 04:15 PM Marek Horst

#1248 bugfix renaming inputEntityId to inputObjectId after schema changes

35936 02/04/2015 04:01 PM Marek Horst

#1248 introducing failures sink datastore support in metadata extraction module

35832 30/03/2015 07:17 PM Marek Horst

#1240 extending mapred.task.timeout for metadata extraction to 30 minutes

View revisions

Also available in: Atom