/
Data Mining with Forms

Data Mining with Forms

QEST Platform 5.0 Documentation
Applies to QESTField Forms

This article describes the structure of forms data as it is saved in the QEST Platform database with the intent that it act as a starting point for extracting this data for reporting purposes.

Contents

Overview

Forms will typically contain two types of data:

  • mapped data, which is tightly bound to the work order and report hierarchy in the QESTLab database
  • unmapped data, which includes all of the entry fields used by the field technician when filling out the bulk of the report, and which usually does not correspond to existing QESTLab tests

When a form is uploaded, both types of data are recorded in the database in the DocumentExternal.XmlData field.

Field Types

Fields Defined in Adobe Acrobat

Consider a mapped form that contains a mix of data: some mapped from the QEST Platform database and some entered by the user in the field. In Adobe Acrobat, the field names are defined as shown below. Note some of the field names (in black) in the header:

  • Contract No
  • Date
  • TIP Number Inspector

Fields Mapped in QEST Form Mapper

In the QEST Form Mapper, mappings are created for the header data, known to be available in the QEST Platform database, but the fields at the bottom are left un-mapped, as they are not known to QEST Platform and will be completed by the field technician. Note there are mappings to the same fields used before:

  • Contract No - ID20002/ProjectCode
  • Date - ID101/WorkDate
  • TIP Number Inspector -  - ID101/PersonName

Fields Completed by the Field Operator

Once an instance of the form is created, some of the header information is populated from the QESTLab database, and the field technician enters data for the remaining parts. Both the names of the form fields and the mappings on those fields are not seen by the user.

The field technician completes the rest of the fields, including:

  • TIP Number
  • High Temp
  • Low Temp
  • AM Conditions
  • PM Conditions

Xml data

Once the field technician uploads the form, the form is analysed and the data is extracted. This is divided into:

  • mapped data, in the Data/QestData node
  • unmapped data, in the Data/Raw node

Note that for each form there is only a single block of XML in the database, but the two parts will be considered separately in the sections below.

Mapped data

The data in the QestData node is hierarchical and corresponds to the structure of QESTLab documents. In this sense it most closely reflects the field mappings seen in the QEST Form Mapper.

QEST Data
<Data>
  <QestData>
    <ID20002>
      <ProjectCode>DSI</ProjectCode>
      <QestUUID>f775c498-cb94-4aa3-8954-a6ff012ce0fc</QestUUID>
    </ID20002>
    <ID101>
      <PersonName>Field Tech 1</PersonName>
      <WorkDate>11-May-18</WorkDate>
      <QestUUID>926d68bd-608c-415c-bf87-a8dd000acbec</QestUUID>
      <ID190001>
        <SignatureImage/>
        <QestUUID>19067c18-56c8-46de-bfa8-a8dd005b4308</QestUUID>
      </ID190001>
    </ID101>
  </QestData>
  <Raw>
	<!-- Redacted -->
  </Raw>
</Data>

In the above example the top node (ID101) is the work order, and the child node (ID190001) is the form itself.

Property names

The properties on each document will usually (but not always) correspond to a database field of the same name. For example, the ReportNo in the example above corresponds to DocumentExternal.ReportNo in the database.

Date and numeric formats

At this time, WYSIWIG (what you see is what you get): the XML will reflect what the user has entered regardless of whether the corresponding database field is a strongly-typed date etc or not. For example, suppose that a PDF form has two date fields:

  • one has been configured to display as dd-mmm-yyyy, and a field technician sets a value of 12-May-2018
  • one has been configured to display as mm/dd/yy, and a field technician sets a value of 08/30/18

In the resultant XML, these will not be converted back to a standardized date format, but rather will directly contain the text "12-May-2018" and "08/30/18". This imposes some limits on data mining, as in this scenario it's not possible to compare such dates without first using a specific format conversion.

Identifiers

Every document will have a QestUUID, which uniquely identifies the document in the database, and can be used in conjunction with the qestReverseLookup table to traverse document relationships.

Unmapped data

The data in the Raw node is flat and corresponds to the names of the form fields. In this sense it most closely reflects the field mappings seen in the fields defined in Adobe Acrobat. Note that the mapped fields will be included in this block as well.

Raw Data
<Data>
  <QestData>
    <!-- redacted -->
  </QestData>
<Raw>
    <ContractNo>DSI</ContractNo>
    <undefined>227261</undefined>
    <TIPNumberInspector>Field Tech 1</TIPNumberInspector>
    <Day/>
    <Date>11-May-18</Date>
    <HighTemp>70</HighTemp>
    <LowTemp>55</LowTemp>
    <AMConditions>Good</AMConditions>
    <PMConditions>Bad</PMConditions>
    <ItemsofWorkRow1/>
	<!--snip-->
  </Raw>
</Data>

Property names

The properties correspond to the names of the fields that the administrator configured in e.g. Adobe Acrobat, minus the spaces.

  • by coincidence only, properties such as ReportNo and ClientName correspond to the same field names in the QestData node (at Data/QestData/ID101/ID190010)
  • other properties such as ClientStreet1 are named differently to the corresponding field Street in the QestData node (at Data/QestData/ID20001)

Date and numeric formats

The same limitations apply as with mapped data.

Fields with invalid characters

There are reasonably strict naming rules for XML which, if broken, will render the XML invalid and make it impossible to parse. The naming conventions of the PDF form fields are generally less strict, so an administrator may make form fields which need to be sanitized in the raw XML. For example

  • the field 90 Degrees would appear in the XML as <_x0039_0_x0020_Degrees>

If it is necessary to ensure that the original field names are preserved, make certain that the field names entered in Adobe Acrobat conform to the same rules as for XML naming. Namely:

  • Element names must start with a letter or underscore
  • Element names cannot start with the letters xml (or XML, or Xml, etc)
  • Element names can contain letters, digits, hyphens, underscores, and periods
  • Element names cannot contain spaces

Versioning of XML data

Multiple versions of the XmlData are not currently retained. The data in the table will always reflect that of the most recently uploaded version of the form.



Products described on these pages, including but not limited to QESTLab®, QESTNet, QESTField, QEST Web App, Construction Hive, and associated products are Trademarks () of Spectra QEST Australia Pty Ltd and/or related companies.

The content of this page is confidential. Do not share, duplicate or distribute without permission.

© 2021 Spectra QEST® Australia Pty Ltd and/or related companies.  Terms of Use and Privacy Statement


Related content

Integrity | Curiosity | Empathy | Unity

The content of this page is confidential and for internal Spectra QEST use only. Do not share, duplicate or distribute without permission.