TaXML Presentation
TaXML Presentation
From: http://www.taxadmin.org/fta/meet/2000tech/andrson/
"TaXML Presentation."
By Michael Sanzi and Lesley Anderson
September 12, 2000
Also in:
http://www.taxadmin.org/fta/meet/2000tech/andrson/anderson.ppt
[Slide #1]
TaXML Presentation
Lesley Anderson
[Slide #2]
Introduction
"Straw man" XML-based schema
The schema authors
Notes:
Straw man
This schema is presented as a straw man -- it is a starting point for working with XML.
The people working on this were all tax professionals with many years of experience.
Even though this has been carefully reviewed, there may be
inconsistencies in the examples.
This proposal may not exactly meet the requirements of the agencies that would use it,
simply because there has not been sufficient (any!) communication between
developers and customers.
The Dev Team
Tax professionals with many years experience
Worked for 10 years developing commercial tax software for professional use
Worked at MS for the last 2 years developing the TaxSaver product
XML experience provided by programmers at MS
[Slide #3]
Agenda
Developing a hierarchy
Creating the schema
Creating the XML data file
Validating the XML data file with the schema
Displaying data using XSL
[Slide #4]
Developing a Hierarchy
Tax forms and electronic filing
Included fields that need data entered
Included data only once
Exceptions: key fields and placeholders
Notes:
Tax forms and electronic filing
Analyzed forms and flow, and reviewed the electronic filing specs
Looked for ways to logically group data, rather than just reiterate the forms
Looked for ways to economize the amount of data by avoiding duplication
Used descriptive tag names
Included entry fields
Entered fields on forms are included in the hierarchy
Subtotals and non-significant computed fields are not included
Included fields that would be computed on a worksheet or a form not supported
for this version. These are placeholders for a group of fields that would be entered
later.
Included data only once
If data appeared in more than one place, only included data at its source
entry point.
For example, advance EIC is entered on the W-2 and then carried to the 1040.
In our model, the field only shows on the W-2.
[Slide #5]
Forms Supported
Form 1040
Schedule A
Schedule B
Schedule C
Schedule E (pg.1)
Schedule EIC
Schedule F
Schedule H
Schedule J
Schedule R
Schedule SE
Form 1040A
Schedule 1
Schedule 2
Schedule 3
Form 1040EZ
Form 2210-F
Form 2441
Form 4255
Form 4562
Form 4797
Form 4835
Form 8606
Form 8615
Form 8812
Form 8815
Form 8828
Form 8829
Form 8839
Form 8863
Form 9465
Form W-2
Form 1099-INT, DIV, & MISC
Form 1099-R
Notes:
It is somewhat misleading to speak in terms of forms.
It is more accurate to say that the fields with entered data, as contrasted
with computed data, are included in the schema.
[Slide #6]
At the Top of the Hierarchy
TaXML
Authentication
Identification
KeyID
TaxYear
Version
Major
Minor
IndividualTax
CorporateTax
W-2 & W-3
Notes:
TaXML
XML schemas must have a unique root element. In this case, it's TaXML.
There would of course, need to be elements for authentication, signon, and
versioning. This is not addressed in this hierarchy except for a couple of
placeholders. As security and transmissions needs are determined, this part
of the schema will be filled
Individual Tax is the top level for both federal and state income tax for individuals.
I'm showing Corporate Tax as an example of the level at which another entity
would appear in the hierarchy. This could be included as an actual part of the schema,
by including it through a namespace or a data island.
Also, this type of schema could be used to transmit information from
employers and financial institutions to the IRS.
[Slide #7]
The Taxpayer Element
Taxpayer
IDNumber
Name
FirstName
MiddleInitial
LastName
Suffix
CompleteName
NameControl
Age65OrOlder
Blind
MilitaryIndicator
HomePhone
WorkPhone
PresidentFund
Exemption
Notes:
The Taxpayer Element
ID number is the Social Security number. We have used the more general
term IDNumber to allow this tag to be used for either SSN or EIN.
The taxpayer's name is comprised of the 6 pieces shown. We gather the
first, middle initial, last name, and suffix separately to make name control and matching
easier. The name is then displayed in full in the CompleteName field.
The NameControl represents the same content that is currently used in
electronic filing. Whether this would still be needed in this new methodology is unclear.
Then we have pulled together other information that is specifically
related to the taxpayer.
There are a couple of other fields that are in this section that
were deleted on this slide due to space constraints.
The spouse has an identical element.
[Slide #8]
Address
Address
Street
Street2
ApartmentSuite
City
State
ZipCode
NewAddress
Notes:
Address
The hierarchy was developed looking at federal and California. If California
needed a piece of information that would be relevant to federal or another state, that field
was added to the federal portion of the hierarchy.
An example of this is that Street2 is a second line for the street address.
The NewAddress tag is a boolean (Yes/No) field to indicate if this is a new address
[Slide #9]
FilingStatusInformation
FilingStatusInformation
FilingStatus
MFS
Name
IDNumber
DidNotLiveWithSpouse
HeadofHousehold
Name
IDNumber
QWYearSpouseDied
MustItemizeIndicator
Notes:
FilingStatusInformation
FilingStatusInformation includes the filing status used in the return.
The MFS element was created to group the information needed for the spouse
when the status is married filing separately.
Under the MFS element is the Name element. This is bolded to show that this
element has been previously used. This simplifies the hierarchy by not needing
to again list the components.
The Head of Household information is for the qualifying dependent.
[Slide #10]
DependentList
DependentList
Dependent
Name
IDNumber
Relationship
QualifyforTaxCredit
QualifiedCareExpense
YearofBirth
Student
Disabled
NumberOfMonths
PYChildCareIndicator
Notes:
DependentList
Here is the first example of dealing with an element that may occur more than
once, or in English, that may be present in the XML data file more than once.
Throughout this proposed schema, we have used the word "List" to indicate that
more than one of the elements that follows the list is allowed.
In short, here is what this hierarchy means:
One DependentList is allowed.
Within each DependentList, there can be an unlimited number of Dependent
elements.
Within each Dependent element, there can be only one of the items listed.
When we move to examine the schema you will be able to see that this is
accomplished using the minOccurs' and maxOccurs' attributes.
[Slide #11]
Digging Into the Hierarchy
Wages
Demonstrates adding levels to the hierarchy
Shows how state data can be gathered
ActivityList
Combining business, rental, farm, and farm rentals
Depreciation
California
Integrating state into the mix
Notes:
Digging Into the Hierarchy
We're going to be using a paper handout for this part of the presentation
because of space constraints on the screen.
We'll take a look at the Wages, ActivityList, and how California is included
in the hierarchy.
[Slide #12]
Creating the TaXML Schema
XDR rather than DTD
Working in XML
Using a browser
Using an XML editor
Declaring the name space
Notes:
XDR vs DTD
There are several languages for writing schemas; XML Document Type Definitions
(DTD) is the only standard one and the most widely used today.
Microsoft's XML tools use a proprietary schema language called XML Data Reduced
(XDR), which makes a number of improvements over DTD. For example, XDR
schemas can specify data types (e.g., decimal number, date) and allowable values
of elements, enabling p
Microsoft says it will eventually replace XDR with the XML Schema language,
currently under development by the W3C with input from Microsoft and other
companies. Microsoft also says it will provide automated tools for translating
XDR schemas to XML Schema
We are using XDR in this schema.
Working in XML
Now we're going to look at XML. Up until now, the material we've looked at has
been the analysis and organization of the material, without actually being in an XML
file or schema.
I'm opening the schema in the Internet Explorer. So technically what you're
looking at is an HTML representation of the schema, rather than pure XML. HTML adds
the colors and allows you to expand or collapse parts of the schema to make
it easier to view.
Declaring the name space
The first line in the file is the declaration of the name space. We're using
the Microsoft versions of XML for both the schema and the datatypes.
[Slide #13]
TaXML AttributeType
AttributeType
tsj
state
keyfield
Format
Notes:
AttributeTypes
The attribute feature allows you to add additional information to an element. This can
be used to validate, sort, and display data.
TSJ
The tsj' attribute is probably one you can figure out with little help. Does
anyone know what this is?
By adding the tsj' attribute, we set the stage to be able to identify which
income belongs to the taxpayer, the spouse, or to them as a couple. This can be used to
determine the optimal filing status, and is especially critical to many of the states.
State
Sourcing income to a particular state or city is integral to the preparation
of correct state returns.
Keyfield
The keyfield attribute might be better named something like check total.
[Slide #14]
TaXML Data Types
Data types
fixed.14.4
float
boolean
date
int
string
Notes:
TaXML Data Types
Fixed.14.4
Fixed.14.4 is the type used for money fields.
Float
Used for percentages.
Boolean
Used for Yes/No or True/False situations.
Date
Used for dates as long as a word such as various is not permitted. If it is,
then must use the string type.
Int
Used for numeric fields that are not currency
String
Used for anything else! Text is obvious, but also for IDNumbers, calendar
years, etc., where you want to be able to control format.
[Slide #15]
ElementType Declarations
Declaring the elements
Order
Format
Beginning <
ElementType
Name
Content
Dt:type
Ending />'
Notes:
Declaring the Elements
The methodology we used to generate this schema was to set up a database.
This gave us a dependable, stable data source that was easily maintainable. A
program was written using C++ that creates the schema.
Order
Each tag name that we use in the hierarchy must be declared in the schema
using the ElementType' statement.
The ElementType declaration is best done for an item before it is included
as a child of another element.
Another way to view this is that the elements that contain text, rather
than other elements are handled first.
Although the first half of the schema is in alphabetical order, this is
only because that was the choice of the programmer to alphabetize the
elements that did not have child elements.
The order in which a schema is written is not hard and fast. The
language is still growing.
Format
Each line begins with a less than sign. All XML schemas are XML files and
must conform to the required XML syntax.
Let's look at the AccountingMethod line as an example.
Note to self: Go to next slide.
[Slide #16]
AccountingMethod Example
Notes:
Begin the line with the less than sign.
Next is ElementType to identify what is to be defined.
Next is name'. This is the tag name from the hierarchy.
Content must be declared using the content="XXX" format.
Dt:type="string, boolean, etc" is the format for data type.
The ending of the line is />. The slash is used to denote an empty element,
if you are familiar with that from your XML class. This simply means that
no data will be stored on this line.
Content
Content identifies the content of the element. The choices that XML allows
are eltOnly, textOnly, and mixed.
textOnly' means that the element contains data. Although textOnly' sounds
like it might only hold string data, this will hold any kind of data.
eltOnly' means that this element can only contain other elements. An
example of this is the Name element within the Taxpayer element that we looked at early in the
presentation. Name contains the elements FirstName and LastName that actually contain the
Mixed is not used in this schema.
[Slide #17]
Building the Tree in XML
The tree
Declaring elements that contain other elements
Example
Notes:
Note to self: Search on TaXML to find the place in the file.
The Tree
The tree in an XML schema really just represents the parent-child
relationship between elements.
[Slide #18]
The XML Data File
Creating the data file in "real life"
Schema under control of IRS
XML data files produced by 3rd party software
XML data files created by taxpayer entry on IRS web site
Typed in for this presentation
Demo of the XML file
Notes:
For this presentation, we simply typed the data into a well-formed XML
file and validated it against the schema. All that means is that if we entered data that did not
match something in the schema, an error was displayed when we tried to view it in Inter
In real life, the schema would be established by the IRS much the same as the
electronic filing record layouts and specifications are done currently. Third party
software would then produce valid XML data files in the same way that they produce well forme
In the near future, we hope to see XML data files produced by a taxpayer's
entries into the fields on an IRS web site.
Now let's take a look at an XML data file. I'm going to show you this file in
an XML editor instead of Internet Explorer just to give you another view of the world.
Whether you use this tool or IE, the validations done are the same.
I should mention that we have used EF PATS files for our samples, so these may look familiar!
[Slide #19]
Sample XML Data File
Identify the version
Include the schema to be used to validate this file
Data must be included between correctly named tags
Case sensitive
End tags
No overlap
Notes:
Version
The version is 1.0. To create a sample file of your own, just copy this line.
Schema
You must give the name (and path if it's in a different directory) of the schema.
Well-formed Rules
I'm sure you covered these rules in your class yesterday.
XML is case sensitive -- not a problem for those who grew up with UNIX.
You must have beginning and ending tags.
Unlike HTML, the tags may not overlap.
[Slide #20]
Validation of Data
XML validates data against the schema and thus ensures a correctly formed file
As with our current electronic filing system, however, there will be a need for checking
content
There would need to be calculations done with the XML data after transmission of the file to
the IRS
[Slide #21]
Displaying Data With XSL
XML data storage versus use of the data
XSL is a separate language
Very new so hard to find information
Uses XML syntax
XSL file
Notes:
XML Storage Vs Display
Perhaps the greatest advantage of using XML is the power to store
data separately from how you use the data. I'll show you a demo in
a minute that displays the same XML data file in two different formats.
XSL as a Language
Information is beginning to become available about XSL. My latest
check of the bookstores shows several good looking books due to be published this summer.
Since XSL uses the same syntax as XML, you don't have to learn another language.
[Slide #22]
Summary
Hierarchy
XML schema and XML data file
XSL
Questions?
Notes:
Hierarchy
We covered a possible data hierarchy that would reduce the amount of
data to be transmitted.
----------------------------
Prepared by Robin Cover for The XML Cover Pages archive.

