TaXML Presentation
TaXML Presentation
From: http://www.taxadmin.org/fta/meet/2000tech/andrson/ "TaXML Presentation." By Michael Sanzi and Lesley Anderson September 12, 2000 Also in: http://www.taxadmin.org/fta/meet/2000tech/andrson/anderson.ppt [Slide #1] TaXML Presentation Lesley Anderson [Slide #2] Introduction "Straw man" XML-based schema The schema authors Notes: Straw man This schema is presented as a straw man -- it is a starting point for working with XML. The people working on this were all tax professionals with many years of experience. Even though this has been carefully reviewed, there may be inconsistencies in the examples. This proposal may not exactly meet the requirements of the agencies that would use it, simply because there has not been sufficient (any!) communication between developers and customers. The Dev Team Tax professionals with many years experience Worked for 10 years developing commercial tax software for professional use Worked at MS for the last 2 years developing the TaxSaver product XML experience provided by programmers at MS [Slide #3] Agenda Developing a hierarchy Creating the schema Creating the XML data file Validating the XML data file with the schema Displaying data using XSL [Slide #4] Developing a Hierarchy Tax forms and electronic filing Included fields that need data entered Included data only once Exceptions: key fields and placeholders Notes: Tax forms and electronic filing Analyzed forms and flow, and reviewed the electronic filing specs Looked for ways to logically group data, rather than just reiterate the forms Looked for ways to economize the amount of data by avoiding duplication Used descriptive tag names Included entry fields Entered fields on forms are included in the hierarchy Subtotals and non-significant computed fields are not included Included fields that would be computed on a worksheet or a form not supported for this version. These are placeholders for a group of fields that would be entered later. Included data only once If data appeared in more than one place, only included data at its source entry point. For example, advance EIC is entered on the W-2 and then carried to the 1040. In our model, the field only shows on the W-2. [Slide #5] Forms Supported Form 1040 Schedule A Schedule B Schedule C Schedule E (pg.1) Schedule EIC Schedule F Schedule H Schedule J Schedule R Schedule SE Form 1040A Schedule 1 Schedule 2 Schedule 3 Form 1040EZ Form 2210-F Form 2441 Form 4255 Form 4562 Form 4797 Form 4835 Form 8606 Form 8615 Form 8812 Form 8815 Form 8828 Form 8829 Form 8839 Form 8863 Form 9465 Form W-2 Form 1099-INT, DIV, & MISC Form 1099-R Notes: It is somewhat misleading to speak in terms of forms. It is more accurate to say that the fields with entered data, as contrasted with computed data, are included in the schema. [Slide #6] At the Top of the Hierarchy TaXML Authentication Identification KeyID TaxYear Version Major Minor IndividualTax CorporateTax W-2 & W-3 Notes: TaXML XML schemas must have a unique root element. In this case, it's TaXML. There would of course, need to be elements for authentication, signon, and versioning. This is not addressed in this hierarchy except for a couple of placeholders. As security and transmissions needs are determined, this part of the schema will be filled Individual Tax is the top level for both federal and state income tax for individuals. I'm showing Corporate Tax as an example of the level at which another entity would appear in the hierarchy. This could be included as an actual part of the schema, by including it through a namespace or a data island. Also, this type of schema could be used to transmit information from employers and financial institutions to the IRS. [Slide #7] The Taxpayer Element Taxpayer IDNumber Name FirstName MiddleInitial LastName Suffix CompleteName NameControl Age65OrOlder Blind MilitaryIndicator HomePhone WorkPhone PresidentFund Exemption Notes: The Taxpayer Element ID number is the Social Security number. We have used the more general term IDNumber to allow this tag to be used for either SSN or EIN. The taxpayer's name is comprised of the 6 pieces shown. We gather the first, middle initial, last name, and suffix separately to make name control and matching easier. The name is then displayed in full in the CompleteName field. The NameControl represents the same content that is currently used in electronic filing. Whether this would still be needed in this new methodology is unclear. Then we have pulled together other information that is specifically related to the taxpayer. There are a couple of other fields that are in this section that were deleted on this slide due to space constraints. The spouse has an identical element. [Slide #8] Address Address Street Street2 ApartmentSuite City State ZipCode NewAddress Notes: Address The hierarchy was developed looking at federal and California. If California needed a piece of information that would be relevant to federal or another state, that field was added to the federal portion of the hierarchy. An example of this is that Street2 is a second line for the street address. The NewAddress tag is a boolean (Yes/No) field to indicate if this is a new address [Slide #9] FilingStatusInformation FilingStatusInformation FilingStatus MFS Name IDNumber DidNotLiveWithSpouse HeadofHousehold Name IDNumber QWYearSpouseDied MustItemizeIndicator Notes: FilingStatusInformation FilingStatusInformation includes the filing status used in the return. The MFS element was created to group the information needed for the spouse when the status is married filing separately. Under the MFS element is the Name element. This is bolded to show that this element has been previously used. This simplifies the hierarchy by not needing to again list the components. The Head of Household information is for the qualifying dependent. [Slide #10] DependentList DependentList Dependent Name IDNumber Relationship QualifyforTaxCredit QualifiedCareExpense YearofBirth Student Disabled NumberOfMonths PYChildCareIndicator Notes: DependentList Here is the first example of dealing with an element that may occur more than once, or in English, that may be present in the XML data file more than once. Throughout this proposed schema, we have used the word "List" to indicate that more than one of the elements that follows the list is allowed. In short, here is what this hierarchy means: One DependentList is allowed. Within each DependentList, there can be an unlimited number of Dependent elements. Within each Dependent element, there can be only one of the items listed. When we move to examine the schema you will be able to see that this is accomplished using the minOccurs' and maxOccurs' attributes. [Slide #11] Digging Into the Hierarchy Wages Demonstrates adding levels to the hierarchy Shows how state data can be gathered ActivityList Combining business, rental, farm, and farm rentals Depreciation California Integrating state into the mix Notes: Digging Into the Hierarchy We're going to be using a paper handout for this part of the presentation because of space constraints on the screen. We'll take a look at the Wages, ActivityList, and how California is included in the hierarchy. [Slide #12] Creating the TaXML Schema XDR rather than DTD Working in XML Using a browser Using an XML editor Declaring the name space Notes: XDR vs DTD There are several languages for writing schemas; XML Document Type Definitions (DTD) is the only standard one and the most widely used today. Microsoft's XML tools use a proprietary schema language called XML Data Reduced (XDR), which makes a number of improvements over DTD. For example, XDR schemas can specify data types (e.g., decimal number, date) and allowable values of elements, enabling p Microsoft says it will eventually replace XDR with the XML Schema language, currently under development by the W3C with input from Microsoft and other companies. Microsoft also says it will provide automated tools for translating XDR schemas to XML Schema We are using XDR in this schema. Working in XML Now we're going to look at XML. Up until now, the material we've looked at has been the analysis and organization of the material, without actually being in an XML file or schema. I'm opening the schema in the Internet Explorer. So technically what you're looking at is an HTML representation of the schema, rather than pure XML. HTML adds the colors and allows you to expand or collapse parts of the schema to make it easier to view. Declaring the name space The first line in the file is the declaration of the name space. We're using the Microsoft versions of XML for both the schema and the datatypes. [Slide #13] TaXML AttributeType AttributeType tsj state keyfield Format Notes: AttributeTypes The attribute feature allows you to add additional information to an element. This can be used to validate, sort, and display data. TSJ The tsj' attribute is probably one you can figure out with little help. Does anyone know what this is? By adding the tsj' attribute, we set the stage to be able to identify which income belongs to the taxpayer, the spouse, or to them as a couple. This can be used to determine the optimal filing status, and is especially critical to many of the states. State Sourcing income to a particular state or city is integral to the preparation of correct state returns. Keyfield The keyfield attribute might be better named something like check total. [Slide #14] TaXML Data Types Data types fixed.14.4 float boolean date int string Notes: TaXML Data Types Fixed.14.4 Fixed.14.4 is the type used for money fields. Float Used for percentages. Boolean Used for Yes/No or True/False situations. Date Used for dates as long as a word such as various is not permitted. If it is, then must use the string type. Int Used for numeric fields that are not currency String Used for anything else! Text is obvious, but also for IDNumbers, calendar years, etc., where you want to be able to control format. [Slide #15] ElementType Declarations Declaring the elements Order Format Beginning < ElementType Name Content Dt:type Ending />' Notes: Declaring the Elements The methodology we used to generate this schema was to set up a database. This gave us a dependable, stable data source that was easily maintainable. A program was written using C++ that creates the schema. Order Each tag name that we use in the hierarchy must be declared in the schema using the ElementType' statement. The ElementType declaration is best done for an item before it is included as a child of another element. Another way to view this is that the elements that contain text, rather than other elements are handled first. Although the first half of the schema is in alphabetical order, this is only because that was the choice of the programmer to alphabetize the elements that did not have child elements. The order in which a schema is written is not hard and fast. The language is still growing. Format Each line begins with a less than sign. All XML schemas are XML files and must conform to the required XML syntax. Let's look at the AccountingMethod line as an example. Note to self: Go to next slide. [Slide #16] AccountingMethod Example Notes: Begin the line with the less than sign. Next is ElementType to identify what is to be defined. Next is name'. This is the tag name from the hierarchy. Content must be declared using the content="XXX" format. Dt:type="string, boolean, etc" is the format for data type. The ending of the line is />. The slash is used to denote an empty element, if you are familiar with that from your XML class. This simply means that no data will be stored on this line. Content Content identifies the content of the element. The choices that XML allows are eltOnly, textOnly, and mixed. textOnly' means that the element contains data. Although textOnly' sounds like it might only hold string data, this will hold any kind of data. eltOnly' means that this element can only contain other elements. An example of this is the Name element within the Taxpayer element that we looked at early in the presentation. Name contains the elements FirstName and LastName that actually contain the Mixed is not used in this schema. [Slide #17] Building the Tree in XML The tree Declaring elements that contain other elements Example Notes: Note to self: Search on TaXML to find the place in the file. The Tree The tree in an XML schema really just represents the parent-child relationship between elements. [Slide #18] The XML Data File Creating the data file in "real life" Schema under control of IRS XML data files produced by 3rd party software XML data files created by taxpayer entry on IRS web site Typed in for this presentation Demo of the XML file Notes: For this presentation, we simply typed the data into a well-formed XML file and validated it against the schema. All that means is that if we entered data that did not match something in the schema, an error was displayed when we tried to view it in Inter In real life, the schema would be established by the IRS much the same as the electronic filing record layouts and specifications are done currently. Third party software would then produce valid XML data files in the same way that they produce well forme In the near future, we hope to see XML data files produced by a taxpayer's entries into the fields on an IRS web site. Now let's take a look at an XML data file. I'm going to show you this file in an XML editor instead of Internet Explorer just to give you another view of the world. Whether you use this tool or IE, the validations done are the same. I should mention that we have used EF PATS files for our samples, so these may look familiar! [Slide #19] Sample XML Data File Identify the version Include the schema to be used to validate this file Data must be included between correctly named tags Case sensitive End tags No overlap Notes: Version The version is 1.0. To create a sample file of your own, just copy this line. Schema You must give the name (and path if it's in a different directory) of the schema. Well-formed Rules I'm sure you covered these rules in your class yesterday. XML is case sensitive -- not a problem for those who grew up with UNIX. You must have beginning and ending tags. Unlike HTML, the tags may not overlap. [Slide #20] Validation of Data XML validates data against the schema and thus ensures a correctly formed file As with our current electronic filing system, however, there will be a need for checking content There would need to be calculations done with the XML data after transmission of the file to the IRS [Slide #21] Displaying Data With XSL XML data storage versus use of the data XSL is a separate language Very new so hard to find information Uses XML syntax XSL file Notes: XML Storage Vs Display Perhaps the greatest advantage of using XML is the power to store data separately from how you use the data. I'll show you a demo in a minute that displays the same XML data file in two different formats. XSL as a Language Information is beginning to become available about XSL. My latest check of the bookstores shows several good looking books due to be published this summer. Since XSL uses the same syntax as XML, you don't have to learn another language. [Slide #22] Summary Hierarchy XML schema and XML data file XSL Questions? Notes: Hierarchy We covered a possible data hierarchy that would reduce the amount of data to be transmitted. ----------------------------
Prepared by Robin Cover for The XML Cover Pages archive.