Afternoon Breakout Sessions, February 16, 2005 (2PHC)
"Key Challenges to Using Small Area Data Effectively" (2PC3)
Learning Phase Workshop on National Non-Profit and Commercial Sector Organizations (2PHD)
A note on the discussion format: For the final Learning Phase Workshop, the NICS moderator changed the format of the afternoon breakout sessions. Instead of three separate discussion groups for the afternoon (Systems Architecture, Participant Needs and Applications, and Governance and Finance), meeting participants went to one of three breakout sessions that each discusses the ‘‘same topic’’ – ‘‘Key challenges to accessing and using small area data effectively’’ (2PC7)
The aim of each session was to suggest possible NICS tools and methods to overcome challenges regarding small area (i.e., neighborhood) data access and use. Examples of such tools and methods might include metadata standards, synthetic data sets, and larger sample sizes. The notes from all three sessions have been combined into one set of notes. (2PC8)
First, some definitions… (2PC9)
- Small Area data has to do with the geography of the data: (2PCA)
- “Low level” data that is individually identifiable at the various sub-national, sub-state levels (2PCB)
- Standardiztion or well-documentation should enable small-area data to be aggregated up to the regional, state or national level. (2PCG)
- One way to approach is to try and think of the most inclusive system, we need to think of tools that allow us to get down to that level as well as higher geographies. (2PCH)
- Not all the data you are going to be able to get will be available at the lowest levels of geography (parcel level, tract level), but not an either/or proposition - need data at all levels so you are able to access data at the appropriate level but it may not be useful to start with one definition of what "small" geography is because it will vary from topic, but a tool that can access data at the appropriate level given these limitations on the data would be most useful. (2PCI)
CHALLENGES and OPPORTUNITIES (2PCJ)
- Metadata and Standards (2PCK)
- comparable data and disclosure with different types of data, methodology (2PCL)
- What standards and how can we set them for data? (2PCM)
- Defining a "small area"? (2PCN)
- What is the right level of geography to answer my question? (2PCO)
- Defining "quality" which is relative? (2PCP)
- Metadata with a Social Infrastructure. We lack the social treaties or social conventions (metadata, standards, notion system agreement) that makes the data difficult to be tapped. We need to ensure the context and meaning travel with the data when it is shared or mined. (2PCR)
- Legislation (for social issues around data) to allow us to guarantee that people who are involved in these large data pools, participants in NICS don’t get hurt. (2PCS)
- Descriptors of the lowest common denominator - income definition - there are least common denominators across each definition - useful for NICS to come up with this basic metadata. (2PCT)
- Agreement on land use schemes--standard definitions for data collected (2PCV)
- We should embrace inconsistencies in data. No one is going to be able to decide or agree upon a single standard for metadata and data, so we shouldn’t wait for this to happen to make data that is good enough for decision-making. Think of modeling this community to enable pure democratic exchanges--unmediated--, like they way Ebay or Amazon designed their peer rating system. Such a system allows bottom-up or top-down exchanges that builds the best data and comments about the data for decision-making. (2PHQ)
- A set of standards could be a set of blinders if we are not careful (2PCW)
- Do you use metadata or standards or some other approach to ensure geographic comparability? (2PCX)
- Small grassroots need support in creating metadata projects which is wasting time in moving forward with the real analytical work (2PCY)
- (Lack of/Need for) Geographic Metadata/Standards (2PD0)
- Using data with standard geographies in user-defined geographies. Reconciling different geographies, and access to data at specific (and low-level) geographies. (2PD1)
- Standardizing a base map at the point level. Inventory of point and parcel data. (2PD2)
- Dealing with changes in geography boundaries (2PD3)
- Outreach Early intervention for awareness for those who aren’t as savvy with data collection issues. The earlier that people who are just starting to collect data are addressed by a NICS, the sooner we can have more quality data generally available. (2PD4)
- Spatial and temporal mismatch of data, geographic consistency. Lack of co-terminality of different geographies (police districts, school districts, neighborhoods) – need to be reconciled for use. (2PD8)
- There are real complications related to various geographic boundaries - (i.e. county boundaries do correspond to congressional subdivisions, etc.). (2PD9)
- Also self defined geographical boundaries necessitate reconciling data coded in various geographies (ZICTAs, census blocks, political jurisdictions, etc). This is a complicated issue, when you want to allow user-defined geographical boundaries and ensure statistical significance (2PDA)
- Organizational and Governance Challenges (2PDB)
- Adopting a system to capture changes (2PDC)
- Who is the audience? Who is the design for? (2PDD)
- How to convince people to share the data? (2PDE)
- Privacy concerns regarding data. (2PDF)
- Sorting through the inherit contradiction in the goals of organizations (2PDG)
- How to retain the grassroots effort of NICS? How to design a flexible framework (2PDH)
- How can we encourage collaboration? (2PDI)
- Comprehensible user interfaces to find what you need and use what you find (2PDJ)
- Privacy and Confidentiality (2PDO)
- Assembly of data with privacy and confidentiality considerations. Trying to put together data in a meaningful way that also protects privacy and confidentiality. The data exists, but we need to make it usable and integrate easier (2PDP)
- CIPS related to sharing data reinforcing confidentiality protections (2PDQ)
- Insufficient/Differing Levels Statistical Literacy (2PDR)
- Users are not just looking for data but answers (2PDS)
- Use of data in decision-making. Data is available, but we need to help public policy makers (particularly local level officials) use the data appropriately (for their purposes, and guard against misuse of the data) (2PDT)
- Separating description from prescription. (2PDU)
- Using appropriate geographic boundaries– Zip code data are less useful because they change frequently overtime. But this is what many people use. Boundaries need to be boundaries of consequence so they are able to act on it - need to match units of government. (2PDV)
- If you want to engage public in data, how do you engage them, how do you teach them to use the data, understand the data? (2PDW)
- How to help people not make stupid errors? (2PDX)
- Data/Information Access (2PDY)
- Not able to get the detailed geography needed for analysis. (2PDZ)
- Can not answer some basic questions i.e. what is the average rent? (2PE0)
- How to engage identified organizations that have enterprise data that would support further analysis, but it is not sold (2PE1)
- Data acquisition of administrative data sets, etc. (2PE2)
- Can we provide a data structure that will provide value-add with rules for rolling it up so that communities can use it intelligently? -Repurposing the administrative data (2PE3)
- Private Sector Data - for example: Marketing Data - Is it possible to make these data available? Marketing data provide psychological profile data that are very specific to neighborhoods. (2PE5)
- Can I get a tool to get below county level (2PE9)
- Data are time consuming to collect so the expectations of the community are high? A major source of data now are transaction data. how do you use existing data? (2PEA)
- Meeting demand – current tools and data offerings don’t meet demand (2PEB)
- What is the real demand for data at their local areas? (2PEC)
- NICS users are data intermediaries - 1) church 2) community groups 3) researchers data intermediaries know the demand but can't get the data. (2PEH)
- What kind of decisions are we looking to influence or help people-decide? (2PIV)
- Do we want to help them make better decisions on the ground than they are now? If people are just looking for more appropriate formats of current data at an easier thing to do would be to convince claritas to provide better products (2PEI)
- To ensure that the products being developed are influencing decision (2PEJ)
- Data Quality (2PEK)
- Trust and providence of local data. Assessing confidence in the data collected at the local level -- Accuracy, trustworthiness, usability of data with others. (2PEL)
- Tools that help individuals make decisions around quality of data (2PEM)
- collection sustainability --keeping the information fresh and real time (2PEN)
- Data integration needs to ensure statistical quality. For example, current tools split geographies and create duplicate-propagation of data. When statistical tools manipulate data they should be designed such that they guarantee they propagate statistical significance to all transformations. (2PEO)
- How do you maintain currency of data ? (2PEP)
- The key is not to wait for perfect data but how do you communicate information about the data? (2PEQ)
- Are we overly concerned about the quality or interpretation of data? For the vast majority of the users that we are talking about, if we give users data with some documentation this will be a great help. (2PER)
- We need to elucidate what's in it for me at the local level (2PET)
- How will NICS be available and supported by the lowest level of users? What is the structure of support of the user groups? (2PEU)
- there is more that we can do for small areas with the existing data but can't primarily because of staffing and funding concerns (2PEV)
- Fedstats for example had to make the decision to go to the county level because the data gets flaky sub county. Need to work with individuals at the local level to get data. (2PEX)
SOLUTIONS : Tools and Methods to Overcome Challenges (2PEY)
- We want to go towards ‘‘open architecture’’ instead of ‘‘open standards’’ or necessarily “open source” (2PEZ)
- NICS needs to have a serious support capability (2PF0)
- Elucidate what our goal is, what the purpose of data is, how it can effect decision-making, quality of life, etc. EG: child care who's taking services, who needs it. In formaulating this, we should ask the question: "What arguments would a city council person need to make to change child care policy, for example?" In this example, you know you need to know what the target population is, and to determine what percent of those in-need the community currently services. Then you need to consider the supply issues - what is the nature of the quality to care, what percent of those who need it get good quality services. As we think through specific problems, we'll figure out what this system needs to demonstrate successes early on. (2PIW)
- Understand data within an analytical framework to make the data tell the story (a mediated discussion) (2PF2)
- Indicators that have a numerator and a denominator (2PF3)
- Solving Confidentiality and Privacy Concerns and Issues (2PF5)
- Criminal penalties for misuse of data? (2PF6)
- Documentation (2PF7)
- Spatial and temporal catalog or indexing of available data, its collection and use. (2PF8)
- Data Standards/Metadata Standards/Documentation (2PF9)
- Basic elements of Metadata standards: (2PFA)
- OMB as a clearing house to define metadata and statistical data (2PFG)
- Develop a training guide to train users (white to black belt NICS user) (2PFH)
- Develop a top ten list of mistakes within the context of the framework. (2PFI)
- Funding a community statistics standards office, tying funding programs to those which subscribe to the centralized organizations. (2PFJ)
- Where there are multiple accepted definitions (e.g. income), or various results for different indicators for a particular effect( unemployment/employment rate) a NICS-powered portal could display all versions, explaining differences. E.g. for 3 different numbers of homeownership rate, if the system could pull up all the numbers and explain the differences. (2PFL)
- Data come from a lot of different places so there should be documentation or resources available about the data. (2PFM)
- Supporting Participants, all data users, NICS users (2PFN)
- Develop Communities of Practice for communities, local users. (2PFO)
- Educate communities on the importance of sharing data, and being statistically literate, and on data reference (2PFP)
- Encourage collaboration by tying resource allocation to participation (2PFQ)
- Making Participants/Users Aware of Best Statistical and Decision-Making Practices (2PFR)
- Tyranny of tools…Best practice examples so that the tools don’t define the problem being studied or trying to be defined. People should be made aware (guides, documentation, training) of the uses and limits of tools. Currently, tools guide collection, some people at the local level won’t collect data because the current tools for data use do not help a user at that geography, but they don’t realize that their data is extremely important for users with higher level geographies. Collect data with respect to statistical principles, or more generalized guidelines rather than what current tools can and cannot do. (2PFS)
- Imagine a future where there are organizations that support the data. Distributing funding to all the organizations that helped collect the data. (2PFT)
- Improving Data Quality and Use of Data (2PFU)
- Incentives to improve data. It’s a resources issue– the private industry has overcome the public sector’s drive to do a lot of these initiatives because when they create tools or combine data, they have created a commercialized product (and sustainable biz model) in the process. In some cases some companies are struggling with this because they can’t find the biz model or the market for a new tool or new information’s use. (2PFV)
- Develop a peer rating system within NICS. Like an Amazon rating system, or blog format – an interim solution while we develop this piece further. (2PFW)
- Demonstration of good data and what are the consequences of bad data to sell and educate users and data producers on what good data can do, and what it looks like (standard formats, good documentation, meta data, notes on limitations and use) (2PFZ)
- Identify data gaps (2PG1)
- Using allocation methodologies as a way to deal with propagating data to user-defined geograhies. (2PG2)
- Technology can create and drive demand - bring data in a basic format to verify other data. Edgar is a good example of how difficult for bringing this type of data online. if you make data available, tools will come. (2PG3)
GENERAL DISCUSSION (2PG4)
- Roles (2PG5)
- Vision for NICS is abstract - determine things that NICS is not going to do? (2PG6)
- NICS would: (2PG7)
- GASB might cause smaller municipalities to think differently about their payoff in lifetime account. (2PGA)
- Is PART a similar opportunity that would provide the persuasion to organizations to make them start to pay attention to metadata and the need for it, and the need for standards for metadata. (2PGB)
- SWOT analysis of how programs are affecting infrastructure (2PGC)
- Need to do a market analysis to determine who the users are and what they need. (2PGD)
- Create 465 centers at the congressional district level and education projects (2PGE)
- Data seems like a bad word in DC at this time (2PGF)
- Need to have some early success stories to establish the "best practice" creating competition against that best to acquire funding (2PGG)
- Obtaining and Supporting “NICS-Ready” data (2PGH)
- System Design, Functionality, Meeting User Needs (2PGM)
- What does this system look like - what are some of the delivery systems that are effective in addressing small area issues? (2PGN)
- Should there be a focus on states that have very good state data systems. Homeland sec/justice has driven a lot of the data to the state level and look at these systems as models for NICS. (2PGP)
- Uniformity is system design, database architecture and user interface (2PGQ)
- Need to have the user to have the flexibility to answer the questions they are interested in--a system that gives them a number of results, and allows their use of a NICS to refine their search as they go. (2PGS)
- Think about scalability - need to demonstrate the benefits of a system like this. (2PGU)
- Could be useful to report back on what types of data are available, what are the consistencies in the data - this was mentioned in the last session as a way of identifying low hanging fruit. (2PGV)
- How do we develop a product at any scale that has viability in the private market? (2PGX)
- recognition of low income asian american communities (2PGZ)
- Best-practices (on data collection side, multidisciplinary) (2PH1)
- What does this system look like - what are some of the delivery systems that are effective in addressing small area issues? (2PGN)
- NEXT STEPS (2PH2)
- Developing a use case that develops best practices. (2PH3)
- Need to demonstrate NICS ready data (2PH4)
- Need to demonstrate rewards and gaps in the data. (2PH5)
- Open technology - developing a standard by which technology standards over time. (2PH6)
- Peer Rating process? Kind of like a market place, do we trust the original source like amazon. How do you develop and identify those things? (2PH7)
- Legislation - to balance the need for privacy and confidentiality. What would victory be? Need to be able to point to a set of data that are NICS ready. Data that would be able to flow in and out. (2PHS)
- Ability to take NICS data and expand on local datasets - attributes associated about data. (2PHT)
- Use cases to show who are NICS ready, to demonstrate win-wins from data sharing. (2PH9)
- Case studies will populate a users guide, documentation about NICS (2PHA)
- Goal for next year is a set of data that is NICS ready that we can point to as we move ahead. (2PHB)