Send the link below via email or IMCopy
Present to your audienceStart remote presentation
- Invited audience members will follow you as you navigate and present
- People invited to a presentation do not need a Prezi account
- This link expires 10 minutes after you close the presentation
- A maximum of 30 users can follow your presentation
- Learn more about this feature in our knowledge base article
Do you really want to delete this prezi?
Neither you, nor the coeditors you shared it with will be able to recover it again.
Make your likes visible on Facebook?
You can change this under Settings & Account at any time.
Transcript of Bergen 2010
Experiences from estate and labor market research Maciej Piotrowski
Institute of Economics
University of Information Technology in Rzeszów, Poland http://www.wsiz.pl 20 threads 5 threads 1 thread UITM Sample approaches Web-based data collection sample approaches http://www.msnbc.msn.com/id/38463013/ns/technology_and_science-security/ http://www.techeye.net/internet/apple-bbc-ibm-intel-and-more-downloaded-facebook-user-torrent http://www.wolframalpha.com But ....
limited sources of data linked
no flexibility to create own databases
no possibility to automatically browse through pages
no option to automate incremental storage of data in specified time intervals Efficiency Other factors:
our network connection
website network connection
number of variables
complexity of regular expressions for variables DoS attack ?!? The need Our solution Challenges Businees
application Research opporunities Thank you for your attention! Questions? Maciej Piotrowski
Institute of Economics
University of Information Technology in Rzeszów, Poland
Complex quantitative analysis of the real estate market in Poland:
price levels depending on type of property, location, parameters
analysis of dynamics (monthly period)
Complex quantitative analysis of the labor market in Poland:
availability of jobs depending in regions
analysis of structure and dynamics
searched skills and competences
Exploitation of real estate data:
Real estate agencies
Investors and developers
Media (especially local and regional)
Exploitation of labor market data:
job offices and agencies
Complex spatial analysis
Development and integration with real estates listings of "price attractiveness mark"
efficiency of various algorithms, regular expressions, databases, infrastructures
optimization of large databases for future processing
optimization of end-user interfaces linking to databases
data mining techniques
lawfulness of extensive web-data collection and database building
comparative analysis of national and international regulations
Statistics, econometrics, forecasting:
spatial analysis of data (spatial dependency, auto-correlation, spacial interpolation) - GIS software
turning page rules
flexibility and extensibility
optimization of data cleaning, sorting and other data operations mechanisms
data presentation and visualization interfaces for large daatabases (e.g. real estate database contains ca. 1 milion records for each months, each with 10-30 variables depending on the type of the property)
optimization of data collection algorithms
CAPTCHA (Completely Automated Public Turing test to tell Computers and Humans Apart)
System architecture Basic facts:
Data collection application written in C# (Framework .NET 3.5) is fully scalable and can benefit from multiple CPUs and multicore CPUs
Data saved in PostgreSQL database through Npgsql
Management web-based interface is written in PHP, xHTML and AJAX. SOAP protocol used to communicate with Server
Web-based presentation interface is written in PHP, xHTML and AJAX, additionally Adobe Flex used
4 thousand lines of code
getting number of records: @"(?<=<td class=.paginateopis. nowrap>wyniki[0-9 -]+spo.r.d )([0-9]+)"
getting links of records: @"(?<=<div class=.line0.><a href=./details,[0-9]+,)([0-9]+)"
getting street name: @"((?<=ulica:.)[\w\s-]+)"
getting house type: @"(?<=<div class=.opis.>Podstawowe informacje</div>[\w\s.,-]+)(blok|kamienica|dom wolnostoj.cy|apartamentowiec|wie.owiec|inny?)"
getting dimentions of the land: @"((?<=Wymiary dzia[�-9ęóąśłżźćń;]+ki:[a-z\s,.-]+)|(?<=wymiary[ ]*)|(?<=kszta[�-9ęóąśłżźćń;]+t:[a-z�-9ęóąśłżźćń;.\s]+)|(?<=na[ ]*)|(?<=wa[ ]*)|(?<=wy[ ]*)|(?<=wymiarach ok.[\s]*))([0-9,.]+)([-mxszer.dl/\s]+|[na\s]+)([0-9,.]+)" http://ig.wsiz.pl Other promising areas of application:
Business entities database
Online auction websites (e.g. used cars price index, automatic valuation of cars based on feautures)
Social networks and other e-communities websites
Automatic sentiment analysis